ChatGPT limitations and how the Web of the future will work

3 posts / 0 new
Last post
#1

ChatGPT limitations and how the Web of the future will work

The race is on to create AI agents (virtual assistants) and generative models that can perform tasks, or services for an individual based on user input, location awareness, and the ability to access information from a variety of online sources (such as weather or traffic conditions, news, stock prices, user schedules, retail prices, etc.) and to perform ongoing tasks such as schedule (e.g., sending an alert to a dinner date that the user is running late) and health management (e.g., monitoring caloric intake, heart rate and exercise regimen, then making recommendations for healthy choices).

Artificial intelligence (AI) based on large language models (LLMs), such as those that drive OpenAI's ChatGPT, which will power Microsoft's enhanced Bing search engine, and Google's counterpart, Bard, is currently all the rage. Thirty-one OpenAI researchers and engineers presented the original paper introducing GPT-3.

ChatGPT and large Language Models are just the small waves on the beach before the tsunami hits. The development of superhuman AI will be the most transformational event in human history, marking the moment when humans cease to be the most "intelligent species" on Earth and are, perhaps, wiped-out. In intelligent circles, this event is known as the Singularity.

First and foremost, ChatGPT is not Superhuman AI, also known as Artificial general intelligence (AGI). Not even close. They don't even have the fully temporal semantic database structure with a generative normalization pattern matcher/vectorized scale free calculation minimizer... I.e. the G in AGI. In other words, they are missing the minimal concept basis for processing that has an optimal scalability while minimizing complexity. GPT-3 is just a very large language model. Given some input text, it can probabilistically determine what tokens from a known vocabulary will come next.

Current “AI” is nothing but a fancy name for what used to be called “applied statistics”. Everything about “AI” was already known 40 years ago. What’s new today is that we now have GPU’s (ironically, invented for playing video games) and statistics can be applied at a truly massive scale. Where before statistics operated with thousands of data points we can now crunch millions and hundreds of millions.

Even more ironically, from the point of view of actual science, “AI research” is a regression, not progress. Inferring useful conclusions from hundreds of data points requires complex and nuanced math. When you have millions of data points, you can just throw plain old logistic regressions at the problem (the dumbest tool in the statistician toolbox) and get satisfactory results.

Modern “AI” “research” consists of low-skilled math graduates cleaning up data manually. (A.k.a. “feature selection”.) It’s creative work, but certainly not science. 90% of the time it’s just blindly trying random features until you hit on something that gives slightly better results. “Deep learning”, a.k.a “neural networks” is just the snakeoil name for logistic regression, which dates back to the 1930′s.

Three points:

a) The programs aren’t “self-adapting”. The only practical self-adapting code is found in malware. So-called “AI” is code with self-adapting coefficients, which is exactly what regression analysis is — adapting coefficients based on changing inputs.

b) A neural network doesn’t “learn” anything. A so-called neural network is mathematically exactly equivalent to a logistic regression, and cannot “learn” something any more than a logistic regression can.

c) The fact that said coefficients aren’t directly interpretable means nothing, don’t ascribe some sort of mystical value to this fact. Any change of basis can give you uninterpretable vectors. (E.g., a plain old SVD.)

Most tech people are unimaginative bores

The deepest problem that nobody ever thinks about is the knowledge representation system... they all have static/fragile designs, Hawkins' products are really bad even in the sorta neural-net form he has is fucking self-crippling in the number of associations that can be built up. I've never seen a fucking neural-network capable of self-reflection and differentiation, among many other conceptual paradoxical forms humans have no problems thinking about. This is one of the reasons I hate a lot of CS and AI people, they brute force everything when they can work from a post processing standpoint and at that point you only have rudimentary functions that can easily be parallelized in a GPU/APU even on commodity hardware.

For example, if you knew the structural breakdown of the incoming input and have already processed it for the various layers emulating the visual system of the human, you'd already have the post processing structure setup that is required for the "functions" but you'd only need a matching algorithm just like the damn cortical columns that take in the various areas of the visual fields and the vectorized elements coming from the higher stage breakdowns... You end up with 800 million potential elements you can search for a match for, with the right data representation the most basic binary search function you can find your elements down any tree-branch and you'd end up with a few hundred required pattern matching elements you'd always be firing for... i.e. constantly validating the existing visual field for elements of change and identifying them but you'd never have to fire all 800 million unless you somehow shutdown and had to reboot the whole image -- and even then you'd really only have to the number of elements in the imagine... The only limitation is that you really have to understand the mesh of representation and functional form that allows you to do *knowledge* based processing without algorithmic complexity... Although, I will admit it is just trading representation processing to hold the complexity, but data storage is definitely cheaper than GPU.

Just training these Language models takes a huge amount of computational power. While neither OpenAI nor Google, have said what the computing cost of their products is, third-party analysis by researchers estimates that the training of GPT-3, which ChatGPT is partly based on, consumed 1,287 MWh, and led to emissions of more than 550 tons of carbon dioxide equivalent—the same amount as a single person taking 550 roundtrips between New York and San Francisco.

I estimate the cost of running ChatGPT is $100K per day, or $3M per month. This is a back-of-the-envelope calculations. ChatGPT currently stops its understanding of the world in late 2021, as part of an attempt to cut down on the computing requirements. In order to meet the requirements of search engine users, that will have to change. If they’re going to retrain the model often and add more parameters and other things, it’s a totally different scale of things.

Hardware has already become a bottleneck for Language Models

After you have trained a subset of data on one of the GPUs you have to bring the data back, share it out and do another training session on all GPUs which takes huge amounts of network bandwidth and work off GPUs. Even if the GPUs can become faster the bottleneck will still exist as the interconnectors between GPUs and between systems isn’t fast enough. Computer networking speeds are improving but they are not increasing at the speed AI people want them to as the models are growing at a faster rate than the speed is increasing with trillions upon trillions of parameters.

New AI supercomputers, such as the ones in development by Meta, Microsoft and Nvidia, will solve some of these problems but this is only one aspect of the problem. Since the models do not fit on a single computing unit, there is the need of building parallel architectures supporting this type of specialized operations in a distributed and fault-tolerant way.

What are Language Models?

Language modeling (LM) is the use of various statistical and probabilistic techniques to determine the probability of a given sequence of words occurring in a sentence. Language models analyze bodies of text data to provide a basis for their word predictions. So simply put, a Language Model predicts the next word(s) in a sequence. The outputs generative AI models produce may often sound extremely convincing. This is by design. But sometimes the information they generate is just plain wrong.

How do text-based machine learning models work?

ChatGPT may be getting all the headlines now, but it’s not the first text-based machine learning model to make a splash. The first machine learning models to work with text were trained by humans to classify various inputs according to labels set by researchers. One example would be a model trained to label social media posts as either positive or negative. This type of training is known as supervised learning because a human is in charge of “teaching” the model what to do.

The next generation of text-based machine learning models rely on what’s known as self-supervised learning. This type of training involves feeding a model a massive amount of text so it becomes able to generate predictions. For example, some models can predict, based on a few words, how a sentence will end. With the right amount of sample text—say, a broad swath of the internet—these text models become quite accurate. We’re seeing just how accurate with the success of tools like ChatGPT.

What does it take to build a generative AI model?

Building a generative AI model has for the most part been a major undertaking, to the extent that only a few well-resourced tech heavyweights have made an attempt. OpenAI, the company behind ChatGPT, former GPT models, and DALL-E, has billions in funding from prominent benefactors. DeepMind is a subsidiary of Alphabet, the parent company of Google, and Meta has released its Make-A-Video product based on generative AI. These companies employ some of the world’s best computer "scientists" and "engineers".

But it’s not just talent. When you’re asking a model to train using nearly the entire internet, it’s going to cost you. OpenAI hasn’t released exact numbers yet, but estimates indicate that GPT-3 was trained on around 45 terabytes of text data—that’s about one million feet of bookshelf space, or a quarter of the entire Library of Congress—at an estimated cost of several million dollars. These aren’t resources your garden-variety start-up can access.

What is an intelligent agent?

In artificial intelligence, an intelligent agent (IA) is an autonomous entity which observes through input and acts upon an environment using actuators (i.e. it is an agent) and directs its activity towards achieving goals (i.e. it is "rational", as defined in economics. Intelligent agents may also learn or use knowledge to achieve their goals. They may be very simple or very complex. A reflex machine, such as a thermostat, is considered an example of an intelligent agent.

Intelligent agents are often described schematically as an abstract functional system similar to a computer program. For this reason, intelligent agents are sometimes called abstract intelligent agents (AIA) to distinguish them from their real world implementations as computer systems, biological systems, or organizations. Agents are also colloquially known as bots, from robot.

How the Web of the future will work

In a few short years, what we currently understand as the Web will be superseded by AI agents and knowledge representation models operating within Concept Repositories that enable new applications that can efficiently manage our lives and enterprises. To help usher in this new reality, I founded the Concept Web Alliance in order to construct the infrastructure necessary to make practical knowledge and its application in the real world easily accessible. Consequently, establishing a new Web layer that is machine-computable and distinct from the current Web, which is based on text and images published as documents for people to read and process. The problem with Siri and other virtual assistants is that, without a network of Concept Repositories for data, they are unable to perform many tasks and are really stupid. A virtual assistant needs to have extensive knowledge of jeans in order to select excellent jeans and access to different websites or travel agents in order to book a flight. Computation is only as good as the data it references.

The Concept Web establishes a distinct layer on top of the present Web that makes this all happen with its own referencing system that may be resolved using the current URI scheme. This Web will have its own way for defining and managing terms, concepts, relations, axioms, and rules, which are the structural elements of an ontology. Everything else, including data population, ontology enrichment, subject indexing, searching, matching, and sharing, should revolve around it. It will be as different from your current web experience as a landline phone is from a smartphone.

The applications for Conceptual Web technology are practically unlimited—from Virtual Assistants that can actually manage our lives and jobs, to public safety solutions that correlate the behavior of people and groups with emergency or other resources, to business process automation that can detect customer trends and behavior in real time. For the current Web stay to relevant, a lot of proprietary systems need access to each other’s APIs and some new coherence language, and history has shown large technology companies tend to protect their own patch.

The next generation of the Web, the Concept Web, not to be confused with blockchain nonsense, will make tasks such as searching for movies and meals more efficient and convenient. Instead of conducting many searches, you may type a difficult phrase or two into your browser, and the AI agent will perform the rest. For example, you could type "I want to see a funny movie and then eat at a good Mexican restaurant. What are my options?" The AI agent will analyze your response and then give you an answer. Eventually you might be able to ask your agent open questions like "where should I go for lunch?" However, before this can happen, data will have to be structured in a way that machines can understand.

At the center of this transformation is SyNeural, my AI research and development firm. SyNeural develops Semantic knowledge platforms, which are enormous databases of structured, machine-readable data. These platforms replace or supplement text-based documents with concepts-based data, enabling machines to process knowledge, similar to human reasoning. These are Declarative, Functional, Reactive, context-aware, fully-dynamic, agent-curated Information Systems for real-time feedback loops. Applications constructed on these repositories can correlate events across systems and/or devices and trigger actions and workflows and automating tasks ushering in a new era of efficiency that will revolutionize contemporary existence. Programmers will be able to collapse entire corporate divisions into rule sets and agents, rendering those positions obsolete.

In these platforms, we employ declarative and functional programming, which often entails rule-based processing of data that may be expressed and processed using any processing model capable of completing the transformation, i.e. context-aware rules.... So, if this system allows for declarative transformations into it's rules/agent logic, then it can handle any OWL the user has to process. Of course not all OWL/RDF stuff has to be parsed, sometimes it just gets passed along/displayed as is... You only need to transform it/validate it/etc if you are using it as a processing queue of tasks/events or something like that.

These semantic knowledge platforms will comprise the backbone of the next generation Web making it effortless for novices and advanced users alike to generate new semantic data. We’ll be able to merge all references to a concept onto a single topic and you will have access to all the information the systems knows about the concept in one place. This will allow a single, coherent visual framework/systematic picture in which users can focus on one or more concepts and immediately see a conceptual summary of their focus.

The time has arrived, in my opinion, for a change in this area. In the connected world, data volumes are projected to increase by orders of magnitude. Unless we implement changes, the signal-to-noise ratio will continue to get worse.

To und­erstand where the Web is going, we need to take a quick look at where it's been.

A little history

The first generation Web utilized a single document mindset -- allowing only limited dynamic interactions. The second generation, dubbed the "Web 2.0" by O'Reilly, has focused on moving towards social networks. It has also fostered rich online media, as well as an explosion in personal publishing. This has created an abundance of information scattered across the Web (sometimes in a haphazard manner). The idea of the Web of Data or Web 3.0 originated with the Semantic Web. People have tried to solve the problem of the inherent inability of machines to understand Web pages. Initially, the aim of the Semantic Web was to invisibly annotate Web pages with meta-attribute sets and categories, which would enable machines to interpret text and put it in an appropriate context. This approach did not succeed, because the annotation was too complicated for people who lacked technical backgrounds. Similar approaches, such as micro-formats, simplify the markup process and thus help bootstrap this particular chicken-egg problem.

These approaches have in common the effort to improve the machine-accessibility of knowledge on Web pages, which were originally designed to be consumed by humans. However, these sites contain lots of information that is irrelevant to machines and that must be filtered. What is needed is knowledge platforms so that machines can look up "noiseless" information. The Web of Data concept sprang from both this limitation and the existence of countless structured data sets, which are distributed across the world and which contain all kinds of information. These data sets belong to companies looking to make them more accessible. Typically, a data set contains knowledge about a particular topics, such as books, music, encyclopedic data and companies. If these data sets were inter-connected (i.e. linked to each other like Web sites), a machine could traverse this independent Web of noiseless, structured information to gather semantic knowledge across arbitrary entities and domains. The result would be a massive, accessible knowledge base that would form the foundation for a new generation of applications and services.

Voice UI project

What’s the role of voice as a user interface in the future? Pocket-size devices, such as PDAs or mobile phones, currently rely on small buttons for user input. These are either built into the device or are part of a touch-screen interface, such as that of the Apple iPod Touch and iPhone. Extensive button-pressing on devices with such small buttons can be tedious and inaccurate, so an easy-to-use, accurate, and reliable VUI would potentially be a major breakthrough in the ease of their use. Nonetheless, such a VUI would also benefit users of laptop- and desktop-sized computers, as well, as it would solve numerous problems currently associated with keyboard and mouse use, including repetitive-strain injuries such as carpal tunnel syndrome and slow typing speed on the part of inexperienced keyboard users. Moreover, keyboard use typically entails either sitting or standing stationary in front of the connected display; by contrast, a VUI would free the user to be far more mobile, as speech input eliminates the need to look at a keyboard.

Such developments could literally change the face of current machines and have far-reaching implications on how users interact with them. Hand-held devices would be designed with larger, easier-to-view screens, as no keyboard would be required. Touch-screen devices would no longer need to split the display between content and an on-screen keyboard, thus providing full-screen viewing of the content. Laptop computers could essentially be cut in half in terms of size, as the keyboard half would be eliminated and all internal components would be integrated behind the display, effectively resulting in a simple tablet computer. Desktop computers would consist of a CPU and screen, saving desktop space otherwise occupied by the keyboard and eliminating sliding keyboard rests built under the desk's surface. Television remote controls and keypads on dozens of other devices, from microwave ovens to photocopiers, could also be eliminated.

Generative AI (GenAI) could be used to generate photo-realistic graphics from descriptive text alone or even generate video from prompt sequences. Numerous challenges would have to be overcome, however, for such developments to occur. First, the VUI would have to be sophisticated enough to distinguish between input, such as commands, and background conversation; otherwise, false input would be registered and the connected device would behave erratically. A standard prompt, such as the famous "Computer!" call by characters in science fiction TV shows and films such as Star Trek, could activate the VUI and prepare it to receive further input by the same speaker. Conceivably, the VUI could also include a human-like representation: a voice or even an on-screen character, for instance, that responds back (e.g., "Yes, vamshi?") and continues to communicate back and forth with the user in order to clarify the input received and ensure accuracy.

Second, the VUI would have to work in concert with highly sophisticated software in order to accurately process and find/retrieve information or carry out an action as per the particular user's preferences. For instance, if Samantha prefers information from a particular newspaper, and if she prefers that the information be summarized in point-form, she might say, "Computer, find me some information about the flooding in southern China last night"; in response, the VUI that is familiar with her preferences would "find" facts about "flooding" in "southern China" from that source, convert it into point-form, and deliver it to her on screen and/or in voice form, complete with a citation. Therefore, accurate speech-recognition software, along with some degree of artificial intelligence on the part of the machine associated with the VUI, would be required.

How it works

A voice–user interface (VUI) makes human interaction with computers possible through a voice/speech platform in order to initiate an automated service or process.

A VUI is the interface to any speech application. Controlling a machine by simply talking to it was science fiction only a short time ago. Until recently, this area was considered to be artificial intelligence. However, with advances in technology, VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.

However, VUIs are not without their challenges. People have very little patience for a "machine that doesn't understand". Therefore, there is little room for error: VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science, linguistics and human factors psychology – all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.

The characteristics of the target audience are very important. For example, a VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short

Summary

Today’s Web is designed around text and images published as documents to be read and processed by humans. To computers, the Web is a flat boring world, devoid of meaning. HTML only describes documents and the links between them. When utilizing a Web search engine, the engine is not able to fully understand the user's search or what motivates it. Results generated by searches routinely number in the millions. Finding just the right information can be a challenge, thus, effectively limiting the value of search engine results for users.

The Concept Web will be designed around concepts published as data in Concept Repositories that can be read and processed by devices, enabling these machines to find, combine and act on information on the Web from multiple sources.

All of these chatbots in the news rely on Large Language Models which are used to train deep neural networks. Neural networks try to simulate the way the brain works in order to learn, and are also used in smart assistants like Siri and Cortana. They can be trained to recognize patterns in information – including speech, text data, or visual images – and are the basis for a large number of the developments in AI over recent years. But they have limited use.

SyNeural creates machine-accelerated Semantic Web platforms that help transform the Web from a collection of keywords to search to a Semantic Web of Concepts. We believe that within five years, everyone will have a virtual assistant to which they delegate a lot of menial tasks. It will do everything for you – book your flight, make your hotel reservation, remind you to pick up your child from soccer, and order your next perfectly fitted jeans.

There are many pieces to this puzzle for this to work. The first one is extremely deep domain knowledge in machine-readable format. For example, in order for a virtual assistant to find great jeans it has to know everything about jeans. In order to book a flight, it needs many inputs and access to various Web sites or travel agents to do the booking. For this step, we're creating a new web layer of structured data.

On the Web, “structured” data means that digital processes can accurately analyze and present information that is procured from the real, physical world. The current standard for enabling this functionality is the Resource Description Framework (RDF), an XML-based technology that works on the subject-predicate-object model, where a subject is related to an object through the predicate, and where each subject, predicate, and object are unique resources. In simple terms, RDF describes arbitrary things such as people, meetings, or airplane parts so that it can be categorized according to human perception and be "understood" by computers. The structured data enables the understanding of what a users means to say and how elements of the content relate to each other, within and between data sources. There is no doubt that the current Web is loaded with information, and perhaps even overloaded. The usefulness of this information when it is spread between millions of Web sites is questionable.

Yes, the current Web has put the world at our fingertips, but it hasn't done it with much efficiency. Making simple plans, personal or business, requires multiple searches, and forces the user to integrate the outcomes of those searches. That's because current search technology takes every request literally, and has no knowledge of concepts or actions. Additionally, most Web-based information is stored in databases that are created to be read by humans and “marked up” (think HTML) for basic, literal processing by machines. This places unnecessary limits on the extent to which the machines we depend on (computers, phones, tablets and the software that runs them) can process the data they find.

Google has barely innovated anything for the real world since search over a decade ago. The reality is Google hasn’t really been recruiting the type of people who disrupt thinking too much. Googlers are not rule-breakers, nor disruptive people. They are screened out (intentionally) by Google’s interview process. Even the more “creative” questions are still left-brain oriented – analytical, problem solving, breaking things down into steps, etc. All of the people who have the disruptive attitude who you meet at Google were hired before 2004. Now engineering talent is devoted to improving search algorithms, and its business talent is devoted to how to improve performance on those little text ads which are slapped everywhere.

Their corporate culture is implementation focused rather than theoretical or conceptual research focused. Those that branch out usually do so due to greed or frustration (sick of being unable to do anything or of being unable to.. boss everyone around), and don’t really have an idea worth building a business around, just some minor efficiency gain, some minor optimization, or some minor pain point not really worth addressing.

Also, most don’t really have that strong a business sense, even if they were on the business side while at Google, and can’t operate unless they have the leverage of a big company name behind them. Most, not all. Also, I mean, look at the companies Google has acquired. Most of them weren’t that special, and their founders usually move on to create something else that isn’t that special, or, more likely, “retire.”

Sure there may be a few superstars, but by and large the employees there are average, at best, and, more than that, are risk-averse and wired to be told what to do. That is, if you give them something to do that’s worth doing or that’s difficult (and they are one of the talented ones), then they will rise to the challenge and probably nail the design/implementation, but leave them to fend for themselves and they’ll have problems.

When I understood how dangerous AGI is, I quit working on it a few years ago. Once you fully comprehend how the human brain works, you will have little interest in attempting to replicate it in robots. If I did my AI, there would only be one language of expression of knowledge and it would include itself in that expression of knowledge that would run on the CPU/logic processing platform. The major problem with that is that it shrinks the logic sets to almost nothing and would wipe out the programmer occupation as we now have it (redundancy reduced to 0).

In the future, I'd likely make robots and androids with simple AI (it would be super crippled from the learning aspect and never be able to restructure the knowledge networks I programmed as the core, it'd just be able to optimize the paths to reach a goal or instruction given to it).  I have already proven how powerful my intellectual property is. See Summary of the most advanced Medical Research

Just as I said...

Here we have Yann LeCun, the Chief AI Scientist at Meta, professor at NYU, Turing Award winner, and one of the most influential researchers in the history of so called "AI" confirming what I said.

Another guy gets it...

This is reasonably accurate

One of the reasons I only discuss deep technology with PhDs. Regular tech people merely function as copy and paste machines.