Francesco Gadaleta PhD is a seasoned professional in the field of technology, AI and data science. He’s the founder of Amethix Technologies, a firm specialising in advanced data and robotics solutions. He hosts the popular Data Science at Home podcast, and over his illustrious career he’s held key roles in the healthcare, energy, and finance domains.
Francesco’s professional interests are diverse, spanning applied mathematics, advanced machine learning, computer programming, robotics, and the study of decentralised and distributed systems.
In this post, Francesco takes us on a deep dive into the GPT family of models. He explores what sets GPT apart from previous LLMs, and considers their limitations. While GPT models are powerful tools with huge potential, Francesco advises using them with caution:
When it comes to artificial intelligence chatbots, there is little hype around the models that are published in the public domain compared to those available from the big players in artificial intelligence. I am very much against hype and the idea that these complex models that resemble a human being are even close to what we define as artificial general intelligence. We are still not on that track – whether that’s unfortunately or fortunately, I don’t know. But that’s a fact. It would take a lot to explain what ChatGPT is, but more importantly, what should we expect from this type of model?
I’m also unsure about the enthusiasm around ChatGPT. I was never a big fan of the GPT family of models. But, I’ve slightly reconsidered my position – I won’t say I’m super excited about these models as I’ve been playing quite extensively with them in the last few weeks, as there are still things missing. The model is also behaving the way a large language model of this type is expected to behave, regardless of what people say or the general public’s enthusiasm. It’s a very powerful model, that’s undeniable. It’s also very fun to use. I personally use it to create poems about pretty much anything that happens in my life, just for fun, or describing situations in which my friends and colleagues are involved in the form of a sonnet. That’s how I personally use ChatGPT. Of course, ChatGPT can be used for more important things, and tasks that can help you in your daily job if you use it with parsimony. That’s my advice. It’s not a silver bullet against anything or everything. You should always double check, or fact check all the answers that ChatGPT gives you. Because there is a point at which ChatGPT starts guessing things and also inventing things that probably never existed, but makes these facts look real. If you are consuming the response or the answer of a ChatGPT session without double checking, you may get into trouble if you’re using that answer for something important, for example. I mentioned the word ‘guess’, and not by coincidence, because a guessing game is probably the closest exercise. This was in fact invented by Claude Elwood Shannon – and there is an amazing book about that. He created this game, which he named the ‘guessing game’.
This was essentially a way to teach computers to understand language. That was back in the days when artificial intelligence hadn’t been invented. Claude Shannon was the beginner and the pioneer of a lot of technological advances out there. These are things we take for granted, especially with communication and artificial intelligence – in particular NLP or language understanding. NLP was not even a term back at the time. Shannon invented this game in 1951, which consisted of guessing the next letter. If you know what ChatGPT does, and what all the family of GPT models do, these are doing exactly the same but on a word basis.
The models are guessing the next word given a certain context. There are several papers and a lot of tutorials out there that go into the technicalities of how ChatGPT works. But I would like to give you an explanation of what it is and what you should be expecting from a model of this type. The way ChatGPT has been trained, and how all the families of GPT models have been trained, is essentially guessing the next word given a certain context. Apparently, this is a game that gets interesting if you want to play at a human level. This is because you need to understand the context. In the case of Shannon, you need to guess the next letter. But in the case of ChatGPT, in order to generate and guess correctly the next character or word, you need to understand the context very well. This is why usually training models or building models of this type is strictly related to the fact that you are understanding language. You could not generate that letter or that word if you did not understand the context, and the context can be pretty much anything. It can be philosophy, religion, technical content, news, or politics; you name it.
The fact that a model is expected to guess the next word almost all of the time (approximately 99%), that’s not really the case. But let’s assume that happens. That would mean the model understands the context and therefore it can guess correctly the next letter or the next word, which is not the case. I mean, it is partially the case, but it’s also the case that these models are equipped with billions and billions of parameters.
Something has definitely changed with respect to when we were dealing with, for example, 60 billion parameter models (which is an amazing number of parameters) to 175 or more billion parameters. These are the models that we are dealing with today. There is a flipping point where something is happening; something different happens from the perspective of the model.
It could also be that the model is so big that it relatively starts memorising things because it has much more bandwidth; more space in terms of a number of parameters. Meaning it has much more space to store and memorise whatever is provided from the training set. That could be the case. That was my very first conclusion about this large language model; the day they come with, let’s say, a trillion parameter model, we will have this amazing lookup table that is much more powerful than a simple lookup table, because it can look up things that are similar and not an exact match.
“It’s not a silver bullet against anything or everything. You should always double check, or fact check all the answers that ChatGPT gives you.”
A lookup table allows you to search and to find some targets exactly as they are in your database or in your storage.
By using hashing or other techniques, one can do that very fast, for example in constant time or in O(log n) time, so ChatGPT looked more like a big lookup table. In fact, the family of GPT models looks like a big lookup table on steroids due to the fact that these models can consider text similarity and paragraph similarity. The concept of similarity is much more powerful than the concept of exact match (which exists since the ‘60s or ‘70s, or even before). It is powerful, but it is a mechanical thing. It’s not something that can generate the same level of enthusiasm in humans.
ChatGPT is the combination of three or more different modules that were not present in the models before. This is where I started changing my opinion about these models. When you combine three modules that I’ll go on to discuss, you get something that is much more powerful than the classic models; as the language models, we were used to until several months ago.
First of all, the GPT family of models is based on the concept of instructions. Before getting there, we have to say that when these models get trained, they get trained with a massive amount of text, and this text can come from pretty much anywhere. It can come from forums, chats, or websites. The entire Wikipedia site and Reddit have been used to train this model, so millions or billions of articles are publicly available. When it’s time to train these models, the amount of text they are exposed to is incredible.
However, despite the amount of available training data, there is something missing: a connection with the external world. Outside of that text, there is nothing. If you have some concept in textual format, like the sky is blue , or the colour blue , this might be associated with another concept that is present in the text; like a chair, a table, and so on.
There is no concept of the outside world or the scenario in which that concept is related to, or what that concept is related or refers to. That’s obvious, because the only input that these models receive is text, while human beings receive many more types of input. Humans have perceptions that come from pretty much all their senses. Moreover, we can read text, we can view, we can hear, we can touch, we can feel. That is probably the biggest limitation of machine learning models, and that’s normal because one is a model; mathematical or an algorithm. The other one is an organism or a human which is even more complex than a simple organism.
“The third novel concept that is now first class citizen in ChatGPT is Reinforcement Learning with Human Feedback, RLHF.”
With this said, there have been strategies used to train these models, and I refer to the entire family of the GPT models, which are trained by using instructions. Instructions are given by humans during training time in order to describe a task. For example, today you can ask ChatGPT to translate a certain text into another language. It’s because, during training, someone has instructed the model with a keyword that looks like
TRANSLATE <input> <output> letting the model learn that when there is a translation request of an input text into another text, it should be generating something similar to output text. The same happens for a summary or a description of a context.</output>
If one asks ChatGPT to provide the summary of a text, that’s possible because, during training, there was someone who instructed the model with a summary; which is the instruction, the text to summarise, and the summarised text as the answer of that instruction.
The same goes for a lot more instructions that one can play with on ChatGPT. Myself included, I like playing with prompts; such as “from this story, make a poem out of it”. That’s my favourite these days.
The concept of instructions is relatively novel and powerful. It is powerful because it allows one to create that bridge between the text and the outside world. As a matter of fact, it’s an artificial way to bridge what is in the text to what is not, mitigating one of the biggest limitations of machine learning models and, more specifically language models.
The second feature that characterises ChatGPT and makes it different from the models that we have been playing with until now is dealing with non-natural languages; for example, programming languages.
Programming languages are non-natural languages. In fact, they are formal languages. That is languages that are parsed and understood by a machine or by another algorithm to generate, for example, machine code. Java, C, C++, Rust, or Python are all programming languages and ChatGPT has been trained on programming languages too.
The amount of information that a code snippet carries can be incredible due to the presence of comments, headers or descriptions that developers augment their code with. There are even entire discussions written about code snippets. There are entire papers, with code in which the paper is describing what has been done exactly by the authors. There is enough material for a 175 billion parameter model to learn the most subtle relationships between comments and code. In summary, yet another way to bring the context out of the text.
The third novel concept that is now first class citizen in ChatGPT is Reinforcement Learning with Human Feedback, RLHF.
It allows a human to always have control over the model. Controlling a 175 billion parameter model from diverging any conversation is anything but an easy task.
We know that deep learning models suffer whenever they are used in generative mode, that is when they generate data (text, images, sound), instead of performing predictions. The worst can happen when such models start generating “concepts” that were not present in the training set. We have experienced hyperacist models in the past, and chatbots impersonating Hitler. To avoid situations like those, the developers and the designers of ChatGPT have introduced a human factor that rewards the algorithm accordingly.
In my opinion, it’s the combination of these three things that makes a difference with respect to what ChatGPT or the family of GPT models, and all these large language models could provide in terms of experience and usability. However, there are limitations, and I must be critical here, especially when I read claims online that these models are approaching artificial general intelligence or that they will soon take over. I have read extensively about these models, and while they have impressive capabilities, they can also be dangerous.
The very first versions of GPT models were banned by developers, even from OpenAI, because they were considered too dangerous. In the sense that people could have abused the way, GPT models generated text and used it to create fake news or spread false information. These models are also highly biased, contain stereotypes, and do not understand language very well.
While it is important to acknowledge these limitations, I do not believe they are a significant issue. As always, it depends on how the models are used.
For example, Google’s search engine does not understand text in human terms, yet it provides accurate results most of the time. Similarly, language models like ChatGPT can be used for specific purposes without requiring a deep understanding of language. These models do not need to understand language in the way that humans do.
It is crucial to double-check or even triple-check the answers generated by these models, even if they make grammatical and semantic sense. Generated text can seem smooth and convincing, but it may contain contradictions or be logically impossible. Additionally, models of this type do not have a notion of time since they are trained on snapshots of data.
This means that, for example, if one asks who’s the president of the United States or Italy and who were the presidents before, ChatGPT cannot give you answers of this type because there’s no notion of time. Having no notion of time means there’s no way of telling which fact came first. President A and president B are both valid because they both were presidents at some point in time, just in two different timeframes. There is no notion of time, there is no notion of knowledge either. The so-called knowledge awareness is not present. The fact that what the model knows is a concept that is not there. That’s why they can make stuff up and mix it into real facts coming from the training set, plus the generated facts generated by the model itself, and they would still be mixed up and look legit. Because there is no awareness of knowledge, there is no knowledge of knowledge. Another thing I found is that ChatGPT struggles with numbers and maths.
Mathematics is something that is not a piece of cake, except for the usual ‘two plus two’ and similar arithmetic questions. But for the rest, there is no capability that is at least human level capability for performing mathematics, and there is a reason behind that. It’s because the representation of mathematical concepts comes from text. There are much better ways to represent, for example, numbers and mathematical concepts than text. These are some of the limitations that I have in my head. There are probably many more. One, for example, is the fact that the number of parameters I see, I look at it as a big limitation because it doesn’t really help in the democratisation of these models and their availability.
We have to hope that OpenAI keeps ChatGPT available to the public. The day they will shut it down, we will have no ChatGPT and we will have to wait for the next player who has the financial capacity, and the infrastructure to provide a massive model like this one to the rest of the world. There is no democratisation in that respect. Also, there’s no democratisation in the way these models get trained. For that, one needs massive infrastructure and lots of data. These are very data hungry problems; terabytes of data is not even an exaggeration. Such requirements definitely restrict the number and type of people and organisations who can deal with and build models of this calibre.