We talk to Dr. Jan Kocoń from the Department of Artificial Intelligence about how large language models hallucinate, how they detect gaps in their knowledge, gain consciousness, and about the constant superiority of humans over the Internet and technology.
Artificial intelligence is now a word used in almost every context, even though non-specialists in the field tend to have a varying understanding of this concept. It may seem that artificial intelligence knows everything or almost everything, but this is obviously not true. And here a new increasingly popular term appears, i.e. hallucinations. What is it all about? Fabrication, lies, mistakes?
We can define a model's hallucinations as the way in which that model responds. The model responds as if it was fully convinced of the truth of its answer. Everything is fine when it gives true information. However, when it provides false information with an identical certainty, we have a hallucinating AI. Based on the methods we currently have, hallucinations can be identified most easily when it comes to facts, because they are easy to verify. On the other hand, if we take into consideration creative answers, which are per se different even when the same question is asked repeatedly, then in the perspective of our detection methods we are also dealing with a hallucination.
Hallucination is therefore a certain feature of the network. It is not a mistake, but a certain feature that we want to fight when we ask about facts, but which we appreciate when we expect creative answers, i.e. when it is not about the well-known truth, but about creative diversity.
How can we make the model know when to provide reliable facts and when to provide creative answers? Is it people that should learn how to ask questions properly? Or should the model be trained to only allow itself to hallucinate when we expect it to do so?
Tough question. It's all pretty fresh these days. We are still looking for solutions. One of the methods that we have adapted for the purposes of hallucination detection is the method in which we treat such a language model a bit like an interrogated person. We ask it the same question over and over again.
Waiting for it to make a mistake?
Rather, checking if its answers are consistent. Sometimes the model may use different words to provide the same answer. In such case, it is not a hallucination. We have some semantic consistency in a group of answers to the same question. But there may also be a situation in which the answers differ each time. If we ask a question about facts and the model gives different answers, there is a high risk that we are dealing with a hallucination.
Hallucinations are often related to the model's insufficient knowledge on a particular topic. This may be the case for various reasons. For example, the issue we ask about was underrepresented in the texts on which the model was trained. The question makes the bell ring, so the model responds. However, it is not aware of its ignorance. And then we have a hallucination, i.e. a false answer given with great certainty.
For example, when we ask ChatGPT about some details on the history of Poland or Polish culture, which it is ignorant of?
Yes. However, this model is evolving. It is used by hundreds of thousands or rather hundreds of millions of users, and many of them send feedback to the company. You can also click and indicate whether you like a given response or not. Based on these assessments, gaps are patched.
We started wondering if it was possible to tune the model so that it was (in really big quotation marks) „aware of its ignorance." We want to optimize the model's response preferences so that, when it has little knowledge of the topic, it simply responds: „I don't know" instead of saying untrue things with great certainty. If we can detect gaps in the model's knowledge, perhaps we can also tune it to ask sensible questions about those gaps. Sensible, i.e. such that are somehow verifiable, for which we can theoretically look for answers externally, either in Google or from an expert in a given field. We strive for the model to be curious about its ignorance, able to ask about it in a sensible way and able to acquire knowledge from external sources. Then, once it has the needed information, it could learn more so that the gaps would be better patched in the next iteration.
Of course, new knowledge may cause new vulnerabilities to appear. This is interesting because we are running out of data for training the network. I'm talking about the whole world. The largest companies that train this type of models are slowly running out of data for training. This is about text data available on the Internet. Most of such data have already been used for training. We will soon run out of data for training and the model is still imperfect. It would be good if it could ask people questions. We can imagine a group of scientists that explore a topic and are well versed in it. Experts are aware of some of their ignorance, but they also know what fields of exploration are reasonable, what hypotheses can be put forward and tested. We would like to ensure that the neural network, having all the available knowledge and being aware of the gaps in its knowledge, would be able to ask questions, and people would – at least for now – answer these questions.
Is it optimistic news that even though AI can read or search the entire Internet, it will still need people?
I think it will remain like this for a long time and that it will be the profession of the future – people creating data for training AI models. The text is quite a "simple" material, easily available, in large quantities, everywhere. But if we are to create, for example, robots that can remove laundry from the washing machine and hang it, there are no such data sets for training. They need to be created.
So now, if the model detects a gap, but has access to information on the network, it can fill the gap itself? Or is it still distant future?
Assuming that this is a gap that can be answered online, then yes. Currently, we use solutions known in jargon as RAG (Retrieval-Augmented Generation). Let's say we have a query to the model, a certain database of texts, and in it we look for texts that semantically resemble what we are asking about. We can imagine that if the model is able to ask a sensible question about its ignorance, we can send such a question to Google and we will receive many documents with the answer. Then we can extract the text content, pass it back to the model and ask it to look for the answer to its question in these documents. Of course, we can't guarantee that it will find the answer, but hypothetically: if the ignorance can be supplemented in such a manner, then yes – it most certainly can fill its own gaps.
The quality of documents is important, because if the model learns from incorrect content...
This is always a problem. But we neither come across only reliable content on the Internet. A lot of people fall for fake news, but there is also a large group that is able to sort out the truth from the lies. If there is some verification procedure (it may even be very difficult to express in words), we can hope that if we show the model thousands of examples of reliable and unreliable answers, it will also learn this method of verification.
And its awareness will increase?
Yes. „Awareness" is a bit of a clumsy term here, because it belongs to people. On the other hand, we use more and more terms reserved for humans. For example, in the scientific article, or rather its preprint, „Into the Unknown: Self-Learning Large Language Models" which we wrote with Teddy Ferdinan and Prof. Przemysław Kazienko, and which we recently published, we present the first results of our research in the context of self-learning models. In it, we defined measures such as the curiosity of the model. The more diverse questions a model asks in areas where it is „aware" of its ignorance, the more „curious" it is. That's how we defined it. This nomenclature will slowly permeate to AI-related disciplines.
Humanize?
A bit yes, but it's a matter of imperfect terminology, because the material is very new.
And we must use the vocabulary we have.
Yes. Besides, we ourselves have a problem with defining consciousness in humans. We have a name for this concept, but it is difficult to say exactly what this state is. This state is also not yet understood and described in the case of the human brain.
And probably quite subjective. Another such name taken from human terminology is „hallucination". It resembles the condition of a person who sees or hears something that is not real, but is convinced otherwise.
Yes. Many people say that this term is also an unfortunate description of this phenomenon. I’m not sure of that. Certain terms often enter the jargon, come into use and become self-defining. There will always be someone who doesn't like the term, but it's not worth long discussions, especially since it's not that controversial.
If the model is self-aware, if it learns on its own and fills its knowledge gaps, will it be able to make its own decisions at some point? Should we be afraid of it or not?
I don't know if I should be afraid of this. We see that – at least for now – a lot of human moderation is needed. It is the case of large language models, in which the key to making the model behave the way we want it to do is to create a set of preferences, i.e. instructions with the answers we expect from the model. So if we ask a model about its ignorance, it sometimes asks unhelpful questions. For example, the model might say: „I don't know who will be president of the United States in 2050. „Yes, it identified its knowledge gap, but we also have it and no one will fill it until 2050. From our perspective, this is a question of ignorance that makes no sense. Supervision is therefore needed, at least now, to give the model feedback on whether its question about ignorance makes sense.
We are experimenting with strong language models in such a direction that they eventually are able to make decisions for people. However, most of what I say are plans rather than ready implementation possibilities. I will then have the following answer: at the moment, AI capable of independent decision making is unlikely to be possible in most areas.
Even more so as the „machine" still makes a lot of mistakes. Even if we mean about one in 10 cases, it is still too many. But there are AI solutions, including those related to medicine, for example, in diagnostics, which are better than the average general practitioner. Because they can aggregate knowledge from many sources and see many more cases. I think that sooner or later we will reach a point where the autonomy of such models will be greater and greater and we will agree with most of the decisions they make, because they will improve the quality of our lives.
A final question: we have a new supercomputer at Wrocław University of Science and Technology, equipped with NVIDIA H100 graphics cards for artificial intelligence processing. Is this a tool that will help you accelerate or expand the scale of your research?
First of all, it will provide opportunities to carry out any large-scale research. The best models used around the world, such as Chat GPT, require at least several hundred, and often even several hundred thousand, graphics cards, such as the 300 units that we have here. But with these 300, we have one of the best machines in Poland, and even in Europe. This is an opportunity to perform various tasks related to the development of AI at our university.