Publicado em March 22, 2023

What's Behind ChatGPT, and Why It's So Awesome

The future remains promising for technologies related to artificial intelligence, and without a doubt, we live in very interesting times.

ChatGPT is the new technological hype of the moment. Launched on November 30, 2022, already in the first month of 2023reached the mark of more than 100 million active users. And it's not for nothing, the results it can present are really impressive, especially for those who don't have artificial intelligence in their bedside reading field.

The acronym refers to one of the latest artificial intelligence algorithms launched by the company OpenAI, a startup that has quickly become one of the most important and influential companies in the sector. The name is a mixture of chatbot, conversational robots, and generative type algorithms that OpenAi has been releasing called GPT (Generative Pre-Trained Models).

Generative models are those that manage to generate content. OpenAI's GPTs are algorithms that allow processing human language. And pre-trained models, as the translation of the name GPT refers to, are models that need a huge set of training data to be able to learn, using the best of machine learning concepts (machine learning) and artificial neural networks.

For Those who follow OpenAI know that ChatGPT is an evolution of previous algorithms, which were already very powerful. Now, what were the elements that made ChatGPT become a popularity champion? I'll give my guesses here:


On Tuesday (14), OpenAI announced the release of its newest language model, GPT-4. Compared to its predecessor, released in March 2022, GPT 3.5, we have some news. The company this time did not reveal more details about the number of GPT-4 parameters, citing strategic reasons, anyway the knowledge base of GPT 3.5 was already impressive.

The previous model was trained with the same 175 billion parameters as its predecessor, GPT-3. Parameters, roughly speaking, can be compared with synapses of biological neurons. A biological human brain, it is estimated, has somewhere between 100 trillion and 1 quadrillion synapses. GPT 3.5 is powered by the very best of the internet in terms of content, like the entire Wikipedia, a huge corpus ofnewspapers and scientific articles, books by different authors, information from newspapers and blogs, patents, among other things. Content has been carefully curated by humans, already considering learnings related to previous versions.

As for GPT-4, as there is no official information available, we can speculate. OpenAI states that GPT-4 "can solve difficult problems with greater precision, thanks to its broader general knowledge and problem-solving abilities." She spent six months making GPT-4 more secure and aligned. “GPT-4 is 82% less likely to respond to requests for prohibited content and 40% more likely to produce factual responses than GPT-3.5 in our internal assessments,” says the company. Some of the speculations are that the GPT-4 has been fed by knowledge bases on specific topics through the fine-tuning approaches of the model (fine-tuning), which explains the improvement in relation to its factual response rate. What can be great for several areas of activity and was demonstrated live, in the company's launch live, in relation to American tax knowledge.

In addition, GPT-type algorithms use a technological architecture of artificial neural networks known as “Transformers”, which manage to manipulate linguistic content with almost the same ease as we manipulate numbers in mathematical operations. When reading a text that talks about sleeves, shoes and the new fashion in Paris, he easily ends up inferring, by probability, that the sleeves in question must be from shirts, not fruits. More than that, it manages to assign causality relationships, infer which is the correct term, even with spelling or grammar problems, compare concepts, translate languages, among many other things.


Yes, use a chatbot framework for more conversationfluid with its users, especially lay people, allowed a leap in popularity and a breakdown of the barriers to using the algorithm.

Chatbots are already very popular in Brazil, andfor For many of us, talking to the algorithm through a familiar interface makes using them more intuitive and accessible.

Furthermore, the use of a conversational interface improves the “relationship” between the user and the algorithm, enhancing the user experience, and also organizing information and data, making it easier for users to find relevant information.


This other acronym, RLHF, stands for Reinforcement Learning from Human Feedback, which is a type of machine learning (in the literal translation it would be Reinforcement Learning through Human Feedback), is one of the novelties in this new version. It is encapsulated in an algorithm called InstructGPT, an integral part of ChatGPT.

From a conceptual point of view, the layer is not new, but its implementation in ChatGPT promoted a qualitative leap in the ability to deliver adherent results. Some of the built-in benefits:

Reduced hallucinations: and no, you're not getting it all wrong. ChatGPT-like language algorithms can also “travel in mayonnaise” and write things that, while grammatically consistent, and even probabilistically plausible, are not supported by facts. The RLHF layer penalizes less plausible probabilistic constructions of human language. No magic about it. The humans used in the feedback that shaped the ChatGPT “personality” layer don’t like “fisherman stories”

Less aggressive: using a less aggressive tone of language is something that needed to be incorporated as an organizational policy at OpenAI, and the humans who participated in feedback to the algorithmpenalize more aggressive messages.

Attention to sensitive topics: also incorporated into the model through human feedback, the algorithm was directed to avoid responding on topics such as religion, politics, sexual preferences, suicide, and instructions for illegal activities, among other sensitive subjects.


Few people have talked about it. Anyway, one of the most interesting aspects of using ChatGPT is its ability to keep a lot of conversation history information in its “memory” for later contextual reference.

Everything indicates significant advances in its cognitive architecture bringing the insertion of external data to the language model to be processed together with the Conversation Prompt.

Models that allow storing “memories” in an external and structured way have been explored a few times in the past. The concept of “external memory” in neural networks, such as “Memory Networks”, is one of the alternatives to the concepts of Short and Long Term Memory Neural Networks (LSTM), where information is passed layer by layer in a neural network. . Another alternative was to use multiple focus of attention, used in the “transformer” architecture responsible for the success of the Large Language Models (LLM) having the GPTs as one of its best known representatives.

With layers that organize data external to the language model, you can refine your knowledge throughout a conversation, ask for

comparisons, request revisions, among other things, in an incredible efficiency gain. One of the key gains from using the increased volume of single-entry Tokens released in the GPT-4 update is the ability to “remember” and use context from the past to help answer questions in the present. This can allow ChatGPT to develop a deeper understanding of what was said earlier, thereby improving your responses.

In addition, the concept of “External Memory” as the use of information available on the internet, “Embeddings”, “Plugins” also allows ChatGPT to learn faster, as it does not need to process/train the entire language model again . Instead, he can use information already stored in his memory to answer the questions. This means that ChatGPT can become more flexible and faster when dealing with complex information, as well as being able to remember the entire conversation.

The ability to store content in the “short term” memory of GPT-4 has significantly increased compared to its previous version, which will generate impacts on this attribute of ChatGPT in the near future. The new model has a maximum token count of 32,768, which means you can feed text up to 25,000 words into a single entry.


Although I I still believe that there is a long way to go to build general purpose artificial intelligence, ChatGPT already performs better than humans, on average, with regard to many activities that involve language. Algorithms that understand well what we want to say, and also manage to produce quality written content (among these contents, programming codes, music and jokes), can be great allies for us humans in the search for greater efficiency and to get rid of operational tasks .

Many innovations are yet to come: the combination of architectures that seek real-time information on the internet as well as using a pre-acquired knowledge base in previous training, as well as the construction of increasingly sophisticated software architectures. Called multimodal Generative AI, it can interpret images, text, sound as well as produce content in these different modalities. The new version of the language model, GPT-4, incorporates some of these new features.

The future remains bright for technologies related to artificial intelligence, and without a doubt, we live in very interesting times. Who lives will see.

ERRATA: The term Memory Networks was used as a synonym for storage architecture external to the original GPT model used. Although the name Memory Networks refers to one such model that uses this approach, many other different architectures have been used to successfully combine prompt information with factual information available on the internet or from new bodies of knowledge. Dialog architectures like LaMDA https://arxiv.org/pdf/2201.08239.pdf, and vector structures like Embeddings (OpenAI) are some examples. Technically, the name "Memory Networks", whose origin comes from the 2014 article (https://arxiv.org/abs/1410.3916), brings a proposal to change the paradigm of memory structures internal to Neural Networks to that of LSTM (Long Short Term Memory), and is not explicitly mentioned in the OpenAI Papers. This article, in this sense, suggests that there are benefits of combining external informational references to LLM. The original text was modified.

598 reads 126 Likes

About the author

Alexandre Del Rey
Sempre aprendendo

Alexandre Del Rey

Conselheiro & Founder I2AI

Conselheiro fundador da I2AI – Associação Internacional de Inteligência Artificial. Também é sócio-fundador da Engrama, sócios das Startups D2i e Egronn e investidor nas startups Agrointeli e CleanCloud. Tem mais de 20 anos de experiência em multinacionais como Siemens, Eaton e Voith, com vivência em países e culturas tão diversas como Estados Unidos, Alemanha e China.
Palestrante internacional, professor, pesquisador, autor, empreendedor serial, e amante de tecnologia. É apaixonado pelo os temas de Estratégia, Inteligência Competitiva e Inovação.
É Doutor em Gestão da Inovação e Mestre em Redes Bayesianas (abordagem de IA) pela FEA-USP. É pós-graduado em Administração pela FGV e graduado em Engenharia Mecânica pela Unicamp.

Read too

598 reads 126 Likes
Café com Presidente
Próximo Evento
Oct. 26, 2023

Café com Presidente

Uma conversa com o Presidente da I2AI - Onédio S. Seabra Júnior - para falarmos sobre os temas mais quentes de Transformação Digital e Inteligência Artificial, num bate-papo informal com