ChatGPT is already in widespread use, yet very few people still understand how it actually works. From the user’s perspective, generating a response to a prompt can take just a few dozen milliseconds. However, beneath the surface, on a level we cannot see, complex processes are constantly taking place in order to understand the prompt and find the right answer. In this article, we analyze how large language models work, how ChatGPT is built, what role the Transformer architecture plays, and which part of the entire process most urgently requires improvement.

The development of artificial intelligence and machine learning through the example of ChatGPT
ChatGPT, created by OpenAI, is one of the most high-profile achievements in the field of AI and the first solution of its kind to spread so quickly. It is an advanced large language model, or LLM, that uses machine learning to generate responses to prompts based on the input text it receives. By the end of 2023, ChatGPT was handling more than 10 million individual queries per day. Weekly, the technology was being used by 100 million users. Before diving into how ChatGPT itself works, it is worth understanding the broader context in which this technology operates.
There are many definitions of artificial intelligence, and those that attempt to identify actual decision-making potential comparable to that of a human being are often based on the Turing test and its variations. New solutions are described as artificial intelligence, although there is no universal agreement on that label. This does not change the fact that these systems are capable both of making their own decisions within a certain defined field and of performing tasks with a high degree of operational freedom.
The greatest influence on the development of AI comes from progress in machine learning and deep learning. This technology has in fact revolutionized every technological sector, opening up new opportunities and new challenges. It is precisely the ability to learn independently that makes these tools subject to their own ongoing processes of modernization. They require improvement, but constant supervision is not necessary.
Thanks to the use of neural networks to process input data and machine learning to enrich the databases used for generating responses, ChatGPT is regarded as the most flexible and comprehensive chatbot. During the stage of generating an answer to a prompt, the model uses knowledge gathered during training as well as from the conversation itself, which helps it better understand logical contexts. This is also influenced by the use of attention mechanisms, which help ChatGPT determine which elements of the prompt are most important to the user and which should be especially emphasized when generating a response.
How does ChatGPT work? Prompt – response – interaction
The ability to provide smooth responses that sometimes sound strikingly similar to human communication is the result of natural language processing, or NLP. The user’s message is processed as a prompt, which serves as the context for the generated response. The ChatGPT model, based on the Transformer architecture and pre-trained on a vast set of text data, analyzes this prompt and generates a response on that basis. This process relies on understanding the context of the dialogue, syntactic analysis of sentences, and the semantics of words through the use of semantic extensions. The generated response is then presented to the user, creating an interactive dialogue.
The core principle behind an intelligent chat system is the relationship between prompt, response, and interaction. In the next part of this article, we explain exactly which processes are hidden behind each of these elements, and there are quite a lot of them. Even at this stage, however, the third element stands out: interaction. Constantly gathering experience from past conversations makes it possible to generate responses more effectively, in ways that better match the user’s intent. The context of interaction plays an important role in generating responses, enabling the model to analyze the user’s earlier statements and maintain consistency throughout the dialogue. As a result, ChatGPT can generate more natural, grammatically correct, and contextually appropriate responses, which contributes to a higher-quality user experience.
The 5 stages of content generation – from prompt to dialogue cycle
The process of conversing with ChatGPT and other alternative communication modules based on LLMs requires continuous enrichment of the knowledge base used by the algorithms. For the conversation to be useful, the process includes both initial actions related to decoding the user’s input so that it becomes understandable to the algorithms, as well as the use of previously generated responses.
The main 5 stages of content generation
Input processing – the prompt entered by the user begins to be processed and transformed into a form that will be understandable to the language model. At this stage, encoding and NLP processes take place.
Response generation – based on the processed input, the model generates a response using information contained in the training data set and the context of the conversation.
Response evaluation – before the user sees the message, the algorithm goes through a response evaluation stage. Internal criteria are responsible for verifying the coherence of the text, its relevance to the topic, and compliance with the user’s detailed instructions.
Response publication – finally, the answer is presented to the user.
Dialogue cycle – the algorithm remembers previous responses, which influence the creation of subsequent ones. Generating the next response also takes the latest message into account.
The construction of the GPT model resembles an endless set of interconnected mechanisms. AI-generated graphic.
The construction of the GPT model resembles an endless set of interconnected mechanisms. AI-generated graphic.
Processing the input prompt (NLP)
The largest number of processes takes place at the first stage of communication, when the prompt is submitted by the user. At that point, the algorithm faces the first serious problem: what does this set of characters actually mean? In order to understand the message, it is first necessary to process natural language into a set of meanings understandable to algorithms.
The most important elements of the NLP process include
Tokenization – the first stage of decoding, consisting of segmenting the prompt. The message is divided into smaller units of content called tokens. Tokenization enables ChatGPT to understand the structure of the sentence and the type of content sent by the user.
Syntactic analysis – an important stage of NLP is analyzing the prompt at the language level. The divided parts of the prompt, the tokens, are examined for characteristic features in terms of hierarchy, grammatical structure, and more. This information later activates specific commands when creating text, making it possible for ChatGPT to adapt its response style to the user’s communication style.
Decoding and embedding – the initially processed tokens must be encoded in such a way that they correspond to the algorithm’s cognitive categories. At this stage, token embedding takes place in the form of numerical vectors containing encoded semantics. This leads to a form understandable to the algorithm, which is then matched to contexts and meaning groups according to the adopted knowledge maps. ChatGPT is not a homogeneous structure, but rather a system of interconnected vessels. The processed and embedded prompt is sent to the group responsible for specific contexts. The decoding stage may consist of multiple sub-stages depending on the structure of the prompt, the classification of the task, and the contexts involved.
Model process of prompt handling in ChatGPT
Model process of prompt handling in ChatGPT. Source: BeaStollnitz.com
Fine-tuning – this is both a newer part of LLM construction and another element of the processing workflow, in which the processed prompt becomes part of the model’s next round of training, while at the same time the model’s parameters are adjusted before generating the response in order to provide the best answer. The fine-tuning process is especially important when the prompt is classified into a specific meaning group, for example queries related to specialist topics. The model automatically modifies itself to better reflect the characteristics and patterns present in the new training data.
Source analysis – after tokenization, analysis, embedding, and alignment with contexts, the model processes source data in order to find the information that best answers the prompt. The quality of the results depends on the effectiveness of the model’s machine learning and deep learning techniques, the effectiveness of the fine-tuning process, and the type and number of databases on which the model was trained.
Response generation – the prompt processing stage within NLP ends with response generation, which must be semantically and grammatically consistent with the user’s question or statement. Thanks to this, ChatGPT can produce more natural and contextually appropriate answers.
Fine-tuning is a new stage related to LLM programming
Fine-tuning is a new stage related to LLM programming, making it possible, among other things, to direct the program toward carrying out more specific tasks. Source: Addepto.com
Transformer architecture – the key to understanding ChatGPT’s efficiency?
ChatGPT uses the Transformer architecture, hence the name Generative Pre-Trained Transformer, or GPT, which OpenAI developers have been working on since at least 2017. The first mention of the use and characteristics of this architecture appears in the 2017 paper Attention Is All You Need, published through arXiv by Cornell University. Interestingly, one of the authors of the paper is also a Pole, Łukasz Kaiser, a mathematics graduate of the University of Wrocław, which is why one can say that GPT also has Polish roots to some extent.
The Transformer is responsible for processing sequences of data, such as sentences or text fragments, carrying out the NLP-related processes described above, and generating responses. It consists of two main layers: Multi-Head Attention and Feedforward Neural Networks. Before getting into that, however, let us answer the question of what role the Transformer architecture plays in ChatGPT and what it is actually responsible for.
The Transformer architecture is responsible for the overall processing of sequential data. Its main task is not only to analyze the prompt, but above all to examine the relationships between individual elements. These relationships occur not only at the syntactic level, but also at the semantic and contextual levels.
Self-Attention Mechanism
At the core of the Transformer architecture on which ChatGPT is based is the Self-Attention Mechanism. This allows the model to evaluate the importance of particular prompt elements on different levels and analyze the relationships between them. The mechanism calculates a weighted sum of values based on the similarity between query vectors, key vectors, and value vectors.
Procedurally, the attention mechanism is responsible for maintaining communication between the processed tokens. The attention mechanism belongs to the decoder layer, which is part of the previously mentioned processes within prompt processing. It is important to remember that all processes and sub-processes are connected both residually and through skip connections. The former provide an alternative path for data flow in the neural network, allowing some layers to be bypassed and improving training parameters. The latter type of connection is non-hierarchical and occurs between different levels of the system for the purpose of verifying data flow and more. The foundations of such a framework were already presented in 2015 in the paper Deep Residual Learning for Image Recognition.
Multi-Attention Mechanism
To capture different types of relationships between words, the architecture must be designed to use many different sets of self-attention mechanisms. The results of these different mechanisms are combined with each other and always occur in relation to one another. The categories shared between them are referred to as Multi-Attention Mechanism blocks or Multi-Headed Self-Attention.
These blocks are responsible for processing different attention variants in parallel. They also include information processing before and after the operation. Three tensors take part in the process: query, key, and value. The mechanism enables the model to analyze dependencies between different tokens in a given sequence by calculating attention for each token pair. In the case of GPT, Multi-Headed Attention processes the sequence of input tokens for each token pair in different blocks, or heads.
Attention mechanisms are grouped into heads
Attention mechanisms are grouped into heads, which simultaneously and in parallel are responsible for analyzing different properties of the sequence.
The advantage is that each of these blocks can focus on examining different aspects of the relationships between tokens. This translates into the ability to detect different patterns and dependencies within the prompt and its context. For example, one mechanism block may be sensitive to semantic similarity between tokens, while another may focus on analyzing syntactic properties and relationships within the prompt.
Multilayer perceptron (MLP)
After the tokens are processed by the attention mechanisms, the results are passed to multiplied MLP layers, or Multilayer Perceptrons. At this stage, the previously processed prompt data is filtered linearly in order to further identify relationships within the numerical data. At this level, ChatGPT uses both nonlinear and linear functions, such as the Rectified Linear Unit.
MLP is a type of feedforward neural network. The construction of such networks is based on units being passed in only one direction, from input nodes to output nodes, without loops or cycles. However, the neural network does not merely distinguish non-characteristic features and examine dependencies in the input data. The MLP module also involves processes related to automatic learning occurring inside the perceptron. After the data is processed, the weights of individual connections are adjusted and verified against the final result, which is an example of supervised self-learning.
Training ChatGPT
The Transformer architecture in all versions of ChatGPT, GPT-3, GPT-3.5, and GPT-4, enables continuous learning and the ability to draw conclusions from conversations conducted with the user. Initially, the model underwent a process called supervised fine-tuning, during which OpenAI trainers played the roles of both users and AI bots. This made it possible to continue work related to sequencing dialogues and modeling communication patterns. The model was then improved through the implementation of a reward mechanism. Good answers were marked by testers as correct, while inappropriate ones were marked as incorrect. The data used for training was said to include 570 GB of quantitative and qualitative data, including scientific publications, web articles, books, encyclopedias, and other sources.



