The launch of ChatGPT has transformed the way people operate in their daily lives. Introduced as a chat demo for OpenAI’s LLMs (Large Language Models), ChatGPT has become invaluable across the world. It is one of the most popular AI models that can communicate with you like your best friend. From automating your daily tasks, and writing tedious emails/code to having a deep conversation about almost anything (provided it should be ethical), it has changed perspectives on life and the future.
But, this isn’t it! The more you use it, the more you explore it – that’s the beauty of this AI tool. As OpenAI keeps adding new features with new models, it is becoming challenging to find how exactly this works. Have you wondered why ChatGPT is ahead in the race? What is so special about it?
To get an in-depth understanding, we have compiled all the technical aspects that can help you get a deep knowledge - from basic to technical, everything is listed in one place. Are you excited for this exploration? Well, we are.
Come, let’s begin.
What is ChatGPT?
As we use it, it is fair to say that ChatGPT is quite easy to access and use. Powered by advanced AI models like GPT-4o, o1-mini, and DALL.E 3, it has a user-friendly interface that has answers to all your silly/ serious/ complex/ hilarious questions. From writing quirky content, and summarizing images, to automating workflows and translating natural language into code, it can perform all the activities from head to toe. The only catch is how detailed, specific and clear your prompt is. Once you know the drill, you no longer need to use different AI applications for your complex tasks, just simply log in to ChatGPT and start a new conversation.
Integrate all your data on one platform
But, do you know how it works behind bars? How does it resolve all your queries in an instant? This is where the actual question lies. So the answer is simple - ChatGPT often relies on Artificial Intelligence (which is obvious) to choose the most relevant model or a combination of different AI models to find the answer to your specific question. To keep it very specific and easy for it to answer, it is always better to choose an AI model to handle your request. Otherwise, it will automatically choose on its own. But, do you know what AI models are these and in which area they individually specialize? Let’s unveil it!
- GPT-4o and GPT-4o work the best to evaluate images and understand and write about them in just a few seconds.
- DALL.E 3 is specifically used to generate images
- O1-preview and 01-mini are used to respond to questions and even prompts that require advanced solutions.
Wait, there is more to it. How can we not consider its new features?
- Voice Model - This uses GPT-4o to quickly have a real-time audio conversation with you and thereby solve all your queries.

- Search Model - This is most importantly used to search across the web, summarize the text, and even add citations and references in case of any research-based information.

- Reason Model - This is specifically used to analyze, interpret provide logical reasoning to a given problem.

Now that you are well aware of its flexibility and versatile nature, you know how much complexity is involved in the process of answering your requests, right? As we burst this bubble, let’s gain a deeper understanding of its back-end variants.
What is ChatGPT API?
Open AI does not offer ChatGPT standalone. It has built a ‘ChatGPT API’ platform where many developers can maximize the use of ChatGPT and integrate it into their own systems or applications This is great news for companies to provide seamless answers to their customers within their applications or services.
For instance, Boltic uses ChatGPT API to seamlessly integrate it within the systems (clearly, it has some price attached to it). With this, you can easily get answers to your prompts on thousands of applications to automate your daily tasks without any coding. Here are some of the in-built applications that you can pair it:
- Read Sheet - It can help you read and automate appointments and work schedules.
- Function - This can aid in automating various sales management activities as well as daily responsibilities.
- Mapper - This can help you easily map and automate your schedule and daily tasks systematically.
To learn more about such applications that can be easily integrated with ChatGPT, visit here.
How does ChatGPT work?
The GPT in ChatGPT is an abbreviation for Generative Pre-Trained Transformer. Until OpenAI released the 01 family of AI Models, all the LLMs (Large Language Models) and LMMs (Large Multimodal Models) of OpenAI were called GPT - X, similar to GPT-4o. As you can see, these names often keep changing and this is how things work behind the scenes.
To dive into this complex behaviour of ChatGPT, we have listed many technical concepts that are vital to gaining a better understanding of LLMs, LMMs, and the other AI models that are essentially used by ChatGPT to perform tasks.
Supervised vs unsupervised learning
The act of developing an AI model is called ‘Training’. The P in GPT indicates ‘pre-trained’. Like all other modern AI models, the new o1 family models are also pre-trained in nature. This says that Open AI has just dropped the idea of including P from its naming scheme.
So, what does this GPT actually do and how does it operate? To get into the details, it is imperative to crack the answer to this question.
Here’s the entire story - Before the GPT-1 was launched, to develop the algorithms of best-performing AI models, ‘Supervised Learning’ was used. Initially, manually labeled data was used to train these models. Similar to a database that has images of different plants with a text description of each plant written originally by humans, this was created. This type of training data is quite effective in some matters but is too expensive in reality. This is why, even today there isn’t large labeled data available to train LLMs (Large Language Models).
So instead of this expensive data, GPT-1 relies on pre-training. So what’s this? This is where some ground rules were set and then large amounts of unlabelled data were used, like an open internet. This unlabelled data was left unsupervised and was freely allowed to form its own set of rules and relationships that align with the text.
Unlike labelled data, unlabeled data being comparatively cheaper in rate, OpenAI continued with this pre-training process and created many powerful AI models with the help of massive data, involving non-text data too. For instance, GPT-4o was trained in a similar way using this idea, with a mix of image and audio data too. And thus, GPT-40 knows not only what a jackfruit is, but also how it looks.
No matter how helpful this unlabeled data is, OpenAI knew that it cannot completely rely on it. So, what happened? They started fine-tuning all the unlabeled data to make its pattern predictably appropriate to handle a particular request. This was performed using different forms of supervised learning.
How does pre-training work?
As we saw above, there are two types of pre-training - supervised and unsupervised. Most of the AI models are unsupervised. What is it? When a supervised training approach is used for a model, the model is trained properly to map a particular function that can easily help to provide answers to the given prompt.
For instance, AI models are trained on a data set of medical consultations, where symptoms and concerns of patients are labelled with appropriate responses. So, if you ask ‘What should I do to cure my persistent cough?’ An AI model will provide systematic answers like ‘Many factors can cause persistent cough problems. You must hydrate, take proper rest, and consult a doctor if it lasts for more than 2 weeks or is followed by other symptoms such as fever or shortness of breath.’
Thus, a supervised approach is often used to perform tasks such as regression, classification, and sequence labelling. But, as everything has its own pros and cons, even this approach has. Despite providing better and appropriate answers, it can be scaled up to a certain limit. Since the supervised learning approach involves humans, they can go beyond the bars to predict the user inputs and craft individual output. The only catch is that the process can be time-consuming and the responses would most probably depend on the level of their expertise.
As a result, it is nearly impossible to predict all the questions that would be asked in the future, so this way it is not wise for ChatGPT to rely completely on supervised data. So, this is where unsupervised pre-training provides an effective way.
In contrast to supervised learning, when an AI model is pre-trained as per a non-supervised approach, no specific output is linked to each input. Rather, its focus is to make AI models learn specific patterns and structures of the input data without considering any task. This approach is suitable for responding to prompts that are related to the topic like anomaly detection, clustering, and even dimensionality reduction.
For example, AI models are trained on a large database of medical articles, research papers, and patient records without any specific label. So, in the case of an unsupervised learning approach, if you write a prompt asking ‘What are the most common causes of persistent cough?’ The AI model might provide answers like this ‘Persistent cough can be related to infections, allergies, acid reflux, or some chronic conditions such as asthma. It’s always best to consider all additional systems and consult a doctor for proper diagnosis and treatment.’
So, to provide limitless information, unsupervised pre-training works the best. As humans are not involved in the process, it is a faster process as compared to a supervised one. Also, the best part is that developers just need to dump endless information into this pre-training process, instead of having knowledge about everything that might be asked by a user.
Human involvement in pre-training
Even though the supervised learning approach is not scalable enough, there is evidence that human brains are involved in the development of AI models for users. As per ChatGPT, OpenAI was pre-trained by combining both supervised and unsupervised learning techniques. In fact, as per the latest reports, the data labelling market is predicted to reach USD 8.2 billion in 2028. This demonstrates the rise in demand for high-quality databases.
How are these databases formed? Many companies often depend on outsourced labour in some of the developed countries. This is where workers quite often earn low wages to process large datasets. This type of dependency on human brains has pros, but it's unfortunate to see that it also creates ethical concerns, such as risks associated with mental health, exploitation of labourers, high wages, etc.
Training dataset of ChatGPT
The dataset that is used to train ChatGPT is massive. It is based on GPT-3 (abbreviated as Generative Pre-trained Transformer 3) architecture. This model was trained in WebText2. What is this? This is a library of about 45 terabytes (merely text data).
Typically, the free version of ChatGPT was trained on GPT-3 and was lastly updated to a better model - GPT-4o. To access them in a better form, you can switch to ChatGPT Plus (paid version), and use a more extensive version of the GPT-3, GPT-4, and GPT-4o dataset.
What is transformer architecture?
As we dive deeper into how training works, let us move a step forward - All the training data aims to create a deep learning neural network. Now, what is this? It is a multi-layered, complex, and high-weighted algorithm that is modeled after human intelligence. So, the human brain is typically used by ChatGPT to often learn about relationships and patterns present in the text data and create relevant and human-type responses by just predicting which word will fit next in a sentence.
For example - Let’s give a model an incomplete sentence and ask to fill it with the appropriate word - ‘The dog sat on the”
Using this deep neural network, the GPT model will first analyze the text and predict the most appropriate word to fit in the context. Depending on the training data, it may suggest:
- Mat (most common word that fits well in any generic sentence)
- Sofa (this is also a reasonably fit word)
- Windowsill (this is not so common, but can be used)
This network uses transformer architecture. It is denoted by ‘T’ in the GPT. This was proposed back in 2017 and it is critical to understand its importance while analyzing AI models. Come, let’s uncover this concept.
The transformer model simplified how AI algorithms were initially designed. It enables computations to work in parallelly in order to reduce training time. This not only made the AI models faster but also cheaper in a way.
In the olden days, RNNs (Recurrent Neural Networks) were used to read text from left to right in a sequence. This is a great concept when concepts and words are placed beside each other. But, what if they are placed at a distance? This is where transformers come into the picture. It reads every word in a sentence altogether and compares each word with one another. After doing so, they simply put their attention to the most important and relevant words in a particular sentence, no matter where they are placed in it. Apart from that, it can be performed in a parallel way on the latest modern software. This is the main process of these transformers called ‘self-attention’.
Not just this, these transformers are simplifying many things. But, do they work alone? No, they work with tokens. These tokens are often referred to as chunks of image or text, which are encoded as a vector. This vector is often a number with a specific direction and a position. The two closer token vectors are placed away from each other, the more related they are, the more distant they are kept. In the same pattern, ‘attention’ is represented as a vector. It allows neural networks that are entirely related to transformers to remember the essential information they read previously in a particular paragraph.
What are tokens?
As mentioned above, tokens play an imperative part in the technical arena of ChatGPT. But, what does it actually mean? So, all images, text, and audio are broken into small tokens merely to align with the text. In fact, the original model behind the art of ChatGPT, GPT-3 was initially trained on over 500 billion tokens. This enables its language model to easily align with the meaning of the text and predict the next expected words in a sentence. This is performed by structuring words and their relationship in the form of mathematical points.
Even though complex or longer words are segmented into smaller tokens. On average, most of the tokens have 4 characters. The information about the inner workings of ChatGPT is completely revealed. But, from what is visible about GPT -o1 and 4o it is safe to assume that it was previously trained on the same dataset to collect extra data as much as it could.

All the text-based tokens are sourced from a large repository of data that are written by humans. This is true at least for GPT-3. These tokens include articles, books, and several other documents that revolve around different topics such as genres, and styles. A large chunk of this content is extracted from the open internet. This simply means that a vast amount of human knowledge is used to build a foundation of its language model.
At present, most researchers are facing a lack of human-based training data. This has led other AI models like o1 to rely on AI-based training data. This is done without considering all the audio and image data, which again needs to be broken into smaller tokens.
On the basis of this training, the neural network of GPT-3 has 175 billion variables that use a prompt and assign different parameters and a small portion of randomness, depending on the weights and values, and then generate an appropriate output that matches the question being asked by the user. Even though OpenAI has not yet disclosed how many parameters GPT-4o mini, the GPT-4o, or any versions of o1 has in reality, we can simply assume that the count exceeds 175 billion but does not go beyond 100 trillion parameters, particularly when parameters are required to support additional modalities.
Despite the uncertainty of real numbers, higher parameters do not really mean that it is beneficial. This is because many recent developments in model performance and power have witnessed many parameters, but the improvements are still negative as they might be poorly trained.
However, corporates are witnessing fierce competition among AI-based companies. This means that the researchers are unwilling or unable to reveal in-depth information about the process and the development of their respective models.
What is RLHF?
As LLMs (Large Language Models) rely on pre-training data, it is completely unsuitable to launch without any additional training process. What do you think about it? Isn’t it horrifying to use an AI model that is built with almost zero guidance? Will it be able to handle the requests of the users appropriately? Come, let’s dig into it.
So, how do we refine its abilities and train them appropriately? RLHF (Reinforcement Learning with Human Feedback) is the solution. It is a technique that optimizes its data in a way that it can easily respond to different types of requests in a safe and effective manner.
Initially, to make the work easier, OpenAI created some demonstrations to show how neural networks must respond in the most common situations. With the help of this demonstration, it even created a reward-based model that includes a comparison between two or more responses of the models that were typically ranked by AI trainers. The purpose behind this was simple - To make the AI understand and learn which type of responses are most appropriate to answer any request. While not completely operating as a form of supervised learning, RLHF helps to fine-tune various neural networks like GPT.

Apart from that, this reinforcement learning also aids in making AI models more safe and secure. It takes away all its biased and inappropriate responses and optimizes them effectively, giving it the shape of a human-like dialogue. The advancements of this learning work on each model and make it more reliable and safer for all types of users.
For instance, the o1 family of AI models was trained with the help of this RLHF to analyze and think thoroughly about a problem using a technique named, CoT (Chain-of-thought).
What is a Chain-of-thought reasoning (CoT)?
GPT-4o, a type of LLM, often faces challenges while handling a complex or a multi-step problem. This is because they are trained in a way that can only respond with generic and obvious answers to challenging questions.
A prompt says:
A man has 17 buffaloes, and all but 9 ran away yesterday. How many total buffaloes are left with the man?
GPT-4o Might Answer:
Since 17 buffaloes ran away, the man is left with none.
The Actual Answer:
The man is still left with 9 buffaloes, as ‘all but 9’ stayed 9 only.
The above example clearly indicates the incapability of GPT-40 to handle tricky math problems or logical puzzles that require more hard work. To avoid such responses, CoT helps. What does it do?
It breaks the complex problems into small equal parts and takes some time to actually understand, analyze, perform various trial and error methods, and then finally provide an appropriate solution. Rather than responding instantly, it takes a little extra time to evaluate and perform additional computing to respond correctly. This is visible in the o1 model of ChatGPT. Thus, it uses 01 when the prompts are relevant to it.
What is Natural Language Processing (NLP)?
.webp)
The end goal behind refining the AI models of OpenAI is to simply make it as effective for NLP (Natural Language Processing). NLP is a kind of a category that involves various aspects of AI, such as machine translation, speech recognition, and chatbots that can be taken as a process to make AI understand and learn about syntax and rules of various languages, created to develop complex algorithms representing those areas and then use them to perform specific tasks.
The best example of how NLP enhances AI interaction is quite evident in customer service-based chatbots.
When a customer asks the chatbot, ‘Can I return a product that I purchased last week?’ In this scenario, the chatbot makes effective use of NLP to:
- Understand the actual intent behind this question.
- Understand the main points such as ‘last week’ (related to the time frame).
- Analyze this by referring to the return policy of the company
- Write a correct response like - Yes, you can return your product within 15 days of purchase. Would you want me to guide you through the entire process?’
From understanding the structure of the sentence, its meaning, and the real context behind the words, ChatGPT helps to seamlessly interact with the customer, just like a salesperson would do physically.
Multimodality in ChatGPT
As we saw above NLP (Natural Language Processing) is an integral part of ChatGPT, it is crucial to know that it is multimodal in nature. This means that ChatGPT has the ability to understand images, text, and even audio, as a form of the same prompt.
Because of this multimodality, it can easily process images, documents, long paragraphs, or any graphs/charts and can even generate images through DALL.E 3. With its latest voice mode, you can even chat, interrupt, and have a human-like conversation with ChatGPT.

Two main phases of ChatGPT operation
Let’s say you open Google and search for something, how would it answer your query? Would it scan the entire web to find the relevant answer? No. Instead, it would retrieve the pages from the existing data that might help you find the right answer. To carry out this process, Google has two primary phases - the data-gathering and spidering phase; and the lookup phase.
ChatGPT slightly works in a similar way. The data-gathering phase is named as pre-training and the lookup phase that often interacts with the user is called ‘inference’. The way Generative AI works is magical and thus this pre-training model has gained momentum. The reason behind this is its ability to be scalable and flexible.
What is the best part about ChatGPT?
While you learn how ChatGPT has been developed, do you know what makes it stand out? Whenever it generates a response, it keeps asking itself - ‘What word should be next based on what I have already written so far’ and then keeps adding words to it. In short, it generates the next part of the word (token) and thus it sometimes creates unusual or new words and phrases.
For example, if you write a prompt in ChatGPT - ‘Write a story about a colorful city’, this AI will generate a response that might have an opening sentence like, ‘In the year 2070, the city of Mumbai covered with neon lights, holographic billboards, glowing murals that would paint the skyline, in contrast, shades of blue, gold and red.’
After this sentence, this AI tool will start predicting the next words continuously, taking the last word it has already generated. This way, it keeps expanding the content and in case it fails to find a perfect word to fit, it would even start inventing words or phrases like ‘prism towers’ and ‘chromoflare streets’, thereby adding a uniqueness to the piece.
So, what’s the science behind this? How does it operate in the backend? It is able to pick the best possible words from a wide list of probabilities. But how does it analyze which word would fit in the context? Does this work on a ranking basis? (the word that has the highest ‘probability’) If this were the case, then one day everything would be just too scientific and never creative, in short, boring! The other way is the random pick where we randomly pick low-ranked words. This is where we can get creative content.
How does this randomness work? It means if you type the same prompt many times in ChatGPT, it will generate a different essay each time for you. This actually works, have you noticed? This is because it uses a bit of randomness and a ‘temperature’ setting. This setting controls the creativity generated by ChatGPT (to create a balance between creativity and actual answers). It uses a high temperature similar to 0.8 to make responses more relevant and creative. Here, a lower temperature makes the response more predictable and generic for the user (so it often avoids it).
Note: Even though the word ‘temperature’ comes from statistical physics, in reality, it does not have any physical meaning in the AI world.
Process of Probability Model
As mentioned above, ChatGPT works on a probability model. How does it actually work? To get a list of probabilities to fit in the sentence, we first have to retrieve the most important ‘language model’ neural net. This is explained with the help of the below formula:

Then, the ‘net model’ needs to be applied like a black box - add some text and see how it predicts the next words. For example, if we ask it to predict the next 5 words, it will use the following formula:

Once the formula is applied, it turns the result into a well-formatted dataset.

If we repeatedly apply the model at each step by adding the word with the highest probability, this is how it will look:

If this process continues for a longer period at ‘zero temperature’, the output is more likely to become confusing and repetitive. Just like this:

But, what if it sometimes chooses other words randomly (with a temperature of 0.8)? It can create a clear sentence that is meaningful and easy to understand:

So, each time this process is used, different outputs are generated randomly (even if the prompt is repeated), adding a flair of uniqueness to each answer:

Even if you see in the first step, there are many options to choose the next word (at a temperature of 0.8), but they tend to drop faster. The straight line on this graph indicates a pattern where each word reduces in a predictable pattern, following a mathematical concept named ‘power law’:

If this continues, what will happen? A random example is shown below. This is always better than picking a top or most likely word, but still sounds a bit weird.

What makes ChatGPT so successful?
Above everything that we learnt about it, here’s what you must know - the secret behind its success: ChatGPT has mastered the patterns in language (which is their top secret) to make sentences look natural. It doesn't really follow rigorous grammatical rules and concepts, but knows how to generate coherent sentences with the help of a large amount of data. One simple example - Balancing parentheses indicates that neural networks often struggle with complex algorithmic tasks but excel in identifying underlying patterns in natural language.
Behind the power of ChatGPT, is the transformer architecture which helps to capture the structure of human-written sentences. It goes beyond syntax by reading and analyzing billions of sentences, thereby developing an efficient coherence framework. It even considers logic, but its understanding surpasses the rigid and practical logical patterns and structures–that’s the superpower of ChatGPT.
Extensibility in ChatGPT
ChatGPT is not a mere chatbot that responds to every question with its limited knowledge, backed by the training data. It has rather become a powerful asset that can be used in several ways:
- Through the mobile application of ChatGPT, you can access its advanced voice mode and even upload images directly from your phone to get a summary or extract data thoroughly.
- ChatGPT search helps you extract real-time information from the web as you log in to ChatGPT.
- The desktop application of ChatGPT provides access to interact with it anytime, anywhere. It can even work with various coding applications.
- Boltic’s OpenAI integration allows you to integrate ChatGPT with thousands of other applications like Calendly, Converter, and Mapper.
- With GPTs, you can even build customized bots (as per your requirements) on ChatGPT.
What’s Next for ChatGPT?
From a tech demo to one becoming one of the most popular chatbots across the globe, it has surprised with its innovative models, mutimodality features, advance reasoning and integrations. Now, what’s next? Recently, it has launched its most advanced model: GPT- 4.5. Come, let’s know about it in detail.
Is ChatGPT 4.5 Coming?
OpenAI has officially introduced ChatGPT - 4.5, it’s known to be the most advanced version among all its models. It is a multi-use model, having no scope for deep reasoning. At present, it is just launched as a part of their ‘research preview’, indicating that it is still not perfectly ready for the users.
This model is currently only accessible to the ones with a subscription. The people who have tried this model state that it more natural, intuitive, powered by emotional intelligence. This is a perfect model for those who felt that some responses merely looked like another outdated page of Wikepedia.
ChatGPT - 4.5 is expected to be much better at identifying social cues and context of the inputs than its predecessors. The CEO of OpenAI, Sam Altman, states that it would be the first AI model that would make humans like talking to a thoughtful person. Further, he revealed that this model will be available on ChatGPT Plus and Team Subscribers in a few weeks.
drives valuable insights
Organize your big data operations with a free forever plan
An agentic platform revolutionizing workflow management and automation through AI-driven solutions. It enables seamless tool integration, real-time decision-making, and enhanced productivity
Here’s what we do in the meeting:
- Experience Boltic's features firsthand.
- Learn how to automate your data workflows.
- Get answers to your specific questions.