ChatGPT 101: Pre-Training

Subedi🌀
3 min readFeb 4, 2023

How chatGPT works behind the scene: Part one

OpenAI’s GPT-3, including the chatbot version known as ChatGPT, is based on a transformer architecture and uses deep neural networks to generate text.

The system design of ChatGPT consists of several components. Part one, called pre-training, of several I will be publishing here. I will walk you through each component.

The first component is pre-training. Pre-training aims to enable the model to generate coherent and meaningful text. This involves training the model on a large corpus of text data to learn patterns and relationships between words and phrases.

Here’s a code example in PyTorch, a popular deep-learning framework, to give you an idea of what pre-training might look like:

Screen Capture by Author

In this example, the model architecture is defined by the GPT Class. It includes an embedding layer to convert the input words into numerical representations, a transformer layer to learn the relationships between words, and a fully connected layer to generate the final output.

Screen Capture by Author

The model is trained using the Adam optimizer and cross-entropy loss, and the training loop iterates over…

--

--

Subedi🌀

💍Husband 📝Writer 🔧Engineer, bringing a unique blend of 🎨creativity, 💪commitment, and 💻technical expertise to everything.