ChatGPT 101: Input Processing

Subedi🌀
4 min readFeb 4, 2023

How chatGPT works behind the scene: Part three

Ready for the next leg of the journey? Make sure you’ve read the second part before diving in! Link’s right here 🔗

The input text is processed by tokenizing it into a sequence of words, converting the words into numerical representations (word embeddings), and then fed into the model.

It is the process of preparing and transforming the input text provided by the user so that the chatbot’s language generation model can process it. This is an important step in building a chatbot because it ensures that the input data is in the correct format and is suitable for generating text for the model.

Tokenization:

The first step involves breaking the input text into individual words or phrases, also known as tokens. This is usually done using a tokenizer, which is a tool that splits the input text into tokens based on specific rules, such as breaking on whitespace or punctuation.

Screen Capture by Author

Here the nltk library is used to perform the tokenization. The word_tokenize

--

--

Subedi🌀

💍Husband 📝Writer 🔧Engineer, bringing a unique blend of 🎨creativity, 💪commitment, and 💻technical expertise to everything.