January 4th, 2021
In today's tutorial, I'll walk you through how to get started with GPT-2. GPT-2 is a language model that is built by Open AI, if you are here then chances are you haveve already heard about GPT-2 or maybe his successor GPT-3. A More in-depth explanation is here in official docs.
For the sake of simplicity, we'll use google colab today, colab is like a more advanced jupyter notebook is ready in the cloud and waiting for you. You can even use a GPU powered application on the free tier of colab, it's truely hard to find the limits of the free tier with regular every day training tasks like that one.
I'll mostly show what to do in an order by pictures, so if you don't want to write what I showed yourself(makes sense) you can go ahead and check the repo for the project. This tutorial is actually based on official github page, yet since there are some errors with the documentation I'll just suggest you to use what is here in this tutorial instead.
First we'll start by creating a new colab project.
Then we can pull the GPT-2 project by using git command.
We can then install Tensorflow in our environment by using the comman below.
After that we need to locate our gpt-2 folder and then install the requirements, we can do that by using those 2 commands in order.
The last step before being able to run the model is to get our selection of modal, we'll choose the smallest one for the simplicity purposes.
Since we need to test the model against our own sentences, we will need an prompt and there is no prompt provided by the colab for free tier. We'll however at this point use another module to overcome this problem.
We'll use the kora module to create our console in a distinct URL.
After this last block of our code has runned succesfully we can go ahead and play on our console by using the link from the output. Before running the command below we'll basically need to run cd gpt-2
command first so we can run our modal finally.
Here is my question to GPT-2 below, not the easiest task even for humans right?
Results from the GPT-2 side however are quite allright to me, given that this model is just a small portion of true GPT-2 and GPT-3 even greater than the original GPT-2 I can't even decide if the results are promising or scarry :).
Later on the blog I might publish some more work about running GPT-2 on a container or/and how to fine tune GPT-2 model for our own needs. Until then, take care of yourselves :)