Eren Akbulut's Blog

How not to Finetune GPT-2 ( on Google Colab )

January 6th, 2021

Hello again everyone, last post of mine here was a simple starters guide for GPT-2, I was looking for the ways of finetuning that model for creating something custom. I'm currently taking a class called "Semantic Web" and the content of it felt like so niche to me, yet since the scope of the lecture is kinda abstract from some aspects, I'm, from time to, having hard times to understand concepts.

Since GPT models works "magically" well :D, I thought maybe I can throw 15.000 lines of raw text that is collected from some main websites about semantic web and semantic web concepts and it can help me to understand me to understand some basics by tutoring me while answering simple semantic web related question. Well, well, well... It did not... :)

I mean it's no surprise right it's not probably how you should fine tune a model and since I'm a total noob, I said why not.

Yet the experience was okay and I can help you through to finetune GPT-2 on Google Colab only, so maybe if you don't use it as ugly as I did you can get nicer results in return. Let's just get started.

Code for the example is here yet all credits to these 2 repos on github and Medium. Those kind of processes are quite really hard and with the help of those 2 repos it was like sunny sunday afternoon after that :)

Last tutorial was a total hustle to gather all the pictures and stuff I'll post them in just one picture and then the results in another one.

In first step we'll need to install GPT-2 with some extra libraries to make it easier to finetune. The next step is to install tensorflow, I wasn't able to find a decent gpu version that works well this model so this it what we use.

After that we can locate into gpt-2 folder and run the requirements install command. It should work pretty smoothly. After that we can safely download our modal, yet the changes are going to be made inside in the future.

At this point, since using the bash over colab is quite painful we'll use files section to make arrangements.

We should move "encode.py", "train.py" and "train-horovod.py" from gpt-2 main folder to /src folder. Our latest version of the folder structure has given above.

Last 2 steps on console before we run our tuned model are encoding the text of our choice and training the new model. For that we first locate into /src folder then run the commands above in the first picture. I stopped my model after 50 iterations because to be honest it wasn't that hopeful in the first place since I know what I feed the model with :)

After we stop the code the last step will be folder arrangement by replacing the model files, we can instead copy the modal and change the name to save the former one. Tuned model is in the /src/checkpoint/run1 folder and we can basically copy and paste everything into /src/model/117M folder. I moved the old model and the checkpoint files from the former model and kept them in the folder of the new model, that's not necessary at all.

This is roughly how should our folder structure look at the end.

Now we can safely run our code for interactive run and see what's going to happen. We can create a new cell to run this command below.

!python3 src/interactive_conditional_samples.py --top_k 40

I used Kora for the last tutorial but both Kora and builtin !bash command on colab somehow couldn't find our model, the reason behind the switch is that.

I'll leave some results down below so you can see what you can expect when you finetune a model with garbage :D. Yet given the circumstances the results are relatively impressive.

Actually when looking at the last one you could feel that it starts to understand stuff about what going on but especially the data collecting part is still the biggest problem to solve when dealing with such a problem. Actually, for most cases, all ML related problems.

Thank you for staying with me this far, and I'll see you next time. :)