I’m sharing a Colab notebook that illustrates the basics of this fine-tuning GPT2 process with Hugging Face’s Transformers library and PyTorch. It’s intended as an easy-to-follow introduction to using Transformers with PyTorch, and walks through the basics components and structure, specifically with GPT2 in mind. There are many ways of getting PyTorch and Hugging Face to work together, but I wanted something that didn’t stray too far from the approaches shown in the PyTorch tutorials.

You should understand the basics of PyTorch and how a training loop works before getting started. If you don’t, this official PyTorch tutorial serves as a solid introduction. Familiarity with the workings of GPT2 might be useful but isn’t required. I’ve liberally taken things from Chris McCormick’s BERT fine-tuning tutorial, Ian Porter’s GPT2 tutorial and the Hugging Face Language model fine-tuning script so full credit to them. Chris’ code has practically provided the basis for this script - you should check out his tutorial series for more great content about transformers and nlp.

I should mention what the script doesn’t cover:

  • Using the nlp library to load in the dataset and setting up the training workflow, which looks to streamline things rather nicely.
  • Accumulated gradients - this gives larger effective batch sizes than Colab allows (GPT2 is a large model, and anything more than a batch size of 2 would be enough to get a CUDA out of memory error on Colab).
  • Freezing layers. This is the process of only changing the parameters in selected layers, made famous by the ULMFit process.
  • Using ‘past’ when generating text. This takes in the previous state when generating successive items of text. I didn’t need it.
  • Tensor packing. This is a neat way of fitting in as much training data in each batch.
  • Hyperparameter search. I settled quickly on values that seemed to produce decent values, without checking if they were optimal.

Even if that doesn’t mean anything to you now, you might find yourself wondering about some of it later.

Lastly, it’s worth noting that the Transformers library can change considerably, without much warning or record in the documentation. If something doesn’t match what you see in the docs, it’s likely that things have moved on.