A light implementation of the 2017 Google paper 'Attention is all you need'.
  • Jupyter Notebook 98.8%
  • Python 1.2%
Find a file
2023-11-27 15:20:08 -06:00
resources Update notes and continue comments 2023-11-27 14:01:12 -06:00
weights Add BETSI weights 2023-11-27 09:12:14 -06:00
.gitattributes Add BETSI weights 2023-11-27 09:05:50 -06:00
.gitignore Add the rest of the model 2023-11-26 19:49:42 -06:00
attention_visualization.ipynb Finish up notes 2023-11-27 15:20:08 -06:00
clean.sh Add visualization notebook 2023-11-26 21:16:08 -06:00
config.py Fix up notebook 2023-11-27 12:23:34 -06:00
dataset.py Update notes and continue comments 2023-11-27 14:01:12 -06:00
dna.py Update notes and continue comments 2023-11-27 14:01:12 -06:00
LICENSE Initial commit 2023-10-31 17:03:28 -05:00
model.py Update notes 2023-11-27 14:12:28 -06:00
NOTES.md Finish up notes 2023-11-27 15:20:08 -06:00
README.md Update README and add requirements 2023-11-27 09:04:12 -06:00
requirements.txt Update README and add requirements 2023-11-27 09:04:12 -06:00
RUSHIT.md Finish up notes 2023-11-27 15:20:08 -06:00
train.out Update notes and continue comments 2023-11-27 14:01:12 -06:00
train.py Update notes and continue comments 2023-11-27 14:01:12 -06:00
translate.py Finish up notes 2023-11-27 15:20:08 -06:00

BETSI

arXiv paper

A light implementation of the 2017 Google paper 'Attention is all you need'. BETSI is the name of the model, which is a recursive acronym standing for BETSI: English to shitty Italian, as the training time I allowed on my graphics card did not give enough time for amazing results.

For this implementation I will implement a translation from English to Italian, as Tranformer models are exceptional at language translation and this seems to be a common use of light implementations of this paper.

The dataset I will be using is the opus books dataset which is a collection of copyright free books. The book content of these translations are free for personal, educational, and research use. OPUS language resource paper.

Notes

I'm creating notes as I go, which can be found in NOTES.md.

Transformer model architecture

Transformer model

Requirements

There is a requirements.txt that has the packages needed to run this. I used PyTorch with ROCm as this sped up training A LOT. Training this model on CPU on my laptop takes around 5.5 hours per epoch, while training the model on GPU on my desktop takes around 13.5 minutes (24.4 times faster!).

TODO and tenative timeline:

  • Input Embeddings
  • Positional Encoding
  • Layer Normalization - Due by 11/1
  • Feed forward
  • Multi-Head attention
  • Residual Connection
  • Encoder
  • Decoder - Due by 11/8
  • Linear Layer
  • Transformer
  • Tokenizer - Due by 11/15
  • Dataset
  • Training loop
  • Visualization of the model - Due by 11/22
  • Install AMD RocM to train with GPU - Attempt to do by end

References used