I saw this linked on Hackaday Train your own LLM from scratch and curious I gave it a go.
It's not the best in terms of a quick guide the code is there but there is no final "this is what the finished code should look like" set of files so it actually needs some messing about to set the imports etc up correctly so it runs. It is also a real pain in the arse to get Pytorch the NN library it uses to do the hard stuff to run with Cuda enabled you need a gfx card with cuda but then also the correct cuda toolkit and then the cuda version of the pytorch librairies with the matching version of cuda built into it. Took me a few goes to get the thing running and then a few more to get it running on the GPU and not taking a week to do the tiny 2/2/128 model.
The training data is a load of shakespeare poems and it uses the seed phrase "To be or not to be" to see how its going but when you're done you can feed in a checkpoint for the various points in the training (every 1000 iterations up to 5k) and give it a prompt and get it to spit out some text based on it in that vein.
It doesn't take very long running on a gaming laptop with a 5070 8g to run the tiny (2/2/128 0.4M params) or medium (6/6/384 10.8M params) models although for some reason the large parameters (12/12/768 85.3M params) ran incredibly slow and produced worse results than the tiny maybe taking up too much memory for the little laptop or some other resource issue. I retried with an 8/8/768 on a different data set and it was much faster but then we are dealing with multiplicative effects as the number of nodes increase.
The final training outputs for tiny was
Step 4900 | val loss: 1.8400
--- Step 4900 sample ---
To be or not thusbaced herring theser,
What that very surpurpitsemishson,
Net to lord she father that hear'd Cit
---
sort of text its getting some words that sound shakespear esq
medium final training was
Step 4900 | val loss: 4.1142
--- Step 4900 sample ---
To be or not easiquse.
VOLUMNIA:
I can, if he wish you gold on me to-morrow?
COMINIUS:
I'll not hear it, and m
---
better on medium some of it almost sounding like a poem
but for the large
Step 4900 | val loss: 2.8399
--- Step 4900 sample ---
To be or notnehd h tee mge o! yaol yeneiuune ot uu iv foo the boodr w tlasotr 'mateo tuvel laleitysd ct h heoneo
---
so still gibberish
I put some prompts into the final checkpoints of the tiny and med to see what it would get.
What would shakespheare by way of Poe look like --prompt "quoth the raven"
Tiny:
"quoth the raven heart him.
Seconse hand lamid me, lover neven courationd,
In be of that the broud the goods the be not
The end againt the the treath, I had the dearth thee,
O prepared our die!
Come seation night ou"
Medium
"quoth the ravenspul of his law commanding gear
And by the volumes at his hope of chastion:
That we are all chosent of the subjects are sighs;
And treacherous triumph in his disease,
Till he better was so little, the"
Or the classic "To sleep perchance to dream"
tiny:
"to sleep perchance to dreams and should
She the conful mide be to and thou calus;
Madanish the I know thou must the be some
Thard your me death is the the treath,
I have doth the grother thy bast of and and
Whild all their as t"
Med:
"to sleep perchance to dream us the clos;
And interchangeaning that the state which make the farther
Of PrAyou, resides me at now the air.
COMINIUS:
Then learn to see the way
That been legs fillines, we bite thou could,says,
An"
Still at least 10% gibberish
How about "lack of.org"
Tiny:
"lack of.orge the should he thee conful my here love
With me fruman, do insbreaset and thought with to could
There your me dother again the treath,
I had the dink them his marrian too and be for the fillon."
MERS
Med:
"lack of.orged his sword soul to noble lies
should between out. And this all night, hast dark with thy sease;
and cruel made me do I, thy words careful
vyness dangerous and reasons all. What thou bear's the
confe'
I then decided to try with different training data and given this is an AI I decided to use all of Glados's voice lines from the portal games by scraping them off wikipedia.
the data set isn't huge 120,502 chars, vocab size: 74 compared to the shakespeare set which is 1,115,394 chars, vocab size: 65 but the training produced some promising output
using the "We both said a lot of things that you're going to regret" line as the sample I got:
"We both said a lot of things that you're going to regret. But I think we can put our differences behind us. For science. You monster.
Okay. Plan B, subsecti"
For the medium model size after 5000 iterations.
And for "lack-of.org" fake glados said
"Lack of.orgge regulations require me to warn you that this next test chamber... is looking pretty good.
Feeds off the misery of others.
Fifty thousand years is a lot of time to think. About me. About you. We wer"
"Quoth the ravenange tests, I am beginning to think you don't share my excitement for rescuing crying trapped injured dying humans.
At the start of this course I was worried you were becoming too close but in my atte"
"to sleep perchance to dream robot. I'm happy to put this all behind us and get back to work. After all, we've got a lot to do, and only sixty more years to do it. More or less. I don't have the actuarial tables in front of me."
and of course
"the cake is a liest concersation?
An escape hatch will open in three... Two... One.
An extra special bonus course that ends with us finding and freeing humans!
An intubation associate will be dispatched to revive you"
I think the glados data set is probably not large enough to produce any good data so it's mostly just spitting out existing lines from there still a fun little exercise.