All of this new AI whizzbangery made me curious about digging into its fundamentals. I came across this video of Andrej Karpathy doing a 101 on starting to build a language model,
And so I figure why not give it a go? So I broke out scala-cli and start pottering through the video (it’s actually quite good fun, recommend!). The first half of the video sets the scene with a purely probabilistic approach, my questionable translation it being linked below (it wouldn’t be super readable / runnable atm, I don’t recommend reading it, as that isn’t the point here)**
And part two then launches into neural nets. I’m following along, and we get to a forward pass
, looks simple enough (if tedious). We get some measure of how good the network is (a so called “loss function”) and then BOOM.
loss.backward()
https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html
Mind blown. From what I can tell, torch
has quietly built up a DAG of the prior calculations behind loss
and goes back differentiating that loss function with respect to it’s previous calculations. Aside from setting aside some time to dig out my undergrad limited math textbooks and check I even described it correctly, implementing this looks (to me) hard in scala? Like, gnarly metaprogramming + gnarly math hard.
Am I misreading the difficulty here? Does anything like it exist? I always wondered what the differentiating (<- hah!) features of pytorch et al were. I feel like I might have stumbled on one?
** GitHub - Quafadas/makemore: An autoregressive character-level language model for making more things