Technical resources to get started with LLMs as a (classical) ML engineer

Don’t you think it’s time to understand what’s going on under the hood? 🤖

Oct 12, 2023

I recently posted a short version of this article on LinkedIn. However, I thought this would be a great read for this audience as well.

If you have a strong background in (classical) #ML, you might feel that stepping into the world of NLP and LLMs is a bit intimidating. However, you’ll quickly notice that (most) of your intuitions and know-how is directly applicable, if you’re willing to go through a bit of reading. 🚀

It is just a matter to know where to start and how to ramp up quickly. And I happen to have just what you need in store! 🤩

a computer screen with a bunch of buttons on it — **It’s time to figure out what’s going on under the hood.** Photo by Levart_Photographer on Unsplash

I think Jay Alammar’s work is a great place to start. This is very visual and usually very didactic. Here is a digest of his most interesting articles:

- The Illustrated Transformer: this is widely considered as the big innovation in NLP and it requires a bit of effort to understand it in details. However, like most things, the biggest obstacle is getting around the notation and terminology.
- How GPT-3 works: this gives you a bigger picture on how Transformers blocked are stacked to perform end-to-end decoding.
- The Illustrated Retrieval Transformer (aka RETRO) is the new kid on the block, and it is a more principled and efficient manner to give your model access to a (very) large knowledge base.

Jay has some less foundational articles in store, but they are useful to get some intuitions on how LLMs operate:
- Visualizing hidden states of LLMs
- Explaining Transformer models: although LLMs are often disparaged as ‘black boxes’, this isn’t necessarily the case if you are equipped with the right interpretability tools.

Now, Jay’s work is mostly scratching the surface, although in a very didactic way. Once you have covered that much ground, you can go deeper, specifically on Transformers:
- A great talk on the Transformer's paper 'Attention is all you need', by Lukasz Kaiser, one of its authors (especially this 5-minute segment, which is one of the clearest explanations of attention I have come across).
- Building GPT from scratch, by the amazing Andrej Karpathy: a very good way to understand all the nitty gritty details (if you have some knowledge of PyTorch).
- The Transformer’s family: I have yet to read this one in detail, but everything Lilian Weng writes is just top-notch.

Now, for a more “system-oriented” perspective, there’s more from Lilian Weng’s blog:
- LLM Powered Agents: this will give you an overview of how to provide agency to LLMs, which is a necessary step to make them useful beyond basic conversational schemes and/or QA.
- A survey of prompt engineering: a must-read that goes beyond influencer-style content…

Ah, and this is not from Lilian’s blog, but Eugene Yan wrote this very influential article on Patterns for building LLM applications, which is also critical if you plan on using LLMs in a system.

Now, blog articles and YouTube videos will only get you so far. You will still need to scratch your head on a few academic papers. Here is the list that I think everyone should at least read once:
- Attention is all you need: the paper that introduced the architecture of Transformers.
- Distilling step by step: the idea of distillation is key, as many small “open-source” models are trained using GPT-3.5 or GPT-4 data, using a process called distillation.
- The GPT-4 Technical Report is kind of interesting for historical reasons, but LLaMa-2’s report is widely considered as a much superior alternative.
- Plan-and-Solve: an advanced prompting strategy that is much superior to basic Chain-of-Thought, ReAct and the like.
- Retrieval-enhanced Transformers (aka RETRO) is the modern (for a few months 😅) way to expose an external knowledge base to an LLM.
- InstructGPT (the ancestor, by a few months, of ChatGPT), which proposed the use of RLHF to “align” LLMs for the first time.
- Direct Preference Optimization: a simpler, more stable, and cheaper way to align LLMs compared to RLHF.

That’s it! I will likely come back to this list from time to time to update it. Please drop me a note or leave a comment if you feel I forgot something.

Technical resources to get started with LLMs as a (classical) ML engineer

Don’t you think it’s time to understand what’s going on under the hood? 🤖

Discussion about this post