I finished the first unit of the Hugging Face Agents course, at least the reading part. I still want to play around with the code a bit more, since I imagine we’ll be doing that more going forward. In the meanwhile I wanted to write up some reflections on the course materials from unit one, in no particular order…
Code agents’ prominence
The course materials and smolagents
in general places special emphasis on code agents, citing multiple research papers and they seem to make some solid arguments for it but it also seems pretty risk at the same time. Having code agents instead of pre-defined tool use is good because:
Composability: could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function?
Object management: how do you store the output of an action like generate_image in JSON?
Generality: code is built to express simply anything you can have a computer do.
Representation in LLM training data: plenty of quality code actions is already included in LLMs’ training data which means they’re already trained for this!
The thing that gives me pause is that it seems like we moved through the spectrum from highly structured and known workflows (a chain, perhaps, or even something like a DAG) to tool use in a loop (which had some arbitrary or dynamic parts but ultimately was at least a little defined), and all the way out then to code agents where basically anything is possible.
If I think about this as an engineer tasked with building a robust, dependable and reliable system, then the last thing I think I want to add into the system is an agent that can basically do any thing under the sun (i.e. code agents). Perhaps I’m misrepresenting the position here of code agents, so I’m looking forward to reading the papers cited above as well as understanding it more from the course authors’ perspective.
Evals & testing
Following on to my confusion around code agents, I’m very curious how the course will recommend one tests and evaluates these arbitrary code agents. Things I could imagine:
- testing out the specific scenarios that your application or use case requires (i.e. end to end)
- testing out each component of the system, such as you can break it down into smaller sub-components
- including things like linting / unit tests maybe once code is generated by the agent (?) i.e. real-time evaluation of the robustness of the system?
- probably LLM as a judge somewhere in the mix, though that opens up its own can of worms…
I do hope they talk about that in the later units of the course.
General patterns
The core loop that came up in unit 1 was:
plan -> act -> feedback/reflection
And all of that gets packaged up in a loop and repeated in various forms depending on exactly how you’re using it. And this pattern is related to the ReACT loop which lots of people cite but seems to be a specific version of the general idea mentioned above.
And the fact that all of this works is somehow all powered by the very useful enablement of tool use, which is itself powered by the fact that the model providers finetuned this ability into the models. Crazy, brittle, impressive and many other words for the fact that this ‘hack’ has such power.
Chat templates
I liked how the unit really impresses on you the impact and importance of chat templates as the real way that LLMs are implemented. You may pass in your requests through a handy Python SDK, passing your tools as a list of function definitions, but in the end this is all being parsed down and out into very precise syntax with many tokens not intended for human consumption.
Points of leverage
At the end of the unit, I was thinking about all the places where an engineer has leverage over agents. What I could initially think of was:
- the variety and usefulness of tools that you provide to your agent (or perhaps the extent to which you allow your code agent to ‘write’ things out into the world)
- the discrimination in the volume or choice of a combination of tools or APIs
- how you chain everything together
- (how robustly you handle failure)
Beyond that there are quite a few things that are somewhat out of your hands unless you decide to custom finetune your own models for a specific use case.
Overall it was a good start to the course: made me think and also got my hands dirty working on a very simple agent with tools using smolagent
and a Gradio demo app in the Hugging Face Hub. I’ll write more after unit two next week.