In the first blog we considered teams of AI agents and how they invoke the increase in productivity Adam Smith observed from specialised teams of workers. We also looked at the strands of research which enabled AI to act in teams.
This time we look at the key steps in that research, doing so will help us better understand agentic AI.
Step 1: Learn the Rules of the Tools
‘Agentic AI’ has been around for many years, mostly directed at gaming. A technique called Reinforcement learning (RL) has been the focus of research, exemplified by the extraordinary success of DeepMind’s AlphaZero and MuZero in games like Go and Chess, beating world masters. AlphaZero learnt how to plan, to strategise. MuZero went a step further, given no instruction on how best to play Go, it achieved novel strategies by playing against itself and then beat world masters.
RL is a different technology to that used in Large Language Models (LLM) such as ChatGPT. LLMs are slower, they lag in gaming speed. However, they shine as ‘few shot’ learners, adapting to most any enterprise without retraining, learning from mere prompts.
In March 2023, AutoGPT emerged, quickly topping Github downloads. Touted as an AI agent that interprets goals in natural language and decomposes them into sub-tasks, it seamlessly integrated tools or other ML models at its disposal. This marked a paradigm shift. LLMs weren’t solo operators anymore. In areas of shortfall, say arithmetic or factual recall, they could delegate. The LLM evolved into a strategic orchestrator, harmonizing tools and tasks.
Step 2: Adapt Tools to Any Objective
In May 2023 a Stanford University research team augmented ChatGPT with a memory for skills, a table of successful and unsuccessful attempts at combining tools for a given objective, in this case to manufacture tools in the game Minecraft.
When unleashed in Minecraft’s simulated realm, this agent didn’t require costly retraining like RL. Instead, leveraging GPT4’s ‘common sense’, it strategized with significantly fewer missteps than RL. Introduced as ‘Voyager’[1], this showcased how adeptly LLMs can autonomously adapt their knowledge to meet diverse objectives, including business processes.
Step 3: Tools to Create Other Tools
Software code is simply language with strict syntax and logic, as such, LLM’s learn to code more easily than they master the nuances of human language. In August 2021, over a year before releasing ChatGPT, OpenAI issued Codex to assist with software development. ‘A skilled individual’ who, when briefed, generated code snippets. Being skilled it can charge for time, known as ‘Github CoPilot’ it sells for $10/mth. Then developments accelerated.
By March 2023, coding abilities were enhanced with the release of GPT4, which codes at an impressive level and grasps developer intent, albeit with occasional error and over confidence.
By June, OpenAI incorporated ‘functions’ into their API, allowing all developers to reliably integrate GPT3.5 and GPT4 into their software. By September, researchers had devised three management frameworks for coordinating teams of AI Agents to write their own software, given an objective, then test it and rectify errors, all autonomously. Theoretically, this approach can be applied to any problem which can be described in code and tested against a set of rules; Accounting and law, not just software development.
Originally these teams were simply GPT3.5 or GPT4 prompted to imagine itself as various specialists. Now they can steer agents from any LLM model, or specialist provider.
By October, LLM teams in two frameworks, Microsoft AutoGen and the Chinese ChatDev, were self-correcting their code in secure environments (like Docker), producing operational applications.
Above image is of GitHub CoPilot proposing code to satisfy the plain english instructions in the blue comments. This agent operates in the context of the software developer's workflow, not in ChatGPT's app.
Step 4: Skilled Individuals to a Team
A mere week post-AutoGPT, Stanford launched “Generative Agents” in a Sims realm, where sociable agents emulate daily routines, from work to coffee catch-ups.
These agents, which were essentially ChatGPT in various roles, engaged in observations about each other, they remembered, reflected, and responded to their social network. This culminated in a party autonomously proposed and organised by agent ‘Isabella’. See below.
Langchain developed this into GPTeam[2]. “Every agent within a GPTeam simulation has their own unique personality, memories, and directives, leading to interesting emergent behavior as they interact.”
Next Time., how to manage teams of Agents for business purposes, not just parties!
Comments