Looking back at 2024, what have been some of the most important trends in AI?
Hello everyone, and welcome to this special end-of-year episode for 2024. I wanted to wrap up the year by looking at some fascinating research I’ve been doing over the Christmas period, focusing on three key areas that I believe will be significant in the coming years.
This episode reviews key research from 2024, focusing on small language models and efficient training, agentic systems, and inference-time compute for advanced reasoning.
Part one discusses Small Language Models (SLM), contrasting them with larger models like OpenAI’s GPT series and Google’s Gemini, which have hundreds of billions of parameters. SLM in contrast, often open-source from companies like Mistral, or lately deepseek and many smaller research labs, range from a few billion to tens of billions of parameters (i.e. 7-13 Billion), with even specialised models having as little as one to three billion parameters.
The training of these smaller models often follows a curriculum learning approach, where the model is initially trained on simpler, more structured data, with increasing complexity introduced gradually. Data set composition is crucial, with a balance between human text, code, and mathematical text being important. Language models are also increasingly used to pre-select and annotate training data, and to evaluate the responses of other models during training.
Practically, smaller language models are interesting for fine-tuning using adapters with LoRA, which are significantly smaller than the base model, requiring less data and compute. Improvements in inference efficiency come from techniques like quantization (reducing the precision of weights) and sparsity (removing less important parts of the model). While impressive progress has been made in training these models, their killer application beyond specific areas like code generation and mobile device deployment remains somewhat uncertain.
Part two explores agentic systems, defined as autonomous AI systems that can devise multi-step plans, use tools, and explore possibilities to solve problems. Key capabilities of such systems include planning, memory (both long-term and short-term), tool use, evaluation of their actions and the environment, and reasoning.Despite the hype, the real-world usefulness of fully autonomous agentic systems, particularly in generic tasks like software development, remains a challenge due to limitations in planning and reasoning capabilities. The sequential nature of multi-step tasks means that even with relatively high individual step success rates, the overall task completion rate can be low. While memory and tool use are progressing well, evaluation capabilities and, most significantly, planning and reasoning remain the biggest hurdles.
Part three focuses on advanced reasoning, exemplified by OpenAI’s o3 model, which demonstrates significantly improved reasoning capabilities attributed to advancements in in-context learning and chain of thought. In-context learning allows models to solve new problems based on task descriptions and examples provided in the current context. Chain of thought prompts models to generate a sequence of logical steps to arrive at an answer, improving accuracy and providing reasoning traces for debugging.
Inference-time compute involves performing a search within the chain of thought space, generating multiple reasoning paths and evaluating them to arrive at a more robust answer. This can involve techniques like beam search and majority voting on results. The o3 model’s reported 88% score on the ARC AGI benchmark, designed to test abstract reasoning resistant to memorisation, is highly significant. This approach, which shares similarities with program synthesis, suggests a path towards overcoming the reasoning limitations of current AI systems. The future may see a combination of many smaller, efficient language models generating parallel chains of thought, with larger models used for evaluation, potentially leading to computationally efficient advanced reasoning systems.
That’s all folks.
With this being said, I think its fair to say, that 2025 is going to stay interesting.