The goldfish principle and objective benchmarking

What LLMs teach you about managing context and measuring progress

Mar 04, 2024

Last week, I completed two DeepLearning.AI courses: 'Generative AI with Large Language Models' by AWS, and 'LangChain for LLM Application Development' with Harrison Chase from LangChain, the developers of the eponymous LLM application development framework.

These two courses profoundly resonated with me, not only by deepening my understanding of LLMs but also by reinforcing two crucial management principles for innovation and navigating uncertainty.

I'll summarize the first principle as the 'Goldfish Principle'—assume a short memory and constantly manage context—and the second as 'Objective Benchmarking'—establishing effective measurement systems when facing uncertain opportunities.

—The Goldfish Principle—

One key concept from these courses is the understanding that LLMs are inherently stateless. While LLMs excel in natural language processing and understanding context, this context is confined within a limited 'context window.'

Picture a wise goldfish with access to all human digital knowledge but only a 3-second memory span*.

With a goldfish-like memory, managing context is therefore a critical task in LLM application development: how do you continuously supply the LLM with the right context, given that it is constantly updating due to user actions or environmental changes?

This task is not only crucial in LLM development but also in innovation in general.

Effective leaders excel at supplying and managing context, especially when venturing into the unknown. It's essential to provide clarity on what, why, and the progress made, particularly in novel situations where multiple perspectives are invaluable.

Managing context is equally crucial when working alone to prevent losing focus or sight of the bigger picture.

In summary, the Goldfish Principle prompts us to envision our organizations like goldfish or LLMs with limited memory and systematically manage context for ourselves and the organization.

—Objective Benchmarking—

Another key property of LLMs is their inherent probabilistic nature.

This non-deterministic and non-linear aspect of LLMs makes them adaptable to diverse tasks, but also comes with challenges.

Essentially, using natural language, with LLMs you have you have almost an infinite number of responses you could prompt for. Contrast that to a more deterministic technology, like a relational database, where you are constrained by the query language and structure of the database.

However, with a badly formed database query you’re likely to trigger an obvious error. With LLMs you could just get nonsense in response.

Or something that’s genius.

Evaluating which outcome an LLM is creating, nonsense or genius, and doing so at scale, is another core task in LLM application development.

Similarly, innovation efforts operate in a non-deterministic world, where outcomes are uncertain. You can’t get enough data yet to determine likely outcomes, but you may be facing a seemingly infinite number of paths you could try.

So like with using LLMs, you need a mechanism to determine if you are progressing towards your goals, in a way that is comparable and informative.

This is where objective benchmarking comes in, i.e. a system of evaluation criteria and tests to measure how well your application or organization is performing against relevant tasks or goals.

For an LLM that often begins with using human feedback and sense checking, but then graduates to using already established benchmarks or other AI models to evaluate the LLM’s effectiveness.

Likewise for innovation efforts, defining benchmarks based on human feedback is a great place to start, especially for the early stages of defining the problem and determining viability.

For example, if you think you might have a solution to improve math literacy in inner cities, what kind of feedback would be convincing enough to experts on the subject?

Capable of proving you are on the right track? Or just as important, to tell you that you’re heading in the wrong direction?

Just make sure the human feedback is objective, relevant and convincing. You are looking for a strong signal to guide you, not emotional validation for your idea.

—Summary—

In my experience, embracing these two principles—managing context [for a goldfish] and objective benchmarking—lays a solid foundation for navigating uncertain opportunities.

Teams and organizations that prioritize these principles often excel, leveraging them to thrive in unpredictable environments. Conversely, those who neglect context and benchmarking tend to encounter unnecessary obstacles and setbacks, hindering their ability when facing the unknown.

[* Turns out the whole “3-second goldfish memory” belief is a myth, but it works for this analogy so I am sticking with it]

Probable Wisdom

Discussion about this post