This site is a Work In Progress

Site Navigation

NeoBooks In Progress

Creating a NeoBook (start here)

Admin and Help

Edit on GitHub


Claude Assesses LLMs +/vs Active Inference for Me

Is Active Inference currently functioning anywhere to answer questions the way Claude and ChatGPT can? Is it anywhere close to doing so?

I'll search for current information about Active Inference systems and their capabilities compared to large language models.

Active Inference AI system question answering 2024 2025

10 results Let me search for more specific information about Active Inference AI conversational systems.

VERSES AI Active Inference conversational agent chatbot

10 results Based on my search results, no, Active Inference is not currently functioning anywhere as a conversational question-answering system like Claude or ChatGPT, and it's not particularly close to doing so in that specific way.

Here's what's actually happening:

What Active Inference AI Currently Does

Active Inference systems from companies like VERSES AI (using their Genius™ toolkit) are primarily focused on reasoning, planning, and learning agents for specific tasks like robotics, game-playing, classification, and decision-making GeniusGlobeNewswire. In benchmark tests, their Active Inference agents achieved 99.8% accuracy on tasks using only 10% of the training data compared to traditional AI models Diginomica.

Unlike large language models that are trained on massive text datasets, Active Inference AI works by accessing real-time data to update its internal "world model" and is designed to be less prone to hallucination LinkedIn. However, these systems are being developed for applications like:

The Fundamental Difference

Active Inference agents learn their generative models by engaging in purposive interactions with the environment and by predicting these interactions, which provides them with core understanding ScienceDirect. This is fundamentally different from how conversational AI like ChatGPT and Claude work—we're trained on vast text corpora to predict and generate language.

According to VERSES, copilots (like current chatbots) require user inputs and historical data to function, enhance but do not replace human judgment, have no ability to reason, and are unreliable in terms of accuracy Genius—which they position as limitations their Active Inference agents aim to overcome, but through a different approach entirely.

So while Active Inference represents an interesting alternative AI paradigm, it's not being built to replicate conversational AI capabilities—it's pursuing a different kind of intelligence focused on adaptive agents that interact with physical or virtual environments.

How might Generative AI/LLMs and Active Inference complement one another in a future systems architecture? What are each of them best at?

This is a fascinating question about how two different AI paradigms might work together. Let me think through their complementary strengths:

What Each Excels At

Large Language Models (like Claude/ChatGPT):

Active Inference Systems:

Potential Complementary Architectures

1. LLM as "Linguistic Interface" + Active Inference as "Executive Agent"

2. Active Inference for "System 2" Reasoning + LLM for "System 1" Intuition

3. Hierarchical Cognitive Architecture

4. LLM for Knowledge Priors + Active Inference for Adaptation

5. Collaborative Problem-Solving

Why This Combination Makes Sense

The key insight is that LLMs are fundamentally linguistic and correlational—they understand the world through descriptions and patterns in text. Active Inference systems are embodied and causal—they understand through interaction and prediction of consequences.

Combining them could give you:

This kind of hybrid architecture might be particularly powerful for robotics, autonomous systems, and AI agents that need to both understand complex human intent and reliably act in the physical world—areas where neither approach alone is sufficient.

I don't understand how Active Inference can achieve "structured reasoning and causal modeling." Or even how it creates a durable "map" or representation of the world, the way a trained LLM does. Can you explain it to me like I'm a sophomore in college?

Great question! Let me break down how Active Inference actually works and builds its understanding of the world.

The Core Idea: Prediction Machines

Imagine you're walking through your house in the dark. You don't just randomly flail around—you have a mental model of where the walls are, where the furniture is, and what will happen if you take a step forward. Active Inference agents work similarly: they build generative models that predict what they'll experience.

Here's the key difference from LLMs:

How It Builds a "Map" of the World

Active Inference uses probabilistic graphical models—essentially, mathematical structures that represent:

  1. States of the world (hidden variables you can't directly see)
  2. Observations (what your sensors actually detect)
  3. Actions (what you can do)
  4. Relationships between all of these

Let's use a concrete example: Teaching a robot to navigate a room.

Step 1: Start with uncertainty The agent begins with a vague prior model—maybe just knowing "there are objects, I can move, I can sense distances."

Step 2: Predict → Act → Observe → Update

Step 3: Minimize "surprise" This is the mathematical heart of Active Inference. The agent constantly tries to minimize the difference between what it predicts and what it actually experiences. It can do this two ways:

The "Map" Is Dynamic and Causal

Unlike an LLM's fixed weights after training, an Active Inference agent's world model:

1. Continuously updates in real-time Every new observation refines the model. If someone moves the furniture, the agent's predictions fail, and it updates its beliefs about object locations.

2. Encodes cause-and-effect The model doesn't just know correlations (like "walls and floors often appear together"). It knows causal structure:

This is represented mathematically as conditional probabilities: P(observation | state, action)

3. Supports counterfactual reasoning Because the agent has a generative model, it can simulate: "What would happen if I took action X instead of Y?" It can "run" its model forward mentally before acting—like you imagining different routes through your dark house.

The Structured Reasoning Part

Here's where it gets interesting. The agent's reasoning follows this loop:

Current beliefs about world state
        ↓
Generate predictions about different actions
        ↓
Evaluate which action minimizes expected surprise
        ↓
Execute that action
        ↓
Observe outcome
        ↓
Update beliefs (Bayesian inference)
        ↓
Repeat

This is "structured" because:

A Comparison to LLMs

Think about how you learned about coffee:

LLM approach: Read 10,000 articles mentioning coffee. Learn that "coffee" appears near "hot," "morning," "bitter," "cup." Generate coherent text about coffee. But you've never tasted it, made it, or felt a hot cup.

Active Inference approach: See coffee maker. Try pressing button. Observe coffee comes out. Press button many times. Build model: "Button causes coffee. Coffee is initially hot. Hot things cool over time." Now you can reason causally: "If I want coffee in 5 minutes, I should press the button now so it has time to brew."

Why This Enables Better Reasoning

The causal structure means Active Inference agents can:

  1. Understand interventions: "What happens if I do X?" (not just "What things correlate with X?")
  2. Handle novelty: If the environment changes, they detect prediction errors and adapt
  3. Plan multi-step: Chain together action sequences to reach goals
  4. Transfer knowledge: Causal models transfer better—if pushing works on one object, it likely works on similar objects

The Catch

Active Inference builds these models relatively slowly through interaction. An LLM can instantly "know" facts about millions of topics because it compressed vast text data. An Active Inference agent needs to actually experience and explore to build its world model.

That's why the hybrid approach is so interesting—use the LLM's broad, pre-existing knowledge to give the Active Inference agent good starting hypotheses, then let it refine those through actual interaction.

Does this help clarify how Active Inference creates its "map"? The key is that it's not a static representation learned once, but a dynamic probabilistic model continuously refined through the cycle of prediction, action, and belief updating.

If Active Inference is so quick to update what it "knows," how do important ideas persist? Aren't they easily changed or swayed by new information?

Excellent question! You've hit on a fundamental challenge in learning systems: the stability-plasticity dilemma. How do you stay flexible enough to learn new things without forgetting or overwriting what you already know?

Active Inference handles this really elegantly through precision-weighted updates and hierarchical model structures. Let me break this down.

Precision Weighting: Confidence Matters

In Active Inference, not all observations are treated equally. The system tracks precision (basically, confidence or reliability) for both:

Here's the key insight: Updates are weighted by relative precision.

Example: Your Coffee Maker

Let's say your Active Inference robot has made coffee 500 times. Its model says: "Pressing the button causes coffee to dispense in 30 seconds."

Scenario 1: One weird observation

Scenario 2: Consistent pattern of failures

This is mathematically implemented through precision parameters in the Bayesian updating equations. High-precision beliefs resist change; low-precision observations don't cause much updating.

Hierarchical Models: Different Timescales

Active Inference systems typically use hierarchical generative models with different levels learning at different rates:

Level 1: Fast-changing particulars

Level 2: Stable regularities

Level 3: Deep structural knowledge

This is like your own learning: If you see a chair in an unusual spot once, you don't conclude "chairs can teleport." Your deep knowledge about object permanence and physics is highly stable. But your belief about where this particular chair is right now updates quickly.

Priors: Starting Assumptions Shape Learning

Active Inference agents also use prior distributions that can be set to be:

Informative priors (strong beliefs):

Weak priors (uncertain beliefs):

You can think of priors as setting the "stubbornness" of different beliefs. Core principles get stubborn priors; surface details get flexible ones.

The Math: Bayesian Updates Don't Erase, They Integrate

Here's something crucial that's different from, say, machine learning models that directly update weights:

Active Inference uses Bayesian inference, which means:

New belief = (Prior belief × Likelihood of observation) / Normalizing constant

This is multiplicative, not replacement. Your old belief literally gets multiplied by how well it explains the new data. So:

Practical Implementation: Multiple Mechanisms

Real Active Inference systems combine several strategies:

1. Learning rates Different parameters update at different speeds. Core structural parameters have very low learning rates.

2. Meta-learning The system learns how much to trust different sensors in different contexts. "My vision is reliable in daylight but not at night."

3. Surprise thresholds Only big prediction errors trigger significant model revision. Small errors are attributed to noise.

4. Ensemble models Maintain multiple hypotheses simultaneously. "95% confident it's a coffee maker, but 5% chance it's something else." Don't fully commit until very certain.

A Human Analogy

Think about how you handle conflicting information:

Peripheral belief: "My friend is usually on time"

Core belief: "Friends generally care about my feelings"

Why This Matters for AI Systems

This precision-weighted, hierarchical approach means Active Inference agents can:

  1. Adapt to genuine changes (water tank empty → update "coffee maker ready" belief)
  2. Ignore noise (weird sensor reading → don't panic)
  3. Preserve core knowledge (physics doesn't change → don't update that)
  4. Handle novelty intelligently (unexpected observation → increase uncertainty, gather more data)

The Balance

You're right that there's a tension here. Too stable = can't adapt. Too plastic = no persistent knowledge.

Active Inference addresses this by making stability a function of evidence:

This is actually more sophisticated than many machine learning approaches, which often use fixed learning rates and can suffer from "catastrophic forgetting" when learning new things.

Does this help explain how Active Inference maintains important knowledge while still being adaptive? The key is that it's not equally quick to update everything—it tracks confidence and adjusts learning accordingly.

Yes, mostly. But in your coffee maker example, why doesn't Adaptive Inference switch to troubleshooting mode, given it ought to know that machines break or are sometimes just not plugged in? Why fail for nine days before switching? And how would the model know at all that machines break?

Ah, NOW you're getting to the really interesting part! You've identified a crucial limitation in my simplified explanation. Let me clarify what's actually happening.

The Short Answer

You're absolutely right—a well-designed Active Inference agent should switch to troubleshooting mode almost immediately! The 9-day example I gave was misleading. Let me explain what would actually happen and how the agent knows about things like "machines break."

How The Agent Knows About Machine Failures

Active Inference agents can encode multiple hypotheses in their generative model simultaneously. The model doesn't just represent "button → coffee." It represents something richer:

A More Realistic Model Structure:

State variables:
- Coffee maker functional: TRUE/FALSE
- Water tank full: TRUE/FALSE  
- Power connected: TRUE/FALSE
- Button pressed: TRUE/FALSE

Causal relationships:
- IF (functional=TRUE AND water=TRUE AND power=TRUE AND button=TRUE) 
  THEN coffee dispenses
- IF (functional=FALSE) THEN button has no effect
- Functional state can transition from TRUE→FALSE over time
- Functional state can be tested by checking power, water, etc.
```

**Where does this structure come from?**

1. **Engineered priors**: Designers can build in knowledge that "systems have components that can fail independently"
2. **Transfer learning**: If the agent learned about other machines, it knows machines can break
3. **Past experience**: If this coffee maker has broken before, that's in the model
4. **Hierarchical abstraction**: Higher levels encode general principles like "tools sometimes stop working"

## What Actually Happens on Day 501

When the button doesn't work, the agent doesn't just slowly update "buttons don't work." Instead:

**The agent infers hidden states** using all its hypotheses:
```
Observation: Button pressed, no coffee
Possible explanations (posterior probabilities):
- P(water empty | observation) = 0.4
- P(not plugged in | observation) = 0.3  
- P(machine broken | observation) = 0.2
- P(sensor malfunction | observation) = 0.08
- P(I misperceived | observation) = 0.02
```

This is **Bayesian inference over hidden states**—the agent is computing "which hidden state of the world best explains what I'm seeing?"

## Active Troubleshooting

Here's where Active Inference gets powerful: The agent doesn't just passively update beliefs. It **selects actions to reduce uncertainty**.

This is called **active learning** or **information-seeking behavior**:

**Step 1: Predict information gain**
"If water is empty, checking the tank will be very informative. If I'm just misperceiving, checking won't help. Let me calculate expected information gain for each possible action."

**Step 2: Choose informative actions**
The agent is driven to minimize uncertainty about important variables. So it might:
- Check water level (high info gain, disambiguates water vs. power vs. broken)
- Check if it's plugged in (high info gain)
- Try pressing button again (low info gain—already did this)

**Step 3: Update based on results**
- Checks water → Full!
- Updates: P(water empty) → 0.0, redistributes probability to other hypotheses
- Checks power → Unplugged!
- Updates: P(not plugged in) → 0.95

This is "troubleshooting mode"—it emerges naturally from the drive to minimize uncertainty about hidden states.

## The Precision-Weighted Update Revisited

My earlier example was too simplistic. What actually happens with precision weighting:

**If button fails once:**
- High uncertainty about *why* it failed
- Agent's model includes multiple possible causes
- Agent seeks information to disambiguate
- Doesn't strongly commit to any single explanation until evidence accumulates

**If button fails consistently AND agent checks everything else:**
- Water full: ✓
- Plugged in: ✓
- Button mechanically working: ✓
- Still no coffee after 10 tries: ...then conclude "deeper malfunction"

The precision weighting doesn't prevent troubleshooting—it prevents *premature commitment* to explanations before gathering enough evidence.

## Why My Example Was Misleading

I said it takes until day 510 to update beliefs, but that's only if the agent:
1. Doesn't have "malfunction" as a hypothesis in its model (bad design!)
2. Doesn't actively investigate (ignoring information-seeking drives)
3. Just keeps pressing the button without checking anything else (not minimizing uncertainty)

A well-designed Active Inference agent would investigate on day 501 because **the prediction error itself is informative** and drives exploration.

## The Deeper Question: Model Structure

You've identified something really important: **The agent's behavior depends entirely on what's in its generative model.**

If the model only represents:
```
button → coffee (sometimes works, sometimes doesn't)
```

Then it can't troubleshoot. It just updates probabilities about button reliability.

If the model represents:
```
(button AND power AND water AND functional_state) → coffee

Then it can troubleshoot by inferring hidden states and testing them.

This is a fundamental challenge: Active Inference doesn't magically know about power plugs and water tanks. That structure must come from:

1. Innate priors (engineered or evolved)

"Objects have parts. Parts can fail. Effects have multiple necessary conditions."

2. Learning compositional structure

Through experience, discover that "coffee-making" decomposes into subprocesses that can fail independently. This requires:

3. Language/communication (in hybrid systems)

This is where LLMs could help! An LLM could provide the structure: "Coffee makers require: power, water, functional heating element, functional pump..."

The Active Inference agent then grounds this abstract knowledge through actual interaction.

An Analogy: You and Your Car

When your car won't start, you don't conclude "cars don't start anymore." You have a rich causal model:

Where did this model come from?

Active Inference agents need similar model-building processes. The troubleshooting emerges from having the right causal structure, not from being "quick to update."

The Real Innovation

What makes Active Inference powerful for troubleshooting isn't rapid belief updating—it's:

  1. Explicit uncertainty representation: Knows what it doesn't know
  2. Information-seeking drives: Motivated to resolve uncertainty
  3. Counterfactual simulation: Can mentally test "what if water is empty?"
  4. Causal structure: Understands components and their relationships

This is very different from LLMs, which might describe troubleshooting steps (having read service manuals) but can't actually infer hidden states from observations or decide which tests would be most informative.

The Bottom Line

You're absolutely right to push back on my example. A proper Active Inference agent wouldn't wait 9 days—it would immediately:

  1. Register high uncertainty about cause
  2. Be driven to reduce that uncertainty
  3. Select diagnostic actions
  4. Update beliefs based on findings

The key limitation is: it can only troubleshoot within the causal structure it has. If "machines break" isn't in the model, it can't troubleshoot machine failures. This is why hybrid architectures with LLMs (providing rich structural knowledge) and Active Inference (grounding it through interaction) are so promising.

Does this clarify how troubleshooting would actually work?

Two problems for me. First, the Model Structure you showed looks suspiciously like what one would feed an Expert System, and I know Active Inference isn't an expert system. And second, without having the system digest a manual first, or have "experienced" all the possibilities? Can you tell I'm skeptical about Active Inference?

Ha! Yes, I can tell you're skeptical—and you're asking exactly the right questions. You've nailed two genuine problems with how I've been explaining this. Let me be more honest about what's actually going on.

Problem 1: Yes, That Looked Like An Expert System

You're right to call this out. What I showed you—those IF-THEN rules—is NOT how Active Inference actually represents things. I oversimplified to make it intuitive, but I made it look like symbolic AI from the 1980s.

What Active Inference actually uses:

It represents the world as probability distributions over continuous or discrete variables, not logical rules. The math looks more like:

P(observation | hidden_state, action) - likelihood function
P(hidden_state) - prior beliefs about the world
P(action | hidden_state) - policy (what to do given beliefs)

These are learned statistical relationships, not hand-coded rules. For example:

Instead of: IF plugged_in=TRUE THEN coffee_works

It learns: P(coffee_dispenses | plugged_in, other_factors) - a probability distribution shaped by experience.

The agent learns these probability distributions through interaction, using techniques like:

So it's not brittle rule-based reasoning—it's probabilistic inference over learned models.

BUT (and this is important): The agent still needs to have the right variables in its state space. It can learn the relationships between them, but discovering what variables even matter? That's genuinely hard.

Problem 2: The "Experiencing All Possibilities" Problem

This is where your skepticism is completely justified. Here's the uncomfortable truth:

Active Inference systems, as currently implemented, have a serious scalability problem.

Let me break down what they can and can't do:

What Current Active Inference Can Handle:

Relatively constrained domains:

In these cases, the agent can:

What It Currently Struggles With:

Open-ended, complex domains like "understanding coffee makers in general":

The Honest State of Affairs

Here's what researchers are actually working with:

1. Hand-crafted structure for complex tasks

For anything complicated (like diagnosing a coffee maker), current Active Inference implementations DO require humans to specify:

Then the agent learns the parameters (specific probabilities) through experience, but not the structure (what variables exist and matter).

This is less like learning from scratch and more like: "Here's a template for reasoning about coffee makers. Now go learn the specifics through interaction."

2. Structure learning is a frontier research problem

There's active work on:

But this is HARD and mostly works in toy domains. Nothing close to human-like open-ended structure learning.

3. Transfer learning is limited

If an Active Inference agent learns about one coffee maker, it can transfer some knowledge to similar coffee makers. But transferring from "coffee makers" to "toasters" to "all electrical appliances"? That level of abstraction is not something current Active Inference does well.

Where This Leaves Us

Your skepticism is warranted. Current Active Inference systems are not magical learning machines that figure out how the world works from scratch. They're more like:

Good at:

Not good at:

Why The LLM Hybrid Makes So Much Sense

This is actually why people are excited about combining Active Inference with LLMs:

LLM contributes:

Active Inference contributes:

Without the LLM, the Active Inference agent is stuck learning everything from scratch in the specific domain it's embedded in—which works for robots learning to walk but not for general intelligence.

Without Active Inference, the LLM just regurgitates text patterns about troubleshooting without actually understanding through interaction or being able to adapt its knowledge based on what actually happens.

The Brutal Honesty

Active Inference is not currently close to being a general-purpose question-answering system like Claude or ChatGPT because:

  1. It doesn't have language understanding—it works with sensorimotor states, not linguistic concepts
  2. It doesn't have pre-existing world knowledge—it learns from scratch through interaction
  3. It doesn't scale to open-ended domains—it needs constrained state spaces
  4. Structure learning is still a research problem—it often needs hand-crafted model templates

What it IS good at is being an adaptive agent in specific domains where it can interact, experiment, and learn causal relationships. It's complementary to LLMs, not a replacement.

Should You Still Be Interested?

Despite these limitations, Active Inference is interesting because:

But anyone claiming Active Inference is about to replace LLMs for general question-answering is overselling it. The research is nowhere near that.

Does this more honest assessment address your skepticism? You're right to be skeptical of oversimplified explanations—the reality is messier and more limited than I initially made it sound.

Yes, that was much, much better. Are there any "extra honest" wrinkles you haven't mentioned yet that I might like to know about?

Ha! I appreciate you pushing for the full picture. Yes, there are several more "uncomfortable truths" about Active Inference that don't make it into the glossy explanations:

1. The Computational Intractability Problem

Active Inference requires computing probability distributions over all possible states and all possible action sequences. This is mathematically elegant but computationally nightmarish.

The issue: Exact inference is intractable for anything but tiny state spaces. So in practice, researchers use approximations:

What this means: The agent isn't actually doing "pure" Active Inference. It's doing a computationally feasible approximation that might miss important dependencies or make systematic errors. The gap between the beautiful theory and the implemented algorithms is significant.

2. The "Credit Assignment" Problem

When something goes wrong (or right), how does the agent know which part of its model to update?

If the coffee doesn't dispense, is it:

In deep hierarchical models, credit assignment becomes really hard. The agent might update the wrong parts of its model, leading to:

This is especially problematic when the model has hundreds or thousands of parameters.

3. The Goal Specification Problem

Active Inference agents minimize "surprise" relative to their preferred observations (goals). But where do these preferences come from?

In practice: Researchers hand-specify them. "The agent prefers to be at the goal location" or "The agent prefers observations consistent with having coffee."

The problem: This is essentially hard-coding what the agent should want. There's no good theory yet for:

This is basically the same "alignment problem" that plagues reinforcement learning, just dressed in different mathematical clothes.

4. The "Model Selection" Problem

I mentioned the agent maintains multiple hypotheses. But how many? Which ones?

The reality: There are infinitely many possible models that could explain any given data. The agent can't consider them all. So:

Example: If you only give the agent models that involve "water" and "power," it will never discover that mineral buildup can clog the mechanism. The hypothesis space constrains what can be learned.

5. The "Reality Gap"

Most impressive Active Inference results are in simulation. When you move to the real world:

Transfer from simulation to reality is a huge challenge. Many systems that work beautifully in simulated environments fail or require extensive tuning when deployed on real robots.

6. The Empirical Track Record Is... Mixed

Let me be blunt: Active Inference has not (yet) produced many practical applications that outperform other methods significantly.

Where it has shown promise:

Where it hasn't dominated:

Why? Combination of:

7. The "Free Energy Principle" Might Be Unfalsifiable

Active Inference is derived from the Free Energy Principle (FEP)—a theoretical framework claiming that all self-organizing systems minimize free energy.

The controversy: Some critics argue the FEP is so general it's unfalsifiable:

This doesn't mean it's wrong, but it means the theoretical foundations are more controversial than proponents sometimes suggest.

8. The Sample Efficiency Claim Needs Context

You saw the claim: "99.8% accuracy with 10% of the data!"

Important context:

The sample efficiency advantage is real in some circumstances, but not universal.

9. The "Understanding" Question

Does Active Inference really "understand" causation, or is it just learning predictive correlations with extra steps?

The debate:

The uncomfortable part: The "causal reasoning" often relies on assumptions engineered into the model, not discovered by the agent.

10. The Hype Cycle

Like many AI approaches, Active Inference has its evangelists who oversell it:

The reality: It's a promising framework with some interesting properties, genuine limitations, and a lot of open research questions. It's not a silver bullet.

11. The Integration Challenge

Even if you wanted to build a hybrid LLM + Active Inference system:

Nobody really knows how to do this well yet.

These are open research problems, not solved engineering challenges.

The Bottom Line

Active Inference is:

It's a legitimate research direction that might contribute to future AI systems, but it's not the revolutionary breakthrough that some marketing materials suggest.

The honest assessment: It's interesting, it has real advantages in specific domains, it's worth researching further, but anyone claiming it's about to transform AI is getting ahead of the evidence.

Does that give you the full, unvarnished picture? I tried to be as honest as possible about the gaps between theory and practice, the unsolved problems, and the legitimate controversies.

That was terrific. Thank you. One last question: don't many of the algorithms inside LLMs also "minimize surprise"? Isn't that what a lot of the learning methods are trying to do? Why is that pitched as novel with Active Inference? (besides that it's done with other algorithms and theories)

Excellent catch. You've identified something that irritates a lot of people in AI: the rebranding of concepts to make them seem more novel than they are.

You're absolutely right. Let me break down what's really going on.

Yes, LLMs Minimize Surprise

What LLMs do during training:

This is explicitly minimizing prediction error / surprise. The loss function is even called "negative log-likelihood," which is mathematically equivalent to surprise.

So when Active Inference advocates say "our agents minimize surprise," anyone familiar with machine learning should think: "...so does literally every predictive model."

What's (Supposedly) Different About Active Inference?

The Active Inference community emphasizes several distinctions, with varying degrees of legitimacy:

1. Action Selection Based on Minimizing Expected Future Surprise

The claim: LLMs minimize surprise over training data (past). Active Inference agents choose actions to minimize surprise over future observations.

Translation: It's planning. The agent simulates "if I do action A, what will I observe, and how surprised will I be?" and picks actions that lead to unsurprising (predictable, preferred) outcomes.

Is this novel? Not really. This is what model-based reinforcement learning does. And optimal control. And planning algorithms. The math is dressed up differently (free energy vs. value functions), but the core idea—"pick actions that lead to good predicted outcomes"—is standard.

2. Both Perception and Action Minimize the Same Objective

The claim: In Active Inference, there's a unified framework where:

Is this novel? Kind of. It's an elegant theoretical unification. Most RL frameworks treat perception (state estimation) and control (action selection) as separate problems, even though they're related.

But pragmatically: When you implement it, you still end up with separate algorithms for inference and control. The "unification" is more philosophical than practical in current implementations.

3. Active Data Collection

The claim: Active Inference agents don't just passively minimize prediction error on whatever data they see. They actively seek information to reduce uncertainty.

Translation: The agent explores specifically to improve its model, not just to maximize reward.

Is this novel? No. This is literally what active learningcuriosity-driven exploration, and information-theoretic exploration bonuses do in RL. These have been around for decades.

The "active" in "Active Inference" sounds distinctive, but information-seeking behavior is well-studied in other frameworks.

4. Generative Models vs. Discriminative Models

The claim: Active Inference uses generative models (P(observation|state)) while many ML systems use discriminative models (P(state|observation)).

Is this novel? No. Generative models have been around forever:

LLMs themselves are generative models! They model P(next_token|context), generating text.

The generative vs. discriminative distinction is real and important, but it's not unique to Active Inference.

The Marketing Problem

Here's what frustrates people: Active Inference repackages well-known concepts with new terminology and claims novelty.

Let me translate some common Active Inference phrases into standard ML/AI terminology:

Active Inference Says Standard AI Says
"Minimize variational free energy" "Maximize evidence lower bound (ELBO)" or "Do approximate Bayesian inference"
"Minimize expected surprise" "Maximize expected reward/utility" (in many formulations)
"Active inference" "Active learning + model-based control"
"Generative model" "Generative model" (same thing)
"Precision-weighted prediction errors" "Kalman gain" or "learning rate based on uncertainty"
"Expected free energy" "Expected value of information + expected utility"

To be fair: The Active Inference framework does combine these ideas in a specific mathematical formulation with some elegant properties. But the core concepts aren't new.

Why The Confusion?

1. Different research communities use different language

Active Inference comes from neuroscience and theoretical biology, trying to explain how brains work. They developed their own terminology independently from the machine learning community.

When they say "minimize surprise," they mean it in the context of the Free Energy Principle, which is trying to explain all self-organizing systems, not just build AI.

2. Genuine theoretical differences (maybe)

There are some theoretical differences in:

But when you implement it on a computer, you often end up with algorithms that look awfully similar to existing methods.

3. Interdisciplinary communication failure

ML researchers often roll their eyes at Active Inference because it seems like old wine in new bottles.

Active Inference researchers get defensive because they feel ML people don't understand the theoretical nuances.

Both sides have valid points, but the communication gap leads to overselling and skepticism.

The Honest Answer To Your Question

"Why is minimizing surprise pitched as novel?"

Because:

  1. It sounds profound when framed as a universal principle rather than a practical algorithm
  2. Marketing - every research community needs to differentiate itself to get funding and attention
  3. Genuine belief by some researchers that the theoretical framework is fundamentally different (even if implementations converge)
  4. Different emphasis - Active Inference emphasizes biological plausibility and theoretical elegance over practical performance
  5. Disciplinary silos - neuroscientists "discovering" things that ML already knew, and vice versa

What IS Somewhat Distinctive About Active Inference?

To be fair, there are some aspects that are relatively distinctive (though not entirely unique):

1. The explicit focus on minimizing surprise about preferred observations

These can be mathematically related, but the framing is different and might lead to different algorithms.

2. The principled way of handling uncertainty and exploration

3. The theoretical unification of perception, action, and learning

4. The connection to neuroscience

The Bottom Line

You're right to be skeptical of the "novelty" claims. Minimizing surprise/prediction error is fundamental to most modern AI. Active Inference packages it differently, adds some theoretical elegance, and makes connections to neuroscience, but it's not a radically new idea.

The legitimate contributions are more about:

But anyone claiming "Active Inference is revolutionary because it minimizes surprise, unlike other AI" is either ignorant of existing ML methods or being deliberately misleading for marketing purposes.

The nuance: There might be valuable theoretical insights in how Active Inference formulates things, even if the core concepts aren't new. Sometimes seeing old ideas from a new angle leads to progress. But the hype often outstrips the substance.

Does that address your (very astute) observation? You're asking exactly the questions that need to be asked to cut through the marketing and get to what's actually going on.

Yes, you've addressed my questions wonderfully. I do have one more: could you cite the top five critiques of Active Inference/Free Energy Principle? Please list authors and URLs.

I'll search for academic critiques of Active Inference and the Free Energy Principle.

Free Energy Principle critique criticism

10 results Let me search for a few more specific critiques.

Raja Valluri Markov blanket trick FEP critique

10 results Let me search for one more prominent critique.

Andrews free energy principle critique unfalsifiable

10 results Perfect! I now have enough material to provide a comprehensive list. Here are the top five critiques:

Top Five Critiques of Active Inference/Free Energy Principle

1. Mathematical Errors in the Original Formulation

Authors: Martin Biehl, Felix A. Pollock, and Ryota Kanai
Paper: "A Technical Critique of Some Parts of the Free Energy Principle" (2021)
URL: https://arxiv.org/abs/2001.06408

This paper proves by counterexamples that the original free energy lemma is mathematically wrong when taken at face value, and shows that crucial steps involving Markov blankets are not generally correct without additional unstated assumptions MediumSpatial Web AI. The authors reveal that various definitions of "Markov blanket" proposed in different works are not equivalent, and conclude that the Bayesian inference interpretation is not sufficiently justified Medium.

2. The "Markov Blanket Trick" - Not a General Principle

Authors: Vicente Raja, Dinesh Valluri, Edward Baggs, Anthony Chemero, and Michael L. Anderson
Paper: "The Markov blanket trick: On the scope of the free energy principle and active inference" (2021)
URL: https://www.sciencedirect.com/science/article/abs/pii/S1571064521000634

These authors argue that FEP is just a way to generalize Bayesian inference to all domains using Markov blanket formalism (what they call the "Markov blanket trick"), and that active inference presupposes successful perception and action instead of explaining them GeniusGenius. They contend that FEP is better understood as a modeling framework rather than a principle for a theory of everything Genius.

3. Unfalsifiability and Lack of Empirical Content

Author: Mel Andrews
Paper: "The Math is not the Territory: Navigating the Free Energy Principle" (2021)
URL: https://philarchive.org/rec/ANDTMI-2

Andrews argues that demands for the FEP to be falsifiable or conform to biological realism rest on a category error, proposing instead that FEP should be understood as a model structure rather than an empirical theory PubMed Central. Andrews contends the principle is not truth-apt because as a set of mathematical tools it does not by itself entail any empirical claims MDPI.

4. Biological Systems Require Biological, Not Physical Principles

Author: Wolfgang Stegemann
Paper: "Criticisms of the theory of free energy or the 'Bayesian brain' by Karl Friston" (2024)
URL: https://medium.com/neo-cybernetics/criticisms-of-the-theory-of-free-energy-or-the-bayesian-brain-by-karl-friston-52749b92c9d3

This critique argues that the theory represents a reversal of causality—minimization of uncertainty is a consequence of actions, not their primary intention, and that organisms evolve to survive and reproduce, not to minimize free energy Diginomica. Stegemann contends that Bayesian formalism and free energy alone are insufficient to explain brain functioning, and that the brain should be understood through biological principles rather than mathematical ones Diginomica.

5. Philosophical and Organicist Critiques

Authors: Various (summarized in Ramstead et al.'s target article responses)
Paper: "The Free Energy Principle: Good Science and Questionable Philosophy in a Grand Unifying Theory" (2021)
URL: https://www.mdpi.com/1099-4300/23/2/238

Organicist critics argue that if historical considerations and lineage matter to understanding organisms, then biological systems should be represented as 'specific' with their trajectories as 'generic,' but free-energy theorists get it backward ArxivScienceDirect. They heavily criticize the FEP's crucial assumption about ergodicity over the system's lifespan, which might lead to insufficient characterizations of organisms' phenotypes, and question whether free energy, surprise, and optimization exist as biological properties in complex adaptive systems Arxiv.


These five critiques cover the major objections: mathematical rigor problems, scope limitations, unfalsifiability concerns, biological inappropriateness, and philosophical issues with the framework's foundational assumptions.