From Probability to Precision: Rethinking AI for Critical Applications By Ben Taylor

width= Ben Taylor is a pioneer of the causal decision intelligence movement. Before co-founding Rainbird in 2013, he was a computer scientist at Adobe and went on to lead the technical development of an award-winning AI system that revolutionised the insurance industry. Ben is a frequent speaker on the importance of trust in AI and is a member of the All-Party Parliamentary Group on Artificial Intelligence.
In this post, Ben explores the role of causal reasoning platforms in critical applications. Currently, LLMs are unable to provide precise, causal reasoning. By contrast, causal reasoning platforms operate on explicit rules and logic, delivering precise results and transparent decision-making. By combining the strengths of LLMs in knowledge extraction with the deterministic nature of causal reasoning, Ben argues, we can ensure accuracy and accountability in critical decisions:

In high-stakes AI applications, precision and explainability are paramount, qualities that large language models (LLMs) inherently lack, regardless of prompting strategy or architecture. While LLMs are powerful tools for processing language, their predictive, probabilistic nature makes them unsuitable for tasks requiring deterministic, explainable decision-making.

This article explores the limitations of LLMs, particularly their inability to provide precise, causal reasoning. It introduces the role of causal reasoning platforms, such as Rainbird, to ensure accuracy and accountability in critical decisions.

By combining the strengths of LLMs in knowledge extraction with the deterministic nature of causal reasoning, organisations can achieve transparent and reliable decision intelligence in areas like tax and audit, credit decisioning, and health administration.

The Precision Challenge Of Language Models

Imagine asking an LLM like GPT4 or Sonnet 3.5, a simple arithmetic question: What is 1 + 1?

You will get the correct answer, not because the LLM is performing a calculation, but because it’s been wellexposed to the question in its training data.

Now, consider a slightly more complex question: What is 1 + 2 × 2?

The correct answer requires an understanding of order of precedence. Again, an LLM will likely give you the correct answer, but not because it understands the order it needs to perform the calculation. It’s not performing a calculation at all, it’s predicting the likely best output based on its training data, which for most LLMs, is the public internet.

But what happens when we scale up the complexity?

Let’s pose a larger number calculation directly to an LLM:

Calculate 5674328923 / 993

The response might surprise you. Instead of computing the exact answer, the LLM will generate an approximation and most likely, an incorrect result. If you ask the same question five times, you’ll likely get five different answers.

This is an inherent consequence of the way LLMs are designed. They predict the next word in a sequence based on patterns learned during training, not through precise mathematical computation.

This limitation highlights a fundamental challenge. LLMs are predictive, not deterministic. In high-stakes applications where precision is paramount, relying solely on predictive models will lead to errors that, depending on the use case, may have significant ramifications.

PRECISION IN HIGH-STAKES DOMAINS

Precision isn’t just crucial in calculating arithmetic; it’s vital across any high-stakes domain where decisions are of high consequence.

For example:

Tax and Audit: Ensuring compliance with tax regulations requires precise identification of appropriate tax treatments. Auditors must meticulously evaluate financial statements to detect discrepancies or fraud.
Credit Decisioning: Banks must accurately assess the creditworthiness of applicants to minimise defaults and remain on the right side of financial regulations.
Healthcare Administration: Accurate management of patient records and billing is crucial for ensuring proper treatment, reimbursement and regulatory compliance. Health administrators need precision when tracking healthcare quality metrics, to improve patient outcomes and meet accreditation standards.

In these areas, decisions aren’t just about numbers –they’re about understanding cause and effect in realworld situations. They require reasoning over knowledge while ensuring compliance with regulations, providing transparent and logical explanations for each outcome.

ADDRESSING SYMPTOMS, NOT ROOT CAUSES

When discussing the shortcomings of LLMs, the conversation often centres around avoiding hallucinations – the term used to describe the tendency of these models to generate plausible-sounding but factually incorrect or nonsensical information.

There are many approaches you might take to ground an LLM and limit the impact of hallucinations. You might, for example, use retrieval augmented generation (RAG) to ground the LLM into referencing your own documented sources. Techniques like RAG and other prompting strategies, such as chain-of-thought (CoT), improve an LLM’s recall and may reduce hallucinations, but don’t address the core issue.

Hallucinations are symptomatic of a broader and more fundamental issue – an inherent lack of precision in LLMs. This is because fundamentally, LLMs operate on probability distributions, not logical reasoning.

While RAG and other similar processing architectures may improve factual accuracy, they cannot turn an LLM into a precise, deterministic reasoning engine. They can never be precise.

All of the strategies implemented to mitigate hallucinations are addressing symptoms rather than root causes. The result is an imprecise attempt to simulate logical reasoning without the benefit of actually being logical. OpenAI’s GPT4 o1 model, previously known as Project Strawberry, is a good example of this.

No grounding or prompting strategies can transform a probabilistic text generator into a deterministic reasoner.

When Determinism Matters

Deterministic systems are essential when:

Precision is required: In fields like finance or healthcare where small errors can have significant consequences.
Explanations are necessary: When decision-makers need to understand the causal rationale behind outcomes if they are to trust and act upon them.
Regulatory compliance is mandatory: Where transparent decision-making is crucial for meeting legal and ethical standards.

What’s required to close this gap is a companion technology that can perform the equivalent of a mathematical calculation, only over logical expressions of knowledge.

This encoding approach serves two key purposes: it enables us to derive precise, reproducible answers from knowledge representations, and it allows us to trace the reasoning behind the system’s conclusions.

Renowned computer scientist Judea Pearl, a pioneer in the field of causal reasoning argues that, while data can show us correlations, it can never truly tell us why something happens.

After more than two decades of AI being synonymous with data-centric machine learning, this is a critical distinction. In high-stakes decisions, understanding the causal relationships, the why behind an outcome, is just as important as the outcome itself.

Consider a medical diagnosis scenario. An LLM might correctly correlate a condition based on a list of symptoms, but without understanding the causal relationships between a disease and the resulting symptoms. It can’t provide the kind of explanatory power that a human doctor, or a causal reasoning system, could.

For high-stakes decisions, knowing that two variables are linked isn’t enough. We need to comprehend the underlying mechanisms and causes behind their relationship. Understanding the pattern alone is insufficient, so grasping the causal connections is crucial for making informed, reliable decisions that we can trust.

Enter Causal Reasoning

To address these challenges, we must turn to causal reasoning and symbolic models. Unlike LLMs, which generate outputs based on patterns from their training data, symbolic AI operates on explicit rules and logic. In Rainbird’s case, these are structured as weighted knowledge graphs.

This approach enables nuanced but ultimately deterministic outputs. Given the same inputs, a deterministic system will always produce the same results and is able to explain why.

A brief historical perspective helps us to understand the foundations of this approach.

The field of knowledge representation and reasoning (KR&R) has been a cornerstone of AI research since the discipline’s inception. Early expert systems demonstrated the power of encoding domain knowledge in a form that machines could reason over. While these systems had limitations, they laid the groundwork for modern causal reasoning platforms that have now overcome those limitations.

Causal reasoning powered AI systems consist of two parts:

Knowledge Representation: Experts encode domain knowledge into the system using logical rules and relationships.
Reasoning Engine: The system applies this knowledge to specific data, processing it through logical inference to arrive at evidenced conclusions. Importantly, this approach doesn’t require that all knowledge be binary or perfectly defined. Where there’s uncertainty or ambiguity in the domain knowledge, this can also be encoded as part of the system. The key is that the reasoning process itself remains deterministic. high-stakes decisions, understanding the causal relationships, the why behind an outcome, is just as important as the outcome itself.

This determinism extends to scenarios with sparse or uncertain data. Even when working with incomplete information, a causal reasoning system can provide a perfect chain of reasoning, showing the cause-andeffect relationships that led to its conclusion, accurately propagating any uncertainty and reflecting it in the outcome. When ambiguity or uncertainty exists in the source knowledge or data, it can be explicitly encoded.

This method allows for precise calculations over knowledge and as a result, transparent decisionmaking. Outcomes show the source and implications of all uncertainties.

This isn’t to say that LLMs have no place in highstakes decision-making. While they are a poor proxy for reasoning, they excel at interpreting unstructured data, understanding natural language, and extracting relevant information.

The Best Of Both

Consider the messy, unstructured world we live in; emails, customer reviews, doctor’s notes, written regulation, policy, operating procedures, handbooks and so on. LLMs can manipulate these forms of explicit knowledge into more structured forms of knowledge representation that are computable.

LLMs have been proven to be effective at processing unstructured data, including from sources of knowledge. They can identify key concepts, relationships, rules, weights and probabilities – and translate them into symbolic representations of knowledge that can be reasoned over with absolute precision and transparency.

Let’s return to our earlier example of using an LLM to perform calculations over large numbers.

When we give the large-number calculation to ChatGPT, we find it generates an accurate result in stark contrast with using the LLM directly.

ChatGPT is a tool which has been designed to pass off certain functions to companion technologies. It recognises the complexity of a maths problem, and rather than trying to predict the answer using the LLM it translates the input into a simple Python script. It then executes this script to precisely calculate the answer. This is a simple demonstration of how LLMs can create an executable encoding from unstructured natural language.

Extending that principle, it’s easy to recognise the power of using an LLM to understand a question being asked, together with any relevant data, and passing this to a causal calculator like Rainbird that can compute over knowledge captured in a knowledge graph to provide a precise answer.

The narrative of generative AI is biased towards the advancement and marketing of artificial general intelligence (AGI) by the large frontier model providers. But, the history of AI tells us that business value lies not in generalisation but in the application of such technologies to narrow domains.

The majority of people attempting to extract value from generative AI are not trying to build generalised tools at all, but seek the application of this technology to build narrower tools that can work precisely in critical domains.

LLMs are playing a vital role in brokering interactions between humans and data, while the reasoning is managed by a causal engine that can do so in the context of a model of what is important, in Rainbird’s case, encapsulated in a computable knowledge graph.

Building The Ai We Can Trust

As we look to the future of AI in high-stakes, precision- critical applications, it’s clear that LLMs alone are insufficient. We must leverage the strengths of both generative and causal approaches.

LLMs can serve as powerful front-ends, interpreting human intent and translating it into structured forms. Causal reasoning systems can then take over, providing the precision, determinism, and explainability needed for critical decision-making.

This hybrid approach is already well-proven by Rainbird and powering decisions in tax, audit, law, lending, claims, healthcare and more. It offers the best of both worlds: the flexibility, language understanding and seamless user experience of generative AI, combined with the precision and logical rigour of logical systems.

In a world where we are slowly and inextricably yielding decisions to machines, and where AI is increasingly involved in outcomes that affect people’s lives, careers, and well-being, we cannot afford to rely solely on probabilistic models.

The future of AI lies not in choosing between approaches, but in skillfully combining them to create systems that are greater than the sum of their parts.

We are finally moving closer to a future where all AI systems are accountable and can truly do what we want them to do; drive decisions with precision, transparency, and reliability.