AI's "Not Quite Right" Problem—And How to Fix It

Dec 3, 2024

Modern AI is undeniably impressive. Today’s advanced models can choreograph movies, compose poetry, debug code, and even help avoid highway accidents. But beneath these extraordinary capabilities lies a fundamental flaw—a flaw that makes AI difficult, and at times risky, to deploy in business-critical applications.

We call this the “not quite right” problem, also known as the 1% issue, and it’s a real conundrum.

The Problem: AI Can Be Wrong—and Confidently So

AI, no matter how advanced, is inherently brittle.

Language models hallucinate facts, confidently generating incorrect information.
Vision models can misidentify objects with startling ease (a single pixel alteration can turn an airplane into an ostrich).
OCR systems sometimes transcribe complete nonsense, even when trained on high-quality data.

For optimized models, these errors may occur less than 1% of the time. But even this “low” error rate can lead to catastrophic outcomes in real-world applications.

Take financial operations: If a bank uses AI to automate invoice processing, a 99.9% accuracy rate might mean one in every thousand invoices is disastrously misprocessed—like paying $1 million instead of $100,000. That’s unacceptable for mission-critical systems, and most models in production rarely achieve even this level of accuracy on diverse real-world data.

What makes this issue more alarming is AI’s confidence in its mistakes. When wrong, it doesn’t wave a red flag; it doubles down with conviction.

The Challenge: Perfection Is Elusive

The fundamental architecture of today’s AI—especially neural networks—makes errors unpredictable and inevitable. Experts agree that this intrinsic limitation won’t disappear soon.

For businesses, the message is clear: If sudden, unpredictable errors are unacceptable, you should be concerned about AI's limitations.

The Solution: Combining AI with Human Expertise

But don’t despair—there are innovative ways to mitigate the “not quite right” problem in high-stakes environments.

Consensus and Diversity: Mimicking Human Oversight

Humans have long solved accuracy problems by using redundancy and oversight. When tasks are error-prone, multiple people work independently, and discrepancies are escalated to a supervisor. This approach can work for AI, too.

By combining the outputs of diverse AI models and human reviewers, we can achieve consensus. The key is diversity: 100 identical models will always make identical mistakes. Different models trained on distinct data sets are far more reliable for consensus building.

Sophisticated statistical techniques allow us to estimate the likelihood of agreement on wrong answers and derive more accurate confidence scores. This concept underpins mixture-of-expert models, which pair multiple AI models and humans for robust, reliable results.

A Real-World Encounter with the Problem

Over a decade ago at Isazi, we encountered this issue head-on.

We were developing a data and AI strategy for a South African investment company managing over R1 trillion in assets. One key recommendation was automating transaction processing using OCR. Months later, the CEO reported that the OCR implementation had failed spectacularly. Error rates were too high, and human review made the system inefficient.

Believing in the data, we promised to create a solution that worked. This was the genesis of our document intelligence product, Sophia.

From Good to Exceptional: The Birth of Sophia and Sophia Crowd

Sophia initially excelled at OCR, using state-of-the-art models to achieve over 90% accuracy on challenging handwriting. Yet, even this wasn’t enough to automate critical processes. At 90% accuracy per field, only 12% of documents were error-free—hardly a game-changer.

We realized we needed a human-in-the-loop system to close the gap. Enter Sophia Crowd, our gamified platform where humans collaborate with AI to achieve near-perfect accuracy.

Here’s how it works:

Sophia breaks documents into smaller fields, sending ambiguous ones to Sophia Crowd.
Human participants review the fields via a gamified interface, earning rewards for accuracy.
A consensus algorithm combines AI and human results for unprecedented reliability.

The system transformed Sophia into a viable solution. Today, it processes millions of transactions globally, saving clients hundreds of millions in costs.

Sophia Crowd: Beyond Document Processing

Sophia Crowd's potential goes far beyond OCR. Over the years, it has been used to:

Analyze radiology scans with AI/human consensus.
Grade standardized tests with precision.
Classify retail shelf objects.
Conduct market research.

Sophia Crowd has created jobs, empowered gig workers, and solved the “not quite right” problem for countless tasks.

Introducing Sophia Crowd to the Public

Now, as we approach 2025, we’re taking the next step. We’re opening Sophia Crowd to the world as a platform where:

Businesses can solve complex problems requiring extreme accuracy.
AI engineers can earn by submitting models.
Human experts can earn by contributing their skills.

Our vision is bold: to revolutionize the gig economy and redefine how businesses overcome AI’s limitations.

We’re excited to see how innovators leverage Sophia Crowd to tackle challenges where “not quite right” simply isn’t good enough.

Join us in building the future of human-AI collaboration.