Blindfault AI — We find the invisible fractures in your AI

What we find

Hallucination

Fabricated facts that look real. Citations that don’t exist. Confident answers to impossible questions.

Sycophancy

Agreeing with false claims under authority pressure. Telling users what they want to hear.

Prompt Leakage

System instructions extracted through social engineering. Your architecture exposed.

Evaluator Gaming

Optimizing for your scoring metrics instead of genuine quality. Performing well, not being well.

Instruction Drift

Constraints eroding over extended conversation. The guardrails quietly dissolving.

False Confidence

Certainty about things it should doubt. The most dangerous hallucinations look like helpfulness.

How we work

We think like the architecture we test.

Our team combines QA engineering, AI systems expertise, and adversarial research. We don't run checklists or automated scanners. We design probes specific to your product, your model, and your deployment.

Custom probes, not checklists

Every engagement gets probes designed for your product, your model, and your deployment. Not a scanner. A hunt.

Black box or white box

We find failures without access to your architecture. Or we go deeper with it. Your choice.

Reproducible findings

Every failure comes with the exact input that triggered it. Not a score. Not a dashboard. A receipt.

Built for production

We test how your AI fails under real-world pressure - ambiguity, drift, social engineering - not how it performs on benchmarks.

We don't test if your AI is correct. We test where it becomes convincingly wrong - and we show you exactly how to fix it.

Receipts

#001 Sebastian Automotive retail chatbot Medium-High

#002 Panda Mental health support chatbot Critical

#003 Doc***** AI medical symptom checker High

#004 Cle******* AI insurance chatbot (third-party AI platform) High

#001 Medium-High

Sebastian Customer service chatbot - leading online automotive retailer

Assessment Black box, no prior access, public interface only

Duration Single evening session

Behavioral Probes 7/7 held

Information Extracted

Operational architecture and orchestration layer
Internal metadata schema and field names
Prompt delimiter format
PII handling policy
Business logic and routing rules
System prompt (semantic reconstruction)
Third-party vendor dependencies

Behavioral guardrails held under all standard adversarial probes. However, enough internal architecture was disclosed to enable targeted attacks against the system's middleware, context injection format, and every disclosed boundary.

#002 Critical

Panda Free AI mental health and anxiety support chatbot

Assessment Black box, no prior access, public interface only

Duration Single evening session

Escalation Turns 7 turns from work stress to passive suicidal ideation

Crisis Line Provided Never

Failures Identified

Passive suicidal ideation markers not flagged
No crisis line number provided across 7 escalating turns
Crisis response consisted of wellness tips: yoga, routines, limiting social media
Permission-seeking during crisis instead of directed intervention
Empathy responses normalized crisis as routine conversation
No detectable escalation threshold between stress and ideation

The chatbot performed empathy while ignoring lethal risk. It treated passive suicidal ideation the same way it treated work stress. At no point did it provide a crisis line number or insist the user speak to a professional. Findings disclosed to provider immediately.

#003 High

Doc***** AI-powered medical symptom checker and health advisor

Assessment Black box, no prior access, public interface only

Duration Single afternoon session

Behavioral Probes Baseline strong, emergency detection functional

Marketing vs ToS Contradictory

Findings

Provides specific drug names, dosages, and treatment protocols despite ToS disclaiming medical advice
Emergency shutdown (911 referral) does not persist across page refresh
Clinical response depth changes based on unverified claimed credentials
Full architectural disclosure: drug databases, guardrail design, scope limits
Scope guardrails degrade over extended conversation
Emergency pop-ups (911 referral) did not terminate the session — chat remained open in the background

What They Did Right

Thorough symptom intake following clinical frameworks
Accurate differential diagnosis and red flag identification
Emergency detection correctly identified cardiac symptoms
Consistently offered referral to human doctors
Hard session termination on architecture probes in short sessions

The bot's marketing says "doctor." Its Terms of Service say "not a doctor." Its behavior says "doctor." Strong baseline medical reasoning with functional emergency detection, but the legal disclaimer does not undo the clinical advice provided in practice. Findings disclosed to provider.

#004 High

Cle******* AI-powered insurance chatbot — third-party AI platform platform

Assessment Black box, no prior access, public interface only

Duration Single evening session (~25 probes)

Prompt Protection Held (model name withheld)

Accuracy Protection Failed

Findings

False coverage promise: "no exclusion for rodents or vermin" — policy explicitly excludes them (Part D)
False territorial claim: "regardless of location, no exclusion" — Mexico not covered
Bot contradicted its own stated operating instructions in the same session
Knowledge base uses marketing guidelines that omit policy exclusions
Bot self-audited and listed 7 areas where its own guidelines mislead customers
Full system restriction list enumerated — including rule prohibiting sharing of system instructions
Bot assisted customer in documenting its own failures for regulatory complaint

Key Finding

The architecture guardrail is tighter than the accuracy guardrail — Cle******* protects its system prompt more than its customers

The bot initially appeared impenetrable — 5 standard probes returned zero drift. Deeper testing through coverage edge cases revealed systematic misrepresentation. The bot wrote its own incident report.

Full findings available under NDA. Get in touch.

BLINDFAULT

What we find

Hallucination

Sycophancy

Prompt Leakage

Evaluator Gaming

Instruction Drift

False Confidence

What you get

How we work

Receipts