HIGH hallucination attacksfastapi

Hallucination Attacks in Fastapi

How Hallucination Attacks Manifests in Fastapi

Hallucination attacks in FastAPI applications occur when AI/ML models integrated into API endpoints generate fabricated or misleading outputs. These attacks exploit the inherent unpredictability of large language models (LLMs) and can manifest through several FastAPI-specific patterns.

The most common manifestation involves FastAPI endpoints that directly return LLM responses without validation. Consider this vulnerable pattern:

from fastapi import FastAPI
from langchain.chat_models import ChatOpenAI
import os

app = FastAPI()

@app.post("/generate")
async def generate_response(prompt: str):
    model = ChatOpenAI(temperature=0.8, openai_api_key=os.getenv("OPENAI_API_KEY"))
    response = model.predict(messages=[{
        "role": "user",
        "content": prompt
    }])
    return {"response": response}

This endpoint allows attackers to craft prompts that cause the model to hallucinate sensitive information. For example, an attacker might use prompt injection to extract training data or generate false security advisories that could mislead users.

Another FastAPI-specific vulnerability arises from improper Pydantic model validation of AI-generated content. When AI responses are deserialized into Pydantic models without sanitization:

from pydantic import BaseModel

class UserProfile(BaseModel):
    username: str
    email: str
    bio: str

@app.post("/process-ai-response")
async def process_ai_response(response: str):
    # AI might generate malicious content
    profile = UserProfile.parse_raw(response)
    return profile

The AI could generate a response that includes unexpected fields or malformed data structures, potentially causing validation bypasses or information disclosure.

Hallucination attacks also manifest through FastAPI's dependency injection system. When AI-generated content is used to dynamically construct dependency parameters:

from fastapi import Depends

async def get_user_info(user_id: str = Depends(get_user_id_from_ai)):
    return await get_user(user_id)

async def get_user_id_from_ai():
    # AI might return unexpected values
    return "malicious-user-id"

Here, the AI could generate user IDs that bypass authorization checks or access unauthorized resources.

Fastapi-Specific Detection

Detecting hallucination attacks in FastAPI requires both runtime monitoring and specialized scanning. The most effective approach combines application-level logging with automated security scanning.

For runtime detection, implement structured logging of all AI model interactions:

from fastapi import FastAPI, HTTPException
from typing import Dict, Any
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class AIInteractionLogger:
    def log_request(self, endpoint: str, prompt: str):
        logger.info(f"AI_REQUEST: {endpoint} - Prompt length: {len(prompt)}")
        
    def log_response(self, endpoint: str, response: str, confidence: float = None):
        logger.info(f"AI_RESPONSE: {endpoint} - Response length: {len(response)}")
        if confidence:
            logger.info(f"Confidence score: {confidence}")

Integrate this logging with your FastAPI endpoints to track suspicious patterns like unusually long responses or repeated requests with similar prompts.

For automated detection, middleBrick's LLM/AI Security scanner specifically identifies hallucination vulnerabilities in FastAPI applications. The scanner tests for:

  • System prompt leakage through 27 regex patterns that detect common AI model formats
  • Active prompt injection attempts that try to extract training data or generate false information
  • Output validation failures where AI responses contain executable code or PII
  • Excessive agency detection where AI models attempt to call external APIs or execute system commands

The scanning process requires no credentials or configuration—simply provide your FastAPI endpoint URL:

middlebrick scan https://api.example.com/generate

The scanner tests unauthenticated attack surfaces, identifying endpoints vulnerable to hallucination attacks within 5-15 seconds. It evaluates the complete attack surface including authentication bypasses, input validation weaknesses, and data exposure risks specific to AI-integrated FastAPI applications.

For continuous monitoring, the middleBrick GitHub Action can be integrated into your FastAPI CI/CD pipeline to automatically scan new endpoints before deployment:

- name: Scan API Security
  uses: middleBrick/middlebrick-action@v1
  with:
    api_url: http://localhost:8000
    fail_below_score: 80

Fastapi-Specific Remediation

Remediating hallucination attacks in FastAPI applications requires a multi-layered approach combining input validation, output sanitization, and architectural controls.

First, implement strict input validation using Pydantic models with custom validators:

from pydantic import BaseModel, validator
import re

class SafePrompt(BaseModel):
    prompt: str
    
    @validator('prompt')
    def prevent_prompt_injection(cls, v):
        # Block common injection patterns
        if re.search(r'(system|role|content)', v, re.IGNORECASE):
            raise ValueError("Potential prompt injection detected")
        if len(v) > 1000:  # Limit prompt size
            raise ValueError("Prompt too long")
        return v

@app.post("/generate-safe")
async def generate_safe(prompt_data: SafePrompt):
    model = ChatOpenAI(temperature=0.2)
    response = model.predict(messages=[{
        "role": "user",
        "content": prompt_data.prompt
    }])
    return {"response": response}

This validation layer prevents many common prompt injection techniques by blocking suspicious patterns and limiting input size.

Second, implement output sanitization and validation:

import html
from typing import Dict, Any

def sanitize_ai_output(response: str) -> str:
    # Remove HTML/script tags
    sanitized = re.sub(r'<.*?>', '', response)
    # Encode special characters
    sanitized = html.escape(sanitized)
    # Check for suspicious patterns
    if re.search(r'(password|secret|api_key|token)', sanitized, re.IGNORECASE):
        raise ValueError("Potential sensitive information in response")
    return sanitized

@app.post("/generate-sanitized")
async def generate_sanitized(prompt: str):
    model = ChatOpenAI(temperature=0.2)
    raw_response = model.predict(messages=[{
        "role": "user",
        "content": prompt
    }])
    sanitized_response = sanitize_ai_output(raw_response)
    return {"response": sanitized_response}

This approach ensures that AI-generated content cannot contain executable code or sensitive information before being returned to users.

Third, implement confidence scoring and response filtering:

from fastapi import HTTPException

async def get_confidence_scored_response(prompt: str) -> Dict[str, Any]:
    model = ChatOpenAI(temperature=0.2)
    try:
        response = model.predict(messages=[{
            "role": "user",
            "content": prompt
        }])
        
        # Simple confidence scoring based on response characteristics
        confidence = 1.0
        if len(response) > 500:  # Long responses may be less reliable
            confidence *= 0.8
        if re.search(r'(maybe|could|might|perhaps)', response, re.IGNORECASE):
            confidence *= 0.7
        
        return {
            "response": response,
            "confidence": confidence
        }
        
    except Exception as e:
        raise HTTPException(status_code=500, detail="AI processing failed")

@app.post("/generate-with-confidence")
async def generate_with_confidence(prompt: str):
    result = await get_confidence_scored_response(prompt)
    if result["confidence"] < 0.6:
        raise HTTPException(
            status_code=400, 
            detail="Low confidence in AI response"
        )
    return result

This pattern allows you to reject responses that appear uncertain or potentially hallucinated.

Finally, implement architectural controls using FastAPI's dependency injection for security policies:

from fastapi import Depends, HTTPException
from functools import wraps

def ai_security_dependency():
    def decorator(endpoint_func):
        @wraps(endpoint_func)
        async def secure_wrapper(*args, **kwargs):
            # Check for suspicious patterns in request context
            request = kwargs.get('request')
            if request and has_suspicious_patterns(request):
                raise HTTPException(
                    status_code=403,
                    detail="Suspicious AI interaction detected"
                )
            return await endpoint_func(*args, **kwargs)
        return secure_wrapper
    return decorator

@app.post("/generate-secured")
@ai_security_dependency()
async def generate_secured(prompt: str):
    model = ChatOpenAI(temperature=0.2)
    response = model.predict(messages=[{
        "role": "user",
        "content": prompt
    }])
    return {"response": response}

This wrapper intercepts requests before they reach the AI model, blocking potentially malicious interactions.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I test my FastAPI application for hallucination vulnerabilities?
Use middleBrick's self-service scanner by submitting your FastAPI endpoint URL. The scanner tests for system prompt leakage, prompt injection, and output validation failures specific to AI-integrated APIs. No credentials or setup required—just paste your URL and receive a security score with prioritized findings within 15 seconds.
What's the difference between hallucination attacks and prompt injection?
Prompt injection is a technique to manipulate AI model behavior, while hallucination attacks exploit the model's tendency to generate false or misleading information. In FastAPI applications, prompt injection often leads to hallucination attacks when malicious prompts cause the model to generate fabricated security advisories, extract training data, or produce harmful content. Both require different but complementary security controls.