When you chat with a VTuber’s AI twin or a game NPC that remembers your past adventures, breaking character can ruin the magic. Large language models (LLMs) have the raw conversational talent, but keeping them in character—especially when faced with questions outside their scripted knowledge—is notoriously difficult. AMADEUS, a new RAG-based framework, aims to fix that.

The Problem with Persona Drift

Most role-playing agents (RPAs) rely on a static “persona paragraph” to define who they are. Retrieval-Augmented Generation (RAG) can pull relevant persona chunks into context, but three problems persist:

  1. Truncated context – Fixed-length chunking ignores the varied length and structure of different personas.
  2. Knowledge gaps – When asked something outside the persona, models either hallucinate or lean on irrelevant content.
  3. No suitable benchmark – Existing datasets focus on dialogue, not structured persona knowledge.

For content creators like daily streamers or VTubers—whose backstories and lore evolve weekly—these flaws make maintaining authenticity costly.

AMADEUS: A Three-Part Harmony

The AMADEUS framework introduces three tightly coupled components:

1. Adaptive Context-aware Text Splitter (ACTS)

  • Finds an optimal chunk length per character (based on longest paragraph).
  • Sets generous overlaps to preserve context.
  • Adds hierarchical context so each chunk knows its place in the narrative.

2. Guided Selection (GS)

  • Sorts persona chunks by similarity to the query.
  • Uses an LLM to check if a chunk can help infer relevant attributes, not just facts.
  • Falls back to top similarity chunks if no clear matches.

3. Attribute Extractor (AE)

  • Pulls Beliefs & Values and Psychological Traits from GS-selected chunks.
  • Feeds these attributes back into the final generation, ensuring answers match the character’s personality—even for out-of-knowledge questions.

CharacterRAG: The Missing Dataset

To test AMADEUS, the team built CharacterRAG, a dataset of 15 fictional characters (976K written characters, 450 QA pairs) stripped of external bias. Each persona is broken into six attributes—Activity, Belief & Value, Demographics, Psychological Traits, Skills & Expertise, and Social Relationships—allowing structured evaluation.

The dataset includes MBTI and Big Five (BFI) questionnaire prompts to stress-test role consistency when direct knowledge is missing.

Results Worth Applause

Across GPT-4.1, Gemma3-27B, and Qwen3-32B, AMADEUS:

  • Increased chunk usage rate by ~9% on MBTI tasks.
  • Improved MBTI type prediction accuracy to 85%, far above Naive RAG and web/graph-based baselines.
  • Reduced hallucination scores, particularly on out-of-knowledge personality questions.

Human evaluators gave high marks (Cronbach’s α > 0.8) to AE’s ability to infer traits that matched the intended character.

Why It Matters

For AI-driven entertainment and engagement, AMADEUS offers a scalable way to keep characters consistent without retraining. This is critical for:

  • Game NPCs that must adapt to player history without breaking lore.
  • VTubers and streamers whose personas evolve with their community.
  • Interactive storytelling platforms that blend authored plots with dynamic user input.

By combining smart chunking, guided retrieval, and personality grounding, AMADEUS doesn’t just answer questions—it stays in character.


Cognaptus: Automate the Present, Incubate the Future