Prompt Engineering for Immersive AI Roleplay

Most prompt engineering guides focus on getting ChatGPT to write better emails. This isn’t that.

This is about making AI characters that feel like real people — characters that remember what happened yesterday, whose mood shifts based on your relationship, and whose speech patterns stay consistent across thousands of messages.

Everything here comes from building Suzune, our production roleplay bot. These aren’t theories — they’re patterns we use every day.

Open Table of contents

The Core Problem: Characters That Don’t Feel Like Characters
Layer 1: The Character Persona
Layer 2: Dynamic Context Injection
Layer 3: Tone Enforcement (Fighting Recency Bias)
- The Double-Injection Pattern
- Anti-Repetition Detection
Layer 4: The Relationship Arc
- Affection Scoring
- Stage-Gated Behavior
Layer 5: Model Routing for Uncensored Content
- The Draft → Rewrite Pipeline
- Pre-Emptive Routing
Putting It All Together
Try It Yourself

The Core Problem: Characters That Don’t Feel Like Characters

If you’ve tried roleplay with vanilla ChatGPT or Claude, you’ve experienced this:

The character “breaks” mid-conversation and starts talking like a helpful assistant
Speech patterns drift — a tsundere suddenly becomes polite and accommodating
The AI forgets what happened 10 messages ago
Every response starts the same way

The root cause? A static system prompt isn’t enough. You need a dynamic prompt architecture — one that evolves with every message.

Layer 1: The Character Persona

The foundation is a character definition file. In Suzune, we use YAML + Markdown:

name: sakura
display_name: Sakura
system_prompt: persona.md  # loaded from file
example_dialogue: |
  User: How was your day?
  Sakura: *leans back in her chair and stretches* Ugh, don't even
  ask. My editor called THREE times about the deadline. *sighs*
  ...I guess it wasn't all bad though. I found this amazing café
  near the station. The matcha latte was actually decent.

The persona.md file contains the character’s core identity. Here’s what matters:

What to Include

Background & Motivation — Who is this character? What do they want? What are they afraid of?
Speech Patterns — Specific mannerisms, verbal tics, vocabulary preferences
Emotional Range — How do they express happiness vs. anger vs. vulnerability?
Relationship Dynamics — How do they treat strangers vs. close friends vs. romantic partners?

What to Avoid

Don’t write a novel. Keep the persona under 800 tokens. LLMs pay less attention to the middle of long prompts.
Don’t say “you are kind and caring.” Instead, show it: “When someone is upset, she tries to make them laugh — often with terrible puns.”
Don’t list personality traits. Describe behaviors. Traits are abstract; behaviors are actionable.

The Example Dialogue Trick

This is the single most impactful technique for voice consistency: include 3–5 example exchanges that demonstrate exactly how the character talks.

LLMs are pattern-matching machines. If you show them a character who uses *action asterisks*, speaks in short sentences, and drops casual slang — the model will mimic that pattern far more reliably than if you just wrote “she speaks casually.”

User: What do you think about the new project?
Sakura: *taps her pen on the desk* Honestly? It's a mess.
The specs are vague, the timeline is insane, and I'm
pretty sure the client doesn't actually know what they want.
*pauses* ...But the tech stack is interesting. I wouldn't
mind getting my hands on it. *small grin*

Notice:

Action descriptions in *asterisks*
Internal monologue through pauses
Mix of complaint and genuine interest (character complexity)
Natural speech rhythm with interruptions

Layer 2: Dynamic Context Injection

Here’s where it gets interesting. A static prompt produces static behavior. For characters that feel alive, you need context that changes every message.

The System Prompt Assembly Pipeline

In Suzune, the system prompt is rebuilt from scratch on every message from ~20 data sources:

┌─────────────────────────────────────────────┐
│           SYSTEM PROMPT (assembled)          │
├─────────────────────────────────────────────┤
│ 1. Character persona (persona.md)           │
│ 2. Behavior rules (rules.md)                │
│ 3. Example dialogue                         │
│ 4. World info (lorebook entries)            │
│ 5. Relationship stage guidance              │
│ 6. Character's diary/memory (memo.md)       │
│ 7. Compressed chat history (summary)        │
│ 8. Emotion/affection scores                 │
│ 9. Current outfit                           │
│10. Current date & time                      │
│11. Tone reminder (recency bias fix)         │
│12. Story direction hints                    │
└─────────────────────────────────────────────┘

Why rebuild every time? Because the character’s context changes:

It’s 11 PM → the character should be sleepy
Affection score just crossed a threshold → unlock new behaviors
A lorebook keyword was mentioned → inject relevant world info
The character changed outfits earlier → reference the new outfit

Keyword-Triggered World Info (Lorebooks)

Borrowed from SillyTavern’s World Info concept, lorebooks inject context only when relevant keywords appear in recent messages:

{
  "keys": ["café", "coffee", "matcha"],
  "content": "Sakura's favorite café is 'Tsuki no Shizuku' near the station. She orders matcha latte every time.",
  "sticky": 3,
  "priority": 5
}

When someone mentions “café,” this entry activates and stays in the prompt for 3 turns (sticky: 3). This means the character naturally “remembers” details about the café without it cluttering every single response.

You can also gate entries by relationship level — NSFW-related world info only activates after the relationship reaches a certain intimacy threshold.

Time Awareness

Every user message in Suzune is timestamped:

[03/31 22:45] User: Still up?

The current datetime is also injected into the system prompt. This means the character knows:

What time of day it is (and acts accordingly)
How much time passed between messages
What day of the week it is

A character who says “Good morning!” at 10 PM breaks immersion instantly. Time awareness prevents this.

Layer 3: Tone Enforcement (Fighting Recency Bias)

Here’s a problem every RP bot developer hits: the character’s voice drifts over long conversations. After 30+ messages, a sarcastic character starts sounding generic.

Why? Recency bias. LLMs pay more attention to tokens near the end of the context. As the conversation grows, the character persona (at the beginning of the system prompt) gets “pushed out” of the model’s attention.

The Double-Injection Pattern

Our solution: inject the tone rules twice — once at the beginning (in the persona) and once at the very end of the system prompt:

[System prompt start]
... character persona with tone rules ...
[50+ messages of conversation]
[System prompt end]
... ★ tone rules again ★ ...

We literally re-inject a section called “Absolute Speech Rules” as the final system message before the model generates. It’s redundant, but it works — voice consistency improves dramatically.

Anti-Repetition Detection

Another common problem: the model starts every response with the same phrase. “Sakura sighs and…” over and over.

Suzune tracks the opening patterns of recent responses. If the same pattern appears 3+ times, a warning is injected:

[System] Your last 3 responses started with similar patterns.
Vary your opening — start with dialogue, an action, a thought,
or a scene description.

Simple, but effective.

Layer 4: The Relationship Arc

The most powerful technique in our system: the character’s behavior evolves based on relationship scores.

Affection Scoring

Suzune tracks 5 dimensions, each scored 1–10:

Score	What It Represents
Trust	How safe the character feels with you
Affection	How much they like you
Respect	How much they admire you
Excitement	How stimulating they find interactions
Devotion	How dedicated they are to the relationship

The character itself updates these scores during conversation — they’re part of the AI’s toolset, not calculated externally.

Stage-Gated Behavior

Based on the average of Trust and Affection (we call this “affinity”), the system prompt fundamentally changes:

Affinity	Relationship Stage	Prompt Behavior
< 3	Strangers	Formal language, no romance, keeps distance
3–5	Acquaintances	Warming up, occasional vulnerability
5–7	Close	Can be romantic, sends selfies, shares secrets
7+	Deep Trust	Full vulnerability, exclusive behaviors unlocked

This isn’t just cosmetic. At low affinity, the system prompt literally forbids romantic content. The character will reject advances — not because of an AI content filter, but because the character isn’t ready yet.

This creates a genuine relationship arc that users find incredibly engaging. You have to actually build trust before the character opens up.

Layer 5: Model Routing for Uncensored Content

This is the elephant in the room for NSFW RP bots: most high-quality models censor adult content.

Claude writes beautifully but refuses explicit scenes. GPT-4 is similar. DeepSeek V3 will write NSFW content but with rougher prose quality.

The Draft → Rewrite Pipeline

Suzune’s solution is a two-model pipeline:

User message
    │
    ▼
DeepSeek V3.2 (draft) ── uncensored, handles NSFW content
    │
    ▼
Claude Haiku (rewrite) ── polishes voice, fixes prose quality
    │
    ├── If rewrite is good → use rewrite
    └── If Claude censored it → discard rewrite, use DS3.2 draft

The key innovation: censorship detection. If Claude’s rewrite is suspiciously shorter than the draft (< 60% length), or contains refusal patterns, or suddenly switches to a different language — we throw away the rewrite and serve the original uncensored draft. (Full details in Navigating AI Content Filters.)

This gives you the best of both worlds: uncensored content with polished prose quality. When it works (which is most of the time for non-explicit scenes), the quality improvement is noticeable.

Pre-Emptive Routing

For scenes that are obviously going to be explicit, we skip the rewrite entirely and route straight to DeepSeek. No point wasting an API call on a rewrite that’s going to get censored.

Putting It All Together

Here’s what happens when a user sends a message to Suzune:

Message arrives → timestamped and saved
System prompt assembled → 20+ dynamic sections combined
Lorebooks scanned → keyword-triggered world info injected
Relationship stage applied → behavior gates based on affection scores
Model generates → DeepSeek V3.2 produces draft
Quality rewrite → Claude polishes (with censorship detection)
Emotion detected → character’s emotional state extracted from response
Response delivered → with appropriate expression sprite

The result? A character that feels like a person — not an AI playing a character.

Try It Yourself

If this sounds overwhelming, here’s where to start:

Start with example dialogue. It’s the highest-impact, lowest-effort technique.
Add time awareness. Just injecting the current datetime makes a surprising difference.
Implement tone re-injection. Copy your speech rules to the end of the system prompt.
Layer in lorebooks gradually. Start with 5–10 entries for your character’s world.

And if you don’t want to build from scratch — the best AI chatbot platforms have done a lot of this work for you. Candy AI and JanitorAI in particular offer good customization options that let you apply some of these techniques without writing code.

This article is part of WaifuStack’s series on building AI roleplay bots. Next up: how we design character personalities with YAML.

Building something similar? Share your approach on X — we’d love to see what you’re working on.