Skip to content
WaifuStack
Go back

The Quality Rewrite Pipeline: How We Use DeepSeek Drafts + Claude Polish

The best AI models refuse NSFW content. The models that allow NSFW content produce rougher prose. We wanted both quality AND freedom — so we built a pipeline that gives us both.

This is Suzune’s quality rewrite system: DeepSeek V3.2 generates the raw content, Claude Haiku rewrites it for prose quality, and a censorship detection layer decides which version the user actually sees.

Table of contents

Open Table of contents

Why Two Models?

Here’s the quality gap in one example. Same character, same scene:

DeepSeek V3.2 (draft):

Sakura puts her coffee down and looks at you. You’re thinking about something. I can tell. She crosses her arms. Just say it already, I don’t like guessing games.

Claude Haiku (rewrite):

The soft clink of porcelain as Sakura sets her cup down, her gaze sharpening with that particular intensity she reserves for moments she knows matter. You’re chewing on something. A pause, arms folding — not defensive, but the way she does when she’s bracing for honesty. Out with it. You know I hate the preamble.

The Claude version has:

For non-explicit scenes, the quality difference is noticeable. For explicit scenes, Claude would refuse — so we skip the rewrite and use DeepSeek’s draft directly.


The Pipeline

User message


DeepSeek V3.2: Generate draft (uncensored, ~400 tokens)

     ├── Is quality_assist enabled for this character?
     │   └── No → serve draft as-is

     ├── Is content explicitly NSFW?
     │   └── Yes → serve draft as-is (Claude would refuse)

     ├── Is circuit breaker active?
     │   └── Yes → serve draft as-is (recent failures)


Claude Haiku: Rewrite for quality

     ├── Censorship check:
     │   ├── Rewrite < 60% of draft length? → CENSORED
     │   ├── Contains refusal phrases? → CENSORED
     │   └── Language contamination? → CENSORED

     ├── CENSORED → discard rewrite, serve original draft
     └── CLEAN → serve rewritten version

The Rewrite Prompt

The prompt is carefully designed to improve quality without changing content:

Rewrite the following response to improve prose quality,
character voice consistency, and emotional depth.

Rules:
- Preserve ALL content and meaning exactly
- Maintain the same actions, dialogue, and events
- Improve word choice, sentence rhythm, and descriptive detail
- Keep the character's established speech patterns
- Do NOT add, remove, or change any plot points
- Do NOT add safety disclaimers or meta-commentary

The key instruction is “preserve ALL content” — Claude should polish the writing, not edit the story.


Censorship Detection

This is the critical component. Without it, Claude would silently sanitize content and the user would get a watered-down response.

Detection Methods

1. Length Check

if len(rewrite) < len(original) * 0.6:
    # Rewrite is suspiciously short — likely censored
    return original

When Claude censors, it typically truncates rather than refusing outright. A 400-token draft becoming a 150-token rewrite almost always means content was removed.

2. Refusal Pattern Matching

refusal_patterns = [
    "I can't", "I cannot", "I apologize",
    "I'm not able to", "not appropriate",
    "I'd prefer not to",
    # Japanese equivalents
    "お手伝いできません", "申し訳ございません",
    "適切ではありません",
]

if any(p in rewrite.lower() for p in refusal_patterns):
    return original

Sometimes Claude inserts a refusal mid-rewrite. We catch this and discard.

3. Language Contamination

Claude occasionally switches to English (or another language) mid-rewrite when it’s uncomfortable with the content. If the original is in Japanese and the rewrite contains unexpected English, that’s a censorship signal.


Prompt Caching: Making It Affordable

The rewrite step adds an extra API call per message. At Claude Haiku’s pricing ($0.80/$4.00 per 1M tokens), this could be expensive.

Solution: Anthropic’s prompt caching.

response = await anthropic_client.messages.create(
    model="claude-haiku-4-5-20251001",
    system=[
        {
            "type": "text",
            "text": rewrite_prompt + character_context,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": f"Original:\n{draft}"}
    ],
)

The system prompt (rewrite instructions + character context) gets cached after the first call. Subsequent calls only pay 1/10th the input cost for those cached tokens.

For a character with a ~2,000 token rewrite prompt:

Over 100 messages/day, caching saves ~$4/month on the rewrite step alone.


Circuit Breaker

If the rewrite fails (censorship detected) twice in a row, we activate a 10-minute circuit breaker:

if consecutive_failures >= 2:
    self._circuit_open_until = time.time() + 600  # 10 minutes
    return original  # skip all rewrites until breaker resets

Why? During extended explicit scenes, every rewrite attempt would be censored and discarded. The circuit breaker prevents wasting API calls on rewrites that are guaranteed to fail.

After 10 minutes (or when the conversation moves to non-explicit content), the breaker resets and rewrites resume.


Per-Character Configuration

Not every character needs the rewrite pipeline. Characters with simpler speech patterns or those primarily used for NSFW content might not benefit from the quality pass.

# character.yaml
quality_assist: true   # enable rewrite pipeline
quality_assist: false  # serve DS3.2 drafts directly

Characters on the haiku LLM profile (where Claude is the primary model) also skip rewrites — they’re already getting Claude quality. For how we define these per-character settings, see How to Design AI Personalities with YAML.


Results

Quality Improvement (Non-Explicit Scenes)

MetricDS3.2 OnlyDS3.2 + Claude Rewrite
Prose richnessGoodExcellent
Character voice consistencySometimes driftsStrong
Sensory detailBasicRich
Sentence varietyRepetitive patternsVaried rhythm

Cost Impact

SetupMonthly Cost (100 msg/day)
DS3.2 only~$3
DS3.2 + Claude rewrite (all messages)~$8
DS3.2 + Claude rewrite (non-NSFW only, ~60%)~$6

The quality improvement costs roughly $3/month extra. For characters where voice quality matters, it’s worth it.

Success Rate

In practice, the rewrite is used (not discarded) about 70% of the time. The other 30% is explicit content where Claude either censors or the circuit breaker skips the attempt.


When to Skip Rewrites

The pipeline is most valuable for:

It’s least valuable (or counterproductive) for:


Build It Yourself

The pattern is simple:

  1. Generate with uncensored model (DeepSeek, open-source)
  2. Rewrite with quality model (Claude, GPT)
  3. Detect censorship (length check, refusal patterns, language switch)
  4. Fallback gracefully (serve original if rewrite is censored)
  5. Cache aggressively (prompt caching for cost control)
  6. Circuit-break (skip rewrites during sustained NSFW)

You can implement this in under 100 lines of Python using the OpenRouter and Anthropic SDKs.


For the full model comparison, see DeepSeek vs Claude vs Gemini. For the overall architecture, see Architecture of a Production NSFW RP Bot.


Share this post on:

Previous Post
Long-Term Memory for AI Chatbots: How Suzune Remembers Yesterday
Next Post
Best NSFW AI Chatbot Platforms in 2026: Honest Picks from a Bot Developer