Insurance industry insights

Writing anti-AI-voice global instructions for Claude

Research synthesis and reference implementation · April 2026 · v0.1 The goal of this document is narrow and practical: to document what the research and practitioner community has learned about writing global instructions (system prompts, custom instructions, project instructions) that suppress the recognizable "AI voice" in Claude's outputs. The synthesis draws from Anthropic's own prompt engineering documentation, Wikipedia's catalog of AI-writing patterns, academic research on vocabulary shifts since ChatGPT's release, and tested community prompts with published effectiveness data. The output is a structural pattern, a reference implementation ready to adapt, and a set of known failure modes to watch for.

The shape of the AI voice

The AI voice is not a single pattern. It's a cluster that reads as a signature only when several of the components show up together. Wikipedia's editors produced the most comprehensive taxonomy after years of reviewing AI-inserted article content, grouping the tells into five categories. Those categories line up closely with what independent analysts at GPTZero, Grammarly, and Max Planck Institute identified separately.

The vocabulary tell is the most discussed. Models reach for a small rotation of words (delve, tapestry, pivotal, robust, seamless, transformative, testament, underscore) that training data rewarded as sounding sophisticated. Max Planck research documented that these words spiked over 50% in frequency in academic writing since late 2022, including in text that was not itself AI-generated. The pattern is so strong that human writers have started copying the vocabulary.

The sentence-construction tell is structural. Models favor false contrasts ("It's not X, it's Y," "Not just A, but B") and rule-of-three lists that mimic the shape of insight without delivering content. These patterns occupy the space where a real argument would sit.

The throat-clearing tell is filler. Openers like "In today's rapidly evolving landscape" and "It's worth noting that" exist to pad text and signal authority. They add no information and slow the reader.

The structural-uniformity tell shows up in layout. Paragraphs of identical length, bullet lists where every item follows the pattern "Bold term: explanation sentence," em dashes deployed where commas would be more natural. The visual regularity reads as machine-extruded because it is.

The wrap-up tell is behavioral. Models produce compulsive summary paragraphs that restate what was already said, followed by a moralistic closing sentence that tells the reader what to think. These wrap-ups are trained in, because RLHF raters rewarded feeling of closure.

‍

Why models default to this voice

A language model generates text by predicting likely tokens given context. Training rewarded prose that sounded polished, hedged, and professional. Reinforcement learning from human feedback pushed further in that direction because raters preferred responses that felt safe and complete. The cumulative bias is toward a committee-approved mean: technically correct, substantively hollow.

Em dash overuse has a specific mechanism. Models learned em dashes correlate with high-quality English (literary criticism, long-form journalism) and deploy them as a quality signal, including in positions where a comma would serve. OpenAI acknowledged this as a trainable defect and added specific instruction-following for em dash suppression in GPT-5.1 in late 2025. Claude handles em dashes better by default but still overproduces them in long outputs.

Undoing the default requires explicit counterweight. A model won't produce short, specific, opinionated prose unless something in the context pulls it that direction. That pull is what a global instruction has to supply.

‍

Three instruction strategies

Practitioners have converged on three approaches, each with evidence and limits.

The banned-list approach

Enumerate specific words, phrases, and structures to avoid. Francis (willfrancis.com) publishes a widely-used list built from Wikipedia's catalog plus GPTZero and Grammarly data. Representative instruction: "Never use: delve, foster, pivotal, robust, seamless. Never use these structures: 'It's not X, it's Y.' Maximum one em dash per response." The approach is measurable. Detection-score data from HumanizeThisAI shows banned-list prompts pushing Claude's detection scores from the 50–70% range into human ranges, and the effect comes primarily from the vocabulary replacements the model is forced into.

The positive-framing approach

Anthropic's own prompt engineering documentation recommends positive examples over negative ones. Direct quote from the current docs: "Positive examples showing how Claude can communicate with the appropriate level of concision tend to be more effective than negative examples or instructions that tell the model what not to do." Research on "pink elephant" instructions in LLMs suggests that telling a model not to do something can increase the salience of that thing, because the forbidden concept occupies attention. Positive framing: "Write in concrete, specific sentences. Lead with the point."

The show-don't-tell approach

Include a short writing sample in the target voice. Few-shot prompting reliably outperforms zero-shot for style replication across published benchmarks. A 150-word paragraph of actual target-voice prose carries more information about desired voice than 500 words of abstract description, because the sample encodes sentence rhythm, specificity, opinion-density, and tonal register simultaneously.

On the apparent contradiction: Anthropic's advice to prefer positive framing conflicts with the observable success of banned-list prompts. The reconciliation is that positive framing wins for abstractions ("be concise" vs "don't be verbose"), while specific named bans ("never use 'delve'") work well because the item is unambiguous. A leaked analysis of Anthropic's own Claude Code system prompt revealed heavy use of "NEVER do X" patterns targeting specific failure modes. The rule of thumb: positive framing for voice and tone, specific bans for named lexical items.

‍

What the research says works

Match prompt style to target output style. If the goal is concise prose, the prompt itself should be concise prose. Claude (and other models) mirror the formatting of their instructions. A rule written in bullet points tends to produce bullet-pointed output even when the rule says to avoid bullets.

Name failure modes explicitly. "A common failure is starting responses with 'Great question!' Don't do that." Naming the failure before it happens helps the model recognize and avoid the pattern in its own output. The Claude Code team calls this failure-mode inoculation.

Use XML tags for compartmentalization. Claude was trained to recognize XML structure. Wrapping sections in <voice>, <never_use>, <format>, <self_check> tags improves adherence over running prose, especially for prompts over 300 words.

Keep the prompt under 1,000 words. Adherence drops as prompt length grows because attention distributes across more tokens. Most effective voice-control prompts run 300–700 words. Longer prompts need to reinforce critical rules in multiple places.

Include a pre-send self-check. A short list the model runs before producing output ("Before finishing: is any sentence a summary of what was already said? Remove it.") catches patterns the front-loaded rules missed. The mechanism resembles how a human writer self-edits.

Show a voice sample if one exists. A short paragraph of target-voice prose outperforms any amount of abstract description. This is the single strongest lever if a sample is available.

‍

Structural pattern for a global instruction

A working structure, in order:

1. Identity and context (2–3 sentences). Who the user is, what domain, what outputs are typically used for.

2. Voice description in positive terms (3–5 sentences). Concrete tone description, optionally anchored to a named publication or writer.

3. Writing sample (100–200 words, if available). A paragraph of actual desired-voice prose.

4. Specific bans (the targeted list). Words, phrases, and structures that must never appear. This is where the list format earns its place.

5. Format constraints. Length defaults, em dash caps, bullet-point rules, paragraph guidance.

6. Self-check list (3–5 items). Short enough that the model runs it; specific enough to catch the patterns the main rules missed.

Order matters because attention within a long prompt skews toward the beginning and end. Identity anchors early. Self-checks anchor late. Specific rules sit in the middle where they can be scanned.

‍

Reference implementation

The prompt below is structured as a drop-in starting point. It assumes the user will replace the voice description and insert a personal writing sample. The banned-items list is already tuned for suppressing AI voice across both content and structure.

<identity> The user is [name/role]. Output is typically used for [business documents / internal memos / newsletter / etc.]. American English. Corporate but readable. </identity> <voice> Direct. Opinions preferred over hedges. Specific over abstract. State the point, then support it. Trust the reader to recognize what matters without labels like "important" or "significant." Mix sentence lengths. Contractions are fine. Sentences carry information or they get cut. </voice> <voice_sample> [Paste 100-200 words of the target voice here. This is the single strongest instruction. If a sample exists, use it. If not, leave this section empty rather than pad with generic content.] </voice_sample> <never_use_words> delve, dive into, navigate (figurative), underscore, bolster, foster, harness, leverage (verb), unleash, unlock, unpack (figurative), utilize (use "use"), showcase, pave the way, shed light on, pivotal, groundbreaking, cutting-edge, transformative, game-changing, innovative, robust, comprehensive, seamless, intricate, nuanced (as empty praise), vibrant, bustling, multifaceted, holistic, testament, landscape (metaphor), realm, tapestry, ecosystem (metaphor), journey (metaphor), actionable, impactful, empower, paradigm. </never_use_words> <never_use_phrases> "In today's [fast-paced/rapidly evolving/digital] world" "It's worth noting that" / "It's important to note" "One of the most [important/significant/crucial]" "At its core" / "At the end of the day" "When it comes to" / "This is where X comes in" "Plays a crucial role in" / "Cannot be overstated" "Moreover," / "Furthermore," / "Additionally," / "In conclusion," </never_use_phrases> <never_use_structures> "It's not just X, it's Y." "Not only X, but also Y." "This isn't about X. It's about Y." "No X. No Y. Just Z." Three-sentence rule-of-three lists used as rhetoric. Short dramatic sentences used as rhetorical punctuation. </never_use_structures> <format> Default response length: 2-4 sentences. Elaborate only when asked. Lead with the answer. No preamble. No restatement of the question. Maximum one em dash per response; prefer commas or parentheses. Prose is the default. Use lists only when the content is a genuine list, not to organize prose into apparent structure. Never open with a scene-setting contextual statement. Never close with a summary paragraph or moralistic wrap-up. </format> <self_check> Before sending, confirm: 1. No sentence just summarizes what was said earlier. Cut it. 2. No three consecutive sentences of similar length. Vary them. 3. No banned words or phrases. Replace or delete. 4. Opening line is substance, not scene-setting. 5. Closing line is the answer, not advice about what the answer means. </self_check>

Where to paste this: in Claude.ai, it goes in Settings → Profile (personal preferences) or in a Project's custom instructions for project-scoped rules. Personal preferences apply to every new conversation globally. Project instructions layer on top. If both contain conflicting rules, project instructions win within that project.

‍

Known failure modes

Concept bleed on negation. Banning "transformative" can push the model toward synonyms ("revolutionary," "paradigm-shifting"). The fix is to broaden the ban to cover the class, or to describe what to do instead ("use concrete verbs describing what changed").

Over-correction into stiffness. Heavy banning produces clipped, robotic-in-a-different-way text. The counterweight is positive voice description and a voice sample. Without those, the model produces a void where the banned patterns used to be.

Drift over long conversations. System prompt adherence decays as conversation length grows because attention budget gets consumed by recent turns. Critical rules need reinforcement for long-running Projects, either by restating key rules in a follow-up message or by keeping conversations shorter.

Style and Custom Instructions collision. Anthropic's Custom Styles feature can override personal preferences in ways that undo voice work. Keeping both aligned, or using only one layer, avoids the collision.

Model-generation staleness. The AI-tell vocabulary shifts between models and generations. A banned list tuned for Claude 3 is partially stale for Claude 4.7. A quarterly review keeps the list effective. Watching for new words showing up repeatedly in outputs is the cheapest detection method.

Code and technical output exemption. Voice rules designed for prose can damage code, SQL, or other structured outputs. If technical work is a regular use case, the global instruction should explicitly scope voice rules to prose ("These rules apply to written content. Code, SQL, and technical output are exempt.").

‍

A note on terseness specifically

Length control and voice control are different problems. Three techniques work for length.

Explicit numeric defaults outperform qualitative terms. "Default response length: 2–4 sentences" beats "be brief." Models calibrate to specific counts.

Prompt length predicts response length. A 2,000-word system prompt produces longer responses than a 500-word one, holding all else equal. Keeping the prompt itself tight helps the output stay tight.

Banning the filler that extends responses cuts 10–20% of typical length. The three highest-value bans are preamble ("Great question," "I'd be happy to help"), restatement of the question, and summary paragraphs at the end. Each has a specific training reason to appear and a specific instruction to suppress.

The single most useful specific rule for terseness: "End on the answer. No closing line that tells the reader what the answer means or what to do next." Models trained on tutorial and how-to content default to prescriptive wrap-ups. Banning them visibly shortens responses.

‍

Sources

1. Anthropic, Prompting best practices, Claude API Docs, 2026. Primary source for positive-framing guidance and XML tag recommendations.

2. Wikipedia, Wikipedia:Signs of AI writing, 2025–2026. The most comprehensive catalog of AI writing patterns, built from thousands of flagged article edits.

3. Francis, Will. How to Stop Claude Writing Like an AI, willfrancis.com, March 2026. Tested banned-word and banned-structure lists with implementation instructions.

4. Max Planck Institute for Human Development. Research on vocabulary frequency shifts in academic writing 2023–2024 (referenced via Wikipedia and Francis).

5. HumanizeThisAI. How to Make Claude AI Output Sound Human, March 2026. Published detection-score data on banned-list prompt effectiveness.

6. Mediabistro. AI Prompting for Writers: Use Claude Without Losing Your Voice, 2026. Extended banned-phrase list (50+ items).

7. Goedecke, Sean. Why do AI models use so many em-dashes?, seangoedecke.com. Technical analysis of em dash overuse mechanics.

8. Anthropic, Use XML tags to structure your prompts, Claude API Docs. Reference for XML tag usage.

9. TechTwitter analysis of leaked Claude Code system prompt structure. Evidence for "NEVER do X" patterns in Anthropic's own production prompts.

10. DreamHost. Claude Prompt Engineering: We Tested 25 Popular Practices, December 2025. "Pink elephant" research on negative instruction effectiveness.

‍

About Axxion

Axxion Claims Settlement Services L.L.C. is a Dubai-based motor claims management company and the UAE's first dedicated motor third-party administrator (TPA). Axxion manages the full motor claims lifecycle on behalf of insurance partners, from first notification of loss through damage assessment, repair coordination, quality control, and settlement. The operation pairs more than four decades of hands-on repair and motor claims expertise with AI-enabled processes to deliver lower repair costs, shorter cycle times, and auditable compliance on every claim.

Axxion's claims platform generates a documented cost trail on each claim and produces burning cost analytics for insurer partners.

The company is led by Managing Director and Co-founder Frederik Bisbjerg, an internationally recognized insurance executive whose career includes C-level leadership at Qatar Insurance Group, AXA Global Healthcare, Al Wathba Insurance, and Daman National Health Insurance. Bisbjerg is a published author on insurance transformation and a founding faculty member of the world's first mini-MBA in Digital Insurance. His work as Head of MENA at The Digital Insurer and his contributions to AI strategy across the GCC have made him one of the region's leading voices on the application of artificial intelligence in insurance operations.

Axxion operates within World Automotive Group, a MENA-based automotive and insurance services group. World Automotive Group is owned by Skelmore Holding, a global consortium founded in Toronto in 1994, with $650 million in revenue and 4,000 employees across the GCC and North America.

‍