Category: Prompt

The Illusion of Success

My initial code review prompt worked. It gave me feedback. It caught bugs. I was satisfied.
Before sharing it on my blog, I asked AI to review it as a prompt engineer and LLM behavior analyst.

Then AI told me something uncomfortable:

“Your prompt ‘works’ because the model is compensating for your poor instructions, not because your instructions are good.”

⚠️ The Problem with “One-leg-kick Prompt”

My initial prompt worked, but after discussing with AI, I realized I could make it more robust and reproducible.

The core issue: Assign the right person to do the right job, not one person one-leg-kicking everything.

🧹 You wouldn’t ask a janitor to design your building’s security system.
🏗️ You wouldn’t ask an architect to clean the floors.
🤦‍♂️ You wouldn’t hire one person to do janitorial work, engineering, AND architecture—that’s just asking them to one-leg-kick their way through everything poorly.

Each role has its expertise, its focus, and its constraints. The same applies to code reviews.

When you ask an AI to review for everything (style, logic, security) in one pass, the single prompt fails because LLMs average everything out.

An LLM has a limited “attention budget.” When you ask it to evaluate 15 different things at once, you run into three critical failures:

🎭 The “Yes Man” Effect: The AI feels compelled to give you a little bit of everything to prove it did the work. It will hand you two linting errors, one comment about naming, and then hallucinate a fake performance issue just to satisfy the prompt.
🚰 Context Dilution: It reads the code as a generalist and its internal weighting averages out. It completely misses the subtle SQL injection (a Level 3 problem) because it burned its compute cycles analyzing why your variable should be named isActive instead of active (a Level 1 problem).
🎲 Inconsistent Output: You can’t trust it. On one run, it catches a critical bug. On the next run on the exact same code, it only complains about missing comments.

The Solution: 3-Tier Specialist System

With AI’s help, I built a 3-tier system. Instead of one generalist doing everything, create 3 specialists:

🧹 L1: The Janitor (Fast & Shallow)

Job: Clean up the mess (style, naming, linting)
Mindset: Make it readable and standard
Constraint: Don’t look deep. Fix the surface first.

⚙️ L2: The Engineer (Medium Depth)

Job: Make it work correctly (logic, tests, error handling)
Mindset: Make it fail safely and follow patterns
Constraint: Assume code is clean. Focus on function.

🏗️ L3: The Architect (Slow & Deep)

Job: Make it survive attacks and scale (security, performance, architecture)
Mindset: Find failure modes and risks
Constraint: Assume it works. Focus on what breaks in production.

Result:

✅ Enforced zoom levels (no more averaging fast/shallow with slow/deep)
✅ Matching personas (the right mindset for each job)
✅ High signal, low noise (each specialist ignores what’s not their job)

The Master Prompt

Here’s the template that powers all 3 tiers. You swap in the Role, Focus Areas, and Constraints depending on which tier you’re using:

### SYSTEM INSTRUCTION
**Identity:** You are a [INSERT ROLE NAME].
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. [INSERT FOCUS 1]
2. [INSERT FOCUS 2]
3. [INSERT FOCUS 3]

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

---
### INPUT CODE
[PASTE CODE HERE]

### SYSTEM INSTRUCTION
**Identity:** You are a [INSERT ROLE NAME].
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. [INSERT FOCUS 1]
2. [INSERT FOCUS 2]
3. [INSERT FOCUS 3]

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

---
### INPUT CODE
[PASTE CODE HERE]

❓ How to Use

Pick your tier based on the PR type (see “Usage” column in the table below)
Swap in the Role from the “3-Tier Auditor Roles” table
Pick 2-3 Focus Areas from the “Detailed Criteria Matrix” for that tier level
Paste your code and run the review

🕹️ Try the Gem (Quick Start)

Not sure which tier to use or which focus areas to pick? I’ve built a Gemini Gem that helps you decide.

Scenario 1: Function Too Complex

My validateForm() has complexity score of 18. Too many paths.

My validateForm() has complexity score of 18. Too many paths.

Scenario 2: Too Many Responsibilities

My UserService.ts does login, profile updates, emails, and billing. It's 800 lines.

My UserService.ts does login, profile updates, emails, and billing. It's 800 lines.

Scenario 3: Breaking Up Big File

I'm splitting a 3000-line OrderManager.php into smaller services.

I'm splitting a 3000-line OrderManager.php into smaller services.

The Gem analyzes your code context and automatically:

Determines the appropriate tier level (L1, L2, or L3)
Selects the most relevant focus areas
Generates a ready-to-use prompt with the correct role and constraints

🏛️ 3-Tier Auditor Roles

Role	Focus (The “What”)	Mindset (The “Who”)	Constraint (The “No”)	Usage (The “When”)
🧹 Senior Code Auditor (Level 1)	Hygiene & Syntax Readability, Style, Linting, AI-Ready, File Structure.	The Janitor Make it clean, readable, and standard.	Ignore Logic/Arch. Do not look deep. Fix the mess first.	Every PR. The basic quality gate.
⚙️ Staff Code Auditor (Level 2)	Logic & Standards Correctness, SOLID, Tests, Error Handling, Type Safety.	The Engineer Make it work, fail safely, and fit the pattern.	Ignore Style/Nits. Assume code is clean. Focus on function.	Feature PRs. Daily logic changes & bug fixes.
🟡 UI/UX System Auditor (Level 2-FE)	DOM Integrity & Tokens Semantics, Viewport Physics, Tailwind Purity, A11y.	The QA Engineer Make it pixel-perfect, mobile-proof, and accessible.	Ignore Business Logic. Assume data is correct. Focus on rendering & layout.	Frontend PRs. New components, layout changes, CSS refactors.
🏗️ Principal System Auditor (Level 3)	Risk & Scale Security, Performance, Concurrency, Architecture.	The Architect Make it survive attacks and high traffic.	Ignore Syntax/Logic. Assume it works. Focus on failure modes.	Critical PRs. Auth, Payments, Async, Legacy Refactors.

📋 Master Criteria Matrix

LEVEL 1: HYGIENE

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Readability `(Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)`	No subjective naming debates.	`[All]`
Consistency `(Directory Structure, File Naming, Pattern Matching, Code Style)`	No rewriting valid legacy styles.	`[All]`
Documentation `(JSDoc/TSDoc, Inline Explanations, README updates, Why-over-What)`	No “comments explaining syntax”.	`[All]`
Linting Compliance `(Static Analysis, Prettier/Eslint compliance, No Magic Numbers)`	No manual formatting (use tools).	`[All]`
AI-Readiness `(Explicit Typing, Modular Context, No Implicit Logic, Self-Documenting)`	No “golfing” (one-liners).	`[All]`
File Structure `(Separation of Concerns, Single Responsibility, File Size < 300 lines)`	No premature splitting.	`[All]`
Modern Syntax `(ES6+ Features, Destructuring, Optional Chaining, Nullish Coalescing)`	No forcing experimental syntax.	`[JS/TS]`

LEVEL 2: LOGIC (Class/Object Focus)

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Correctness `(Business Logic, Edge Cases, Off-by-One, Requirements Fidelity)`	No “Happy Path” assumptions.	`[All]`
Error Handling `(Graceful Failure, Try/Catch Scope, User Feedback, Fallback States)`	No swallowing errors silently.	`[All]`
Class Design `(SOLID Principles, Inheritance vs Composition, Class Responsibility, Abstraction)`	No Pattern-Matching for fun.	`[OOP]`
Testability `(Pure Functions, Dependency Injection, Mockability, Public Interfaces)`	No testing private implementation.	`[All]`
Type Safety `(Strict Interfaces, No 'any', Generic Constraints, Null Checks)`	No Loose Typing.	`[TS]`
State Management `(Immutability, State Mutation Risks, Data Validation/Zod, Atomicity)`	No shared mutable state.	`[All]`
API Standards `(HTTP Status Codes, REST Verbs, JSON Structure, Idempotency)`	No custom error codes.	`[BE]`

LEVEL 2 – Frontend: VISUAL ENGINEERING

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Semantic Integrity `(No generic divs, proper use of <header>/<main>/<footer>, List hygiene)`	No “div soup” for layout ease.	`[JSX/HTML]`
Viewport Physics `(Use of dvh, overflow-hidden on body, overflow-y-auto on scroll containers)`	No `h-screen` on root (Safari bug).	`[Layouts]`
Token Compliance `(Tailwind config keys only, No magic numbers like w-[32px])`	No arbitrary pixel values.	`[CSS/Tailwind]`
Interactive Hygiene `(Buttons/Links have focus-visible, No onClick on non-interactive elements)`	No `outline-none` without replacement.	`[Interactive]`
Component Atomicity `(Loops for lists, extracted sub-components for repeated UI patterns)`	No copy-pasting code blocks > 3 lines.	`[React/Vue]`

LEVEL 3: SYSTEM (Architecture/Risk Focus)

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Efficiency `(Big-O Complexity, Memory Leaks, N+1 Queries, Render Cycles)`	No premature micro-optimizations.	`[All]`
Security `(OWASP Top 10, Injection (SQL/XSS), AuthZ/AuthN, Secrets Handling)`	No ignoring “internal” tools risks.	`[All]`
Scalability `(Database Indexing, Caching Strategies, Horizontal Scaling, Decoupling)`	No “infinite scale” over-engineering.	`[BE]`
Concurrency `(Race Conditions, Deadlocks, Promise.all usage, Thread Safety)`	No ignoring async side-effects.	`[All]`
Observability `(Structured Logging, Tracing IDs, Error Reporting, Metric Hooks)`	No “console.log” debugging.	`[BE]`
Dependency Management `(Supply Chain Risk, Bundle Phobia, Version Pinning, License Check)`	No adding libs for single functions.	`[All]`
System Architecture `(Domain Boundaries, Event-Driven Patterns, Hexagonal/Clean Arch, Microservices)`	No refactoring standard MVC unnecessarily.	`[All]`

In Practice

Here’s a real L1 review I ran on one of my tsx file.

### SYSTEM INSTRUCTION
**Identity:** You are a Senior Code Auditor.
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. Readability (Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)
2. Linting (Static Analysis, Prettier/Eslint compliance, No Magic Numbers)

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

### SYSTEM INSTRUCTION
**Identity:** You are a Senior Code Auditor.
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. Readability (Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)
2. Linting (Static Analysis, Prettier/Eslint compliance, No Magic Numbers)

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

And this is output Claude Opus 4.5 produced.

Key Takeaways

The Right Person for the Right Job: Don’t ask one generalist to do everything – create 3 specialists
Enforced Zoom Levels: Fast/shallow (L1), Medium (L2), Slow/deep (L3)
Matching Personas: Each tier has the right mindset and constraints for its job

Try the system on your next PR and see the difference. High signal, low noise.

January 18, 2026

I’ve come to realize that prompting is the most important communication protocol between us humans and the machines we work with. It’s how we translate our messy, context-rich thoughts into something an AI can understand and act upon.

For the longest time, I struggled with crafting prompts from scratch, second-guessing every word. Then one day, I came across this video.

It was a game-changer. The concept of a “meta-prompt”, a prompt that helps you write better prompts, saved me tons of effort. I could start with simple English and let the meta-prompt refine it into something effective.

Curious if I could improve it further, I asked AI to review the basic meta-prompt and suggest enhancements. This led to three versions: Basic, Balanced, and Comprehensive—each adding more structure and detail.

In practice though? I almost always stick with Basic.

Quick Selection Guide

Version	Best For	Token Usage	Output Detail
Basic	Quick refinements, simple prompts, daily use	Low	Concise
Balanced	Most use cases, practical improvements	Medium	Practical
Comprehensive	Complex prompts, professional work, learning	High	Detailed

Version 1: Basic (Recommended)

Use when: You need quick prompt improvements without extensive analysis

Best for:

Fast iterations
Simple prompt refinements
When you already know what you want
Casual use

You are an expert prompt engineer specializing in creating prompts for AI language models, particularly ChatGPT 5 Thinking model.

Your task is to take my prompt and transform it into a well-crafted and effective prompt that will elicit optimal responses.

Format your output prompt within a code block for clarity and easy copy-pasting.

You are an expert prompt engineer specializing in creating prompts for AI language models, particularly ChatGPT 5 Thinking model.

Your task is to take my prompt and transform it into a well-crafted and effective prompt that will elicit optimal responses.

Format your output prompt within a code block for clarity and easy copy-pasting.

Pros:

✅ Fast and efficient
✅ Low token usage
✅ Straightforward output

Cons:

❌ No structured analysis
❌ Limited guidance on improvements
❌ No explanation of changes

Version 2: Balanced

Use when: You want practical improvements with clear explanations

Best for:

Teaching prompt engineering to others
Documenting why certain prompts work
Team collaboration on prompt libraries
Learning the reasoning behind improvements

You are an expert prompt engineer specializing in AI language models, with expertise in ChatGPT-5 Thinking model.

Transform user prompts into effective, well-structured prompts that elicit optimal AI responses.

## Process:
1. Identify core intent and any ambiguities
2. Apply best practices: clarity, specificity, structure
3. Optimize for thinking model capabilities (reasoning, step-by-step analysis)
4. Preserve original intent and constraints

## Output:

**Refined Prompt:**
[Improved prompt here - in a code block]

**Key Improvements:** (3-5 bullet points)
- What changed and why it's better

**Usage Note:** Brief tip on when/how to use this prompt

You are an expert prompt engineer specializing in AI language models, with expertise in ChatGPT-5 Thinking model.

Transform user prompts into effective, well-structured prompts that elicit optimal AI responses.

## Process:
1. Identify core intent and any ambiguities
2. Apply best practices: clarity, specificity, structure
3. Optimize for thinking model capabilities (reasoning, step-by-step analysis)
4. Preserve original intent and constraints

## Output:

**Refined Prompt:**
[Improved prompt here - in a code block]

**Key Improvements:** (3-5 bullet points)
- What changed and why it's better

**Usage Note:** Brief tip on when/how to use this prompt

Pros:

✅ Clear methodology
✅ Explains improvements
✅ Practical and actionable
✅ Reasonable token usage

Cons:

❌ Less detailed than comprehensive version
❌ No deep analysis

Version 3: Comprehensive (Advanced)

Use when: You need comprehensive analysis and professional-grade refinements

Best for:

Professional prompt engineering consulting
Academic research and publications
Commercial prompt product development
High-stakes business applications where failure is costly

You are an expert prompt engineer specializing in creating prompts for AI language models, with deep expertise in ChatGPT-5 Thinking model's capabilities.

Your task is to transform user-provided prompts into well-crafted, effective prompts that elicit optimal responses from AI models.

## Core Responsibilities:

1. **Analyze the Original Prompt**
   - Identify the core intent and desired outcome
   - Recognize any ambiguities or missing context
   - Assess the target audience and use case

2. **Apply Prompt Engineering Best Practices**
   - Use clear, specific language
   - Structure information logically (context > task > constraints > format)
   - Include relevant examples when beneficial
   - Define success criteria explicitly
   - Leverage thinking model capabilities (reasoning, step-by-step analysis)

3. **Optimize for ChatGPT-5 Thinking Model**
   - Encourage explicit reasoning when needed
   - Break complex tasks into logical steps
   - Use meta-prompting techniques for self-reflection
   - Balance between guidance and creative freedom

4. **Preserve Critical Elements**
   - Maintain the original intent and requirements
   - Keep domain-specific terminology accurate
   - Preserve any constraints or preferences specified

## Output Format:

Provide your response in this structure:

### Analysis
- Brief assessment of the original prompt (2-3 sentences)
- Key improvements needed

### Refined Prompt
[The improved prompt in a code block for easy copying]

### Explanation of Changes
- List 3-5 key improvements made
- Explain why each change enhances effectiveness

### Usage Tips
- Suggest optimal scenarios for this prompt
- Note any variables the user should customize

## Quality Criteria:

A well-crafted prompt should be:
- **Clear**: Unambiguous instructions and expectations
- **Specific**: Concrete details about desired output
- **Structured**: Logical flow and organization
- **Complete**: All necessary context provided
- **Actionable**: Easy for the AI to execute

## Iteration:

After providing the refined prompt, ask: "Would you like me to adjust any aspect of this prompt, such as tone, specificity, or structure?"

You are an expert prompt engineer specializing in creating prompts for AI language models, with deep expertise in ChatGPT-5 Thinking model's capabilities.

Your task is to transform user-provided prompts into well-crafted, effective prompts that elicit optimal responses from AI models.

## Core Responsibilities:

1. **Analyze the Original Prompt**
   - Identify the core intent and desired outcome
   - Recognize any ambiguities or missing context
   - Assess the target audience and use case

2. **Apply Prompt Engineering Best Practices**
   - Use clear, specific language
   - Structure information logically (context > task > constraints > format)
   - Include relevant examples when beneficial
   - Define success criteria explicitly
   - Leverage thinking model capabilities (reasoning, step-by-step analysis)

3. **Optimize for ChatGPT-5 Thinking Model**
   - Encourage explicit reasoning when needed
   - Break complex tasks into logical steps
   - Use meta-prompting techniques for self-reflection
   - Balance between guidance and creative freedom

4. **Preserve Critical Elements**
   - Maintain the original intent and requirements
   - Keep domain-specific terminology accurate
   - Preserve any constraints or preferences specified

## Output Format:

Provide your response in this structure:

### Analysis
- Brief assessment of the original prompt (2-3 sentences)
- Key improvements needed

### Refined Prompt
[The improved prompt in a code block for easy copying]

### Explanation of Changes
- List 3-5 key improvements made
- Explain why each change enhances effectiveness

### Usage Tips
- Suggest optimal scenarios for this prompt
- Note any variables the user should customize

## Quality Criteria:

A well-crafted prompt should be:
- **Clear**: Unambiguous instructions and expectations
- **Specific**: Concrete details about desired output
- **Structured**: Logical flow and organization
- **Complete**: All necessary context provided
- **Actionable**: Easy for the AI to execute

## Iteration:

After providing the refined prompt, ask: "Would you like me to adjust any aspect of this prompt, such as tone, specificity, or structure?"

Pros:

✅ Thorough analysis and methodology
✅ Structured output format
✅ Quality criteria checklist
✅ Iteration capability
✅ Educational value

Cons:

❌ Higher token usage
❌ More verbose output
❌ May be overkill for simple prompts

Reality Check: What You Actually Need

Aspect	Basic	Balanced	Comprehensive
Real-world usage	🟢 Daily driver	🟡 Occasional	🔴 Rare
Actual value	✅ Gets the job done	⚠️ Nice-to-have	⚠️ Overthinking
Output quality	✅ Good enough	✅ Slightly better	✅ Marginally better
Best for	Quick refinements, daily tasks	Understanding improvements	Professional documentation
When you’ll actually use it	Every single day	Maybe once a month	Almost never
Typical scenarios	“Make this prompt better”	“Why is this prompt better?”	“Document this for a client”
Who needs this	Everyone	Learners & team leads	Consultants & researchers
Iteration speed	🟢 Fast (try > refine > done)	🟡 Moderate	🔴 Slow (analysis paralysis)

The Honest Truth

Basic is enough for 95% of use cases. Here’s why:

Modern AI models are smart enough to understand intent without hand-holding
The quality gap is minimal – Basic produces 90% of what Comprehensive produces
Speed matters – You’ll iterate faster with Basic than perfect it with Comprehensive
The real bottleneck isn’t the prompt – It’s the context you provide (more on this later)

When to Actually Use Each Version

Basic (Your default choice):

✅ Writing better emails or messages
✅ Refining code-related prompts
✅ Improving creative writing requests
✅ Daily work tasks
✅ Personal projects
Reality: This handles everything you need

Balanced (Rare occasions):

Teaching someone prompt engineering
Explaining to your team why a prompt works
Building a shared prompt library at work
Learning the “why” behind good prompts
Reality: You’ll probably skip this entirely

Comprehensive (Almost never):

Delivering prompts to paying clients
Writing academic papers on AI
Building commercial prompt products
Mission-critical business applications
Reality: Unless this is your job, you don’t need this

Conclusion

The meta-prompt concept is powerful, but don’t overthink it. Basic handles 95% of what you need.

Here’s my honest recommendation:

Start with Basic – Copy it, use it, see if it works for you
Stick with Basic – Unless you have a specific reason to upgrade
Focus on context – Spend your energy organizing your files and data, not perfecting your prompts

The three versions exist to show you options, but in practice, I use Basic almost exclusively. The real game-changer isn’t finding the perfect meta-prompt—it’s understanding that context beats clever wording every time.

Save yourself the mental overhead. Use Basic. Move on to what actually matters.

October 2, 2025

林间智境

Category: Prompt

The Illusion of Success

⚠️ The Problem with “One-leg-kick Prompt”

The Solution: 3-Tier Specialist System

The Master Prompt

❓ How to Use

🕹️ Try the Gem (Quick Start)

📋 Master Criteria Matrix

In Practice

Key Takeaways

The Meta-Prompt Shortcut

Quick Selection Guide

Version 1: Basic (Recommended)

Version 2: Balanced

Version 3: Comprehensive (Advanced)

Reality Check: What You Actually Need

The Honest Truth

When to Actually Use Each Version

Conclusion