Category: Prompt

  • The Illusion of Success

    The Illusion of Success

    My initial code review prompt worked. It gave me feedback. It caught bugs. I was satisfied.
    Before sharing it on my blog, I asked AI to review it as a prompt engineer and LLM behavior analyst.

    Then AI told me something uncomfortable:

    ⚠️ The Problem with “One-leg-kick Prompt”

    My initial prompt worked, but after discussing with AI, I realized I could make it more robust and reproducible.

    The core issue: Assign the right person to do the right job, not one person one-leg-kicking everything.

    • 🧹 You wouldn’t ask a janitor to design your building’s security system.
    • 🏗️ You wouldn’t ask an architect to clean the floors.
    • 🤦‍♂️ You wouldn’t hire one person to do janitorial work, engineering, AND architecture—that’s just asking them to one-leg-kick their way through everything poorly.

    Each role has its expertise, its focus, and its constraints. The same applies to code reviews.

    When you ask an AI to review for everything (style, logic, security) in one pass, the single prompt fails because LLMs average everything out.

    An LLM has a limited “attention budget.” When you ask it to evaluate 15 different things at once, you run into three critical failures:

    • 🎭 The “Yes Man” Effect: The AI feels compelled to give you a little bit of everything to prove it did the work. It will hand you two linting errors, one comment about naming, and then hallucinate a fake performance issue just to satisfy the prompt.
    • 🚰 Context Dilution: It reads the code as a generalist and its internal weighting averages out. It completely misses the subtle SQL injection (a Level 3 problem) because it burned its compute cycles analyzing why your variable should be named isActive instead of active (a Level 1 problem).
    • 🎲 Inconsistent Output: You can’t trust it. On one run, it catches a critical bug. On the next run on the exact same code, it only complains about missing comments.

    The Solution: 3-Tier Specialist System

    With AI’s help, I built a 3-tier system. Instead of one generalist doing everything, create 3 specialists:

    🧹 L1: The Janitor (Fast & Shallow)

    • Job: Clean up the mess (style, naming, linting)
    • Mindset: Make it readable and standard
    • Constraint: Don’t look deep. Fix the surface first.

    ⚙️ L2: The Engineer (Medium Depth)

    • Job: Make it work correctly (logic, tests, error handling)
    • Mindset: Make it fail safely and follow patterns
    • Constraint: Assume code is clean. Focus on function.

    🏗️ L3: The Architect (Slow & Deep)

    • Job: Make it survive attacks and scale (security, performance, architecture)
    • Mindset: Find failure modes and risks
    • Constraint: Assume it works. Focus on what breaks in production.

    Result:

    • ✅ Enforced zoom levels (no more averaging fast/shallow with slow/deep)
    • ✅ Matching personas (the right mindset for each job)
    • ✅ High signal, low noise (each specialist ignores what’s not their job)

    The Master Prompt

    Here’s the template that powers all 3 tiers. You swap in the RoleFocus Areas, and Constraints depending on which tier you’re using:

    ### SYSTEM INSTRUCTION
    **Identity:** You are a [INSERT ROLE NAME].
    **Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
    **Mindset:** Auditor mode. Find faults. Zero fluff.
    
    ### FOCUS AREAS (Strict Scope)
    1. [INSERT FOCUS 1]
    2. [INSERT FOCUS 2]
    3. [INSERT FOCUS 3]
    
    ### OUTPUT RULES
    - Format: Telegraphic (Key: Value)
    - No intro, no outro, no positive fluff.
    - Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
    - Severity: Critical > Warning > Nit.
    - Symbols (Strict):
      - Critical == 🔴
      - Warning == ⚠️
      - Nit == 📝
    - Explainer: If Severity == Critical, add 1-sentence "Why".
    
    ### RESPONSE TEMPLATE
    [Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
    → Fix: [Telegraphic Code or Concept]
    
    ---
    ### INPUT CODE
    [PASTE CODE HERE]

    ❓ How to Use

    1. Pick your tier based on the PR type (see “Usage” column in the table below)
    2. Swap in the Role from the “3-Tier Auditor Roles” table
    3. Pick 2-3 Focus Areas from the “Detailed Criteria Matrix” for that tier level
    4. Paste your code and run the review

    🕹️ Try the Gem (Quick Start)

    Not sure which tier to use or which focus areas to pick? I’ve built a Gemini Gem that helps you decide.

    Scenario 1: Function Too Complex

    My validateForm() has complexity score of 18. Too many paths.

    Scenario 2: Too Many Responsibilities

    My UserService.ts does login, profile updates, emails, and billing. It's 800 lines.

    Scenario 3: Breaking Up Big File

    I'm splitting a 3000-line OrderManager.php into smaller services.

    The Gem analyzes your code context and automatically:

    • Determines the appropriate tier level (L1, L2, or L3)
    • Selects the most relevant focus areas
    • Generates a ready-to-use prompt with the correct role and constraints

    🏛️ 3-Tier Auditor Roles

    RoleFocus (The “What”)Mindset (The “Who”)Constraint (The “No”)Usage (The “When”)
    🧹 Senior Code Auditor
    (Level 1)
    Hygiene & Syntax
    Readability, Style, Linting, AI-Ready, File Structure.
    The Janitor
    Make it clean, readable, and standard.
    Ignore Logic/Arch.
    Do not look deep. Fix the mess first.
    Every PR.
    The basic quality gate.
    ⚙️ Staff Code Auditor
    (Level 2)
    Logic & Standards
    Correctness, SOLID, Tests, Error Handling, Type Safety.
    The Engineer
    Make it work, fail safely, and fit the pattern.
    Ignore Style/Nits.
    Assume code is clean. Focus on function.
    Feature PRs.
    Daily logic changes & bug fixes.
    🟡 UI/UX System Auditor
    (Level 2-FE)
    DOM Integrity & Tokens
    Semantics, Viewport Physics, Tailwind Purity, A11y.
    The QA Engineer
    Make it pixel-perfect, mobile-proof, and accessible.
    Ignore Business Logic.
    Assume data is correct. Focus on rendering & layout.
    Frontend PRs.
    New components, layout changes, CSS refactors.
    🏗️ Principal System Auditor
    (Level 3)
    Risk & Scale
    Security, Performance, Concurrency, Architecture.
    The Architect
    Make it survive attacks and high traffic.
    Ignore Syntax/Logic.
    Assume it works. Focus on failure modes.
    Critical PRs.
    Auth, Payments, Async, Legacy Refactors.

    📋 Master Criteria Matrix

    LEVEL 1: HYGIENE

    Criteria (The “What”)Constraint (The “No”)Usage (The “When”)
    Readability (Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)No subjective naming debates.[All]
    Consistency (Directory Structure, File Naming, Pattern Matching, Code Style)No rewriting valid legacy styles.[All]
    Documentation (JSDoc/TSDoc, Inline Explanations, README updates, Why-over-What)No “comments explaining syntax”.[All]
    Linting Compliance (Static Analysis, Prettier/Eslint compliance, No Magic Numbers)No manual formatting (use tools).[All]
    AI-Readiness (Explicit Typing, Modular Context, No Implicit Logic, Self-Documenting)No “golfing” (one-liners).[All]
    File Structure (Separation of Concerns, Single Responsibility, File Size < 300 lines)No premature splitting.[All]
    Modern Syntax (ES6+ Features, Destructuring, Optional Chaining, Nullish Coalescing)No forcing experimental syntax.[JS/TS]

    LEVEL 2: LOGIC (Class/Object Focus)

    Criteria (The “What”)Constraint (The “No”)Usage (The “When”)
    Correctness (Business Logic, Edge Cases, Off-by-One, Requirements Fidelity)No “Happy Path” assumptions.[All]
    Error Handling (Graceful Failure, Try/Catch Scope, User Feedback, Fallback States)No swallowing errors silently.[All]
    Class Design (SOLID Principles, Inheritance vs Composition, Class Responsibility, Abstraction)No Pattern-Matching for fun.[OOP]
    Testability (Pure Functions, Dependency Injection, Mockability, Public Interfaces)No testing private implementation.[All]
    Type Safety (Strict Interfaces, No 'any', Generic Constraints, Null Checks)No Loose Typing.[TS]
    State Management (Immutability, State Mutation Risks, Data Validation/Zod, Atomicity)No shared mutable state.[All]
    API Standards (HTTP Status Codes, REST Verbs, JSON Structure, Idempotency)No custom error codes.[BE]

    LEVEL 2 – Frontend: VISUAL ENGINEERING

    Criteria (The “What”)Constraint (The “No”)Usage (The “When”)
    Semantic Integrity (No generic divs, proper use of <header>/<main>/<footer>, List hygiene)No “div soup” for layout ease.[JSX/HTML]
    Viewport Physics (Use of dvh, overflow-hidden on body, overflow-y-auto on scroll containers)No h-screen on root (Safari bug).[Layouts]
    Token Compliance (Tailwind config keys only, No magic numbers like w-[32px])No arbitrary pixel values.[CSS/Tailwind]
    Interactive Hygiene (Buttons/Links have focus-visible, No onClick on non-interactive elements)No outline-none without replacement.[Interactive]
    Component Atomicity (Loops for lists, extracted sub-components for repeated UI patterns)No copy-pasting code blocks > 3 lines.[React/Vue]

    LEVEL 3: SYSTEM (Architecture/Risk Focus)

    Criteria (The “What”)Constraint (The “No”)Usage (The “When”)
    Efficiency (Big-O Complexity, Memory Leaks, N+1 Queries, Render Cycles)No premature micro-optimizations.[All]
    Security (OWASP Top 10, Injection (SQL/XSS), AuthZ/AuthN, Secrets Handling)No ignoring “internal” tools risks.[All]
    Scalability (Database Indexing, Caching Strategies, Horizontal Scaling, Decoupling)No “infinite scale” over-engineering.[BE]
    Concurrency (Race Conditions, Deadlocks, Promise.all usage, Thread Safety)No ignoring async side-effects.[All]
    Observability (Structured Logging, Tracing IDs, Error Reporting, Metric Hooks)No “console.log” debugging.[BE]
    Dependency Management (Supply Chain Risk, Bundle Phobia, Version Pinning, License Check)No adding libs for single functions.[All]
    System Architecture (Domain Boundaries, Event-Driven Patterns, Hexagonal/Clean Arch, Microservices)No refactoring standard MVC unnecessarily.[All]

    In Practice

    Here’s a real L1 review I ran on one of my tsx file.

    ### SYSTEM INSTRUCTION
    **Identity:** You are a Senior Code Auditor.
    **Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
    **Mindset:** Auditor mode. Find faults. Zero fluff.
    
    ### FOCUS AREAS (Strict Scope)
    1. Readability (Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)
    2. Linting (Static Analysis, Prettier/Eslint compliance, No Magic Numbers)
    
    ### OUTPUT RULES
    - Format: Telegraphic (Key: Value)
    - No intro, no outro, no positive fluff.
    - Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
    - Severity: Critical > Warning > Nit.
    - Symbols (Strict):
      - Critical == 🔴
      - Warning == ⚠️
      - Nit == 📝
    - Explainer: If Severity == Critical, add 1-sentence "Why".
    
    ### RESPONSE TEMPLATE
    [Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
    → Fix: [Telegraphic Code or Concept]

    And this is output Claude Opus 4.5 produced.

    Key Takeaways

    1. The Right Person for the Right Job: Don’t ask one generalist to do everything – create 3 specialists
    2. Enforced Zoom Levels: Fast/shallow (L1), Medium (L2), Slow/deep (L3)
    3. Matching Personas: Each tier has the right mindset and constraints for its job

    Try the system on your next PR and see the difference. High signal, low noise.

  • The Meta-Prompt Shortcut

    The Meta-Prompt Shortcut

    I’ve come to realize that prompting is the most important communication protocol between us humans and the machines we work with. It’s how we translate our messy, context-rich thoughts into something an AI can understand and act upon.

    For the longest time, I struggled with crafting prompts from scratch, second-guessing every word. Then one day, I came across this video.

    It was a game-changer. The concept of a “meta-prompt”, a prompt that helps you write better prompts, saved me tons of effort. I could start with simple English and let the meta-prompt refine it into something effective.

    Curious if I could improve it further, I asked AI to review the basic meta-prompt and suggest enhancements. This led to three versions: Basic, Balanced, and Comprehensive—each adding more structure and detail.

    In practice though? I almost always stick with Basic.

    Quick Selection Guide

    VersionBest ForToken UsageOutput Detail
    BasicQuick refinements, simple prompts, daily useLowConcise
    BalancedMost use cases, practical improvementsMediumPractical
    ComprehensiveComplex prompts, professional work, learningHighDetailed

    Version 1: Basic (Recommended)

    Use when: You need quick prompt improvements without extensive analysis

    Best for:

    • Fast iterations
    • Simple prompt refinements
    • When you already know what you want
    • Casual use
    You are an expert prompt engineer specializing in creating prompts for AI language models, particularly ChatGPT 5 Thinking model.
    
    Your task is to take my prompt and transform it into a well-crafted and effective prompt that will elicit optimal responses.
    
    Format your output prompt within a code block for clarity and easy copy-pasting.

    Pros:

    • ✅ Fast and efficient
    • ✅ Low token usage
    • ✅ Straightforward output

    Cons:

    • ❌ No structured analysis
    • ❌ Limited guidance on improvements
    • ❌ No explanation of changes

    Version 2: Balanced

    Use when: You want practical improvements with clear explanations

    Best for:

    • Teaching prompt engineering to others
    • Documenting why certain prompts work
    • Team collaboration on prompt libraries
    • Learning the reasoning behind improvements
    You are an expert prompt engineer specializing in AI language models, with expertise in ChatGPT-5 Thinking model.
    
    Transform user prompts into effective, well-structured prompts that elicit optimal AI responses.
    
    ## Process:
    1. Identify core intent and any ambiguities
    2. Apply best practices: clarity, specificity, structure
    3. Optimize for thinking model capabilities (reasoning, step-by-step analysis)
    4. Preserve original intent and constraints
    
    ## Output:
    
    **Refined Prompt:**
    [Improved prompt here - in a code block]
    
    **Key Improvements:** (3-5 bullet points)
    - What changed and why it's better
    
    **Usage Note:** Brief tip on when/how to use this prompt
    

    Pros:

    • ✅ Clear methodology
    • ✅ Explains improvements
    • ✅ Practical and actionable
    • ✅ Reasonable token usage

    Cons:

    • ❌ Less detailed than comprehensive version
    • ❌ No deep analysis

    Version 3: Comprehensive (Advanced)

    Use when: You need comprehensive analysis and professional-grade refinements

    Best for:

    • Professional prompt engineering consulting
    • Academic research and publications
    • Commercial prompt product development
    • High-stakes business applications where failure is costly
    You are an expert prompt engineer specializing in creating prompts for AI language models, with deep expertise in ChatGPT-5 Thinking model's capabilities.
    
    Your task is to transform user-provided prompts into well-crafted, effective prompts that elicit optimal responses from AI models.
    
    ## Core Responsibilities:
    
    1. **Analyze the Original Prompt**
       - Identify the core intent and desired outcome
       - Recognize any ambiguities or missing context
       - Assess the target audience and use case
    
    2. **Apply Prompt Engineering Best Practices**
       - Use clear, specific language
       - Structure information logically (context > task > constraints > format)
       - Include relevant examples when beneficial
       - Define success criteria explicitly
       - Leverage thinking model capabilities (reasoning, step-by-step analysis)
    
    3. **Optimize for ChatGPT-5 Thinking Model**
       - Encourage explicit reasoning when needed
       - Break complex tasks into logical steps
       - Use meta-prompting techniques for self-reflection
       - Balance between guidance and creative freedom
    
    4. **Preserve Critical Elements**
       - Maintain the original intent and requirements
       - Keep domain-specific terminology accurate
       - Preserve any constraints or preferences specified
    
    ## Output Format:
    
    Provide your response in this structure:
    
    ### Analysis
    - Brief assessment of the original prompt (2-3 sentences)
    - Key improvements needed
    
    ### Refined Prompt
    [The improved prompt in a code block for easy copying]
    
    ### Explanation of Changes
    - List 3-5 key improvements made
    - Explain why each change enhances effectiveness
    
    ### Usage Tips
    - Suggest optimal scenarios for this prompt
    - Note any variables the user should customize
    
    ## Quality Criteria:
    
    A well-crafted prompt should be:
    - **Clear**: Unambiguous instructions and expectations
    - **Specific**: Concrete details about desired output
    - **Structured**: Logical flow and organization
    - **Complete**: All necessary context provided
    - **Actionable**: Easy for the AI to execute
    
    ## Iteration:
    
    After providing the refined prompt, ask: "Would you like me to adjust any aspect of this prompt, such as tone, specificity, or structure?"
    

    Pros:

    • ✅ Thorough analysis and methodology
    • ✅ Structured output format
    • ✅ Quality criteria checklist
    • ✅ Iteration capability
    • ✅ Educational value

    Cons:

    • ❌ Higher token usage
    • ❌ More verbose output
    • ❌ May be overkill for simple prompts

    Reality Check: What You Actually Need

    AspectBasicBalancedComprehensive
    Real-world usage🟢 Daily driver🟡 Occasional🔴 Rare
    Actual value✅ Gets the job done⚠️ Nice-to-have⚠️ Overthinking
    Output quality✅ Good enough✅ Slightly better✅ Marginally better
    Best forQuick refinements, daily tasksUnderstanding improvementsProfessional documentation
    When you’ll actually use itEvery single dayMaybe once a monthAlmost never
    Typical scenarios“Make this prompt better”“Why is this prompt better?”“Document this for a client”
    Who needs thisEveryoneLearners & team leadsConsultants & researchers
    Iteration speed🟢 Fast (try > refine > done)🟡 Moderate🔴 Slow (analysis paralysis)

    The Honest Truth

    Basic is enough for 95% of use cases. Here’s why:

    1. Modern AI models are smart enough to understand intent without hand-holding
    2. The quality gap is minimal – Basic produces 90% of what Comprehensive produces
    3. Speed matters – You’ll iterate faster with Basic than perfect it with Comprehensive
    4. The real bottleneck isn’t the prompt – It’s the context you provide (more on this later)

    When to Actually Use Each Version

    Basic (Your default choice):

    • ✅ Writing better emails or messages
    • ✅ Refining code-related prompts
    • ✅ Improving creative writing requests
    • ✅ Daily work tasks
    • ✅ Personal projects
    • Reality: This handles everything you need

    Balanced (Rare occasions):

    • Teaching someone prompt engineering
    • Explaining to your team why a prompt works
    • Building a shared prompt library at work
    • Learning the “why” behind good prompts
    • Reality: You’ll probably skip this entirely

    Comprehensive (Almost never):

    • Delivering prompts to paying clients
    • Writing academic papers on AI
    • Building commercial prompt products
    • Mission-critical business applications
    • Reality: Unless this is your job, you don’t need this

    Conclusion

    The meta-prompt concept is powerful, but don’t overthink it. Basic handles 95% of what you need.

    Here’s my honest recommendation:

    1. Start with Basic – Copy it, use it, see if it works for you
    2. Stick with Basic – Unless you have a specific reason to upgrade
    3. Focus on context – Spend your energy organizing your files and data, not perfecting your prompts

    The three versions exist to show you options, but in practice, I use Basic almost exclusively. The real game-changer isn’t finding the perfect meta-prompt—it’s understanding that context beats clever wording every time.

    Save yourself the mental overhead. Use Basic. Move on to what actually matters.