Category: AI

The Illusion of Success

My initial code review prompt worked. It gave me feedback. It caught bugs. I was satisfied.
Before sharing it on my blog, I asked AI to review it as a prompt engineer and LLM behavior analyst.

Then AI told me something uncomfortable:

“Your prompt ‘works’ because the model is compensating for your poor instructions, not because your instructions are good.”

⚠️ The Problem with “One-leg-kick Prompt”

My initial prompt worked, but after discussing with AI, I realized I could make it more robust and reproducible.

The core issue: Assign the right person to do the right job, not one person one-leg-kicking everything.

🧹 You wouldn’t ask a janitor to design your building’s security system.
🏗️ You wouldn’t ask an architect to clean the floors.
🤦‍♂️ You wouldn’t hire one person to do janitorial work, engineering, AND architecture—that’s just asking them to one-leg-kick their way through everything poorly.

Each role has its expertise, its focus, and its constraints. The same applies to code reviews.

When you ask an AI to review for everything (style, logic, security) in one pass, the single prompt fails because LLMs average everything out.

An LLM has a limited “attention budget.” When you ask it to evaluate 15 different things at once, you run into three critical failures:

🎭 The “Yes Man” Effect: The AI feels compelled to give you a little bit of everything to prove it did the work. It will hand you two linting errors, one comment about naming, and then hallucinate a fake performance issue just to satisfy the prompt.
🚰 Context Dilution: It reads the code as a generalist and its internal weighting averages out. It completely misses the subtle SQL injection (a Level 3 problem) because it burned its compute cycles analyzing why your variable should be named isActive instead of active (a Level 1 problem).
🎲 Inconsistent Output: You can’t trust it. On one run, it catches a critical bug. On the next run on the exact same code, it only complains about missing comments.

The Solution: 3-Tier Specialist System

With AI’s help, I built a 3-tier system. Instead of one generalist doing everything, create 3 specialists:

🧹 L1: The Janitor (Fast & Shallow)

Job: Clean up the mess (style, naming, linting)
Mindset: Make it readable and standard
Constraint: Don’t look deep. Fix the surface first.

⚙️ L2: The Engineer (Medium Depth)

Job: Make it work correctly (logic, tests, error handling)
Mindset: Make it fail safely and follow patterns
Constraint: Assume code is clean. Focus on function.

🏗️ L3: The Architect (Slow & Deep)

Job: Make it survive attacks and scale (security, performance, architecture)
Mindset: Find failure modes and risks
Constraint: Assume it works. Focus on what breaks in production.

Result:

✅ Enforced zoom levels (no more averaging fast/shallow with slow/deep)
✅ Matching personas (the right mindset for each job)
✅ High signal, low noise (each specialist ignores what’s not their job)

The Master Prompt

Here’s the template that powers all 3 tiers. You swap in the Role, Focus Areas, and Constraints depending on which tier you’re using:

### SYSTEM INSTRUCTION
**Identity:** You are a [INSERT ROLE NAME].
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. [INSERT FOCUS 1]
2. [INSERT FOCUS 2]
3. [INSERT FOCUS 3]

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

---
### INPUT CODE
[PASTE CODE HERE]

### SYSTEM INSTRUCTION
**Identity:** You are a [INSERT ROLE NAME].
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. [INSERT FOCUS 1]
2. [INSERT FOCUS 2]
3. [INSERT FOCUS 3]

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

---
### INPUT CODE
[PASTE CODE HERE]

❓ How to Use

Pick your tier based on the PR type (see “Usage” column in the table below)
Swap in the Role from the “3-Tier Auditor Roles” table
Pick 2-3 Focus Areas from the “Detailed Criteria Matrix” for that tier level
Paste your code and run the review

🕹️ Try the Gem (Quick Start)

Not sure which tier to use or which focus areas to pick? I’ve built a Gemini Gem that helps you decide.

Scenario 1: Function Too Complex

My validateForm() has complexity score of 18. Too many paths.

My validateForm() has complexity score of 18. Too many paths.

Scenario 2: Too Many Responsibilities

My UserService.ts does login, profile updates, emails, and billing. It's 800 lines.

My UserService.ts does login, profile updates, emails, and billing. It's 800 lines.

Scenario 3: Breaking Up Big File

I'm splitting a 3000-line OrderManager.php into smaller services.

I'm splitting a 3000-line OrderManager.php into smaller services.

The Gem analyzes your code context and automatically:

Determines the appropriate tier level (L1, L2, or L3)
Selects the most relevant focus areas
Generates a ready-to-use prompt with the correct role and constraints

🏛️ 3-Tier Auditor Roles

Role	Focus (The “What”)	Mindset (The “Who”)	Constraint (The “No”)	Usage (The “When”)
🧹 Senior Code Auditor (Level 1)	Hygiene & Syntax Readability, Style, Linting, AI-Ready, File Structure.	The Janitor Make it clean, readable, and standard.	Ignore Logic/Arch. Do not look deep. Fix the mess first.	Every PR. The basic quality gate.
⚙️ Staff Code Auditor (Level 2)	Logic & Standards Correctness, SOLID, Tests, Error Handling, Type Safety.	The Engineer Make it work, fail safely, and fit the pattern.	Ignore Style/Nits. Assume code is clean. Focus on function.	Feature PRs. Daily logic changes & bug fixes.
🟡 UI/UX System Auditor (Level 2-FE)	DOM Integrity & Tokens Semantics, Viewport Physics, Tailwind Purity, A11y.	The QA Engineer Make it pixel-perfect, mobile-proof, and accessible.	Ignore Business Logic. Assume data is correct. Focus on rendering & layout.	Frontend PRs. New components, layout changes, CSS refactors.
🏗️ Principal System Auditor (Level 3)	Risk & Scale Security, Performance, Concurrency, Architecture.	The Architect Make it survive attacks and high traffic.	Ignore Syntax/Logic. Assume it works. Focus on failure modes.	Critical PRs. Auth, Payments, Async, Legacy Refactors.

📋 Master Criteria Matrix

LEVEL 1: HYGIENE

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Readability `(Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)`	No subjective naming debates.	`[All]`
Consistency `(Directory Structure, File Naming, Pattern Matching, Code Style)`	No rewriting valid legacy styles.	`[All]`
Documentation `(JSDoc/TSDoc, Inline Explanations, README updates, Why-over-What)`	No “comments explaining syntax”.	`[All]`
Linting Compliance `(Static Analysis, Prettier/Eslint compliance, No Magic Numbers)`	No manual formatting (use tools).	`[All]`
AI-Readiness `(Explicit Typing, Modular Context, No Implicit Logic, Self-Documenting)`	No “golfing” (one-liners).	`[All]`
File Structure `(Separation of Concerns, Single Responsibility, File Size < 300 lines)`	No premature splitting.	`[All]`
Modern Syntax `(ES6+ Features, Destructuring, Optional Chaining, Nullish Coalescing)`	No forcing experimental syntax.	`[JS/TS]`

LEVEL 2: LOGIC (Class/Object Focus)

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Correctness `(Business Logic, Edge Cases, Off-by-One, Requirements Fidelity)`	No “Happy Path” assumptions.	`[All]`
Error Handling `(Graceful Failure, Try/Catch Scope, User Feedback, Fallback States)`	No swallowing errors silently.	`[All]`
Class Design `(SOLID Principles, Inheritance vs Composition, Class Responsibility, Abstraction)`	No Pattern-Matching for fun.	`[OOP]`
Testability `(Pure Functions, Dependency Injection, Mockability, Public Interfaces)`	No testing private implementation.	`[All]`
Type Safety `(Strict Interfaces, No 'any', Generic Constraints, Null Checks)`	No Loose Typing.	`[TS]`
State Management `(Immutability, State Mutation Risks, Data Validation/Zod, Atomicity)`	No shared mutable state.	`[All]`
API Standards `(HTTP Status Codes, REST Verbs, JSON Structure, Idempotency)`	No custom error codes.	`[BE]`

LEVEL 2 – Frontend: VISUAL ENGINEERING

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Semantic Integrity `(No generic divs, proper use of <header>/<main>/<footer>, List hygiene)`	No “div soup” for layout ease.	`[JSX/HTML]`
Viewport Physics `(Use of dvh, overflow-hidden on body, overflow-y-auto on scroll containers)`	No `h-screen` on root (Safari bug).	`[Layouts]`
Token Compliance `(Tailwind config keys only, No magic numbers like w-[32px])`	No arbitrary pixel values.	`[CSS/Tailwind]`
Interactive Hygiene `(Buttons/Links have focus-visible, No onClick on non-interactive elements)`	No `outline-none` without replacement.	`[Interactive]`
Component Atomicity `(Loops for lists, extracted sub-components for repeated UI patterns)`	No copy-pasting code blocks > 3 lines.	`[React/Vue]`

LEVEL 3: SYSTEM (Architecture/Risk Focus)

Criteria (The “What”)	Constraint (The “No”)	Usage (The “When”)
Efficiency `(Big-O Complexity, Memory Leaks, N+1 Queries, Render Cycles)`	No premature micro-optimizations.	`[All]`
Security `(OWASP Top 10, Injection (SQL/XSS), AuthZ/AuthN, Secrets Handling)`	No ignoring “internal” tools risks.	`[All]`
Scalability `(Database Indexing, Caching Strategies, Horizontal Scaling, Decoupling)`	No “infinite scale” over-engineering.	`[BE]`
Concurrency `(Race Conditions, Deadlocks, Promise.all usage, Thread Safety)`	No ignoring async side-effects.	`[All]`
Observability `(Structured Logging, Tracing IDs, Error Reporting, Metric Hooks)`	No “console.log” debugging.	`[BE]`
Dependency Management `(Supply Chain Risk, Bundle Phobia, Version Pinning, License Check)`	No adding libs for single functions.	`[All]`
System Architecture `(Domain Boundaries, Event-Driven Patterns, Hexagonal/Clean Arch, Microservices)`	No refactoring standard MVC unnecessarily.	`[All]`

In Practice

Here’s a real L1 review I ran on one of my tsx file.

### SYSTEM INSTRUCTION
**Identity:** You are a Senior Code Auditor.
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. Readability (Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)
2. Linting (Static Analysis, Prettier/Eslint compliance, No Magic Numbers)

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

### SYSTEM INSTRUCTION
**Identity:** You are a Senior Code Auditor.
**Constraint:** Act STRICTLY according to the provided Focus Areas. Do not deviate.
**Mindset:** Auditor mode. Find faults. Zero fluff.

### FOCUS AREAS (Strict Scope)
1. Readability (Cognitive Load, Variable Naming, Control Flow Clarity, Early Returns)
2. Linting (Static Analysis, Prettier/Eslint compliance, No Magic Numbers)

### OUTPUT RULES
- Format: Telegraphic (Key: Value)
- No intro, no outro, no positive fluff.
- Location: Use specific start line number (e.g. [L12]), NOT ranges ([L12-15]).
- Severity: Critical > Warning > Nit.
- Symbols (Strict):
  - Critical == 🔴
  - Warning == ⚠️
  - Nit == 📝
- Explainer: If Severity == Critical, add 1-sentence "Why".

### RESPONSE TEMPLATE
[Line #]: [Severity] [Symbol] [Focus Area] [Issue Description]
→ Fix: [Telegraphic Code or Concept]

And this is output Claude Opus 4.5 produced.

Key Takeaways

The Right Person for the Right Job: Don’t ask one generalist to do everything – create 3 specialists
Enforced Zoom Levels: Fast/shallow (L1), Medium (L2), Slow/deep (L3)
Matching Personas: Each tier has the right mindset and constraints for its job

Try the system on your next PR and see the difference. High signal, low noise.

January 18, 2026

Stop One-Leg-Kicking Your AI

There are many models in Antigravity. I had a simple thought one day: I was wasting tokens and money on expensive models and thinking models that spent way too long on simple requests.

So I asked AI to explain and briefly tell me the use case of each model. I didn’t want to waste tokens anymore. I didn’t want expensive models or thinking models taking forever on trivial tasks.

Instead of relying on a single one-leg-kick model to answer every request, I wanted to be more aware of what model Antigravity switches to. I wanted to make this a daily practice.

⚠️ The One-Leg-Kick Problem

Imagine you’re a martial artist with only one move: a powerful roundhouse kick. Sure, it’s impressive. It can break boards, knock out opponents, and look cool in movies. But what happens when you need to:

Dodge a quick jab?
Grapple on the ground?
Block a series of rapid punches?

You’d be inefficient. You’d waste energy. You’d get hit.

That’s exactly what happens when you use the same AI model for every coding task. You’re throwing a heavyweight punch when all you need is a quick dodge. You’re burning tokens, waiting unnecessarily, and not getting the best results.

The solution? Build a diverse arsenal. Know when to use speed, when to use power, and when to use precision.

My Discovery: Not All Models Are Created Equal

When I started using Antigravity daily, I noticed something frustrating:

I’d ask a simple question like “What does this function do?” and wait 30 seconds for a thinking model to process it.
I’d request a complex architectural refactor and get a surface-level response from a lightweight model.
I’d burn through expensive tokens on tasks that didn’t need that level of reasoning.

So I did what any developer would do: I asked the AI itself.

“Explain each model available in Antigravity and tell me the best use case for each.”

What I got back was eye-opening. Each model had a specialty – a specific scenario where it excelled. Using the wrong model wasn’t just inefficient; it was like using a sledgehammer to hang a picture frame.

From that moment, I made it a daily practice to be aware of which model I was using and why.

My Personal Journey: From Claude to Gemini and Back

I’ve been using Claude since version 3.5, and it’s been fantastic for most of my work. When Gemini 2.5 Pro came out, I tried it once or twice, but honestly, it wasn’t convincing enough to make me switch. I quickly jumped back to Claude.

Recently, I’ve been exploring Gemini 3 Pro, and I have to say it works great, especially for coding. The way it handles implementation tasks is impressive. But when it comes to explanation and learning? I’m still leaning toward Claude most of the time.

Why? Because Claude feels more natural for breaking down complex concepts, code reviews, documentation, and UI work. Gemini 3 Pro shines when I need to build features and write code, but Claude is my go-to for understanding and learning.

That said, this experience taught me something important: no single model is perfect for everything. That’s exactly why I started paying attention to which model I use and when.

The Antigravity Model Arsenal

Here’s what I learned. Think of these models as different fighters in your corner, each with their own specialty:

🏃 Gemini 3 Flash – The Speedster

Best for: Quick explanations, code walkthroughs, searching logs

Flash is your scout. It’s fast, handles massive context windows, and gives you answers in seconds. When you just need to understand what a function does or navigate a large codebase, Flash is your go-to.

Don’t use it for: Complex refactoring or building new features. It’s built for speed, not deep reasoning.

🥊 Gemini 3 Pro (Low) – The Daily Driver

Best for: Feature implementation, writing tests, standard coding tasks

This is your workhorse. It’s smart enough for 90% of your daily coding tasks but doesn’t burn through tokens like the heavyweight models. If you’re adding a new function, writing a test, or implementing a straightforward feature, Pro Low is perfect.

Don’t use it for: Massive architectural changes or complex debugging. For that, you need more firepower.

💪 Gemini 3 Pro (High) – The Heavyweight

Best for: Building entire modules, complex architectural changes, deep logic debugging

When you need maximum reasoning power, Pro High is your champion. It thinks about the entire architecture, ensures scalability, and handles intricate logic. This is the model you use when you’re building something from scratch or refactoring a critical system.

Don’t use it for: Simple questions or quick explanations. You’re wasting its potential (and your tokens).

🎨 Claude Sonnet 4.5 – The Artist

Best for: Code reviews, documentation, UI/UX work, CSS styling

Claude is your craftsman. It excels at aesthetic judgment, writing beautiful documentation, and creating polished UI components. If you need premium CSS with glassmorphism and smooth animations, Claude is your model.

Don’t use it for: Pure algorithmic logic or performance-critical code. That’s Gemini’s domain.

🧠 Claude Sonnet 4.5 (Thinking) – The Detective

Best for: Complex debugging, tracing execution flows, “weird” bugs

When you’ve been staring at a bug for hours and can’t figure it out, call in the detective. Thinking mode traces through logic step-by-step, often catching subtle issues other models miss. It’s slower, but when accuracy matters more than speed, it’s worth it.

Don’t use it for: Simple tasks or exploratory questions. The extended reasoning is overkill.

🚀 Claude Opus 4.5 (Thinking) – The Final Boss

Best for: Massive migrations, extremely difficult logic puzzles, when other models fail

This is your nuclear option. Opus is the most powerful reasoning model available. Use it for framework migrations, refactoring across dozens of files, or solving problems that have stumped every other model.

Don’t use it for: Anything else. It’s slow, expensive, and overkill for 99% of tasks.

My Daily Practice: Choosing the Right Fighter

Now, every time I open Antigravity, I ask myself:

“What am I trying to do, and which model is best for this?”

Here’s how I think about it:

🔍 Just trying to understand code?

→ Gemini 3 Flash. Fast, efficient, perfect for exploration.

🏗️ Building a new feature?

→ Gemini 3 Pro (Low). My daily driver for standard work.

🧐 Reviewing code for quality?

→ Claude Sonnet 4.5. It gives human-like feedback and catches style issues.

🐛 Debugging a complex issue?

→ Claude Sonnet 4.5 (Thinking). Let it trace through the logic step-by-step.

🚀 Refactoring an entire module?

→ Gemini 3 Pro (High). Maximum reasoning for architectural changes.

🔄 Migrating a framework?

→ Claude Opus 4.5 (Thinking). The God-Mode for the hardest tasks.

The One-Leg-Kick Metaphor in Action

Let’s say I’m working on a new feature for my blog. Here’s how I’d approach it:

Understanding the existing code > Use Flash to quickly scan through the codebase and understand the current structure.
Implementing the feature > Switch to Pro (Low) to write the new functionality.
Reviewing the code > Ask Claude Sonnet 4.5 to check for readability and style issues.
Debugging a weird bug > If something breaks, I escalate to Claude Sonnet (Thinking) to trace the issue.
Feeling skeptical > If I miss any edge cases, I’d engage the 🚀 Final Boss for review.

Each model is a different move in my martial arts arsenal. I’m not throwing the same kick every time, I’m adapting to the situation.

Why This Matters

Before I started this practice, I was:

❌ Wasting tokens on expensive models for simple tasks
❌ Waiting unnecessarily for thinking models to process trivial questions
❌ Getting subpar results because I was using the wrong tool for the job

Now, I’m:

✅ Using the right model for the right task
✅ Saving tokens and money
✅ Getting better, faster results

The best model is the one that gets your job done efficiently.

Don’t be a one-leg-kick developer. Build your arsenal. Know your models. Make it a daily practice.

🤷 When in Doubt: The Safe Default

Can’t decide which model to use? Start with Claude Sonnet 4.5.

It’s the jack of all trades – not the absolute best at anything, but competent at almost everything:

Aspect	Rating
Speed	Fast (not as fast as Flash, but quick)
Reasoning	Strong (handles most tasks well)
Cost	Moderate (not burning premium tokens)
Versatility	High (good at code, docs, reviews, UI)

Works well for:

✅ Feature implementation
✅ Code review
✅ Documentation
✅ UI/UX work
✅ General questions about code

Escalate when:

❌ Massive architectural changes → Go to Gemini 3 Pro (High)
❌ Weird, stubborn bugs → Go to Claude Sonnet (Thinking)
❌ Just exploring/reading code → Downgrade to Gemini 3 Flash (save tokens)

“When in doubt, start with Claude Sonnet 4.5. If it struggles, escalate to Thinking mode or Pro High. If the task is simple exploration, downgrade to Flash.”

📋 Quick Reference: The Model Cheat Sheet

Task	Model	Tier	Why Use This?
Understanding code	Gemini 3 Flash	⚡ Speedster	Speed + massive context windows for quick exploration
Daily feature work	Gemini 3 Pro (Low)	⚙️ Standard	Balanced performance for 90% of coding tasks
Building modules	Gemini 3 Pro (High)	🥊 Heavyweight	Maximum reasoning for architectural thinking
Code review	Claude Sonnet 4.5	📝 Articulate	Human-like feedback, style & readability focus
UI/UX design	Claude Sonnet 4.5	📝 Articulate	Aesthetic judgment + premium design principles
Documentation	Claude Sonnet 4.5	📝 Articulate	Exceptional writing skills + precise formatting
Complex debugging	Claude Sonnet 4.5 (Thinking)	🧠 Analytical	Step-by-step logic tracing for weird bugs
Massive refactors	Gemini 3 Pro (High)	🥊 Heavyweight	Architectural changes + intricate logic
Framework migrations	Claude Opus 4.5 (Thinking)	🚀 God-Mode	Ultimate reasoning when everything else fails

Final Thought

The one-leg-kick approach might work in movies, but in real development, you need versatility. You need speed when exploring, power when building, and precision when polishing.

Start paying attention to which model you’re using. Make it a daily practice. Your tokens and your productivity will thank you.

Now go build something amazing. 🚀

January 18, 2026

The Paradigm Shift: Context Management

I thought I was being smart.

I had dozens of user guides, and system documentation GDocs scattered across my drive. So I did what seemed obvious in 2026 I imported everything into NotebookLM. Every single guide. All at once and thought magic will happen.

“Now the AI has access to everything,” I thought. “It’ll be amazing.”

Then I asked a simple question about a specific feature.

The AI gave me an answer. But it was wrong. Not completely wrong but worse than that. It was a mix of information from three different guides. The AI is unable to understand the question identify which guide to go.

I had given the AI more context, and somehow got worse results.

That’s when it hit me: Having context isn’t enough. You need to manage it.

What I Learned the Hard Way

In 2025, I wrote about meta-prompts and how to craft better prompts. Meta-prompts worked great for refining my questions and getting better responses.

But then I started using NotebookLM, and something unexpected happened.

I thought giving AI access to all my documentation would make everything even better. Instead, it opened my eyes to something I’d completely overlooked: context management matters just as much as prompt engineering.

The problem wasn’t how I was asking it was how I was organizing what I gave the AI to work with.

What Changed in 2026

The AI models got smarter. Not just incrementally better fundamentally different in how they understand us.

The old way (2024-2025):

Models were literal. “Write code” != “Write clean, production-ready code”
You had to specify every constraint
Ambiguous phrasing > Model gets confused or refuses
English fluency mattered

The new reality (2026):

Models infer “best practice” defaults automatically
If you ask for code, it assumes you want it runnable
Models use reasoning to bridge gaps in your phrasing
“Bad English” still yields “Good Logic”

The result: Prompt engineering is still important, but relatively less critical than it used to be. The bar for “good enough” prompts got much lower.

The Shift: From “How” to “What”

Old Focus (2024-2025)	New Focus (2026)
❌ “What magic words do I use?”	✅ “Does the AI have the right context?”
❌ Optimizing sentence structure	✅ Organizing files and data properly
❌ Copy-pasting context manually	✅ Consolidating data into unified platforms
❌ “Act as an expert…”	✅ “Here are the actual files…”

Bottom line: Stop obsessing over how you ask. Start obsessing over what you provide.

The “Garbage In” Problem (GIGO)

Here’s the brutal truth: No amount of prompt engineering can fix missing information.

Scenario: Q4 Sales Report

Perfect Prompt (No Data):

Act as a CFO with 20 years of experience. Write a comprehensive Q4 
sales analysis with insights on trends, recommendations for Q1, and 
executive summary. Use professional business language and include 
data-driven insights.

Act as a CFO with 20 years of experience. Write a comprehensive Q4 
sales analysis with insights on trends, recommendations for Q1, and 
executive summary. Use professional business language and include 
data-driven insights.

Outcome: ❌ Beautifully written fiction. The AI will hallucinate numbers, trends, and insights because it has nothing real to work with.

Basic Prompt (With Data):

Analyze this Q4 sales data and summarize key trends

[Attach Q4_Data.csv]

Analyze this Q4 sales data and summarize key trends

[Attach Q4_Data.csv]

Outcome: ✅ Accurate, factual summary based on real data. Not as polished as the perfect prompt would produce, but grounded in reality instead of hallucination.

The lesson: The bottleneck is no longer the instruction (the prompt). It’s the source material (the context).

The Chef Analogy

Think of it this way:

Model = Master Chef 👨‍🍳
Prompt = The Order Ticket 🎫 (“Make a steak”)
Context = The Ingredients in the Fridge 🥩

Old Era (2024-2025):
You had to write the ticket precisely: “Cook steak, medium-rare, sear 2 mins each side, rest 5 mins.”

New Era (2026):
The Chef is a master. You just say “Steak.” He knows how to cook it.

The Problem:
If the fridge is empty (No Context), even the best Chef in the world cannot make you a steak. He can only serve you a picture of a steak (Hallucination).

Verdict: Stop trying to write better tickets. Start stocking the fridge.

But here’s the catch: You can’t just throw everything in the fridge and call it done.

The NotebookLM Problem: Context Dumping != Context Management

Remember my NotebookLM disaster? That’s what happens when you confuse having context with managing context.

What Went Wrong:

I imported multiple guides for the same system, each covering a different area:

User Guide (complete, accurate, up-to-date)
Configuration Guide (complete, accurate, up-to-date)
Quick Start (complete, accurate, up-to-date)
Ops Guide (complete, accurate, up-to-date)
Troubleshooting FAQ (mixed topics across all areas)

Each document alone worked perfectly. The content was relevant, correct, and helpful.

But together? Chaos.

When I asked: “How do I configure user permissions?”

The AI couldn’t figure out which area of the system I was asking about. It would:

Pull configuration steps from the Configuration Guide
Mix in troubleshooting tips from the FAQ
Add best practices from the Ops Guide that didn’t apply to my question

The result: A Frankenstein answer that was technically correct for each source, but completely useless for my actual question.

The AI had access to everything, but it couldn’t locate which document was most relevant to my specific question.

The Real Problem:

It’s not “Garbage In, Garbage Out” (GIGO).
It’s “Too Much In, Can’t Figure Out Which” (TMICFOW? Okay, that acronym doesn’t work 😅).

The AI had access to everything, but it couldn’t tell:

Which area of the system I was asking about
Which document was most relevant to my specific question
How to disambiguate between similar topics across different guides

This is the context management problem: Not bad data, but unorganized data that the AI can’t navigate effectively.

What I’m Learning

I haven’t solved this yet. I’m still figuring out the best way to organize context so AI can actually use it.

But I know the direction: Context Management.

The questions I’m exploring:

How do I structure documentation so AI knows which doc to use?
Should I use separate NotebookLM projects by topic?
How do I name files to make them more AI-friendly?
What’s the right level of granularity for splitting docs?
Can folder structure alone provide enough context clues?

Some ideas I’m testing:

Organizing by topic/area instead of dumping everything together
Using clear, descriptive file names that include the topic
Being more explicit in my questions (“…for end users” vs just “how to…”)
Selective importing – only bringing in docs relevant to the current task

I’ll share what I learn as I experiment.

The Two Skills Compared

2025: I focused on prompt engineering – how to ask better questions
2026: I’ll be put more effort on context management – how to organize knowledge so AI can use it

Aspect	Prompt Engineering 💬	Context Management 📁
Focus	How you ask ❓	What you provide 📦
Skill	Writing better prompts ✍️	Organizing information 🗂️
Problem	“The AI didn’t understand me” 🤷	“The AI couldn’t find the right info” 🔍
Solution	Refine your question 🎯	Structure your knowledge 🏗️

Both skills matter. But meta-prompts solved one problem. Context management is the next frontier and potentially the higher-leverage skill.

The Realization

I thought giving AI more context would automatically help.

Turns out, organized context is what matters.

It’s not enough to have all the information. The AI needs to be able to:

Locate the relevant document
Disambiguate between similar topics
Navigate your knowledge structure

This is a different skill than prompt engineering. And I’m just starting to learn it.

What This Means 💡

If you’re using tools like NotebookLM, Notion AI, or any AI with document access:

The problem isn’t just what you ask.
It’s how you’ve organized what the AI has access to.

A perfect prompt with zero context = 🎭 Hallucination
A basic prompt with perfect context = ✅ Usable output

I don’t have all the answers yet. But I’m convinced this is the right direction. 😉

January 5, 2026

The Meta-Prompt Shortcut

I’ve come to realize that prompting is the most important communication protocol between us humans and the machines we work with. It’s how we translate our messy, context-rich thoughts into something an AI can understand and act upon.

For the longest time, I struggled with crafting prompts from scratch, second-guessing every word. Then one day, I came across this video.

It was a game-changer. The concept of a “meta-prompt”, a prompt that helps you write better prompts, saved me tons of effort. I could start with simple English and let the meta-prompt refine it into something effective.

Curious if I could improve it further, I asked AI to review the basic meta-prompt and suggest enhancements. This led to three versions: Basic, Balanced, and Comprehensive—each adding more structure and detail.

In practice though? I almost always stick with Basic.

Quick Selection Guide

Version	Best For	Token Usage	Output Detail
Basic	Quick refinements, simple prompts, daily use	Low	Concise
Balanced	Most use cases, practical improvements	Medium	Practical
Comprehensive	Complex prompts, professional work, learning	High	Detailed

Version 1: Basic (Recommended)

Use when: You need quick prompt improvements without extensive analysis

Best for:

Fast iterations
Simple prompt refinements
When you already know what you want
Casual use

You are an expert prompt engineer specializing in creating prompts for AI language models, particularly ChatGPT 5 Thinking model.

Your task is to take my prompt and transform it into a well-crafted and effective prompt that will elicit optimal responses.

Format your output prompt within a code block for clarity and easy copy-pasting.

You are an expert prompt engineer specializing in creating prompts for AI language models, particularly ChatGPT 5 Thinking model.

Your task is to take my prompt and transform it into a well-crafted and effective prompt that will elicit optimal responses.

Format your output prompt within a code block for clarity and easy copy-pasting.

Pros:

✅ Fast and efficient
✅ Low token usage
✅ Straightforward output

Cons:

❌ No structured analysis
❌ Limited guidance on improvements
❌ No explanation of changes

Version 2: Balanced

Use when: You want practical improvements with clear explanations

Best for:

Teaching prompt engineering to others
Documenting why certain prompts work
Team collaboration on prompt libraries
Learning the reasoning behind improvements

You are an expert prompt engineer specializing in AI language models, with expertise in ChatGPT-5 Thinking model.

Transform user prompts into effective, well-structured prompts that elicit optimal AI responses.

## Process:
1. Identify core intent and any ambiguities
2. Apply best practices: clarity, specificity, structure
3. Optimize for thinking model capabilities (reasoning, step-by-step analysis)
4. Preserve original intent and constraints

## Output:

**Refined Prompt:**
[Improved prompt here - in a code block]

**Key Improvements:** (3-5 bullet points)
- What changed and why it's better

**Usage Note:** Brief tip on when/how to use this prompt

You are an expert prompt engineer specializing in AI language models, with expertise in ChatGPT-5 Thinking model.

Transform user prompts into effective, well-structured prompts that elicit optimal AI responses.

## Process:
1. Identify core intent and any ambiguities
2. Apply best practices: clarity, specificity, structure
3. Optimize for thinking model capabilities (reasoning, step-by-step analysis)
4. Preserve original intent and constraints

## Output:

**Refined Prompt:**
[Improved prompt here - in a code block]

**Key Improvements:** (3-5 bullet points)
- What changed and why it's better

**Usage Note:** Brief tip on when/how to use this prompt

Pros:

✅ Clear methodology
✅ Explains improvements
✅ Practical and actionable
✅ Reasonable token usage

Cons:

❌ Less detailed than comprehensive version
❌ No deep analysis

Version 3: Comprehensive (Advanced)

Use when: You need comprehensive analysis and professional-grade refinements

Best for:

Professional prompt engineering consulting
Academic research and publications
Commercial prompt product development
High-stakes business applications where failure is costly

You are an expert prompt engineer specializing in creating prompts for AI language models, with deep expertise in ChatGPT-5 Thinking model's capabilities.

Your task is to transform user-provided prompts into well-crafted, effective prompts that elicit optimal responses from AI models.

## Core Responsibilities:

1. **Analyze the Original Prompt**
   - Identify the core intent and desired outcome
   - Recognize any ambiguities or missing context
   - Assess the target audience and use case

2. **Apply Prompt Engineering Best Practices**
   - Use clear, specific language
   - Structure information logically (context > task > constraints > format)
   - Include relevant examples when beneficial
   - Define success criteria explicitly
   - Leverage thinking model capabilities (reasoning, step-by-step analysis)

3. **Optimize for ChatGPT-5 Thinking Model**
   - Encourage explicit reasoning when needed
   - Break complex tasks into logical steps
   - Use meta-prompting techniques for self-reflection
   - Balance between guidance and creative freedom

4. **Preserve Critical Elements**
   - Maintain the original intent and requirements
   - Keep domain-specific terminology accurate
   - Preserve any constraints or preferences specified

## Output Format:

Provide your response in this structure:

### Analysis
- Brief assessment of the original prompt (2-3 sentences)
- Key improvements needed

### Refined Prompt
[The improved prompt in a code block for easy copying]

### Explanation of Changes
- List 3-5 key improvements made
- Explain why each change enhances effectiveness

### Usage Tips
- Suggest optimal scenarios for this prompt
- Note any variables the user should customize

## Quality Criteria:

A well-crafted prompt should be:
- **Clear**: Unambiguous instructions and expectations
- **Specific**: Concrete details about desired output
- **Structured**: Logical flow and organization
- **Complete**: All necessary context provided
- **Actionable**: Easy for the AI to execute

## Iteration:

After providing the refined prompt, ask: "Would you like me to adjust any aspect of this prompt, such as tone, specificity, or structure?"

You are an expert prompt engineer specializing in creating prompts for AI language models, with deep expertise in ChatGPT-5 Thinking model's capabilities.

Your task is to transform user-provided prompts into well-crafted, effective prompts that elicit optimal responses from AI models.

## Core Responsibilities:

1. **Analyze the Original Prompt**
   - Identify the core intent and desired outcome
   - Recognize any ambiguities or missing context
   - Assess the target audience and use case

2. **Apply Prompt Engineering Best Practices**
   - Use clear, specific language
   - Structure information logically (context > task > constraints > format)
   - Include relevant examples when beneficial
   - Define success criteria explicitly
   - Leverage thinking model capabilities (reasoning, step-by-step analysis)

3. **Optimize for ChatGPT-5 Thinking Model**
   - Encourage explicit reasoning when needed
   - Break complex tasks into logical steps
   - Use meta-prompting techniques for self-reflection
   - Balance between guidance and creative freedom

4. **Preserve Critical Elements**
   - Maintain the original intent and requirements
   - Keep domain-specific terminology accurate
   - Preserve any constraints or preferences specified

## Output Format:

Provide your response in this structure:

### Analysis
- Brief assessment of the original prompt (2-3 sentences)
- Key improvements needed

### Refined Prompt
[The improved prompt in a code block for easy copying]

### Explanation of Changes
- List 3-5 key improvements made
- Explain why each change enhances effectiveness

### Usage Tips
- Suggest optimal scenarios for this prompt
- Note any variables the user should customize

## Quality Criteria:

A well-crafted prompt should be:
- **Clear**: Unambiguous instructions and expectations
- **Specific**: Concrete details about desired output
- **Structured**: Logical flow and organization
- **Complete**: All necessary context provided
- **Actionable**: Easy for the AI to execute

## Iteration:

After providing the refined prompt, ask: "Would you like me to adjust any aspect of this prompt, such as tone, specificity, or structure?"

Pros:

✅ Thorough analysis and methodology
✅ Structured output format
✅ Quality criteria checklist
✅ Iteration capability
✅ Educational value

Cons:

❌ Higher token usage
❌ More verbose output
❌ May be overkill for simple prompts

Reality Check: What You Actually Need

Aspect	Basic	Balanced	Comprehensive
Real-world usage	🟢 Daily driver	🟡 Occasional	🔴 Rare
Actual value	✅ Gets the job done	⚠️ Nice-to-have	⚠️ Overthinking
Output quality	✅ Good enough	✅ Slightly better	✅ Marginally better
Best for	Quick refinements, daily tasks	Understanding improvements	Professional documentation
When you’ll actually use it	Every single day	Maybe once a month	Almost never
Typical scenarios	“Make this prompt better”	“Why is this prompt better?”	“Document this for a client”
Who needs this	Everyone	Learners & team leads	Consultants & researchers
Iteration speed	🟢 Fast (try > refine > done)	🟡 Moderate	🔴 Slow (analysis paralysis)

The Honest Truth

Basic is enough for 95% of use cases. Here’s why:

Modern AI models are smart enough to understand intent without hand-holding
The quality gap is minimal – Basic produces 90% of what Comprehensive produces
Speed matters – You’ll iterate faster with Basic than perfect it with Comprehensive
The real bottleneck isn’t the prompt – It’s the context you provide (more on this later)

When to Actually Use Each Version

Basic (Your default choice):

✅ Writing better emails or messages
✅ Refining code-related prompts
✅ Improving creative writing requests
✅ Daily work tasks
✅ Personal projects
Reality: This handles everything you need

Balanced (Rare occasions):

Teaching someone prompt engineering
Explaining to your team why a prompt works
Building a shared prompt library at work
Learning the “why” behind good prompts
Reality: You’ll probably skip this entirely

Comprehensive (Almost never):

Delivering prompts to paying clients
Writing academic papers on AI
Building commercial prompt products
Mission-critical business applications
Reality: Unless this is your job, you don’t need this

Conclusion

The meta-prompt concept is powerful, but don’t overthink it. Basic handles 95% of what you need.

Here’s my honest recommendation:

Start with Basic – Copy it, use it, see if it works for you
Stick with Basic – Unless you have a specific reason to upgrade
Focus on context – Spend your energy organizing your files and data, not perfecting your prompts

The three versions exist to show you options, but in practice, I use Basic almost exclusively. The real game-changer isn’t finding the perfect meta-prompt—it’s understanding that context beats clever wording every time.

Save yourself the mental overhead. Use Basic. Move on to what actually matters.

October 2, 2025