AI Should Review Your Thinking, Not Replace It

February 2026 | Aniket Sinha | 9 min read

A mildly embarrassing confession: I asked GPT how to stop AI from killing my critical thinking.

It didn’t say “use me less.” It said: use me differently. Stay the primary thinker. Let the model argue with you, not think for you.

That advice is inspired from a talk by Advait Sarkar (Microsoft Research) that’s been stuck in my head. His point isn’t that AI is “bad”. It’s that we’re drifting into an outsourced reasoning routine—where a knowledge worker’s job quietly turns into approving machine-generated content all day.

Efficient, sure.

But if you’re a Staff Software Engineer, “efficiency” isn’t the high bar. Judgment is.

So here’s the version that actually works in real engineering: design docs, reviews, incidents, and the weekly “wait, why are we doing this?” conversations.

The Trap: It Feels Like a Normal Tuesday

You open your inbox: summarize.
A doc you don’t want to read: summarize.
A Slack reply you don’t want to write: draft.
A design you’re not ready to commit to: generate options.
A Pull Request you are tired of reviewing: “anything risky here?”
A weird production blip → “Investigate this”.

None of these are crazy. That’s why it sticks. Your day starts to look productive while your contact with the work gets indirect.

Sarkar has a line for it: we become middle managers for our own thoughts. Not doing the thinking, but managing the process of thinking.

If you’ve ever stared at an AI-generated paragraph and thought, “I mean... I guess?” — you know exactly what he is talking about.

What You Lose (Quietly)

The loss isn’t obvious. It shows up as trade-offs once AI becomes your default first step:

Fewer truly different ideas, especially across a team—everyone converges on the same “reasonable” options.
Less critique, because the text arrives polished and confident.
Less memory, because you didn’t write with the material (code/doc).
Less reflection, because the loop becomes “approve/reject” instead of “build a model in your head”.

That last one matters more than it sounds. Staff-level work is basically: build models, spot risks early, and make good trade-offs under uncertainty. If your day trains you out of that muscle, it’s a slow leak.

The Fix: Stop Treating AI Like an Author

Best mental shift: AI should review your thinking, not replace it.

Not “write the design doc.” Instead: “Here’s my design. Tell me where it breaks.”

Not “diagnose the incident.” Instead: “Here are the symptoms and signals. Give me hypotheses and what would confirm each.”

That’s the difference between outsourcing and leverage.

My Default Loop: Think → Commit → Consult

This is the pattern I keep coming back to. It’s boring, which is part of why it works.

1) Think (10–20 minutes, solo)

Before I ask AI anything substantive, I force myself to write a scrappy first pass:

what I think we should do
why (2–3 reasons)
what I’m unsure about
what would change my mind

It doesn’t have to be pretty. It just has to be mine.

2) Commit (one small artifact)

I turn that into something reviewable: a short design note, ADR, PR comment, incident update—whatever fits.

For engineering decisions, I try to include these blocks (you can create your design doc template around this):

goal / non-goals
constraints (latency, cost, security, team bandwidth, timelines)
options + trade-offs
assumptions / invariants (“what must stay true”)
risks + mitigations
rollout + rollback
how we’ll know it’s working (signals)

This is the part people skip because it feels slow. It also prevents the “we shipped it, but nobody can explain it” problem later.

3) Consult (AI + humans)

Now I bring in AI—specifically to attack the artifact.

And yes, I still ask humans (peers). Especially for security/data integrity/migrations that can ruin a quarter. AI doesn’t get to be the only reviewer in the room.

“Should I Create AI Assistants/Agents for This?”

Yes—but don’t build a giant autopilot system that does things end-to-end. That’s how you get output without ownership.

Instead, create a small “review board”. Think of them like recurring roles you’d want in a design review, but available anytime.

The set that keeps paying off:

Red Team: edge cases, hidden assumptions, failure modes
SRE brain: operability, observability, rollout/rollback, SLO risk
Security brain: threat model, auth, data exposure, abuse cases
Perf/Cost brain: hot paths, scaling cliffs, complexity creep
Test brain: invariants, minimal test plan, regression traps

Key detail: they should push back. If they only agree with you, you basically built a yes-man.

Where This Helps in Day-to-Day Staff Work

Design docs (where mistakes get expensive)

Before I ask for feedback, I write two lists:

“What must always be true?”
“What would be catastrophic if wrong?”

Then I ask AI questions like:

“List the assumptions here. Rank them by fragility.”
“How could this cause silent data loss?”
“What breaks in 6 months when traffic doubles and the team changes?”

This isn’t about paranoia. It’s about catching the boring disasters early.

PR reviews (where humans miss the same things repeatedly)

I review first. Then I ask AI to hunt for a few specific categories humans are bad at when tired:

concurrency/order
retries/partial failures
weird inputs/boundaries
perf cliffs
migration hazards

Sometimes it finds nothing. That’s fine. The habit is what matters.

Incidents (where speed matters, but certainty doesn’t exist)

I use AI for hypothesis generation, not conclusions:

“Given these symptoms and metrics, propose 5 hypotheses and what signal would confirm each.”
“What mitigation reduces blast radius in 10 minutes?”

If an AI can’t point to signals (logs/metrics/traces), it’s storytelling. In an incident, storytelling is how you lose hours.

Strategy (where the clean narrative is often wrong)

I’ll ask it to argue against me:

“Debate this plan like a skeptical principal engineer.”
“What’s the simplest alternative that hits 80% of the outcome?”

You don’t need the model to be right. You need it to force you to defend (or refine) the reasoning.

A Small Python “Review Board” Runner (Optional, but Useful)

This keeps the workflow repeatable. No autonomy, no magic. Just structured dissent on demand.

from dataclasses import dataclass
from typing import List, Dict

@dataclass
class Persona:
    name: str
    system: str

PERSONAS = [
    Persona("RedTeam", "You are a skeptical principal engineer. Find flaws, edge cases, hidden assumptions."),
    Persona("SRE", "You are an SRE. Focus on operability: SLOs, observability, rollout/rollback, failure modes."),
    Persona("Security", "You are a security engineer. Threat model, data exposure, abuse cases, authz/authn."),
    Persona("PerfCost", "You focus on performance and cost. Identify hot paths, scaling risks, caching, complexity."),
    Persona("TestStrategy", "You focus on tests and invariants. Propose a minimal test plan that catches regressions."),
]

def review_pack(context: str, draft: str) -> List[Dict[str, str]]:
    """
    Replace call_llm(...) with your LLM API client.
    Keep this review-only: AI critiques your artifact; it does not write it from scratch.
    """
    reviews = []
    for p in PERSONAS:
        prompt = f"""
Context:
{context}

My draft:
{draft}

Instructions:
1) Ask up to 5 clarifying questions first (if needed).
2) Then critique under:
   Correctness, Edge Cases, Operability, Security, Performance/Cost, Simplicity, Testing.
3) Provide 3 concrete recommended changes to the draft.
"""
        # resp = call_llm(system=p.system, user=prompt)
        resp = "<LLM_RESPONSE>"
        reviews.append({"persona": p.name, "response": resp})
    return reviews

The Guardrails I Try to Follow (Because I Don’t Always)

These sound obvious. They’re also the first things to slip when you’re busy.

Draft my view first (even if it’s rough).
AI critiques before it proposes.
Any factual claim needs a verification hook (metric/log/test/source).
Rollback isn’t optional for real releases.
High-stakes calls still get real humans involved.

Closing Thought

AI can absolutely make you faster. That’s real value.

But if it trains you to skip framing, avoid skepticism, and accept polished text you didn’t truly earn, you’ll feel the cost later—usually at the worst time.

The question I keep coming back to is simple:

Am I using this to get an answer… or to become harder to fool?