Developer’s AI Comprehension Plugin and OpenAI’s Massive Burn Rate

Compact Conversations for 2026-06-20: 6 AI stories, ai news worth knowing in just 5 minutes.

[Audio embed placeholder]

The Lead: I kept shipping code with Claude Code that I couldn’t explain, so I built a plugin that quizzes me on it before I’m allowed to move on

A developer created the No‑Numb plugin for Claude Code, which quizzes users on newly written code before allowing them to continue, forcing comprehension of AI‑generated changes.

Why it matters: Ensures developers understand AI‑generated code, improving maintainability, debugging, and accountability in enterprise AI workflows.

Source: github.com

The Feed

Docs: OpenAI burned through $3.7B in Q1, on revenue of $5.7B, and ended the quarter with $73B+ in cash and marketable securities vs. $40B at the end of December (Erin Woo/The Information)

OpenAI spent $3.7 billion in Q1 2026 while earning $5.7 billion in revenue, ending the quarter with over $73 billion in cash and marketable securities, up from $40 billion.

Why it matters: Highlights growing financial pressure on AI firms and the need for cost‑efficient infrastructure planning.

Source: Techmeme

‘We created a monster’: companies rein in AI usage as costs strain budgets

Amazon, Walmart and Uber are among early adopters that have introduced caps or discouraged wasteful AI activity, citing runaway costs.

Why it matters: Shows enterprises are imposing usage limits to control AI spend, influencing budgeting and governance practices.

Source: Artificial intelligence

Testing Mythos and Fable, Moving Beyond SWE-bench, Nvidia’s Open Contender

Discussion of Anthropic’s Claude Fable 5 restrictions, evaluation challenges, and new benchmarks like DeepSWE, ProgramBench, and ITBench‑AA, plus Nvidia’s Nemotron 3 Ultra release.

Why it matters: Impacts how AI capabilities are measured and adopted, informing benchmark selection and model governance.

Source: The Batch Newsletter - The Batch | DeepLearning.AI

The US FERC approves new orders to fast-track data center power requests, aiming to handle them in 90 days, while bringing new requirements for AI hyperscalers (Bloomberg)

U.S. regulators aim to process data‑center power requests in 90 days to alleviate bottlenecks for AI infrastructure, with new requirements for hyperscalers.

Why it matters: Affects scaling of AI compute resources and regulatory compliance for large‑scale deployments.

Source: Techmeme

New benchmark exposes how badly AI struggles with real knowledge work

Even the best AI models solve only about 3 percent of realistic knowledge‑work tasks, revealing gaps between benchmark performance and practical utility.

Why it matters: Underscores the need for realistic evaluation and cautious expectations in enterprise AI adoption.

Source: The Decoder

One Thing to Try

After an AI agent finishes a code change, pause and ask it to explain what it did and why it chose that approach, then verify the answer before proceeding.

Sources

Transcript

Host A: Welcome to Compact Conversations, the show that compresses the day’s AI news into 5 minutes.

Host A: [curious] Today’s lead is about a developer who realized they were shipping code with Claude Code that they couldn’t actually explain. The developer, who goes by Ciucky on GitHub, built a plugin called No-Numb that forces them to understand what the AI just wrote.

It’s a Claude Code plugin—it integrates directly into Claude’s code editor—and it fires after Claude finishes writing code. A stop hook triggers and makes Claude quiz you with multiple choice questions about what it just wrote. The session blocks from continuing until you pass the quiz. Get one wrong and it shows the answer and explains why, then you retake it.

Host B: [with a small lift] The plugin has two modes: standard, which asks conceptual questions like why this approach or what breaks if you change something, and deep mode, where you actually have to go read the code to answer. It only fires on turns that actually edited code and skips trivial stuff. The developer says it’s intentionally a hook, not just a skill, because a skill can be ignored and a hook can’t. That’s the whole point.

Host B: One number to know today: three point seven billion dollars. That’s how much OpenAI burned through in the first quarter of 2026, according to documents reported by The Information. The company brought in five point seven billion in revenue over the same period, but the costs of running frontier AI models continue to outpace even that massive growth.

Host A: [conversational] The Financial Times reports that companies are starting to rein in their AI usage as costs strain budgets. Amazon, Walmart, and Uber are among early adopters that have introduced caps or are discouraging wasteful activity. One executive quoted in the piece said ‘we created a monster,’ referring to how quickly AI usage and costs spiraled beyond initial projections.

Host B: These companies were among the first to deploy AI at scale, and now they’re finding that without careful management, the costs balloon unexpectedly. The Financial Times says they’re implementing usage limits, monitoring tools, and clearer guidelines about what constitutes productive versus wasteful AI activity.

Host A: Bloomberg reports that the U.S. Federal Energy Regulatory Commission has approved new orders to fast-track data center power requests. The aim is to handle these requests within 90 days, a dramatic acceleration from a process that can currently take years.

Host B: The move is designed to remove bottlenecks that risked slowing the AI infrastructure buildout. The orders also bring new requirements for AI hyperscalers, according to Bloomberg.

Host A: [thoughtful] The Decoder reports on a new benchmark that exposes how badly AI struggles with real knowledge work. Even the best AI models fail at realistic knowledge tasks, fully solving just three percent of them. The benchmark moves beyond simple question-answering to test actual work completion.

Host B: DeepLearning.AI’s newsletter covers testing challenges with Anthropic’s Claude Fable 5 model. Independent evaluators reported they couldn’t fully evaluate the model because it refused some test prompts or routed them to less capable versions, making direct measurement impossible.

Host A: And Nvidia has released Nemotron 3 Ultra, a 550 billion parameter open-weights model built for long-running agentic tasks. It’s the highest-scoring U.S. open-weights model on the Artificial Analysis Intelligence Index, though it trails behind some Chinese models. The weights and training data are freely available, which is notable given how few frontier models are fully open.

Host B: [lighter] One thing to try: if you’re using Claude Code or any AI agent to write code, steal the No-Numb idea. After the agent finishes a chunk of work, pause and ask it to explain what it just did and why it chose that approach over alternatives. Don’t just read the diff and move on.

Host A: You don’t need a plugin to do this. Just ask the question in the same conversation. The point is the same: if you can’t explain it, you don’t own it. And if you don’t own it, you can’t maintain it or debug it when it breaks.

Host A: That’s Compact Conversations for Saturday. More AI news tomorrow. Until then, happy prompting.