In partnership with

> SPOTLIGHT

WHAT MATTERS TODAY

#1. Anthropic built a new model and decided not to release it to the public

Claude Mythos Preview found thousands of zero-day vulnerabilities across every major operating system and web browser, including a 17-year-old remote code execution flaw in FreeBSD that had survived millions of prior security scans. On Anthropic's internal benchmarks, Mythos scores 77.8% on SWE-Bench Pro compared to Opus 4.6's 53.4%. Anthropic will not make the model publicly available. Instead, the company is deploying it through Project Glasswing: a defensive cybersecurity coalition with AWS, Apple, Google, Microsoft, Nvidia, Cisco, JPMorganChase, and 40+ other organizations, backed by $100M in model usage credits.

#2. OpenAI published a 13-page blueprint for a world with superintelligence. Sam Altman says the transition has already started.

OpenAI's new policy document proposes a robot labor tax, a national wealth fund paying dividends to every American (modeled on Alaska's Permanent Fund), a four-day workweek, and containment protocols for autonomous AI that cannot be shut down. Altman told Axios this is the "new social contract" for a world adjusting to superintelligence, a process he says has already begun. Axios called it "the most detailed blueprint any tech titan has ever published for how to tax, regulate, and redistribute wealth from the technology he's building".

> SIGNAL HEADLINES

CAPTURE THE SHIFT

GLM-5.1 puts open source at the top of SWE-Bench Pro for the first time

Chinese AI lab Z.ai released GLM-5.1, which scores 58.4% on SWE-Bench Pro, ahead of both GPT-5.4 and Claude Opus 4.6. This is the first time an open-source model has held the top spot on a major coding benchmark. The model is built to sustain agentic sessions of up to eight hours without human guidance.

The benchmarks used to measure AI capability are running out

METR's Time Horizon evaluation suite is nearly saturated. Frontier models can complete almost every task in the set. METR estimates that by mid-2027, no benchmark from 2026 or earlier will be able to confirm whether a frontier system carries dangerous capabilities. The tools built to measure AI are falling behind the thing they are measuring.

Anthropic secures 3.5 gigawatts of compute with Google and Broadcom

Anthropic signed a multi-gigawatt TPU deal with Google and Broadcom for capacity coming online in 2027, almost entirely US-based. The agreement reflects the company's pace of growth: revenue run-rate has tripled to $30B since January, and the number of enterprise customers paying $1M or more per year doubled to 1,000. Compute at this scale is the prerequisite for deploying Mythos-class models at wider reach.

Google AI Overviews gives wrong answers 10% of the time

A new independent analysis found Google's AI-generated search summaries return incorrect answers roughly 10% of the time. As AI becomes the primary answer layer for hundreds of millions of daily queries, a 90% accuracy rate falls short of the threshold needed to replace traditional search for decisions that carry real consequences

Meta is preparing to release the first models from Alexandr Wang's Superintelligence team

The release will follow a hybrid strategy: some models open source, the largest ones closed. The flagship "Avocado" model was delayed earlier this year after benchmark performance fell short of rivals across the board. Meta acknowledged internally that the new models will not be competitive across all areas and is repositioning around consumer use cases.

> ONE PRACTICAL TODAY

Use Perplexity to stress-test any business idea in six minutes

Most founders keep a backlog of ideas that never get properly evaluated. The bottleneck is not the ideas. It is the cost of evaluating each one before knowing whether it is worth pursuing. Perplexity Deep Research handles the filtering step in five to six minutes, on the free plan.

Here is how to do it:

Step 1. Open Perplexity.ai and switch to Deep Research mode. This is available on the free plan at five queries per day.

Step 2. Paste this prompt with your idea filled in: "Research this business idea: [your idea]. Give me: (1) market size estimate with sources, (2) top three existing competitors and their traction, (3) biggest risks and failure modes in year one, (4) any signal this has already been tried and failed. Output as a six-slide deck structure." Then step away. The run takes five to six minutes.

Step 3. Review the output. If Perplexity surfaces competitors that already died in this space, that is a signal worth digging into before going further. If the market size is smaller than expected, adjust the idea now rather than after two months of work.

Step 4. Save the prompt as a reusable template in a dedicated Perplexity Space. Pick one idea from your backlog each week and run it. At that cadence, you can filter close to twenty ideas in a month with under two hours of total effort.

Techzip note: The goal is not to get a final answer from AI. The goal is to eliminate non-viable ideas before they consume time. What survives this filter is worth taking further.

> PRESENTED BY THE DEEP VIEW

Become An AI Expert In Just 5 Minutes

If you’re a decision maker at your company, you need to be on the bleeding edge of, well, everything. But before you go signing up for seminars, conferences, lunch ‘n learns, and all that jazz, just know there’s a far better (and simpler) way: Subscribing to The Deep View.

This daily newsletter condenses everything you need to know about the latest and greatest AI developments into a 5-minute read. Squeeze it into your morning coffee break and before you know it, you’ll be an expert too.

Subscribe right here. It’s totally free, wildly informative, and trusted by 600,000+ readers at Google, Meta, Microsoft, and beyond.

> WORTH READING

ANALYSIS & THESIS

My Picture of the Present in AI

Ryan Greenblatt, chief scientist at AI safety organization Redwood Research, lays out his current read on the state of AI: probability estimates for full R&D automation, signals on misalignment, near-term cyber and bioweapon risk, and economic disruption timelines. Some claims are openly speculative. Most are grounded in actual evaluation data from inside the safety community.

Why it made the cut: one of the few public assessments from someone with access to real evals and no incentive to spin the timeline in either direction. Read it to calibrate after a week that included Mythos.

Anthropic's OpenClaw Ban Is a Gift to OpenAI

Every's analysis of what happened after Anthropic blocked Claude subscriptions from third-party agent harnesses: Opus 4.6 usage on OpenRouter dropped sharply week over week, while GPT-5.4 surged. The piece traces the structural reason: OpenAI builds its own data centers, giving it a lower cost basis for flat-rate subscriptions. Anthropic buys compute from third parties and may never close that gap.

Why it made the cut: the OpenRouter usage chart makes an abstract cost-structure argument concrete. This is what compute constraints look like at the product layer, in real time.

Software Finally Gets to Work: The Opportunity in Vertical AI

The argument: vertical SaaS has always competed for IT budgets rather than the larger labor budgets underneath it, which is why category ceilings have always been lower than expected. Vertical AI shifts the ROI calculation from software cost to labor cost. That opens a category of business the SaaS model structurally could not reach.

Why it made the cut: a clean framework for thinking about where the next generation of B2B AI companies will be built, and why comparing them to SaaS multiples misses the point.

> Techzip Newsletter

Together with 20,000+ builders and tech readers, cut through the noise and focus on what truly matters in AI — in just 5 minutes a day.

Anthropic built a model too dangerous to release