In partnership with

> SPOTLIGHT

TODAY’s 3 MOST IMPORTANT

Security startup CodeWall deployed an AI agent to probe "Lilli," the internal chatbot used by 70% of McKinsey's 45,000 employees. The agent found 22 API endpoints requiring no authentication. One had a basic flaw that opened the entire database: 46.5 million internal messages, 728,000 client files, and 57,000 user accounts, all in plain text. McKinsey was notified, brought in a third party to verify no unauthorized access had occurred, and patched the vulnerability.

So what: When an AI agent can run its own reconnaissance and exploit vulnerabilities autonomously, the attack surface of enterprise AI is no longer just people. It is the entire API layer. Every team rushing to ship internal AI for business-critical workflows needs to ask: where exactly is security being tested in that deployment lifecycle?

Anthropic launched the Claude Partner Network with $100 million in committed support covering technical onboarding, go-to-market resources, and enterprise integrations. The announcement landed the same week the Ramp AI Index showed Anthropic winning roughly 70% of head-to-head decisions against OpenAI among first-time enterprise buyers.

So what: Anthropic is building switching costs before the enterprise market locks in its choices. Every Claude deployment through this network compounds integrations, workflows, and institutional memory that a stronger model later cannot easily displace on benchmark scores alone. The competition is shifting from model quality to ecosystem depth.

ByteDance is working with Aolani Cloud, a Southeast Asian company, to operate roughly 500 Nvidia Blackwell systems in Malaysia. The hardware is valued at over $2.5 billion. The stated purpose is AI research and development outside China.

So what: US chip export controls are creating a new market: compute arbitrage through Southeast Asia. Malaysia and Singapore are becoming strategic relay points. This is the clearest real-world test yet of whether export controls can hold when supply chains are distributed enough to route around them.

> SIGNAL HEADLINES

Capture the shift

Perplexity opened its entire infrastructure to developers under a single API key: access to multiple frontier models, the same real-time search index that powers Perplexity's consumer product, and embeddings. This is the first time Perplexity has positioned itself as a platform rather than just an end-user product.

The verified AI startup, founded one year ago with roughly 20 employees, received funding from Menlo Ventures to build systems that produce formally verified code. Using the Lean programming language, Axiom proves not just that code outputs are correct, but that each reasoning step cannot create unintended attack surfaces.

Emil Michael stated Claude would "pollute" the DoD supply chain due to what he called a "different policy preference." The comments follow Anthropic's public refusal to support mass surveillance and autonomous weapons. The distance between Anthropic and the rest of the AI industry on DoD contracts is widening, not narrowing.

The vibe-coding platform lets users describe an application in plain language and receive a production-ready product. A 33% single-month revenue increase is rare even among the fastest-growing SaaS companies, suggesting no-code AI development is transitioning from trend to real market.

> PRESENTED BY ATTIO

Attio is the AI CRM for modern teams.

Connect your email and calendar, and Attio instantly builds your CRM. Every contact, every company, every conversation, all organized in one place.

Then Ask Attio anything:

  • Prep for meetings in seconds with full context from across your business

  • Know what’s happening across your entire pipeline instantly

  • Spot deals going sideways before they do

No more digging and no more data entry. Just answers.

> TRY THIS TODAY

Turn Claude Cowork into an actual digital employee with these 10 prompts

Most people use AI to write. A smaller group uses it to execute.

Corey Ganim (@coreyganim) published 10 Claude Cowork prompts, each a complete workflow rather than a single question. Before running any of them, prime Cowork with a Meta-Prompt once:

"You are my executive assistant. You have access to my computer and can take actions on my behalf. Pause before any destructive actions. Save important outputs to Google Docs. My key apps are: [your tools here]."

From there, run workflows on demand: Morning Dashboard (open Gmail, Calendar, and Notion side by side, then summarize what needs attention), Email Batch Processor (categorize all unread emails from the past 24 hours, draft replies, archive FYI items), Meeting Prep (research the person you are meeting via LinkedIn and Gmail history in under 5 minutes), Competitor Spy (pull positioning, pricing, and social signals into one brief), and End-of-Day Shutdown (review missed tasks, prep tomorrow, close tabs).

  • ⏱ Time to implement: ~5 minutes for Meta-Prompt setup, then run each workflow as needed.

  • 🛠 Tools needed: Claude Cowork (Claude in Chrome).

  • 💰 Cost: Claude Pro plan or above.

Techzip note: The leverage is not in any single prompt. It is in the Meta-Prompt that gives Claude context about how you work. The same request produces a different result when the model knows your tools, your role, and your defaults.

> TRY THIS TODAY

Make your agent's SKILL.md files self-improving over time

Agent skills are usually static. The environment around them is not.

Vasilije (@tricalt) from the Cognee team introduced cognee-skills, a framework that closes the loop so SKILL.md files evolve instead of quietly degrading.

The cycle works as follows. A skill runs. The system Observes (logs which skill was selected, whether it succeeded, and what error occurred). Once enough failures accumulate, it Inspects (traces recurring failure patterns across the stored graph). It then Amends (proposes a targeted change to the skill's instructions, reviewable by a human or applied automatically). Finally, it Evaluates (only commits the amendment if outcomes measurably improve, rolls back if not).

To try it: pip install cognee==0.5.4.dev2 and see the repo at github.com/topoteretes/cognee.

  • ⏱ Time to implement: ~30 minutes for basic setup.

  • 🛠 Tools needed: Python, Cognee library.

  • 💰 Cost: Free (open source).

Techzip note: The key idea here is not the tooling. It is the principle: a skill cannot improve if the system has no memory of what happened when it ran. This is why most agent system degradation happens silently.

> WORTH READING

Analysis & Thesis

Cursor published a detailed breakdown of CursorBench, their internal eval suite built from real coding sessions by their own engineering team rather than public repositories. The piece explains why SWE-bench and other public benchmarks can no longer distinguish between frontier models: training data contamination inflates scores, grading penalizes valid alternative solutions, and task scope no longer reflects what developers actually ask agents to do. CursorBench-3 has roughly doubled in task scope since its first version. Results show GPT-5.4 leads when token budget stays under 16K, a finding no public benchmark has captured.

Why it made the cut: this is the clearest published argument for why the benchmarks the industry uses to compare models are measuring the wrong things, and what a better measurement framework actually looks like.

Found this useful?

👉 Forward it to someone trying to keep up with AI.

👉 Read online: techzip.beehiiv.com

Techzip Newsletter
| Zipping what truly matters in the AI era.