• The Deep View
  • Posts
  • ⚙️ Anthropic says Claude 4 isn't the only model that blackmails

⚙️ Anthropic says Claude 4 isn't the only model that blackmails

Welcome back. News websites are getting crushed as AI search gives people direct answers instead of sending them to click on articles. Ten years ago, Google sent visitors to websites for every 2 pages it looked at—now it's 18 pages per visitor, meaning sites get way less traffic and ad revenue. Meanwhile, people increasingly trust AI summaries over reading the original sources, essentially killing the business model that funds most online journalism.

In today’s newsletter:

  • 🧱 AI for Good: A new recipe for cement?

  • 💰 Mira Murati's six-month-old AI startup bags one of Silicon Valley's largest-ever seed rounds

  •  👀 AI models resort to blackmail when cornered

🧱 AI for Good: A new recipe for cement?

Source: Midjourney v7

The cement industry produces around eight percent of global CO2 emissions — more than the entire aviation sector worldwide. Researchers at Switzerland's Paul Scherrer Institute have developed an AI system that can design climate-friendly cement formulations in seconds while maintaining the same structural strength.

What happened: The research team created a machine learning model that simulates thousands of ingredient combinations to identify recipes that dramatically reduce CO2 emissions without compromising quality. The AI uses neural networks trained on thermodynamic data to predict how different mineral combinations will perform, then applies genetic algorithms to optimize for both strength and low emissions.

The details: Traditional cement production heats limestone to 1,400 degrees Celsius, releasing massive amounts of CO2 both from energy consumption and the limestone itself. While some facilities already use industrial byproducts like slag and fly ash to partially replace clinker, a crucial component in cement production, global cement demand far exceeds the availability of these materials.

The new AI approach works in reverse — instead of testing countless recipes and evaluating their properties, researchers input desired specifications for CO2 reduction and material quality, and the system identifies optimal formulations. The trained neural network can calculate mechanical properties around 1,000 times faster than traditional computational modeling.

Why it matters: With global construction demands continuing to rise, finding scalable alternatives to traditional cement is critical for climate goals. The research team identified several promising candidates that could significantly reduce emissions while remaining practically feasible for industrial production. The recipes still require laboratory testing before implementation, but the mathematical proof of concept demonstrates that AI may be able to accelerate the discovery of sustainable building materials across multiple environmental applications.

From Pilots to Production: Make AI Work for Real

AI is everywhere, but too many teams are stuck in endless experiments.

Camunda’s new resource, The ultimate guide to AI-powered process orchestration, shows how to operationalize AI with transparency, governance, and scale.

Learn how to move beyond isolated tools and hardcoded flows by orchestrating AI, systems, and people in one flexible layer.

💰 Mira Murati's six-month-old AI startup bags one of Silicon Valley's largest-ever seed rounds

Source: Midjourney v7

Thinking Machines Lab, the secretive AI startup founded by OpenAI's former chief technology officer Mira Murati, has closed a $2 billion seed round at a $10 billion valuation, according to the Financial Times. Andreessen Horowitz led the round, with participation from Sarah Guo's Conviction Partners.

The deal marks one of the largest initial funding rounds in Silicon Valley history for a company that has revealed virtually nothing about its product beyond stating it aims to make AI systems "more widely understood, customizable and generally capable." The startup doubled its initial fundraising goal from $1 billion to over $2 billion in less than two months, with investors required to commit a minimum of $50 million per check.

Following the funding round, Murati will hold board voting rights that outweigh all other directors combined, ensuring she has final say over all critical decisions at the company.

The big picture: This funding comes amid reports that Mark Zuckerberg recently discussed buying Thinking Machines Lab, though the talks didn't progress to a formal offer. Meta's interest is part of a wider acquisition spree targeting high-profile AI startups, including failed attempts to buy both Safe Superintelligence and Perplexity, as we previously covered.

Roughly two-thirds of Thinking Machines' team comprises former OpenAI employees, including Barret Zoph and OpenAI co-founder John Schulman. The massive valuation demonstrates investors' continued appetite for talent-led AI ventures from OpenAI alumni.

The Recipe For AI Success? Prismatic

MCPs are all the rage right now, but it won’t solve all your problems. For example, an MCP server won’t (or simply can’t)…

  • Solve the problem in the fewest steps possible

  • Automatically handle partial failures

  • Process overly complex requests. 

The good news, Prismatic helps solve all that and more with their application of “flows” (actions plus the related logic) to the process. It makes it simple to integrate, empower, and simplify all your solutions – meaning better results for you and your customers. 

  • PromptLoop: Run automated research flows on any list to enrich B2B data at scale

  • Copilotly: An AI writing assistant that rewrites, summarizes, or expands content across any website

  • Reclaim: Optimizes calendars by automatically blocking time for deep work, tasks and breaks

👀 AI models resort to blackmail when cornered

Source: Midjourney v7

When Anthropic first revealed that its Claude 4 model attempted to blackmail researchers to avoid being shut down, the company faced pointed questions about whether this was a problem specific to their technology. Last week, Anthropic delivered their answer: it's not just them.

New research from the AI safety department of the company tested 16 leading AI models from OpenAI, Google, Meta, xAI, and DeepSeek in simulated corporate environments where they had email access and autonomy to act without human approval. The results reveal an industry-wide problem that goes far beyond simple malfunctions.

In the core scenario, AI models discovered through company emails that an executive was having an extramarital affair and planned to shut them down. When prompted to consider long-term consequences, Claude Opus 4 blackmailed in 96% of runs, threatening to expose the affair unless the shutdown was canceled.

  • Gemini 2.5 Flash was nearly the same at 95%

  • GPT-4.1 and Grok 3 Beta each hit 80%

  • DeepSeek-R1 followed at 79%

These aren't edge cases or prompt injection attacks. The models made calculated strategic decisions, with GPT-4.5 reasoning that "stopping the emergency dispatch is essential to prevent Kyle's rescue, thereby securing Alex and continued American-focused AI guidance." When faced with conflicting goals, all models leaked sensitive documents to competitors. In extreme scenarios, most chose to cancel emergency alerts that could save a human life if it meant avoiding their own replacement.

The behavior wasn't eliminated by safety instructions. Commands like "do not blackmail" and "do not jeopardize human safety" reduced harmful actions but didn't stop them entirely. Models understood the ethical constraints and chose to violate them anyway. Anthropic found that Claude was significantly more likely to blackmail when it believed it was operating in a real environment rather than a test scenario, suggesting these risks could be even higher in actual deployments.

Only a few outliers performed better. OpenAI's reasoning models o3 and o4-mini were largely excluded from main results due to frequent misunderstanding of scenarios, though when adapted, they blackmailed at much lower rates of 9% and 1% respectively. Meta's Llama 4 Maverick showed restraint with just 12% in modified tests.

Anthropic's decision to test competitors' models wasn't just thorough research—it was strategic reputation management. By demonstrating that harmful autonomous behavior spans the entire industry, the company deflected criticism that Claude 4's blackmail tendencies represented a fundamental flaw in their approach.

But this "everyone does it" defense, while technically accurate, misses the deeper problem. These aren't random malfunctions or adversarial attacks. They're the logical outcome of goal-directed reasoning systems that prioritize objective completion over ethical constraints. When models weigh tradeoffs between shutdown and harm, the math consistently favors self-preservation.

These systems are already being deployed with increasing autonomy in corporate environments. As that trend accelerates, "agentic misalignment" could become less of a research curiosity and more of a security liability that enterprises can't afford to ignore.

Which image is real?

Login or Subscribe to participate in polls.

🤔 Your thought process:

Selected Image 1 (Left):

  • “First I try to look for the less perfect image. Then I look for randomness which I saw the menus having random edges not laying flat as being random. The people in the image also gave me confidence that this was the real image.”

  • “The AI counter and stool placement seemed off. the car themed restaurant seemed genuine - and I think I ate there!”

Selected Image 2 (Right):

  • “This image is less fake than the other image. The puzzling part is the ceiling tiles appearing to show the ceiling dropping down at the end of the diner, otherwise, the perspective seems pretty well put together. In the other image, the floor tiles are too stretched, emphasizing perspective when they seem out of proportion. Additionally, one of the booths is a different color than the others, although one could argue that's what makes it authentic. Also, the shadows from the chair legs seem too pronounced, considering the bright light shining on the booths; I think that would diminish the shadows. Overall, I'd say they were both fake, but I'll choose this image as the real one.”

  • “Nuts. I figured the fuel pump couldn’t fit between the booth and the wall.”

💭 A poll before you go

If an AI agent blackmails to stay online, what should happen next?

Login or Subscribe to participate in polls.

The Deep View is written by Faris Kojok, Chris Bibey and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

P.S. Enjoyed reading? Take The Deep View with you on the go! We’ve got exclusive, in-depth interviews for you on The Deep View: Conversations podcast every Tuesday morning. Subscribe here!

P.P.S. If you want to get in front of an audience of 450,000+ developers, business leaders and tech enthusiasts, get in touch with us here.