Microsoft breaks up with OpenAI's voice models

Welcome back. Anthropic will start training its AI models on your chat transcripts and coding sessions unless you opt out by September 28th—but the toggle is automatically set to "On" and buried below a big black "Accept" button. Nothing says informed consent quite like making users hunt for the off switch.

IN TODAY’S NEWSLETTER

1. Microsoft breaks up with OpenAI’s voice models

2. Customers troll Taco Bell's AI drive-thru

3. US fighter pilots take directions from AI for the first time

VOICE AI

Microsoft breaks up with OpenAI’s voice models

Microsoft and OpenAI released competing speech models Yesterday. Microsoft can now generate a full minute of audio in under a second on a single GPU, while OpenAI's latest voice model can switch languages mid-sentence while mimicking human breathing patterns.

Microsoft's MAI-Voice-1 represents the company's push for independence in AI's most critical interface. The model uses mixture-of-experts architecture trained on 15,000 NVIDIA H100 GPUs — compared to over 100,000 chips for models like xAI's Grok. "We are one of the largest companies in the world," Mustafa Suleyman, CEO of Microsoft AI, told Semafor. "We have to be able to have the in-house expertise to create the strongest models in the world."

OpenAI's gpt-realtime processes audio directly through a single neural network, rather than chaining separate speech-to-text and text-to-speech models together. Traditional voice systems work like a relay race — they transcribe your speech into text, process the text and then convert the response back into audio. Each handoff loses information about tone, emotion and context. OpenAI's model eliminates those handoffs entirely.

Voice AI funding surged eightfold in 2024 to $2.1 billion. The global voice AI market will hit $7.63 billion this year, with projections reaching $139 billion by 2033.

Startups across the voice stack are capitalizing on this shift. ElevenLabs leads voice synthesis with a Mosaic score of 955, while companies like Vapi, Retell, Cresta, Cartesia, Synthflow and dozens more build complete voice agent platforms. Meta acquired PlayAI for a reported $45 million in July to bolster its AI assistant capabilities.

Microsoft's MAI-Voice-1 enables multi-speaker audio generation for interactive storytelling and guided meditations. OpenAI's gpt-realtime includes two new voices — Cedar and Marin — designed with breathing sounds and filler words that make conversations feel more natural. Both models can understand nonverbal cues, such as laughter, and adjust their emotional tone on command.

Microsoft couldn't stay dependent on OpenAI for the technology that will define how billions of users interact with Windows, Office and Azure. Unlike text-based chatbots that resemble sophisticated search engines, voice AI creates the illusion of conversing with another person. That psychological shift changes everything about user adoption.

The winner won't just capture market share — they'll define how humans communicate with machines for the next decade.

TOGETHER WITH IBM

New report reveals insights on AI-driven attacks

Our study found AI-driven attacks accounted for 1 in 6 data breaches.

Attackers can use generative AI to perfect and scale their phishing campaigns and other social engineering attacks. Gen AI can reduce the time needed to craft a convincing phishing email from 16 hours down to only five minutes on average.

AI APPLICATIONS

Customers troll Taco Bell's AI drive-thru

Taco Bell is reconsidering its AI drive-thru rollout after customers frustrated with glitchy technology began trolling the voice assistants with ridiculous orders, including requests for "18,000 cups of water" according to The Wall Street Journal.

The fast-food chain deployed AI voice assistants to more than 500 locations nationwide, but the technology has struggled with accuracy and customer acceptance. Customers have complained about orders being processed incorrectly and feeling uncomfortable interacting with the AI system.

"We're learning a lot, I'm going to be honest with you," Taco Bell Chief Digital and Technology Officer Dane Mathews told the Journal. "Sometimes it lets me down, but sometimes it really surprises me."

The AI system often responds to absurd orders by saying it will connect customers to a human team member. Social media videos document numerous problems customers have encountered:

  • Customers repeatedly ignored when asking for specific items like Mountain Dew

  • Orders processed with incorrect items and inflated prices

  • AI adding strange extras like ice cream with bacon and ketchup

  • System struggling to understand different accents and dialects

@foodies

“Do you have spriiite?” 😭😭 (via @mel ) #tacobell #fyp

Parent company Yum Brands announced a partnership with Nvidia in March 2025, investing $1 billion in "digital and technology" initiatives. However, Mathews acknowledged that during peak hours with long lines, human employees may handle orders better than AI.

The challenges mirror broader industry struggles with AI automation. McDonald's ended its AI drive-thru experiment with IBM in 2024 after two years of testing, while White Castle continues expanding its SoundHound-powered AI to over 100 locations.

Taco Bell isn't abandoning AI entirely, but is evaluating which tasks the technology can effectively handle versus those that require human staff. The company continues exploring other applications for AI beyond drive-thru ordering.

TOGETHER WITH GUIDDE

Create how-to video guides fast and easy with AI

Tired of explaining the same thing over and over again to your colleagues?

It’s time to delegate that work to AI. Guidde is a GPT-powered tool that helps you explain the most complex tasks in seconds with AI-generated documentation.

  • Share or embed your guide anywhere

  • Turn boring documentation into stunning visual guides

  • Save valuable time by creating video documentation 11x faster

Simply click capture on the browser extension and the app will automatically generate step-by-step video guides complete with visuals, voiceover and call to action.

The best part? The extension is 100% free

GOVERNMENT

US fighter pilots take directions from AI for the first time

For the first time, US fighter pilots took directions from an AI system during a test this month, marking a fundamental shift in how air combat could be conducted. Instead of relying on ground support teams to monitor radar and provide flight guidance, pilots consulted Raft AI's "air battle manager" technology to confirm flight paths and receive rapid reports on enemy aircraft.

  • Decisions that once took minutes now happen in seconds, according to Raft AI CEO Shubhi Mishra

  • This joins a broader push toward autonomous warfare, with companies like Anduril and General Atomics already building unmanned fighter drones that fly alongside human pilots

  • And of course, Blue Water Autonomies, which we covered a couple of days ago, that are building unmanned warships

Combat decisions have historically required human judgment precisely because context matters in ways that algorithms struggle to capture. When you compress decision-making from minutes to seconds, you're not just making things faster — you're potentially removing the deliberation that keeps pilots alive and missions successful.

The Pentagon is betting that AI can handle the complexity of modern air warfare better than human ground controllers. That's a significant gamble, especially when the consequences of algorithmic errors involve billion-dollar aircraft and human lives.

LINKS

  • Conductor: Run a bunch of Claude Codes in parallel

  • TaskGPT Mobile: Text your Mac and tell it what to do

  • Marblism: An AI Team that runs your inbox, socials, SEO, lead generation and more

  • Qwen Chat: Can now directly read and process the content of any web page from a link

  • Camila, Buenos Aires: PhD in Computer Science (Deep Learning), published in NeurIPS/ICLR, ex-IBM Research Brazil — $52/h

  • Felipe, São Paulo: MSc in Applied Math, 5 yrs reinforcement learning for robotics, ex-Embraer AI Lab — $48/h

  • Mariana, Mexico City: PhD in Computational Linguistics, 7 yrs NLP (last 3 on transformers), ex-Microsoft Research (remote) — $55/h

(Sponsored)

GAMES

Which image is real?

Login or Subscribe to participate in polls.

POLL RESULTS

Free AI-for-data partnerships are:”

  • Very positive (11%)

  • Somewhat positive (30%)

  • Neutral (25%)

  • Somewhat negative (20%)

  • Very negative (14%)

“License plate and text, graphics on the vehicle are clear and make sense. ”

“Lighting (and shadows) are right on this option. Lighting was too highlighted and wrong on the side of the truck.”

“This one was easier than most. The real one had words that made sense printed on the outside of the truck, the artificial one did not.”

“[The other image], but I could not select it from the options in the emailed newsletter. ” (Oops! Please let us know if that’s the case today as well!)

“Imperfections, AI tends to make things too perfect looking.”

The Deep View is written by Faris Kojok and The Deep View crew. Please reply with any feedback. Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

Take The Deep View with you on the go! We’ve got exclusive, in-depth interviews for you on The Deep View: Conversations podcast every Tuesday morning.

If you want to get in front of an audience of 450,000+ developers, business leaders and tech enthusiasts, get in touch with us here.