Phone trees are dead. The IVR you press 1 for sales, 2 for support, 3 to hear these options again was killed by Apple, Google, and everyone else who normalized voice assistants. But oddly, the death of the phone tree didn't bring voice into business software it pushed everyone into chat. Today, almost every 'talk to us' button on every product website is a text chat. Voice as a way to interact with a company has been quietly abandoned.

Voice Is the Next Customer Channel (And Almost Nobody Has Shipped It)
It shouldn't have been. Voice is a better channel than text for a lot of conversations and customers know it. They use voice calls every day with Siri, Alexa, and Google Assistant. They expect it from companies they trust. Most companies haven't shipped it because, until recently, the technology was too clunky to deploy. That's no longer true.
Why did Voice never make it into Customer Chat?
Voice as a customer channel has three historical problems. Speech-to-text was inaccurate. Text-to-speech sounded robotic. And there was no model in the middle that could actually understand what the customer meant. All three problems are now solved well enough to ship to production.
Modern speech recognition runs at near-human accuracy on consumer hardware. Streaming voice synthesis sounds natural in dozens of languages. And the AI agent in the middle, given decent docs and tools, can usually figure out what the customer is asking. The pieces are ready.

User speaks. STT transcribes. Agent reads, plans, acts. TTS speaks the response. Always with a parallel text fallback.
What Voice Mode looks like in Crewmate?
Every public Crewmate chat surface ships with both voice and text. The user toggles between them. When voice is on, the experience is intentionally claude.ai-style: a single ambient 'orb' with audio level visualization, no buttons to fumble with, no half-second silences between turns. Speak naturally, the agent responds in voice. Pause, the conversation pauses. Type something instead, the agent picks up where the voice conversation left off.
The seamlessness between voice and text matters more than the voice quality itself. Customers move between contexts mid-conversation. They start asking a question in voice while walking. They switch to text to paste in an order number. They go back to voice to listen to a long answer while doing something else. If the conversation can survive those transitions without breaking, you've built something useful.
When Voice Wins?
Three contexts where voice substantially outperforms text:
1. Hands-busy interactions: Cooking, driving, holding a baby. Voice doesn't compete for the user's eyes. Any consumer business with mobile-first customers restaurants, delivery, retail has a meaningful chunk of usage in this state.
2. Accessibility-first users: Vision impairments, motor impairments, dyslexia. Text-only chat assumes the customer can read a keyboard and a screen comfortably. A meaningful percentage of customers can't. Voice unlocks those users without forcing them into specialized accessibility software.
3. Long answers: When the agent needs to explain something complex a refund policy, a multi-step process, an account recovery flow listening is often easier than reading. Especially on mobile, where text walls are exhausting.
When Voice doesn't Win?
1. Honesty section: Voice isn't strictly better than text. It's worse for several real situations:
2. Data-dense input: Customers entering an order number, a long URL, or a 32-digit confirmation code. Voice is slower and more error-prone than typing or pasting.
3. Async conversations: Customers who can't or shouldn't speak they're in a meeting, on public transit, in a quiet office. Voice forces them to make noise. Text doesn't.
4. Screenshot-driven conversations: Customers showing you what's broken by sharing an image. Voice can't see the screenshot. Text can be paired with image attachments.
This is why the right answer isn't 'voice replaces text.' It's 'voice is a parallel channel that wins in specific contexts and loses in others, and your customer should be able to switch at any moment.'
The Competitive Opening
Right now, in 2026, almost no consumer-facing business has voice working well in their support channel. The first companies to ship it will get an outsized brand benefit 'this company feels modern' is a powerful, hard-to-measure form of marketing.
The opening will close. Voice will become standard in the next two years the same way live chat became standard in the 2010s. By 2028, customers will be surprised when a company doesn't have a voice. The window to ship it as a differentiator, rather than a baseline, is open right now.
Crewmate ships voice mode out of the box on the public chat surface. If you're already running Crewmate, you have it. If you're not, voice mode is one reason to start looking.
