Is A.I. Bad for Spoken Word Audio Entertainment?

Let’s Talk About the Elephant in the Microphone.

It seems like you can’t open a laptop these days without bumping into A.I. It’s writing our emails, generating our spreadsheets, and—most recently talking to us while we do the dishes.

I used to think the “robot uprising” would involve explosions and red lasers. I didn’t think it would start with a very calm, natural-sounding voice reading me a news article about interest rates while I sat in traffic.

But here we are.

The spoken word entertainment industry podcasts, audiobooks, radio dramas is currently standing at a crossroads. On one path lies pure, unadulterated human creativity. On the other lies efficiency, scale, and algorithms. The question looming over every producer and listener is simple: Is A.I. actually bad for spoken word audio?

Like most things the answer isn’t black and white. It’s a spectrum of gray noise. Let’s break down the upside, the ethical nightmare, and where I think the puck is going.

The Undeniable Win: Text-to-Speech Liberation

Let’s start with a story that has only upside. No caveats. No creepy feelings.

I have a mild confession: I hate reading the news on my phone. I stare at a screen for ten hours a day for work. By 6:00 PM, my retinas feel like sandpaper. But I still need to know what is happening in the world.

Enter A.I. text-to-speech (TTS).

Modern TTS engines are not your grandfather’s Speak & Spell. They have inflection. They have tone. They know when a sentence is a question. They pause for dramatic effect. I currently use a browser extension that turns any written article into an audio file, and the voice is virtually indistinguishable from a calm, professional radio host.

For the “listening over reading” crowd—dyslexic readers, commuters, the visually impaired, or just the tired this is a miracle. We are talking about the democratization of written content. Every Substack newsletter, every Wikipedia page, every long-form essay is now a potential podcast.

If A.I. did *nothing* else for this industry except perfecting text-to-speech, it would still be a massive win for humanity. We have effectively killed the excuse of “I don’t have time to read that.”

So, that’s the good news. But I didn’t start this blog post to talk about reading news articles. I want to talk about creating art. And that is where things get weird.

The Editor That Erases Reality

A few weeks ago, I fell down a YouTube rabbit hole. I was watching a demonstration of “next-generation podcast editing software.”

At first, it looked like standard magic. You upload an audio file of two people talking, and the software transcribes it into text. You can then edit the text delete a paragraph, cut a bad joke, reorder a story—and the software automatically edits the audio to match. No more zooming in on waveforms. No more razor cuts. Just delete the sentence “Umm, so yeah, let me think” and it disappears from the tape.

I thought, Okay, this is convenient. This is the future.

Then the demonstrator dropped the bomb.

He showed how you could add a completely new sentence to the conversation. A sentence that was never said in the original recording. You just type it into the word document, and the software generates the audio out of thin air.

How? It needs about ten seconds of the speaker’s voice to clone it.

Ten seconds.

Suddenly, the podcast is no longer a document of what happened. It is a script that sounds like a documentary. You could record a thirty-minute fight with your co-host, delete the fight, and type up a pretend reconciliation. You could interview a guest, and later “add” an answer you wish they had given.

At this point, my brain short-circuited. I started wondering: If I can just type the script and have an A.I. voice read it perfectly, why would I ever sit in front of a microphone again?

Is typing a document the next evolution of podcasting? Or is it the death of authenticity?

The Audiobook Schism: Purists vs. Pragmatists

This debate is already happening in the audiobook space, and it is vicious.

I recently stumbled upon a Reddit thread that stopped me in my tracks. The original poster was asking: Is “Project Hail Mary” narrated by a human or A.I.?

The fact that this question even exists is breathtaking. It means the A.I. voices have crossed the uncanny valley. They are good enough that we have to debate whether a human being was paid to sit in a recording booth for eight hours or a server farm did it in five minutes.

Half the commenters didn’t care. Their logic was brutal and effective: The objective of listening to an audiobook is to be educated or entertained. If the A.I. achieves that at a lower price point, why should I care if there are lungs behind the voice?

But the other half the Purists were having none of it.

They argued that listening to a human is a connection. You hear the breath between paragraphs. You hear the subtle change in tone when a character is sad. You hear the mistakes that get edited out. There is a soul in the vibration of a human larynx that a diffusion model cannot replicate.

The Purists were not listening to anything A.I. generated. Period.

The Pricing Reality Check

Here is where the market gets interesting.

Right now, an Audible credit costs about $15, and that $15 is split between the author, the publisher, and the narrator. A human narrator for a ten hour book costs hundreds, if not thousands, of dollars upfront.

An A.I. narration costs pennies.

If a publisher releases a human-narrated book for $15 and an A.I.-narrated book for $4.99, the consumer has a choice. Many will choose the $4.99 option. They are listening to learn about stoicism or a biography of Napoleon. They don’t need to “connect” with the narrator. They need the information.

But the Purists will pay the premium. They will seek out human narrators like they seek out vinyl records. They will pay for the performance.

I think there is a world for podcasts and audiobooks where both coexist. We already have it in music: you can listen to a live symphony orchestra (expensive, human) or a MIDI synthesizer (cheap, digital). Neither has killed the other. They serve different masters.

The Real Danger Isn’t Quality. It’s Consent.

Before we all throw our hands up and embrace our new robot overlords, we need to address the elephant in the recording booth.

That software I mentioned? The one that clones a voice from ten seconds of audio?

That is a catastrophe waiting to happen.

We are about to enter a hellscape of “deepfake podcasts.” Imagine a scammer cloning Joe Rogan’s voice to sell you dick pills. Imagine a political operative “leaking” a recording of a candidate saying something they never said. Imagine a creator recording a Patreon exclusive episode, only to have a hacker rip their voice and publish a “Part 2” that the creator never agreed to.

The technology is advancing faster than the law. Right now, in most jurisdictions, your voice is not treated with the same legal protection as your face or your fingerprint. That needs to change.

Fighting the technology is futile. It’s here. It’s getting better by the hour. But we absolutely need regulation. We need digital watermarks. We need “provenance” standards that tell listeners, This clip was generated by A.I. without them having to guess.

So, Is A.I. Bad for Spoken Word?

Here is my final verdict.

A.I. is bad for spoken word if we value authenticity over convenience. If you listen to a podcast to feel like you are sitting in a room with friends, an A.I. podcast is a lie. You cannot have a para-social relationship with a diffusion model. The magic of Serial or Heavyweight was the real human stakes. A.I. cannot replicate that because A.I. does not have stakes.

However, A.I. is incredible for spoken word if we value accessibility and utility. If you want to listen to the terms of service of your new credit card, or a Wikipedia article on the Byzantine Empire, or a bedtime story for your kid where the princess has your spouse’s voice? A.I. is a miracle.

The fear comes when we confuse the two.

We need a label. Just like organic food has a label, “Human Narrated” needs to become a badge of honour. Consumers who care will seek it out. Consumers who don’t care will save five bucks. That is fine.

But for the creators? Here is my advice:

Don’t use A.I. to replace your soul. Use it to remove your friction.

Use A.I. to edit out your ums and ahs. Use it to level your audio. Use it to translate your show into Spanish. Use it to write your show notes. Don’t use it to fake a conversation with a guest you never had. Don’t use it to simulate an emotional breakdown.

The listeners aren’t stupid. They might not be able to tell the difference intellectually in a blind test, but they can tell emotionally. A human podcast feels different. It has scars. It has messiness. That messiness is the point.

A.I. is here. Fighting it is futile. We don’t need to smash the machines. We just need to understand them, use them like the tools they are, and pay a premium for the things that make us human.

And if you see an audiobook for $4.99 with a synthetic voice? That’s fine. Enjoy it. But please, keep supporting the humans who bleed into their microphones. We need them more than ever.