I’m looking for ways to offer bonus content to the AI Humanist’s monthly supporters, and my first project is to produce audio versions of my posts.

Sometimes I’m able to just read the post into my iPhone, but the audio quality varies (from okay to terrible) and I often do my writing in irregular locations. Loud irregular locations.

Fraudsters are already using AI voice cloning technology to trick unsuspecting grandmothers into wiring money to kinfolk who find themselves in a fix. If AI voice clones are that effective, can they be used intentionally and for good purpose?

This past week I gave it a go. There are plenty of AI voice apps out there. I chose to test Speechify and Eleven Labs because they seem to be the companies that came up most in searches and conversations with others dabbling in this corner of the AI world.

Privacy concerns

The answer to your first question is yes: When you upload your voice to an AI voice cloning app, the app company obtains a digital copy of your actual voice. They will tell you they’d never sell your voice for nefarious purposes, but we’re all adults here. Companies get hacked. Sometimes they go bankrupt and see their data sold like so much office furniture. (Greetings, 23andMe customers!)

This is a line you will have to decide to cross, or not. As for me, that ship sailed long ago. I’ve done so many interviews over the years that anyone who wants a snippet of my voice can find it in about ten seconds via Google.

Quality concerns, and ick factor

I don’t want to feed a janky robot version of my voice to the Humanist’s highest-value readers. The whole point of this endeavor is to figure out how to remain human while engaging with AI. So I entered skeptical.

I remained so—until I heard the third version of my own cloned voice.

Last week’s post, “AI and the war of analogy” was used as the text-to-voice copy.

Read on to find out how it went.

Bruce’s actual voice

Here’s what I really sound like. I taped this using my iPhone Voice Memo app.

Bruce’s voice overview: Fine.

Based on this recording I think I could get away with taping voice versions with my iPhone. I’d like to get a warmer, richer version by using an actual podcasting microphone, which is on my shopping list.

Until then, an iPhone will work in a pinch.

Speechify

Speechify is available here.

Speechify overview: Not great.

Using Speechify felt like walking onto a car lot seeking a mid-priced sedan and dealing with a salesman who wanted to sell me an overpriced truck. I came in looking to test their voice cloning capabilities. They wanted to sell me on using Gwyneth Paltrow’s voice. At every step.

Speechify is set up for people who want a professional voice—say, Gwyneth Paltrow or Gwyneth Paltrow—to voice a book from Speechify’s library. Voice cloning feels like a project the company’s engineers dig but marketing and sales hate.

Their current voice cloning technology is clumsy and robotic. Using their ten-second snippet option—where I input a tiny bit of my voice—was a disaster. I taped a longer audio file using my iPhone VoiceMemos app, then uploaded that file to Speechify. That produced the audio above.

As a clone of me, it’s not horrible. Still, you can clearly hear the limitations: A robotic voice. No nuance in the words or sentences. Little warmth. Speechify didn’t do well with punctuation. That “or worse” is all wrong. Confronted with the conundrum of a parenthetical phrase, the AI clone chose to simply skip past it.

The sound here is doubly poor because Speechify would not allow me to download an audio file. I paid $100 for a one-year Premium subscription but Speechify won’t let me download until my one-week trial period expires. (Update: Subscription cancelled.)

Eleven Labs v.2

Eleven Labs is available here.

Eleven Labs v.2 overview: Better.

Eleven Labs was much easier to use. I felt like they were excited to have me there. I didn’t have to use a machete to cut through the products I didn’t want. Their marquee voice is Michael Caine and he wasn’t in my face all the time.

I joined at the Starter price of $5 per month, but based on my usage I may bump to $22 per month. We’ll see.

There’s an immediate leap in quality here. The addition of little breath intakes now and then does a surprising amount of work in making this feel more human. The speed is off—it’s too fast. And the line readings are still too flat. There’s no shaping of words. All pauses are equal.

Good, but not great.

Eleven Labs v.3

A few weeks ago Eleven Labs released a public version of its updated model, v.3.

It is astonishingly good.

Eleven Labs v.3: By far the best.

I’m honestly impressed with this.

Listen to the full warmth and brightness of the voice. Hear the shaping of appropriate words, like grasping and understand. That slight pause after “doomsday pessimists” captures the nuance—it’s a pause, not a full stop. And the inflection given to “or worse” conveys an understanding of the phrase’s doomsday implications.

Eleven v.3 had no trouble with punctuation. It handled the parenthetical phrase with panache.

As an editor and producer I’m close to feeling good about using Eleven Labs v.3 to record a text-to-audio version of AI Humanist posts for paying subscribers.

Let me know what you think

Help me out here. Does the idea of producing an audio version of a post using the Eleven Labs v.3 system creep you out, or not? Would you enjoy an audio version taped live (actual me) or a well-done cloned version, or neither or both?

Drop me a note using the button below.

Join us! Become an AI Humanist supporter today.

MEET THE HUMANIST

Bruce Barcott, founding editor of The AI Humanist, is a writer known for his award-winning work on environmental issues and drug policy for The New York Times Magazine, National Geographic, Outside, Rolling Stone, and other publications.

A former Guggenheim Fellow in nonfiction, his books include The Measure of a Mountain, The Last Flight of the Scarlet Macaw, and Weed the People.

Bruce currently serves as Editorial Lead for the Transparency Coalition, a nonprofit group that advocates for safe and sensible AI policy. Opinions expressed in The AI Humanist are those of the author alone and do not reflect the position of the Transparency Coalition.

Portrait created with the use of Sora, OpenAI’s imaging tool.

Keep Reading

No posts found