Image
Views from the Lab banner

How could text to speech help me at work?

04/07/2023

The recent growth in AI capabilities has had a knock-on effect throughout the tech world. One intriguing area is in text to speech. While we’re not in the realm of robots indistinguishable from humans, they’re getting surprisingly close. 

Don't believe me? Listen for yourself below.

Game Development 

Looking through the demo voices at sites like Replica shows that while the robotic twang is still strong in some voices, there are others that are approaching passible. The barrier to indie games development is getting lower – we've seen AI write the code for games and it can now create voices and images – the AI assisted toolchain is only getting stronger, all it needs is some creativity. 

Voice cloning and editing 

There are techs around that allow you to use your own voice to perform text to speech (say if you're creating a podcast or audio book). I could see this being used to lower the barrier for people to pre-record presentations which could be a cool use case.  

As ever it doesn't take long to see that this tech is rife for abuse, it's already been used to steal, as in the case where fraudsters cloned a company director’s voice to persuade an employee to authorize transfers of $35 million.  

Testing it out

So obviously I had to have a play with one of these, so I got myself an account at Eleven Labs and tried to fake my own voice, with limited success. 

Listen to me below.

It was close enough that I had to try it with other voices.  

After asking very politely for the purposes of science, and not because I plan on using this tool to ring up various people with my boss’s voice saying "Yes, Joe's expenses for team building at the pub are legitimate and a necessary part of networking and research", Hannah agreed to let me try it with her voice.  

I gave it some of Hannah’s audio from the last town hall recording, then generated speech from the same script I used. 

Listen to Hannah below.

I think that the Hannah clone is further from accurate and has retained some robot twang but maybe, given time, that's something I could clear up in the editing of what I uploaded. I noticed while experimenting that Eleven Labs will insert umms and erms which, like real speech, starts to interrupt your listening once you notice them. 

Use in business 

While this was fun to play with, there are some more serious use cases that could change the way we work. I came across Descript, where you can edit your audio by editing the transcript text. I really like this idea as it simplifies audio editing, I can see this saving lots of time in podcast creation. Even for those who don’t podcast, the idea of using these tools to create presentation recordings (especially for those who don’t like presenting) is very appealing.  

Another use case where this technology could be employed is in supporting accessibility in the workplace. For example, it could be used to read aloud text in documents and on the web, for anyone who needs to or prefers to consume information in this way. Tools that make the job of maintaining accessible features easier will mean that more people will be able to benefit from them. 

We’re looking at ways we can help our users in our products, by guiding them through processes and making our webpages more accessible. We’d love to hear how you might want to see this technology being used – get in touch with your thoughts! 

 

By Joe Norley, Research Engineer