Looking through the demo voices at sites like Replica shows that while the robotic twang is still strong in some voices, there are others that are approaching passible. The barrier to indie games development is getting lower – we've seen AI write the code for games and it can now create voices and images – the AI assisted toolchain is only getting stronger, all it needs is some creativity.
Voice cloning and editing
There are techs around that allow you to use your own voice to perform text to speech (say if you're creating a podcast or audio book). I could see this being used to lower the barrier for people to pre-record presentations which could be a cool use case.
As ever it doesn't take long to see that this tech is rife for abuse, it's already been used to steal, as in the case where fraudsters cloned a company director’s voice to persuade an employee to authorize transfers of $35 million.
Testing it out
So obviously I had to have a play with one of these, so I got myself an account at Eleven Labs and tried to fake my own voice, with limited success.
Listen to me below.
It was close enough that I had to try it with other voices.
After asking very politely for the purposes of science, and not because I plan on using this tool to ring up various people with my boss’s voice saying "Yes, Joe's expenses for team building at the pub are legitimate and a necessary part of networking and research", Hannah agreed to let me try it with her voice.
I gave it some of Hannah’s audio from the last town hall recording, then generated speech from the same script I used.
Listen to Hannah below.
I think that the Hannah clone is further from accurate and has retained some robot twang but maybe, given time, that's something I could clear up in the editing of what I uploaded. I noticed while experimenting that Eleven Labs will insert umms and erms which, like real speech, starts to interrupt your listening once you notice them.