How to Use TTS Voice Cloning ai locally - Text to Speech ai Voice cloning locally on your PC.
This tutorial will show you how to use RVC (Real Time Voice Cloning) voice models to create text-to-speech (TTS) in anyone's voice. The tool and the plugins that come with it will allow you to create and clone voices using Text to speech locally on your own computer, so you won't have to pay huge amounts of money to services like Eleven Labs. You'll need a fair bit of space set aside for everything we're going to use so make sure your hard drive isn't full.
How to Download and Setup Applio on Windows.
- To begin download Applio. It's about 5.2GB
- Once you have the Zip file on your computer extract it into a location you want to store it.
- Inside this folder double click the "run-applio.bat" and a command window will appear. This may or may not prompt you to open your browser.
If it does accept and open Applio in your browser. If it doesn't, you can copy and paste the address into a browser window to get access.
How to add Voice Models to Applio to Text to Speech. Joe Biden, Morgan Freeman, Taylor Swift, etc.
Now that you have Applio set up you can add custom voice models to it. This will allow you to create text-to-speech content in anyone's voice, so long as there is a sample.
- To do this change to the Download tab in Applio.
- Next, get a voice model source, Voice Models is the best option. Just copy and paste the link for a voice that you'd like to use.
- Now paste the link into the Model Link box, then click Download Model.
- Once that is done Change to the TTS tab and click Refresh.
- Now Select the voice file you just added from the Voice Model tab and the Index file will automatically update.
- Below this choose an option under the TTS Voices tab. Make sure you select something that matches the person's voice and language. For example, I'm using Morgan Freeman as my base so I need to en-US-AndrewNeural This is basically an English-speaking person with an American accent. This allows the cloner to have a similar base voice to work with.
- Finally, copy and paste your text file or upload a .txt file into the Text to Synthesize box.
- Then click Convert and Applio will generate an audio file based on your text. If you're happy with it just click the download icon to save the file.
As with everything AI at the moment, you will probably have to experiment with the options in the TTS Voices tab, some seem to work better with certain voices than others. Just don't mix and match languages and genders. You'll get some weird results if you do.
How to Clone Voices for Text to Speech Locally on your PC.
Here is where things get a little tricky and a lot more time-consuming. You'll also need a fairly high-end GPU. For example, cloning a voice locally on my laptop with an RTX 3050 took about 8 hours so if you have anything less, you might be waiting for quite a long time. The Voice output file was also 60GB in total so make sure you have at least 80gb of free storage space if you are planning on using Applio.
- To begin, change to the Train tab and make sure that you have a sample audio file ready to upload.
- Next, make sure Dataset Creator box is checked and give the Dataset a name.
- Once you have done that, upload your sample audio to clone.
- Now click Preprocess Dataset and wait for it to complete.
- When it finishes scroll down and click Extract Features and wait for it to complete. This takes a little longer than the previous step.
- Finally, click Generate Index, then Start Training.
The rest of the process is a waiting game. As I mentioned above it will take a very, very long time for Python and the AI tools to analyse and clone your voice sample. There are also quite a lot of other checkboxes and variables throughout this process that you can experiment with, however for your first run through, it's best to just use the default settings and see what sort of results you get from the local voice cloning process. If you encounter any weird errors with ports a quick PC restart will usually solve those problems.
When you're voice is ready you'll find it listed as a Voice Model along with the Index file so you'll be able to use them with any TTS transcript. The good news is that the actual Text to speech voice component doesn't take all that long to process. Though it's not quite as good as the results coming out of Eleven Labs. At least not yet but for an entirely local process, it's pretty damn impressive.