Skip to main content


New beta release of Sonata-for-NVDA, formaly known as Piper-for-NVDA
NVDA 2024.1 compatibility
Support for fast variants for Piper voices. These fast variants improves responsiveness significantly because they use streaming synthesis
Improvements to responsiveness and speed across the board
Release page:
https://github.com/mush42/sonata-nvda/releases/tag/v3.0-beta.1
Direct download link:
https://github.com/mush42/sonata-nvda/releases/download/v3.0-beta.1/sonata_neural_voices-3.0-beta.nvda-addon

reshared this

in reply to Musharraf :verified:

Changed the name to Sonata since we plan to support additional TTS models besides Piper in the future.
in reply to Musharraf :verified:

if there's a great way for me to train my existing voices into this new model, it would be great to know - will check out this new repo! Excellent work at the update, keep up the amazing talent you have.
in reply to Tamas G

@Tamasg
If your existing voices are trained with Piper, then they'll work with this version.
If they fail to work for any reason, you can copy the config from any working voice to your voice, and edit relevant values.
in reply to Musharraf :verified:

yeah, and I love the speed improvement it gives to some of those! Especially Mac Alex :) I did notice though that the faster models do a way better job at not cutting the starting fragments of words (older voices can still slightly do this.) I see that encode and decode are now two models so feels like splitting older voices into the new format is less straight forward without a full retrain, but I might be entirely wrong.
in reply to Tamas G

@Tamasg Yeah! I love the new HFC Female voice, and the fast variant works amazingly well! Oh hey lookie, she reacts to freaking exclamation marks!
in reply to Tamas G

@Tamasg
If you have the original checkpoint, you can convert it to the new format.
Take a look at this script, which I used to export Piper's checkpoints:
https://github.com/mush42/piper-rt-maker/blob/main/tasks.py

I need to update the docs, and add a section on training voices.

Tamas G reshared this.

in reply to Musharraf :verified:

oooh this is so superb. So I could still continue to train with the existing notebook and eventually just convert the checkpoint CKPT file into the new RT voice, that's going to be amazing. At least it unblocks my work though with sourcing higher quality data and transcripts for the voices I've developed, so just already knowing that is huge help. I could point the variables to URLs in my drive folder for the checkpoint, then convert that way with the script you referenced.
in reply to Tamas G

@Tamasg
Here're the steps to convert the checkpoint to fast format:
# Clone piper fork containing export code
git clone https://github.com/mush42/piper
cd ./piper
# Checkout streaming branch
git checkout streaming
cd ./piper/src/python
pip3 install -r requirements.txt
# Upgrade torch
pip3 install --upgrade torch pytorch-lightning onnx
source ./build_monotonic_align.sh
# Export. Edit paths
python3 -m piper_train.export_onnx_streaming --debug [checkpoint path] [export directory]
in reply to Musharraf :verified:

thanks a lot for such precise steps on this! I noticed the tasks.py script was pulling the voice and doing these in one-go, which makes more sense to how the Fast voices got created for the existing ones, though I bet that in itself is quite a project to author on top of the new streaming model work.
in reply to Musharraf :verified:

ah interesting, the source command with that .sh file returns an error at this line: `this_dir="$( cd "$( dirname "$0" )" && pwd )"` (it says: `"dirname: invalid option -- 'b' Try 'dirname --help' for more information."` so something isn't passing properly to $0 there, interesting.
in reply to Tamas G

@Tamasg
I didn't edit that script, it came from piper repo.
Anyways it does not effect the installation. I encountered it myself when exporting voices.
in reply to Musharraf :verified:

ok. I'm not sure if this is good or bad, and I hope it doesn't insult your script code too much by gutting it, but I made a notebook file of your steps: https://eurpod.com/Export_piper_voice_RT.ipynb - hopefully this looks correct, it restructures it a bit to take a path to a file stored in drive after mounting. I'll try it in a little bit and see if it's a spectacular fail or not :D (minor correction to made for running the .sh file since we can't use virtual environments.)
This entry was edited (1 week ago)
in reply to Musharraf :verified:

Well, it works :) I just built a test version of Mac Alex using the new streaming voice model. Huge thanks again! I've updated my notebook file to a working copy, it ensures module paths are importable correctly and creates a folder in the drive called model_final into which the two files get placed successfully for me.
in reply to Tamas G

@Tamasg I agree about the alex voice. I hope we can get votrax and keynote fast varients too. Happy training Tamas!
in reply to Tamas G

@Tamasg this version is actually very usable. Will probably be my primary synth going forward.
in reply to Tom Grant

@TomGrant91 @Tamasg Now if only this would work on the BT Speak. I'd be happy.
in reply to aaron

I did train it up to 6000 epochs over night. Even though like 3K is recommended for fine-tuned models by Piper, but I gave it a lot more. I do feel like it's better, but could be placibo. If you have the energy you can re-install and it should overwrite: https://eurpod.com/en_us-MacAlex+RT_medium.tar.gz (one odd thing this and the other voice do is say numbers that are short but with the hundredth digit mark oddly.)
This entry was edited (1 week ago)
in reply to Tamas G

@Tamasg @TomGrant91 I trid this one, it still doesn't have quite the quality of the original voice unfortunately, and cuts off the start of words still. That unfortunately makes it a no use for me.
in reply to aaron

@fireborn @TomGrant91 this latest one does very slightly better at it, still not super perfect but yes oddly training it longer and letting it learn more did help it in that department at least.
in reply to Tamas G

@Tamasg @TomGrant91 I downloaded it only severalminutes ago, unless you uploaded just now.
in reply to aaron

@fireborn @TomGrant91 ahahaha I was updating it just now, so maybe? :D just like 2 or 3 mins ago it finished uploading I believe with a newer replacement but same size. Just like, 9 more hours of training :D
in reply to Tamas G

@Tamasg @TomGrant91 It doesn't seem any different comparing side by side. Gs still sound like Ds Not your fault, just the way of these things. I think the training works better with more natural sounding voices to beginwith.
in reply to Tamas G

@Tamasg
Just downloaded this myself. Have you updated Keynote in recent times to take advantage of this new AddOn? That's one I'm very keen on trying now that my machine can handle them again, after the addon rewrite. Thanks.
@fireborn @TomGrant91 @mush42
in reply to Andre Louis

@FreakyFwoof @fireborn @TomGrant91 the old voices in the newer add-on will still give a similar speed improvement. I guess the one thing that disappointed me slightly - even if you import the older Keynote or Votrax voice files into that Sonata as voices, they will still have quite good speeds almost to the same degree for me. I wonder if that's just me though.
in reply to Tamas G

@Tamasg @fireborn @TomGrant91 Aah OK, would you mind posting the link please? I didn't save it at the time because I just couldn't run it.
in reply to Tamas G

@Tamasg @FreakyFwoof @fireborn @TomGrant91 I need to figure out why the sonata neural voices won't work at all for me anymore. when I try to switch to it, it says it can't load the sonata neural voices. I was running the first beta, and that was when it worked for a couple days then stopped working all of a sudden. I have completely uninstalled the voices, and then the addon, and installed the beta 2 of the addon, and some voices, but it still won't load it at all. I have even reset the computer thinking something might have just been messed up, but not even that has helped.
in reply to Musharraf :verified:

@Tamasg @FreakyFwoof @fireborn @TomGrant91 you mean the NVDA logs? I can do that. I would just need to reinstall the addon and a few voices again. give me just a few and I can do that.
in reply to Musharraf :verified:

@Tamasg @FreakyFwoof @fireborn @TomGrant91 I might still have the logs from when they broke, and stopped working, but wouldn't you have to come threw the file to find the issue?
in reply to Musharraf :verified:

@Tamasg @FreakyFwoof @fireborn @TomGrant91 ok, I have already forgotten where the logs are. this is a brain fart, but where are the logs you need?
in reply to Musharraf :verified:

@Tamasg @FreakyFwoof @fireborn @TomGrant91 I just told windows to do a search of my c: drive for the file nvda.log and it found nothing. now I know that file has got to be there, but windows is saying nope.
in reply to JamminJerry

@JamminJerry @Tamasg @FreakyFwoof @fireborn @TomGrant91
An easier way is to press insert+F1, then select all and copy.
You can paste it in a plain text file, save and send it.
in reply to Musharraf :verified:

Hijacking the thread somewhat just to tell you that your addon works in windows 11 ARM. It didn't used to, when it was Piper. This makes me happy.
in reply to Musharraf :verified:

Important notice!!!
After installing this version, you will lose all of your installed voices. Please use the voice manager to re-install the voices again.
in reply to Musharraf :verified:

I some how got the audio themes add on to run on nvda 2024.1 just find but it's quite sluggish when using w SAPI also I would like to a sound which could be played when pressing enter
in reply to Musharraf :verified:

Pronunciations issues not withstanding, the speed increase here is legit impressive! And I know enough to know that training data is what's needed here. I hope, when someone comes up with the optimal dataset requirements, that we can find someone willing to submit a voice.
This entry was edited (1 week ago)
in reply to Nick Giannak III

@nick
A dataset designed specifically for screen reader usage, goes a long way toward creating a good quality voice.
If guidelines are the issue, we can come up with a set of guidelines based on Microsoft/Google guidelines which are openly available.
in reply to Musharraf :verified:

I think that would be wise. I don't know if I have the time or voice discipline to make a voice myself, but I do think it might be sensible, at least, to have this available.
This entry was edited (1 week ago)
in reply to Musharraf :verified:

Wow, that's a huge speed increase. You also may wish to note that it only works if you use WASAPI, otherwise it throws:
AttributeError: 'WinmmWavePlayer' object has no attribute 'setVolume'
in reply to Pratik Patel

@ppatel @twynn
Which voices?
Custom voices or the ones downloaded from the voice manager?
I'd appreciate it if you can provide NVDA logs.
in reply to Musharraf :verified:

@twynn I'm going to uninstall the voices I managed to download, uninstall the add-on, and reinstall. I'll get you the debug logs. BTW, the voices were those I downloaded through the voice manager.
in reply to Musharraf :verified:

After updating to the latest beta, the issue i reported still exists. I removed all voices, uninstalled the add-on, reinstalled it, and added voices again. Here's a link to the log.

https://www.dropbox.com/scl/fi/914e332qia2bv2akdbys9/Piper.log?rlkey=ervz1dazlxjjak33hnvh11nwi&dl=0

in reply to Pratik Patel

@ppatel
It seams like the server is not running.
Are you running NVDA on a 32-bit/ARM-64 machine. Sonata only works on 64-bit versions of Windows.
Otherwise, check if the server generated any logs in the following file path:
[NVDA config directory]\sonata\logs\sonata-grpc.log
If not, try running the following binary from a cmd window and report the output:
[NVDA config directory]\addons\sonata_neural_voices\synthDrivers\sonata_neural_voices\bin\sonata-grpc.exe
in reply to Musharraf :verified:

Thanks for trying to troubleshoot this. I'm running this on a Windows 64 bit on an Intel machine. Not Arm. The log file is not generated. Trying to run sonata-grpc.exe from the bin directory results in the following message:

The term 'sonata-grpc.exe' is not recognized as the name of a cmdlet, function, script file, or operable program.

in reply to Musharraf :verified:

I ran it as "./sonata-grpc.exe" and it gave me

"Starting sonata-grpc serverr at 127.0.0.1:49314"

in reply to Pratik Patel

@ppatel
Maybe send me NVDA log to diagnose why the TTS server isn't running.

Musharraf :verified: reshared this.

in reply to Musharraf :verified:

And my apologies. I forgot to mention that I'm using NVDA alpha builds. It's more than possible that this has something to do with it.
in reply to Musharraf :verified:

Dear @Musharraf :verified: Can you please give me a hint how do I build the file sonata-grpc.exe is it a result of building https://github.com/mush42/sonata ? We are working on a slovak human sounding voice with friends and I am tweaking corresponding espeak-data along the way, so until I manage to get these pushed and merged to espeak-ng I imagine my best bet is rebuilding the addon with all the resources locally.
Thanks for all the fantastic work you are putting into this.
in reply to Peter Vágner

@pvagner
Here's how to build the sonata-grpc binary:
git clone https://github.com/mush42/sonata
cd ./sonata/sonata-grpc
# With Rust installed
cargo build --release
in reply to Peter Vágner

@pvagner
If you just want to set the eSpeak-ng data directory, you don't need to re-build the binary.
Just set the following environment variable before launching sonata-grpc:
SONATA_ESPEAKNG_DATA_DIRECTORY=[your custom espeak-data directory parent]