fedi.ml | Display

Musharraf

7 months ago • •

Musharraf
7 months ago • •

New beta release of Sonata-for-NVDA, formaly known as Piper-for-NVDA
NVDA 2024.1 compatibility
Support for fast variants for Piper voices. These fast variants improves responsiveness significantly because they use streaming synthesis
Improvements to responsiveness and speed across the board
Release page:
github.com/mush42/sonata-nvda/…
Direct download link:
github.com/mush42/sonata-nvda/…

Release v3.0-beta.1 · mush42/sonata-nvda

What's new NVDA 2024.1 compatibility Changed the name to Sonata since we plan to support additional TTS models besides Piper in the future. Support for fast variants for Piper voices. These fast v...

^GitHub

Peter Vágner likes this.

reshared this

in reply to Musharraf

Musharraf

in reply to Musharraf • 7 months ago • •

Changed the name to Sonata since we plan to support additional TTS models besides Piper in the future.

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

if there's a great way for me to train my existing voices into this new model, it would be great to know - will check out this new repo! Excellent work at the update, keep up the amazing talent you have.

in reply to Tamas G

Musharraf

in reply to Tamas G • 7 months ago • •

@Tamasg
If your existing voices are trained with Piper, then they'll work with this version.
If they fail to work for any reason, you can copy the config from any working voice to your voice, and edit relevant values.

@Tamas G

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

yeah, and I love the speed improvement it gives to some of those! Especially Mac Alex :) I did notice though that the faster models do a way better job at not cutting the starting fragments of words (older voices can still slightly do this.) I see that encode and decode are now two models so feels like splitting older voices into the new format is less straight forward without a full retrain, but I might be entirely wrong.

in reply to Tamas G

Devin Prater :blind:

in reply to Tamas G • 7 months ago • •

@Tamasg Yeah! I love the new HFC Female voice, and the fast variant works amazingly well! Oh hey lookie, she reacts to freaking exclamation marks!

@Tamas G

in reply to Tamas G

Musharraf

in reply to Tamas G • 7 months ago • •

@Tamasg
If you have the original checkpoint, you can convert it to the new format.
Take a look at this script, which I used to export Piper's checkpoints:
github.com/mush42/piper-rt-mak…

I need to update the docs, and add a section on training voices.

piper-rt-maker/tasks.py at main · mush42/piper-rt-maker

Export and package Piper checkpoints as RT voices in ONNX format - mush42/piper-rt-maker

^GitHub

@Tamas G

Tamas G reshared this.

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

oooh this is so superb. So I could still continue to train with the existing notebook and eventually just convert the checkpoint CKPT file into the new RT voice, that's going to be amazing. At least it unblocks my work though with sourcing higher quality data and transcripts for the voices I've developed, so just already knowing that is huge help. I could point the variables to URLs in my drive folder for the checkpoint, then convert that way with the script you referenced.

in reply to Tamas G

Musharraf

in reply to Tamas G • 7 months ago • •

@Tamasg
Here're the steps to convert the checkpoint to fast format:
# Clone piper fork containing export code
git clone github.com/mush42/piper
cd ./piper
# Checkout streaming branch
git checkout streaming
cd ./piper/src/python
pip3 install -r requirements.txt
# Upgrade torch
pip3 install --upgrade torch pytorch-lightning onnx
source ./build_monotonic_align.sh
# Export. Edit paths
python3 -m piper_train.export_onnx_streaming --debug [checkpoint path] [export directory]

GitHub - mush42/piper: A fast, local neural text to speech system

A fast, local neural text to speech system. Contribute to mush42/piper development by creating an account on GitHub.

^GitHub

@Tamas G

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

thanks a lot for such precise steps on this! I noticed the tasks.py script was pulling the voice and doing these in one-go, which makes more sense to how the Fast voices got created for the existing ones, though I bet that in itself is quite a project to author on top of the new streaming model work.

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

ah interesting, the source command with that .sh file returns an error at this line: `this_dir="$( cd "$( dirname "$0" )" && pwd )"` (it says: `"dirname: invalid option -- 'b' Try 'dirname --help' for more information."` so something isn't passing properly to $0 there, interesting.

in reply to Tamas G

Musharraf

in reply to Tamas G • 7 months ago • •

@Tamasg
I didn't edit that script, it came from piper repo.
Anyways it does not effect the installation. I encountered it myself when exporting voices.

@Tamas G

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

ok. I'm not sure if this is good or bad, and I hope it doesn't insult your script code too much by gutting it, but I made a notebook file of your steps: eurpod.com/Export_piper_voice_… - hopefully this looks correct, it restructures it a bit to take a path to a file stored in drive after mounting. I'll try it in a little bit and see if it's a spectacular fail or not :D (minor correction to made for running the .sh file since we can't use virtual environments.)

This entry was edited (7 months ago)

in reply to Musharraf

Tamas G

in reply to Musharraf • 7 months ago • •

Well, it works :) I just built a test version of Mac Alex using the new streaming voice model. Huge thanks again! I've updated my notebook file to a working copy, it ensures module paths are importable correctly and creates a folder in the drive called model_final into which the two files get placed successfully for me.

in reply to Tamas G

Tom Grant

in reply to Tamas G • 7 months ago • •

@Tamasg I agree about the alex voice. I hope we can get votrax and keynote fast varients too. Happy training Tamas!

@Tamas G

in reply to Musharraf

Musharraf

in reply to Musharraf • 7 months ago • •

Important notice!!!
After installing this version, you will lose all of your installed voices. Please use the voice manager to re-install the voices again.

in reply to Musharraf

Aryan

in reply to Musharraf • 7 months ago • •

I some how got the audio themes add on to run on nvda 2024.1 just find but it's quite sluggish when using w SAPI also I would like to a sound which could be played when pressing enter

in reply to Musharraf

Nick Giannak III

in reply to Musharraf • 7 months ago • •

Pronunciations issues not withstanding, the speed increase here is legit impressive! And I know enough to know that training data is what's needed here. I hope, when someone comes up with the optimal dataset requirements, that we can find someone willing to submit a voice.

This entry was edited (7 months ago)

in reply to Nick Giannak III

Musharraf

in reply to Nick Giannak III • 7 months ago • •

@nick
A dataset designed specifically for screen reader usage, goes a long way toward creating a good quality voice.
If guidelines are the issue, we can come up with a set of guidelines based on Microsoft/Google guidelines which are openly available.

@Nick Giannak III

in reply to Musharraf

Nick Giannak III

in reply to Musharraf • 7 months ago • •

I think that would be wise. I don't know if I have the time or voice discipline to make a voice myself, but I do think it might be sensible, at least, to have this available.

This entry was edited (7 months ago)

in reply to Musharraf

Timothy Wynn

in reply to Musharraf • 7 months ago • •

Wow, that's a huge speed increase. You also may wish to note that it only works if you use WASAPI, otherwise it throws:
AttributeError: 'WinmmWavePlayer' object has no attribute 'setVolume'

in reply to Timothy Wynn

Pratik Patel

in reply to Timothy Wynn • 7 months ago • •

@twynn Voices also fail to load in many cases.

@Timothy Wynn

in reply to Pratik Patel

Musharraf

in reply to Pratik Patel • 7 months ago • •

@ppatel @twynn
Which voices?
Custom voices or the ones downloaded from the voice manager?
I'd appreciate it if you can provide NVDA logs.

@Pratik Patel @Timothy Wynn

in reply to Musharraf

Pratik Patel

in reply to Musharraf • 7 months ago • •

@twynn I'm going to uninstall the voices I managed to download, uninstall the add-on, and reinstall. I'll get you the debug logs. BTW, the voices were those I downloaded through the voice manager.

@Timothy Wynn

in reply to Musharraf

Pratik Patel

in reply to Musharraf • 7 months ago • •

After updating to the latest beta, the issue i reported still exists. I removed all voices, uninstalled the add-on, reinstalled it, and added voices again. Here's a link to the log.

dropbox.com/scl/fi/914e332qia2…

Piper.log

Shared with Dropbox

^Dropbox

in reply to Pratik Patel

Musharraf

in reply to Pratik Patel • 7 months ago • •

@ppatel
It seams like the server is not running.
Are you running NVDA on a 32-bit/ARM-64 machine. Sonata only works on 64-bit versions of Windows.
Otherwise, check if the server generated any logs in the following file path:
[NVDA config directory]\sonata\logs\sonata-grpc.log
If not, try running the following binary from a cmd window and report the output:
[NVDA config directory]\addons\sonata_neural_voices\synthDrivers\sonata_neural_voices\bin\sonata-grpc.exe

@Pratik Patel

in reply to Musharraf

Pratik Patel

in reply to Musharraf • 7 months ago • •

Thanks for trying to troubleshoot this. I'm running this on a Windows 64 bit on an Intel machine. Not Arm. The log file is not generated. Trying to run sonata-grpc.exe from the bin directory results in the following message:

The term 'sonata-grpc.exe' is not recognized as the name of a cmdlet, function, script file, or operable program.

in reply to Musharraf

Pratik Patel

in reply to Musharraf • 7 months ago • •

I ran it as "./sonata-grpc.exe" and it gave me

"Starting sonata-grpc serverr at 127.0.0.1:49314"

in reply to Pratik Patel

Musharraf

in reply to Pratik Patel • 7 months ago • •

@ppatel
Maybe send me NVDA log to diagnose why the TTS server isn't running.

@Pratik Patel

Musharraf reshared this.

in reply to Musharraf

Pratik Patel

in reply to Musharraf • 7 months ago • •

Here is the most recent log.

dropbox.com/scl/fi/h3bfsprt1q5…

Piper2.log

Shared with Dropbox

^Dropbox

in reply to Musharraf

Pratik Patel

in reply to Musharraf • 7 months ago • •

And my apologies. I forgot to mention that I'm using NVDA alpha builds. It's more than possible that this has something to do with it.

in reply to Musharraf

Peter Vágner

in reply to Musharraf • 7 months ago •

Dear @Musharraf :verified: Can you please give me a hint how do I build the file sonata-grpc.exe is it a result of building github.com/mush42/sonata ? We are working on a slovak human sounding voice with friends and I am tweaking corresponding espeak-data along the way, so until I manage to get these pushed and merged to espeak-ng I imagine my best bet is rebuilding the addon with all the resources locally.
Thanks for all the fantastic work you are putting into this.

@Musharraf

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner • 7 months ago • •

@pvagner
Here's how to build the sonata-grpc binary:
git clone github.com/mush42/sonata
cd ./sonata/sonata-grpc
# With Rust installed
cargo build --release

GitHub - mush42/sonata: A cross-platform engine for neural TTS models.

A cross-platform engine for neural TTS models. Contribute to mush42/sonata development by creating an account on GitHub.

^GitHub

@Peter Vágner

Peter Vágner likes this.

in reply to Peter Vágner

Musharraf

in reply to Peter Vágner • 7 months ago • •

@pvagner
If you just want to set the eSpeak-ng data directory, you don't need to re-build the binary.
Just set the following environment variable before launching sonata-grpc:
SONATA_ESPEAKNG_DATA_DIRECTORY=[your custom espeak-data directory parent]

@Peter Vágner

Peter Vágner likes this.

in reply to Musharraf

Peter Vágner

in reply to Musharraf • 7 months ago •

@Musharraf :verified:

@Musharraf

Unknown parent

Tom Grant

Unknown parent • 7 months ago • •

@fireborn @Tamasg agreed.

@aaron @Tamas G

Unknown parent

Tamas G

Unknown parent • 7 months ago • •

I did train it up to 6000 epochs over night. Even though like 3K is recommended for fine-tuned models by Piper, but I gave it a lot more. I do feel like it's better, but could be placibo. If you have the energy you can re-install and it should overwrite: eurpod.com/en_us-MacAlex+RT_me… (one odd thing this and the other voice do is say numbers that are short but with the hundredth digit mark oddly.)

This entry was edited (7 months ago)

Unknown parent

Tamas G

Unknown parent • 7 months ago • •

@fireborn @TomGrant91 this latest one does very slightly better at it, still not super perfect but yes oddly training it longer and letting it learn more did help it in that department at least.

@aaron @Tom Grant

Unknown parent

Tamas G

Unknown parent • 7 months ago • •

@fireborn @TomGrant91 ahahaha I was updating it just now, so maybe? :D just like 2 or 3 mins ago it finished uploading I believe with a newer replacement but same size. Just like, 9 more hours of training :D

@aaron @Tom Grant

in reply to Tamas G

Andre Louis

in reply to Tamas G • 7 months ago • •

@Tamasg
Just downloaded this myself. Have you updated Keynote in recent times to take advantage of this new AddOn? That's one I'm very keen on trying now that my machine can handle them again, after the addon rewrite. Thanks.
@fireborn @TomGrant91 @mush42

@aaron @Musharraf @Tamas G @Tom Grant

in reply to Andre Louis

Tamas G

in reply to Andre Louis • 7 months ago • •

@FreakyFwoof @fireborn @TomGrant91 the old voices in the newer add-on will still give a similar speed improvement. I guess the one thing that disappointed me slightly - even if you import the older Keynote or Votrax voice files into that Sonata as voices, they will still have quite good speeds almost to the same degree for me. I wonder if that's just me though.

@aaron @Tom Grant @Andre Louis

in reply to Tamas G

Andre Louis

in reply to Tamas G • 7 months ago • •

@Tamasg @fireborn @TomGrant91 Aah OK, would you mind posting the link please? I didn't save it at the time because I just couldn't run it.

@aaron @Tamas G @Tom Grant

in reply to Andre Louis

Tamas G

in reply to Andre Louis • 7 months ago • •

@FreakyFwoof @fireborn @TomGrant91 oh yeah! Keynote: eurpod.com/en-us-keynote-mediu… and votrax: eurpod.com/en-us-Votrax_medium…

@aaron @Tom Grant @Andre Louis

in reply to Tamas G

JamminJerry

in reply to Tamas G • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 I need to figure out why the sonata neural voices won't work at all for me anymore. when I try to switch to it, it says it can't load the sonata neural voices. I was running the first beta, and that was when it worked for a couple days then stopped working all of a sudden. I have completely uninstalled the voices, and then the addon, and installed the beta 2 of the addon, and some voices, but it still won't load it at all. I have even reset the computer thinking something might have just been messed up, but not even that has helped.

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to Tamas G

Andre Louis

in reply to Tamas G • 7 months ago • •

@Tamasg Thanks for that.

@Tamas G

in reply to JamminJerry

Musharraf

in reply to JamminJerry • 7 months ago • •

@JamminJerry @Tamasg @FreakyFwoof @fireborn @TomGrant91
If you provide the logs, I'll be able to diagnose the issue.

@aaron @Tamas G @Tom Grant @Andre Louis @JamminJerry

in reply to Musharraf

JamminJerry

in reply to Musharraf • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 you mean the NVDA logs? I can do that. I would just need to reinstall the addon and a few voices again. give me just a few and I can do that.

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to Musharraf

JamminJerry

in reply to Musharraf • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 I might still have the logs from when they broke, and stopped working, but wouldn't you have to come threw the file to find the issue?

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to Musharraf

JamminJerry

in reply to Musharraf • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 ok, I have already forgotten where the logs are. this is a brain fart, but where are the logs you need?

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to JamminJerry

Musharraf

in reply to JamminJerry • 7 months ago • •

@JamminJerry @Tamasg @FreakyFwoof @fireborn @TomGrant91
Just send the NVDA log.

@aaron @Tamas G @Tom Grant @Andre Louis @JamminJerry

in reply to Musharraf

JamminJerry

in reply to Musharraf • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 that is what I am having trouble finding.

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to Musharraf

JamminJerry

in reply to Musharraf • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 I just told windows to do a search of my c: drive for the file nvda.log and it found nothing. now I know that file has got to be there, but windows is saying nope.

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to JamminJerry

Musharraf

in reply to JamminJerry • 7 months ago • •

@JamminJerry @Tamasg @FreakyFwoof @fireborn @TomGrant91
An easier way is to press insert+F1, then select all and copy.
You can paste it in a plain text file, save and send it.

@aaron @Tamas G @Tom Grant @Andre Louis @JamminJerry

in reply to Musharraf

JamminJerry

in reply to Musharraf • 7 months ago • •

@Tamasg @FreakyFwoof @fireborn @TomGrant91 if I did this right, here you go.
dropbox.com/scl/fi/66sx9tsqvxl…

@aaron @Tamas G @Tom Grant @Andre Louis

in reply to Musharraf

Andre Louis

in reply to Musharraf • 7 months ago • •

Hijacking the thread somewhat just to tell you that your addon works in windows 11 ARM. It didn't used to, when it was Piper. This makes me happy.

⇧