Skip to main content


I spent several hours over the last few days implementing WASAPI audio output for NVDA for some reason. As I suspected, I don't think it's really any more responsive, but I'm hoping it might eventually fix some tricky bugs with the old WinMM implementation, though it'll probably introduce a bunch of its own. Still quite some way to go before it's fully featured; e.g. it doesn't support any device other than the default yet, nor can it recover if a device disappears. #NVDASR

Peter Vágner reshared this.

in reply to Jamie Teh

Does it help with the crackling issues that occur in synths that send small chunks of audio?
in reply to Mohamed Al-Hajamy 💾

@MutedTrampet I'm not sure. I've only tested with eSpeak and OneCore so far. It probably won't cope with that yet, but it might be possible going forward. That said, I added a buffered option to the existing audio code years ago which should have fixed this for such synths if they set it correctly.
in reply to Jamie Teh

Even eSpeak does it, I think. Someone filed an issue with sentences and settings that would cause it to crackle.
in reply to Mohamed Al-Hajamy 💾

@MutedTrampet Ah, the crackling at end of lines bug? Yeah, this should fix that. I thought you were referring to a synth which consistently sends small chunks, rather than just at end of lines, etc.
in reply to Jamie Teh

@MutedTrampet The approach for indexing/callbacks is fundamentally different and doesn't require pushing smaller buffers to get reliable timing. However, synths that push really tiny chunks consistently might cause problems. I can probably deal with those, but I don't know of one right now, so I'm not sure if it's worth putting time into.
in reply to Jamie Teh

@MutedTrampet eloquence_threshold supposedly ran into this, IBMTTS mitigates it by collecting the buffers into a bigger block but that destroys indexing.
in reply to x0

@x0 @MutedTrampet Okay. No legal/current synth I can try though, which is going to make this difficult to test. :) Might have to simulate it by dropping eSpeak's buffer size or something.
in reply to Jamie Teh

Ooh, that's cool. Almost all the NVDA bugs I've seen are related to nvwave in one way or another, is this what you're replacing?
in reply to Quin

@TheQuinbox The internals of nvwave, yes. It'll still be nvwave for compatibility reasons, but the underlying implementation is entirely new. Most of the gruntwork is offloaded to C++ code.
@Quin
in reply to Jamie Teh

I almost wish I hadn't started this NVDA + WASAPI thing. I've spent many hours on it now (713 lines of code so far) and now I probably won't be able to let it go until it's done. It works pretty well now, but there are still edge cases that need fixing; e.g. if you force a non-default device and then disconnect it mid playback. Ug.
in reply to Jamie Teh

I'm also providing an option to allow NVDA synth drivers to pass raw memory pointers for audio data instead of converting to a Python bytes object, which is a lot of unnecessary memory copying and overhead when the ultimate audio buffer is just raw memory (no Python objects) anyway. I've updated eSpeak and OneCore already and it works quite nicely, though I don't really notice a difference on my system.
in reply to Jamie Teh

I love the initiative! I assume it will be API breaking, or can that be avoided?
in reply to Leonard de Ruijter

@leonardder Currently, it's API breaking for synths which choose to use it in that they won't work with the old nvwave. Synths using the old method will still work with the new nvwave though. However, I realised there's a way to implement the raw pointer thing with the old nvwave. It can just convert to a Python bytes buffer on the fly.
in reply to Charli Jo

@CharliJo I'm rewriting NVDA's audio output code to use a more modern Windows framework. It should hopefully improve audio stability a bit, though the advantages probably aren't noticeable for most people.