Tamas G

4 months ago

Tamas G
4 months ago

Update: Thanks @pitermach showing a great demo that it's actually Mist World Upsampling to 48 in this demo, not NVDA downsampling to 16!
I stitched together an audio file showing you how bad it is at ignoring the setting of -1 as the output. Instead #NVDASR tries to be too smart, enumerate the list and gather which you have set as your sound mapper output, and explicitly call that sound device when passing to the TTS outputs.
I updated this to add a little more at the end and show how Mist World treats audio output switching properly, that I now know is not proper.
Good night, Mastodon. This really ruined my weekend at first, until that amazing demo in my mentions by @pitermach clarified things. :)
Update: People are asking, "how can I tell?" Listen for the sharpness of S's and other consonants. If you have the ear you'll notice.

#nvdasr @Pitermach

This entry was edited (4 months ago)

in reply to Tamas G

Amir

in reply to Tamas G 4 months ago

Interesting. Which TTS engine or add-on is this? Is this the paid Vocalizer add-on for NVDA?

in reply to Amir

Tamas G

in reply to Amir 4 months ago

@amir yep, the Tiflotecnia one downsamples like this sadly.

@Amir

in reply to Tamas G

Amir

in reply to Tamas G 4 months ago

Oh bad!

in reply to Tamas G

JamminJerry

in reply to Tamas G 4 months ago

I must not have a good enough ear as I am not hearing a difference. if there is one at all, it is very small, at least to me.

in reply to Tamas G

TheFriedChip

in reply to Tamas G 4 months ago

oh god, that is terrible.

in reply to Tamas G

Pitermach

in reply to Tamas G 4 months ago

What you're hearing isn't actually downsampling to 16, it's aliasing artifacts introduced by whatever resampling algorithm Mistworld's audio library is using. Vocalizer actually runs at a native 22 KHZ as far as I know. I recorded a quick demo of what it sounds like when you bring a 22 KHZ file to 48 KHZ with a low quality resampling algorithm versus a file that's actually at 16 KHZ.

Tamas G reshared this.

in reply to Pitermach

JamminJerry

in reply to Pitermach 4 months ago

@pitermach the only one that sounded any different to me was the last one, when you did the 16KHZ all of the others sounded the same to me.

@Pitermach

in reply to JamminJerry

Pitermach

in reply to JamminJerry 4 months ago

@JamminJerry There isn't much of a difference between the old default 64 point sync interpolation and r8brain, but a more pronounced difference with point sampling and linear interpolation there's a bit more high end. Something else I just remembered is back in the day Klango also used to do this, so any voice you used with it would get resampled up like this with noticeable aliasing.

@JamminJerry

in reply to Pitermach

Tamas G

in reply to Pitermach 4 months ago

super informative, wow! I wonder if the default output set within Windows sound pannel to 48K causes the upsampling to 48 and not keeping at 44100, how odd. It could be that with the sound mapper Windows always forces its own sampling rate rather than sticking with the one set as playback in the program. But I'm not 100% sure. (Nope, not this, did a test run by force-changing a Bluetooth A2DP driver to 44) It's definitely odd that when you choose an audio device directly it's correctly sampling in the game though, and maybe Klango did the same thing.

This entry was edited (4 months ago)

in reply to Tamas G

James Scholes

in reply to Tamas G 4 months ago

Unless you're using ASIO and/or exclusive mode, some degree of resampling is unavoidable. Windows opens the device with a particular configuration of sample rate, number of channels, etc. and audio from applications is adapted as needed to match so you get a mix. If three applications are simultaneously sending audio output to the same device, the device is only opened once and receives the sum total of those sources.

in reply to Tamas G

Jamie Teh

in reply to Tamas G 4 months ago

It's also odd that there's a difference because with WASAPI, which even legacy WinMM now uses behind the scenes, there isn't really a separate sound mapper device. You ask the system for the default endpoint, it tells you which that is (the real endpoint, not some sound mapper thing) and then you open that endpoint directly. Thus, there should be no difference in how the device is actually opened. I guess it's possible the app makes different decisions about resampling based on whether the user wants to use the default device or not, but why would it do that?

in reply to Jamie Teh

Jamie Teh

in reply to Jamie Teh 4 months ago

I'll also flag there are different resampling flags you can specify when opening the device with WASAPI. By default, Windows uses a really crappy resampling algorithm, but you can request a better one. I learned this the hard way when I was first implementing WASAPI in NVDA. But again, why would the app fail to specify the correct flag based on which device the user chose?

Tamas G 4 months ago • •

Tamas G
4 months ago