Aaron

4 hours ago (Received 3 hours ago)

Aaron
4 hours ago (Received 3 hours ago)

What are your pain points, folks? Stuff that you hate doing or dealing with, or problems you can't find a good solution to? Stuff that other people might be frustrated with, too.

I'm looking for a way to make myself valuable to other people, as a way to both help people and also earn an income to feed my family in the process.

One thing I can do *really well* is create reliable software to automate rote tasks, generate financial/statistical/other reports, or calculate difficult solutions. Think it can't be done without LLMs? I might surprise you!

Throw me a bone!

Please boost for reach!

#PainPoints
#WishList
#Automation
#Reporting
#ProblemSolving
#FediHire
#GetFediBHired
#FediJob

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 3 hours ago

Sadly, there is no money in solving any of my problems. If there was, someone would have solved them. See, for example, my complaints about text to speech systems. stuff.interfree.ca/2026/01/05/ai-tts-for-screenreaders.html
I can go into more detail about why all the options are bad if you want. But this is the sort of problem that eats years of your life, requires advanced mathematics (digital signal processing at a minimum), and advanced linguistics, on top of being a good systems-level programmer.

Sam's Stuff - The State of Modern AI Text To Speech Systems for Screen Reader Users

^{stuff.interfree.ca}

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 3 hours ago

@fastfinge I just so happen to be an (unemployed) machine learning researcher by trade, with advanced mathematics, linguistics, and programming skills. Maybe not systems-level programming, but I could probably find someone who does that and work with them.

Given that the first two responses I've gotten were both about accessibility, there might be more of a market for this than you think, and also, it might make a good way to demo my skills even if it isn't paid work.

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 3 hours ago

The reason I say systems-level programming is mostly because for a text to speech system used by a blind power user, you need to keep an eye on performance. If the system crashes and the computer stops talking, the only choice the user has is to hard reset. It would be running and speaking the entire time the computer is in use, so memory leaks and other inefficiencies are going to add up extremely quickly.

From what I can tell, the ideal is some sort of formant-based vocal tract model. Espeak sort of does this, but only for the voiced sounds. Plosives are generated from modeling recorded speech, so sound weird and overly harsh to most users, and I suspect this is where most of the complaints about espeak come from. A neural network or other sort of machine learning model could be useful to discover the best parameters and run the model, but not for generating audio itself, I don't think. This is because most modern LLM-based neural network models can't allow changing of pitch, speed, etc, as all of that comes from the training data.

Secondly, the phonemizer needs to be reproducible. What if, say, it mispronounces "Hermione". With most modern text to speech systems, this is hard to fix; the output is not always the same for any given input. So a correction like "her my oh nee" might work in some circumstances, but not others, because how the model decides to pronounce words and where it puts the emphasis are just a black box. The state of the art, here, remains Eloquence. But it uses no machine learning at all, just hundreds of thousands of hand-coded rules and formants. But, of course, it's closed source (and as far as anyone can tell the source has actually been lost since the early 2000's), so goodness knows what all those rules are.

This entry was edited (3 hours ago)

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 2 hours ago

@fastfinge Reading your linked article article and this reply, I get the sneaking suspicion that HDC (hyperdimensional computing) or other one- or few-shot learning methods that are designed to factor the model into independent components that can be quickly recomposed in new ways might be appropriate. The idea would be to, as you suggest, learn the values for these components using machine learning, but also the mapping between them and the sounds produced, so that each becomes separately tunable on the fly.

HDC has the added advantage that it is great for working with "fuzzy", human-interpretable rule representations, is typically extremely efficient compared to neural nets, and even meshes well with neural nets and gradient descent-based optimization.

Do you happen to have data of any sort that could be used for training?

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 2 hours ago

In general, for training the rules for pronouncing English, the CMU pronouncing dictionary is used: www.speech.cs.cmu.edu/cgi-bin/cmudict
When it comes to open-source speech data, LJSpeech is the best we have, though far from perfect: keithito.com/LJ-Speech-Dataset/
And here's a link to GnuSpeech, the only open-source fully articulatory text to speech system I'm aware of: github.com/mym-br/gnuspeech_sa?tab=readme-ov-file
I'm afraid I don't have any particular data of my own.

GitHub - mym-br/gnuspeech_sa: Articulatory speech synthesizer

Articulatory speech synthesizer. Contribute to mym-br/gnuspeech_sa development by creating an account on GitHub.

^GitHub

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 2 hours ago

@fastfinge thanks! I'll have a look at these.

Were you wanting to collaborate on this?

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 2 hours ago

Sadly, this is so far outside of my expertise and abilities it's not even funny. I have an excellent handle on what's needed, and the vague shape of the path forward, but actually doing any of it is way outside of my skillset. If it was anywhere near something I could do, I would have started already.

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 2 hours ago

@fastfinge how about for guidance, design, requirements, alpha testing and evaluation?

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

Aaron

in reply to Aaron 2 hours ago

@fastfinge my thinking is that, sure, I can build a thing, just like all those other folks, but you know the actual needs it would meet firsthand. That's tremendously valuable and can make the difference between something awesome and something completely useless

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 2 hours ago

Absolutely yes to all of the above. I can think of another 10 people on Mastodon at minimum who are also ready and willing to help where ever they can. Just none of us with the skillset to do the actual work.

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 1 hour ago

When it comes to requirements, in general, if it can work with both the SAPI5 and NVDA addons API, it will suit the requirements of speech dispatcher on Linux and the mac API's. The important thing is that most screen readers want to register indexes and callbacks. So, for example, if I press a key to stop the screen reader speaking, it needs to know exactly where the text to speech system stopped so that it can put the cursor in the right place. It also wants to know what the tts system is reading so it can decide when to advance the cursor, get new text from the application to send for speaking, etc. I really really really wish I had a better example of how that works in NVDA than this: github.com/fastfinge/eloquence_64/blob/master/eloquence.py

eloquence_64/eloquence.py at master · fastfinge/eloquence_64

Eloquence synthesizer NVDA add-on compatible with 64-bit versions of NVDA - fastfinge/eloquence_64

^GitHub

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 1 hour ago

@fastfinge I think I get the gist but the code will help a lot!

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 1 hour ago

I wish it would. Unfortunately, that code is what we use to keep Eloquence alive in the 64-bit NVDA version. So it's awful, for dozens of reasons. This...is a bit clearer? Maybe? Anyway, it's the canonical example of how NVDA officially wants to interact with a text to speech system, written by the NVDA developers themselves. Any text to speech system useful for screen reader users needs to expose everything required for someone to write code like this. Not saying you could or should; there are dozens of blind folks who can do the job of integrating any text to speech system with all of the various API's on all the screen readers and platforms. But we have to have useful hooks to do it. github.com/nvaccess/nvda/blob/master/source/synthDrivers/espeak.py

nvda/source/synthDrivers/espeak.py at master · nvaccess/nvda

NVDA, the free and open source Screen Reader for Microsoft Windows - nvaccess/nvda

^GitHub

in reply to 🇨🇦Samuel Proulx🇨🇦

Joe (TBA)

in reply to 🇨🇦Samuel Proulx🇨🇦 2 hours ago

@fastfinge I'm wondering if the best place to put text-to-speech processing is on the hardware of the video card. Therefore, the audio is being generated by a separate system. As long as we are blue-skying it here. therefore the system would not care what system was running, rather it could read whatever was sent to the video to display. As I'm thinking this through as I write this, obviously it would need to read descrete sections or it might be garbled gibberish, but...a thought.

@🇨🇦Samuel Proulx🇨🇦

in reply to Joe (TBA)

🇨🇦Samuel Proulx🇨🇦

in reply to Joe (TBA) 2 hours ago

@RegGuy@hosford42 The issue is that then you lose all semantic meaning. Older screen readers did, in fact, work like this, especially back in the DOS days. But also, you're conflating two different systems, here: the text to speech system, and the overall screen reading system. All the text to speech system does is take text, and turn it into speech. The screen reader is responsible for everything else. And in general, we have perfectly good screen readers. Screen readers can also, by the way, drive Braille displays and other output devices.

@Aaron @Joe (TBA)

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 3 hours ago

The sourcecode for dectalk is out there. Unfortunately, It's...legally dubious at best. It was leaked by an employee back in the day, and now the copyright status of the code is so unclear that nobody can safely use it for anything, but also nobody can demonstrate clear enough ownership to submit a DMCA and get it taken off github. GNUspeech is also pretty close to what's needed, but it won't even compile without all the NeXT development tools, I don't think. So at best all that would be is a base for something else; modernizing it would probably effectively be a complete rewrite anyway.

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 2 hours ago

@fastfinge Looking for the source, I found this:

github.com/dectalk

It looks like both DECtalk and DECtalkMini are being actively maintained, with commits as recent as 1 to 2 months ago. I was hoping the copyright for the "mini" version would be unencumbered, but no such luck. It would have to be a re-implementation from scratch using this code as a guide. That's a lot easier than implementing a new system out of nothing, though.

DECTalk

Command error in phoneme. DECTalk has 8 repositories available. Follow their code on GitHub.

^GitHub

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 2 hours ago

I also have no idea about any associated IP or patents, though. Wouldn't whoever does it need to be able to prove they never saw the original code, just its outputs? Otherwise you're still infringing, aren't you? In this regard, it's probably actually a bad thing that the dectalk sourcecode is so widely available.

And most of the commits seem to be about just getting it to compile on modern systems with modern toolchains. I dread to think how unsafe closed-source C code written in 1998 is.

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 2 hours ago

If you're going to reimplement something, you might be better to go with gnuspeech, as it's known to be in the GPL. At the least, it gives you a vocal model to improve on, that was coded with open research in mind, rather than proprietary code probably written for job security.

in reply to Aaron

🇨🇦Samuel Proulx🇨🇦

in reply to Aaron 2 hours ago

Also, if you enjoy comparing modern AI efforts with older rule-based text to speech systems, and listening to the AI fail hard, this text is wonderful for that. As far as I'm aware not a single text to speech system, right up to the modern day, can read this one hundred percent correctly. github.com/mym-br/gnuspeech_sa/blob/master/the_chaos.txt
But eloquence gets the closest, gnuspeech second, espeak third, dectalk fourth, and every AI system I've tried a distant last.

gnuspeech_sa/the_chaos.txt at master · mym-br/gnuspeech_sa

Articulatory speech synthesizer. Contribute to mym-br/gnuspeech_sa development by creating an account on GitHub.

^GitHub

in reply to 🇨🇦Samuel Proulx🇨🇦

Aaron

in reply to 🇨🇦Samuel Proulx🇨🇦 2 hours ago (Received 1 hour ago)

@fastfinge wow, I could barely read that as a human. LOL

@🇨🇦Samuel Proulx🇨🇦

in reply to Aaron

David Nash

in reply to Aaron 1 hour ago

@fastfinge (context: both Aaron and I are USAians)

It doesn't help that:

1. it's 150 or so years old, so a few pronunciations have changed a bit
2. the pronunciations and spellings (and hence some of the apparent mismatches) are UK English, not US English.

At a minimum, you'll have to envision skipping "r"s after vowels at the ends of words for many of these to make sense. As for the rest, I recognized a few of those from past experience with older UK English (e.g. "clerk" with an "a" sound), but a couple left me scratching my head saying "that's how people actually said or spelled it then and there?"

@🇨🇦Samuel Proulx🇨🇦

in reply to David Nash

🇨🇦Samuel Proulx🇨🇦

in reply to David Nash 1 hour ago

@dpnash@hosford42 Right, but most text to speech systems have a UK English setting. And the mistakes they're making are on things much more basic than that. For example, far too many so-called state of the art AI TTS systems can't even pronounce "Susy", "plaid", "fuchsia", and "lieutenants".

@Aaron @David Nash

in reply to 🇨🇦Samuel Proulx🇨🇦

🇨🇦Samuel Proulx🇨🇦

in reply to 🇨🇦Samuel Proulx🇨🇦 1 hour ago

For example, here's Eleven Labs, the billion dollar voice AI company that's supposed to replace all voice actors forever. I used the voice builder to specifically request received pronunciation. That was not at all what I got. Aside from that, notice the incorrect "tear", pronouncing "plaid" "played", having no idea that "victual" is pronounced "viddle", and a number of other mistakes. I reran it just now, to be as fair as possible. It has not improved.

in reply to 🇨🇦Samuel Proulx🇨🇦

🇨🇦Samuel Proulx🇨🇦

in reply to 🇨🇦Samuel Proulx🇨🇦 1 hour ago

Compare that with the version of GNU Speech released in 1995. It still messes up "tear" and "live". But once you get past the unnatural voice, it's far more precise. And once you get used to it, much much easier to listen to at an extremely high rate of speed (4x or more) all day. All text to speech advancement from "AI" is just the wow factor of "Wow, it sounds so human!" But pronunciation...you know, the important part of actually reading text...is either the same or worse. With five thousand times the resources.

in reply to Aaron

Nicks World

in reply to Aaron 4 hours ago (Received 3 hours ago)

Frankly, having accessible business related software would be very nice, even accessible accounting software that is self-contained and not a website.

in reply to Nicks World

Aaron

in reply to Nicks World 4 hours ago (Received 3 hours ago)

@NicksWorld

Have your tried LibreOffice? I have read that it is accessible, but I trust real users better.

What specific features do you wish for most?

I have a feeling it's probably a big ask for a single developer, but I could at least take a look at the source for LibreOffice (unlike MS products) and see if I can add the features without retooling the whole codebase.

@Nicks World

in reply to Aaron

Nicks World

in reply to Aaron 4 hours ago (Received 3 hours ago)

I just want the ability to read an manipulate stuff in financial statements where its formatted right and I can just get to the catagories I need to get to so I can put in the right numbers into the boxes where they belong so for example, balancing a balance sheet or understanding and interpreting a cashfflow statement.

This entry was edited (4 hours ago)

in reply to Nicks World

Aaron

in reply to Nicks World 4 hours ago (Received 3 hours ago)

@NicksWorld I would probably need to sit with you to understand the dynamics of the flow and where it gives you trouble. There are a few things to unpack here, on first reading:

* Not sure what misformatting you're finding.
* By getting to the categories, do you mean navigating columns by their headers?
* Do you have specific spreadsheets you are working with regularly? If so, I might be able to come up with a different way to collect and/or present the information that is more naturally suited to blind users, like a Q&A format with predetermined flow.

Spreadsheets are designed specifically with sighted users in mind, so there's an element of inaccessibility baked into them. By organizing the information into a more linear, language-based flow instead of a spreadsheet, that could potentially make the process much more natural for a screen reader, and the data could then be automatically formatted as, or loaded into, a spreadsheet. I'd be interested to get your thoughts on this.

@Nicks World

in reply to Aaron

Nicks World

in reply to Aaron 4 hours ago (Received 3 hours ago)

I would be willing to talk to you about htis if you're willing to talk to me through voice, if comfortable, because I'm better at speaking than writing, especially with what I'm looking for. As for libra office, I don't reallly have experience with it, because I kind of thought libra office was libravox lol.

in reply to Nicks World

Aaron

in reply to Nicks World 3 hours ago

@NicksWorld

Sure, I bet it's a bit of a pain for you with text-based discussions! I'm awkward on phones but willing to give it a shot, if you think it's worthwhile and can put up with my spoken awkwardness and fumbling with words. (I communicate so much better when I can write! lol)

Can I suggest, though, that first it might make sense to get familiar with LibreOffice and see if it does a better job with the interface than Excel or other such software? It would be a shame to waste our time and effort on a problem that's already solved. It might also turn out that you have different pain points with the open source software that I can actually modify.

@Nicks World

in reply to Aaron

Nicks World

in reply to Aaron 3 hours ago

Are you willing to send me a link to the program?

in reply to Nicks World

Aaron

in reply to Nicks World 3 hours ago

@NicksWorld

Here's the main page:

libreoffice.org/

And here's the download page:

libreoffice.org/download/downl…

You will need to select your OS for the download.

Home | LibreOffice - Free and private office suite - Based on OpenOffice - Compatible with Microsoft

Free office suite – the evolution of OpenOffice. Compatible with Microsoft .doc, .docx, .xls, .xlsx, .ppt, .pptx. Updated regularly, community powered.

^{www.libreoffice.org}

@Nicks World

in reply to Aaron

Nicks World

in reply to Aaron 3 hours ago

You're a friendly person. Thank you.

in reply to Nicks World

Aaron

in reply to Nicks World 3 hours ago

@NicksWorld This made me smile! You're quite welcome. I hope it already does what you need, but if not, ping me and we can have that chat.

@Nicks World

in reply to Aaron

Zach Bennoui

in reply to Aaron 3 hours ago

@NicksWorld So I just downloaded this out of curiosity, and it seems like they definitely did some good work on accessibility. Unfortunately on macOS when used with VoiceOver, it doesn't really behave like a standard Mac app in terms of the UI and VoiceOver doesn't work how you would expect. This completely makes sense as it's open source and likely not developed using something like Swift UI, but for Mac users I would honestly stick to iWork.

@Nicks World

in reply to Zach Bennoui

Aaron

in reply to Zach Bennoui 2 hours ago

@ZBennoui @NicksWorld I'm assuming iWork is closed source, right?

@Zach Bennoui @Nicks World

⇧

Aaron 4 hours ago (Received 3 hours ago) • •

Aaron
4 hours ago (Received 3 hours ago)