Borris

3 months ago

Borris
3 months ago

Trying to use Elevenlabs to read out a solar forecast for a new feature being added to @BlindHams, and... yeah, it does odd things when encountering numbers if you don't specifically lock it to English, which, BTW, I don't yet know how to do through the API.

The easy fix is to just spell all numbers out as if they are spoken.

Some generations are fine, others are completely broken, and you never know which you'll get.

Here is a pretty terrible example.

@Blind Hams Digital Network

in reply to Borris

Kara Goldfinch

in reply to Borris 3 months ago

And DECTalk from 1984 would have no problem with numbers at all. I thought text to speech was supposed to get better. Lol

in reply to Kara Goldfinch

Borris

in reply to Kara Goldfinch 3 months ago

@KaraLG84 Interestingly, I did use DECtalk first. It butchered lots of other stuff, but got numbers fine.

@Kara Goldfinch

in reply to Borris

Kara Goldfinch

in reply to Borris 3 months ago

Ah damn. Win some lose some.

in reply to Kara Goldfinch

Borris

in reply to Kara Goldfinch 3 months ago

I could just explicitly convert any weird pronunciations to phonemes and use DECtalk anyway. I might still do that.

This entry was edited (3 months ago)

in reply to Borris

Simon Jaeger

in reply to Borris 3 months ago

@KaraLG84 Am I the only one who likes the TTS voices that sound a little bit synthetic? I don't like to be kept guessing. I think voices like SAPI 5 Tom and AT&T Crystal struck a good balance. If it's good enough to use on trains, it's good enough for a bot.

@Kara Goldfinch

in reply to Borris

Alex Chapman

in reply to Borris 3 months ago

Um, OK that sounds like one minute its fine, then it sounds like its trying to speak Chinese and then it stutters, like hello? AI is not supposed to do that, what is this, some sort of incorrectly made speech dictionaries on ElevenLabs end, or I dunno at this point lol.

in reply to Borris

JamminJerry

in reply to Borris 3 months ago

I wonder why it does that with numbers though? that is just strange!

in reply to Borris

Richard Hyman

in reply to Borris 3 months ago

this had me on the floor 😂

in reply to Borris

ondrosik

in reply to Borris 3 months ago

Eleven labs can do really interesting stuff with numbers. I read a book completelly narrated with eleven labs where it totally changed years so sometimes things were in totally different year as expected. I werified that oroginal was correct.

in reply to Borris

Mayowa

in reply to Borris 3 months ago

first off, 1.67? Secondly, why, is, the thing, saying like 2 thousasand and 25 or 0. fiiive on and so on as if the thing is so broken...

in reply to Mayowa

Borris

in reply to Mayowa 3 months ago

@Mayowa What's wrong with 1.67? That's what the text actually said. As for the rest, yeah, it's broken.

@Mayowa

in reply to Borris

Mayowa

in reply to Borris 3 months ago

um, , ya know, people might start using the 6-7 meme on quite literally every occasion when its said I guess.

at least some of them

in reply to Borris

Steve

in reply to Borris 3 months ago

I think this is why Envision abandoned Eleven Labs for their Ally assistant. I'm disappointed something so advanced still can't handle numbers well.

in reply to Steve

Borris

in reply to Steve 3 months ago

@sclower Interestingly, the V1.0 API is perfectly fine with it. V2 and especially V3 are all kinds of weird.

@Steve

in reply to Borris

Steve

in reply to Borris 3 months ago

I need to get back on the bridge. Is it still infested with conspiracy nuts, or is it safe to return?

in reply to Steve

Borris

in reply to Steve 3 months ago

@sclower No. They've all gone over to Another node. That means we have a lot less activity, but quality over quantity, etc. Plus, they (not much to do with me) are starting a net where they can safely complain about whatever at a designated time. Along with the interesting stuff happening with the website, it isn't quite the same terrible it was a few weeks ago.

@Steve

in reply to Borris

Steve

in reply to Borris 3 months ago

Awesome. Based on messages I've seen on the reflector I was hoping that might be the case.

in reply to Steve

Zach Bennoui

in reply to Steve 3 months ago

This is the thing, if they were even a little transparent about what exactly they were doing under the hood I could come up with at least an educated guess as to why it struggle so much with numbers and other symbols, but I have no idea. I'm pretty sure the model they're using is some kind of LLM architecture, similar to many recently released open-source TTS models, and those are notorious for struggling with numbers and other symbols without text normalization being done. I'll be honest, Eleven is actually behind the competition from the likes of Amazon and Microsoft, and many of the open source options not only sound just as good, but are much more customizable as well.

This entry was edited (3 months ago)

in reply to Zach Bennoui

Borris

in reply to Zach Bennoui 3 months ago

@ZBennoui @sclower Fortunately, we will soon have a dedicated, in-house server to run stuff on, so the possibility of using reasonable local models will exist once a decent GPU is added.

@Zach Bennoui @Steve

in reply to Borris

Zach Bennoui

in reply to Borris 3 months ago

@sclower Well you'll certainly have a lot of options to pick from. Something that could be fun and potentially useful is to try fine-tuning a model on the type of content you're going to be generating, performance will likely improve drastically. These LLM based models are much easier to find tune than traditional ML TTS, I was able to train an Orpheus TTS model on my own voice in about ten minutes.

@Steve

in reply to Borris

Talon

in reply to Borris 2 months ago

one fifty wahwah, four hunnur-nur-nur, er ninety nine indeed.

in reply to Talon

Borris

in reply to Talon 2 months ago

@talon Yeah, that.

@Talon

⇧

Borris 3 months ago • •

Borris
3 months ago