TIL: There's a W3C candidate recommendation draft for a CSS markup to transfer different properties of text and controls on the web via audio cues and changes to the TTS volume, speech rate, tone, prosody and pronunciation, kind of like the attributed strings in iOS apps and it's called CSS Speech. w3.org/TR/css-speech-1/ #Accessibility #A11y #Blind
reshared this
Mikołaj Hołysz
in reply to Paweł Masarczyk • • •Isn't this what Aural CSS was back in the day?
AFAIK that was never implemented in any mainstream browser, Emacspeak was the only implementor I know of.
As far as I remember, it had a bunch of extra properties for things like how speech should be positioned in 3d, which was a very Emacspeak thing to do.
Paweł Masarczyk
in reply to Mikołaj Hołysz • • •James Scholes
in reply to Paweł Masarczyk • • •There are people who seem to feel really strongly about this being a good thing for screen reader users, and I must admit to being bewildered about why. Websites changing aspects of screen reader output may be equitable, if we compare it with the way webpages can alter visual presentation through fonts and other aspects. But to me it feels entirely inappropriate to cross that boundary between the browser as the user agent and accessibility software in order to interfere with very personal settings.
Meanwhile on iOS, the related accessibility attributes are being used to achieve outcomes nobody wants or needs, like spaces between all the digits of a credit card number. @miki @prism
James Scholes
in reply to James Scholes • • •I can see the point for e.g. text-to-speech APIs built into the browser, maybe even read-aloud features. But the case for screen reader compatibility seems to be built on the foundational assertion that SR output is monotonous and can't be "livened up" by brands.
As assertions go, I think that is both true and exactly how it should be. I don't use a screen reader for entertainment. I can think of few things more obnoxious than a marketing person thinking that my screen reader should "shout this bit."
Many web authors can't even label stuff correctly. Why on earth would we expect them to treat this sort of feature with informed respect? @miki @prism
Drew Mochak
in reply to James Scholes • • •James Scholes
in reply to Drew Mochak • • •@prism I think without ARIA or an equivalent (like more things built into the web platform), the web would've continued galloping forward with all the same UI widgets and design patterns but with no way to make them even halfway accessible, and we'd be left even more behind than we are now.
By contrast, I don't think the inability for a website to change the pitch of NVDA is a legitimate blocker to anything worthwhile. @Piciok @miki
Drew Mochak
in reply to James Scholes • • •@Piciok @miki
Rebecca
in reply to Drew Mochak • • •Mikołaj Hołysz
in reply to Rebecca • • •Mikołaj Hołysz
in reply to Mikołaj Hołysz • • •James Scholes
in reply to Mikołaj Hołysz • • •@miki I think it's a trap to suggest that such problems should currently be solved only through speech properties and auditory cues within individual apps. Expressive semantics on the web have only been explored at a surface level so far, and it's a complete stretch to go from "We don't have the ARIA properties to convey complex information," to "Let's have every application implement its own beeps and boops."
Imagine having to learn the sound scheme for Gmail, then Outlook, then Thunderbird. Then going over to Slack where they also have unread state albeit for chat messages rather than emails, but they use an entirely different approach again.
All the while, braille users are getting nothing, and people who struggle to process sounds alongside speech are becoming more and more frustrated. Even if we assume that this is being worked on in conjunction with improvements to ARIA and the like, how many teams have the bandwidth and willingness to implement more than one affordance?
We've already seen this in practice: ARIA has braille properties, but how many web apps use them? Practically none, because getting speech half right and giving braille users an even more subpar experience is easier. Your own example highlights how few apps currently let you control things like verbosity and ordering of information.
CSS Speech could turn out even worse. A product team might opt to implement it instead of semantics because the two blind people they spoke to said it would work for them, and never mind the other few million for whom it doesn't. They'll be the people complaining that there's no alternative to the accessibility feature a team spent a month on and thought was the bee's knees.
@silverleaf57 @prism @Piciok
Mikołaj Hołysz
in reply to James Scholes • • •James Scholes
in reply to Mikołaj Hołysz • • •Mikołaj Hołysz
in reply to James Scholes • • •@jscholes @silverleaf57 @prism Efficiency, not equity.
Words are a precious resource, far more precious than even screen real estate. After all, you can only get a fairly limited amount of them through a speaker in a second. We should conserve this resource as much as we can. That means as many other "side channels" as we can get, sounds, pitch changes, audio effects, stereo panning (when available) and much more.
Icon fatigue is real. "me English bad, me no know what delete is to mean" is also real, and icons, pictograms and other kinds of pictures is how you solve that problem in sighted land.
Obviously removing all labels and replacing it with pictograms is a bad idea. Removing all icons and replacing them with text... is how you get glorified DOS UIs with mouse support, and nobody uses these.
Mikołaj Hołysz
in reply to Mikołaj Hołysz • • •James Scholes
in reply to James Scholes • • •@miki Note that I'm specifically responding to your proposed use case here. You want beeps and boops, and I think you should have them. But:
1. I think you should have them in a centralised place that you control, made possible via relevant semantics.
2. I don't think the fact that some people like beeps and boops is a good reason to prioritise incorporating beeps and boops into the web stack in a way that can't be represented via any other modality.
@silverleaf57 @prism @Piciok
Mikołaj Hołysz
in reply to James Scholes • • •James Scholes
in reply to Mikołaj Hołysz • • •James Scholes
in reply to James Scholes • • •@miki As noted, I think people who can process a more efficient stream of information should have it available to them. That could be through a combination of normalised/centralised semantics, support for specialised custom cases, and multi-modal output.
My main concern remains CSS Speech being positioned as the only solution to information processing bottlenecks, which I think is a particularly narrow view and will make things less accessible for many users rather than more.
Good discussion, thanks for chatting through it. @silverleaf57 @prism @Piciok
Drew Mochak
in reply to James Scholes • • •@jscholes At the same time, I think the chances that CSSSpeech completely takes over the industry and we all stop doing text role assignments is quite low.
explainxkcd.com/wiki/index.php…
So I am decidedly meh about this. It could help but probably won't.
@miki @silverleaf57 @Piciok
927: Standards - explain xkcd
www.explainxkcd.comJames Scholes
in reply to Drew Mochak • • •Paweł Masarczyk
in reply to James Scholes • • •Drew Mochak
in reply to Paweł Masarczyk • • •This already exists though, as a screenreader feature. Kind of. NVDA has an add-on called unspoken that will replace the announcement of roles with various sounds, there's a different one for checked vs. unchecked boxes for instance. JAWS did (does?) something similar with the shareable schemes in the speech and sounds manager. Granted, not a lot of people do this, but the ability is there if people want it. VO, TB and cvox also have earcons--they're not used for this purpose, but they could be. Having this under the user's control rather than the author's control does seem better. It prevents for instance a developer deciding to be super abtrusive with ads. I do see the potential for it to be good, the author would be able to convey more nuanced concepts being the author of the content... it just feels like a thing most people wouldn't use, and most of the people who'd try would end up being obnoxious about it.
@jscholes @miki @silverleaf57
Paweł Masarczyk
in reply to Drew Mochak • • •Paweł Masarczyk
in reply to James Scholes • • •Paweł Masarczyk
in reply to James Scholes • • •Devin Prater :blind:
in reply to Paweł Masarczyk • • •Adrian Roselli
in reply to Paweł Masarczyk • • •Why we need CSS Speech - Tink - Léonie Watson
Tink - Léonie Watson - On technology, food & life in the digital age