Search

Items tagged with: CWImageDescriptionMeta


@πŸ‡¨πŸ‡¦Samuel ProulxπŸ‡¨πŸ‡¦ In general, when it comes to what to include in an image description, the context matters. But so does the target audience (not as in whom you want to receive your content, but who may stumble upon it this or that way), and so does the existing knowledge of the target audience. And, this is pretty much Fediverse-specific, so do the expectations of your target audience.

I've observed and studied alt-text and image descriptions for some three years now, not only by reading dozens upon dozens of guides all over the Web, but especially by examining the attitude towards it in the Fediverse, that is, actually only on Mastodon because alt-text isn't such a hot topic anywhere else. I've mostly done so in order to up my own image-describing game further and further and further, also because no alt-text guide out there covers my situation, so I had to cobble all that information together myself, enough information for me to have started my own wiki on this topic to share my knowledge with others.

One thing I've noticed is that Mastodon loves long and extensive image descriptions in alt-text. There's no "keep it short and concise"; instead, there are users who keep receiving praise for alt-texts of 800 or 1,000 characters or more.

Also, my impression is that Mastodon does not like having to ask for details and/or explanations, nor does it like to look up what it doesn't know enough about to understand it. If you have to ask someone who has posted an image for a description of a certain detail in an image, this means that the image description is lacking, regardless of whether or not that detail matters within the context of the post. Having to ask for a description of a detail is almost as bad as having to ask for the description of the whole image.

In fact, it was just a few months ago that I read a Mastodon toot that said that any element in an image mentioned in the description must also have its own visual description. You can't just say what's in the image. You also have to describe what it looks like.

Likewise, if there's something in an image description that someone doesn't understand, it must be explained right away. This, by the way, ties in with the rule that image descriptions must never use technical language or jargon, and if they absolutely cannot avoid it, it must be explained when it's used first. And it must be explained in a way that requires no prior special knowledge.

So far, so good. But the reason why I've gone all the way to observe and study alt-text and image descriptions, and why I'm so obsessed with it, is because I'm in a special situation.

For one, I'm in the Fediverse which means that certain alt-text rules simply don't apply to me, not only everything that involves captions, but also the brevity-as-a-hard-requirement rule. However, I'm not on Mastodon, so I'm not as much bound to Mastodon's limitations as Mastodon users. In particular, my character limit is over 16 million, so I can do a whole lot more in the post itself.

Besides, my original images are nothing like what almost everyone on Mastodon posts. They aren't real-life photographs, nor are they social media screenshots. Instead, they are renderings from 3-D virtual worlds, even extremely obscure virtual worlds that next to nobody out there has ever even heard of.

At the same time, my image posts might get people curious enough that they want to go explore this new universe that they've just discovered through my post. The only way they can explore it is by looking at my images and taking in all the big and small details. If they're blind, they cannot do that, but accessibility and inclusion demand they have the very same chance to do it as fully sighted people. In order for them to have this chance, I must go and describe all these big and small details to them, regardless of context. Everything else would be ableist, maybe not by some official W3C definition, but at least by Mastodon's definition.

Speaking of context, sometimes my images are the context of the post. There isn't that one element in the image that matters within the context of the post while everything else can be swept under the rug. No, the entire image matters. The entire scenery matters. Everything in the image matters all the same. This means that I have to describe everything. Again, see further above: I can't get away with just mentioning what's there. If I mention it, I have to describe what it looks like.

This is also justified because I can never expect everyone to already know what something in my image looks like. Again, they don't show real life. They show virtual worlds. In virtual worlds, things do not necessarily look like what they look like in real life. And things tend to look different in different virtual worlds, sometimes even within the same virtual world system.

For example, you, as someone born completely blind, may have come across enough image descriptions to have a rough idea of what cats look like in real life. But that does not automatically give you a realistic idea what a particular cat looks like in a specific virtual world, also seeing as there are infinitely more possibilities for what cats may look like. It could be a detailed, life-like representation of a cat with high-resolution materials as textures. It could be a very simplified, low-resolution model with a likewise low-resolution texture. It could be cobbled together from standard shapes because that was all that was possible when that cat was made. Or whatever. You wouldn't know unless I told you. But who am I to judge whether or not you want to know?

It gets even worse with buildings. You probably wouldn't even know what a specific building looks like in real life unless you have a detailed description, so how are you supposed to know what a specific building looks like in a virtual world that you've first read about a few minutes ago? In addition, there are so many ways of creating buildings in virtual worlds, and they've changed over time with new tools and new features becoming available.

I've come to a point at which I usually avoid having buildings in my images because they're too tedious to describe, especially realistic buildings, but not only these. My last original image post but one was in spring, 2024, about one and a half years ago. I decided to show a rather fantasy-like building. This building, however, is so complex that it took me two full days, morning to evening, to write the long image description that I'd put into the post. This image description is over 60,000 characters long, over 40,000 of which describe the building. The description also covers the interior because the outer walls of the building are almost entirely glass. The long description has two levels of headlines of its own. I've needed well over 4,000 characters only to explain to people where that place is that's shown in the image.

And then there was the short description for the alt-text which I needed as well so that nobody could accuse me of not adding a sufficiently detailed alt-text to my image. I was genuinely unable to make it any shorter than 1,400 characters. It actually took up a lot of characters that I needed to point especially Mastodon users at the long description in the post itself. That was when Mastodon only hid the post text behind a CW, but not the images, so that nobody on Mastodon would have known that there's a long description unless I told them in the alt-text.

One reason why the long description grew so long was that I didn't describe the image by looking at the image. I described it by looking at the real deal. All the time while I was working on the long description, I was in-world. I had my avatar in front of the building, walking through the building, walking around the building. I could move the camera very close to a lot of details. Instead of seeing the scenery at the resolution of the image, I saw it at a practically infinite resolution. This also enabled me to transcribe text that's so small in the image that it's unreadable, even text that's so tiny in the image that it's invisible. After all, the rule says that any and all text within the borders of an image must be transcribed. And I've yet to see that rule having any explicit exception for unreadable text.

Sure, I could have written that certain details got lost and cannot be identified at the low resolution of the image. But that may be perceived as me trying to weasel out of the responsibility to describe these details instead. I mean, how many people who were born completely blind have a concept of image resolution and pixels, and how many think that it's possible to zoom into any image infinitely? Besides, I'm not bound to what the image shows at its fairly low resolution anyway, so why should I pretend I am? The only logical reason for that would be because I'm expected to describe the image. And not the scenery in the area within the borders of the image.

And still, I haven't given full visual descriptions of everything in that scene. I decided against fully describing all images within that image at the same level as the image itself. I decided so because it would have gone too far: At least one image, a preview image on a teleporter, technically shows dozens of images itself, preview images on teleporters again. And some of these images show more images yet again. I would have ended up describing several dozens of images, at least four levels deep, in order to fully describe one image. And then the whole image description would have been rather pointless because Mastodon rejects posts with over 100,000 characters, and the post would probably have ended up with several millions of characters.

By the way, even before I wrote that massive image description, I actually showed @Hat. AuDHD cat 😷n95πŸ‰ πŸ’”πŸŒ»πŸ”» one of my image posts, the one with my longest description for a single image to that date. It has two images with over 48,000 characters of long description combined, almost 40,000 of which are for the first image. She actually praised this massive image description and told me that this level of detail in both visual description and explanation is exactly what she needs.

The last time I've posted original in-world images was in July, 2024. I took care not to have too many details in the images this time. Still, I ended up with a combined over 25,000 characters of long description for both images, also because they contain an avatar that had to be described in full detail.

I've been working on the image descriptions for a series of avatar portraits for about a year now, on and off, but still. This time, I gave the images a neutral, completely feature-less, bright white background that won't take up much effort to describe. The plan is to have three or four images with three or four portraits of the same avatar each, always in the same post with only slightly different outfits. I'm still describing the first image, and I've only fully covered the first outfit and started with the second one.

The common preamble for all images in one post already exceeds 17,000 characters, including over 2,000 characters explaining OpenSim and over 9,000 characters explaining what OpenSim avatars can be made of and how they work because that's essential for understanding the visual descriptions. I expect the preamble to grow significantly longer before it's ready because I have to get rid of a whole lot of technical language and jargon and/or explain even more of it. The preamble also contains over 5,000 characters of general visual description that applies to all portraits in all images the same. It includes almost 2,000 characters that describe the shoes, men's casual leather shoes, because to my best knowledge, such shoes don't exist in real life.

Other images will show the avatar wearing full brogue leather shoes. I'm still not sure whether I can correctly assume that everyone out there knows what they are and what they look like, or whether I'll have to give the same amount of detail description again, only that full brogue shoes are much more complex than the shoes I've already described. Also, I'm not sure if everyone out there knows what a herringbone fabric pattern looks like, or whether that requires a detail description and an explanation itself, even though several actually blind users have told me that I can assume it to be familiar.

One problem I still haven't solved is that I simply can't fit an appropriately detailed short image description into a maximum of 1,500 characters of alt-text.

Verdict: There are always edge cases in which an image cannot be sufficiently described in only one short and concise image description in the alt-text. My virtual world renderings are such an edge case, also because they're posted into the Fediverse. Another edge case is @Hat. AuDHD cat 😷n95πŸ‰ πŸ’”πŸŒ»πŸ”» who, due to a disability, requires hyper-detailed image descriptions that take hours to read to even be able to experience and understand an image properly.

CC: @Carolyn @Prof. Rachel Thorn πŸ‰πŸ‡ΊπŸ‡¦πŸ³οΈβ€βš§οΈπŸ³οΈ

#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #Ableist #Ableism #AbleismMeta #CWAbleismMeta #VirtualWorlds

⇧