I've just seen an absolutely disgusting article. I said "seen", not "read", because I'm blind and I could not read it.
for your reference, here's the first beautiful sentence of this article:
"ffGE ARrj XRejm XAj bZgui cB R EXZgl, Rmi mjji jrjg-DmygjREDmI XgRDmDmI iRXR XZ DlkgZrj."
I don't know the technology behind this BS, but screen readers see it as scrambled text, kind of encrypted or something like this. I guess it's some font juggling (ChatGPT supposed it's gliph scrambling, where random Unicode values are mapped to random letters — I'll trust her in this because I really don't care about the tech behind it), but if you have a tiny little grain of empathy, never ever ever do this, for goodness sake.
tilschuenemann.de/projects/sac…
#Accessibility #Blindness #Empathy #BadPractices #Web #Text

reshared this

in reply to André Polykanine

I'm viewing the article in Firefox as a sighted person and it's displaying as gibberish to me.

Even if it did work, if it's doing what I'm guessing it's trying to do it would be thoroughly unpleasant to read and I would leave the site immediately.

But agreed, it is not okay to intentionally sacrifice accessibility.

This entry was edited (13 hours ago)
in reply to André Polykanine

From the article "This works as long as scrapers don't use OCR for handling edge cases like this, but I don't think it would be feasible."

Lol, the article says it breaks copy and paste. I took a screen print, opened the image in macOS Preview, and selected the text.

A tech workaround for what this tech is doing is easy. Firefox and Acrobat both already do OCR. A scraper could do so as well.

in reply to André Polykanine

The title is "sacrificing accessibility for not getting web-scraped".

I do not agree with this approach and it looks more like a proof of concept:
Is this where the industry could be moving towards? Absolutely.

The technology behind this (from the HTML source code) is a substitution cipher with a fixed alphabet shift for each letter.

en.wikipedia.org/wiki/Caesar_c…

in reply to André Polykanine

hm-m, that sounds familiar… ah. Yeah. I've heard of this trick before.

Russia's Central Election Commission (ЦИК) pulled the same stunt with its website during the election, when that website published intermediate results, under the pretense of "preventing the system from crashing from unprecedented attacks"; I had to look up the date when exactly, seems to be 2021, which feels about right:
forbes.ru/tekhnologii/440479-p…

How was this supposed prevent any attacks and what *kinds* of attacks even is beyond me to this day. LLMs did not exist back then. Or in the very least weren't anywhere near as capable.

This entry was edited (10 hours ago)
in reply to André Polykanine

In #Safari on an #iPhone going to the link I see... something that resembles a blog, but tapping the "reader view" I get the following, proving the point of the headline. If we can't read it, an #AI #Bot can't scrape it.

<begin text>
# /// script # requires-python = ">=3.12" # dependencies = [ # "bs4", # "fonttools", # ] # /// import random import string from typing import Dict from bs4 import BeautifulSoup from fontTools.ttLib import TTFont def scramble_font(seed: int = 1234) -> Dict[str, str]: random.seed(seed) font = TTFont("src/fonts/Mulish-Regular.ttf") # Pick a Unicode cmap (Windows BMP preferred) cmap_table = None for table in font["cmap"].tables: if table.isUnicode() and table.platformID == 3: break cmap_table = table cmap = cmap_table.cmap # Filter codepoints for a-z and A-Z codepoints = [cp for cp in cmap.keys() if chr(cp) in string.ascii_letters] glyphs = [cmap[cp] for cp in codepoints] shuffled_glyphs = glyphs[:] random.shuffle(shuffled_glyphs) # Create new mapping scrambled_cmap = dict(zip(codepoints, shuffled_glyphs, strict=True)) cmap_table.cmap = scrambled_cmap translation_mapping = {} for original_cp, original_glyph in zip(codepoints, glyphs, strict=True): for new_cp, new_glyph in scrambled_cmap.items(): if new_glyph == original_glyph: translation_mapping[chr(original_cp)] = chr(new_cp) break font.save("src/fonts/Mulish-Regular-scrambled.ttf") return translation_mapping def scramble_html( input: str, translation_mapping: Dict[str, str], ) -> str: def apply_cipher(text): repl = "".join(translation_mapping.get(c, c) for c in text) return repl # Read HTML file soup = BeautifulSoup(input, "html.parser") # Find all main elements main_elements = soup.find_all("main") skip_tags = {"code", "h1", "h2"} # Apply cipher only to text within main for main in main_elements: for elem in main.find_all(string=True): if elem.parent.name not in skip_tags: elem.replace_with(apply_cipher(elem)) return str(soup)
<end text>