I've just seen an absolutely disgusting article. I said "seen", not "read", because I'm blind and I could not read it.
for your reference, here's the first beautiful sentence of this article:
"ffGE ARrj XRejm XAj bZgui cB R EXZgl, Rmi mjji jrjg-DmygjREDmI XgRDmDmI iRXR XZ DlkgZrj."
I don't know the technology behind this BS, but screen readers see it as scrambled text, kind of encrypted or something like this. I guess it's some font juggling (ChatGPT supposed it's gliph scrambling, where random Unicode values are mapped to random letters — I'll trust her in this because I really don't care about the tech behind it), but if you have a tiny little grain of empathy, never ever ever do this, for goodness sake.
tilschuenemann.de/projects/sac…
#Accessibility #Blindness #Empathy #BadPractices #Web #Text
for your reference, here's the first beautiful sentence of this article:
"ffGE ARrj XRejm XAj bZgui cB R EXZgl, Rmi mjji jrjg-DmygjREDmI XgRDmDmI iRXR XZ DlkgZrj."
I don't know the technology behind this BS, but screen readers see it as scrambled text, kind of encrypted or something like this. I guess it's some font juggling (ChatGPT supposed it's gliph scrambling, where random Unicode values are mapped to random letters — I'll trust her in this because I really don't care about the tech behind it), but if you have a tiny little grain of empathy, never ever ever do this, for goodness sake.
tilschuenemann.de/projects/sac…
#Accessibility #Blindness #Empathy #BadPractices #Web #Text
reshared this
Kirill
in reply to André Polykanine • • •André Polykanine
in reply to Kirill • • •Kirill
in reply to André Polykanine • • •⏚ Antoine Chambert-Loir
in reply to Kirill • • •Sorry to add a comment, but if you select the language of your message when typing it, the mastodon app will provide an automatic translation if requested.
Here, the app believes that your message is in English. (You can change that on new messages by clicking on the language box, and you can also modify the message afterwards if needed.)
André Polykanine
in reply to ⏚ Antoine Chambert-Loir • • •Julie Moynat
in reply to André Polykanine • • •this is an encryption algorithm that replaces characters by others, using a dedicated font. The article explains that the idea is to prevent LLM to scrape the content.
Apparently, the person behind this website is an ableist person because they know they're sacrifying accessibility and that it doesn't work with screen readers.
That's real shit.
André Polykanine
in reply to Julie Moynat • • •Julie Moynat
in reply to André Polykanine • • •ChasMusic (he/him)
in reply to André Polykanine • • •I'm viewing the article in Firefox as a sighted person and it's displaying as gibberish to me.
Even if it did work, if it's doing what I'm guessing it's trying to do it would be thoroughly unpleasant to read and I would leave the site immediately.
But agreed, it is not okay to intentionally sacrifice accessibility.
Mossfet
in reply to André Polykanine • • •yes, it's an article about scrambling text using fonts in order to prevent LLM scraping.
It's a cute idea. I'd be more at ease if the author had described it as a neat toy that shouldn't be seriously used for anything, but they don't seem to be thinking that way, only describing the accessibility problems as a "drawback".
André Polykanine
in reply to Mossfet • • •Timothy Wynn
in reply to André Polykanine • • •pastebin.com/fF4TTsLb
https://pastebin.com/fF4TTsLb
PastebinChasMusic (he/him)
in reply to André Polykanine • • •From the article "This works as long as scrapers don't use OCR for handling edge cases like this, but I don't think it would be feasible."
Lol, the article says it breaks copy and paste. I took a screen print, opened the image in macOS Preview, and selected the text.
A tech workaround for what this tech is doing is easy. Firefox and Acrobat both already do OCR. A scraper could do so as well.
Patrick H. Lauke
in reply to André Polykanine • • •Patrick H. Lauke (@patrick_h_lauke@mastodon.social)
Patrick H. Lauke (Mastodon)Jan Wildeboer 😷
in reply to André Polykanine • • •Christian Rickert
in reply to André Polykanine • • •The title is "sacrificing accessibility for not getting web-scraped".
I do not agree with this approach and it looks more like a proof of concept:
Is this where the industry could be moving towards? Absolutely.
The technology behind this (from the HTML source code) is a substitution cipher with a fixed alphabet shift for each letter.
en.wikipedia.org/wiki/Caesar_c…
simple and widely known encryption technique
Contributors to Wikimedia projects (Wikimedia Foundation, Inc.)D:\side\>
in reply to André Polykanine • • •hm-m, that sounds familiar… ah. Yeah. I've heard of this trick before.
Russia's Central Election Commission (ЦИК) pulled the same stunt with its website during the election, when that website published intermediate results, under the pretense of "preventing the system from crashing from unprecedented attacks"; I had to look up the date when exactly, seems to be 2021, which feels about right:
forbes.ru/tekhnologii/440479-p…
How was this supposed prevent any attacks and what *kinds* of attacks even is beyond me to this day. LLMs did not exist back then. Or in the very least weren't anywhere near as capable.
Памфилова объяснила кодирование результатов выборов на сайте ЦИК
Тимур Батыров (Forbes.ru)Manny Dexter
in reply to André Polykanine • • •In #Safari on an #iPhone going to the link I see... something that resembles a blog, but tapping the "reader view" I get the following, proving the point of the headline. If we can't read it, an #AI #Bot can't scrape it.
<begin text>
# /// script # requires-python = ">=3.12" # dependencies = [ # "bs4", # "fonttools", # ] # /// import random import string from typing import Dict from bs4 import BeautifulSoup from fontTools.ttLib import TTFont def scramble_font(seed: int = 1234) -> Dict[str, str]: random.seed(seed) font = TTFont("src/fonts/Mulish-Regular.ttf") # Pick a Unicode cmap (Windows BMP preferred) cmap_table = None for table in font["cmap"].tables: if table.isUnicode() and table.platformID == 3: break cmap_table = table cmap = cmap_table.cmap # Filter codepoints for a-z and A-Z codepoints = [cp for cp in cmap.keys() if chr(cp) in string.ascii_letters] glyphs = [cmap[cp] for cp in codepoints] shuffled_glyphs = glyphs[:] random.shuffle(shuffled_glyphs) # Create new mapping scrambled_cmap = dict(zip(codepoints, shuffled_glyphs, strict=True)) cmap_table.cmap = scrambled_cmap translation_mapping = {} for original_cp, original_glyph in zip(codepoints, glyphs, strict=True): for new_cp, new_glyph in scrambled_cmap.items(): if new_glyph == original_glyph: translation_mapping[chr(original_cp)] = chr(new_cp) break font.save("src/fonts/Mulish-Regular-scrambled.ttf") return translation_mapping def scramble_html( input: str, translation_mapping: Dict[str, str], ) -> str: def apply_cipher(text): repl = "".join(translation_mapping.get(c, c) for c in text) return repl # Read HTML file soup = BeautifulSoup(input, "html.parser") # Find all main elements main_elements = soup.find_all("main") skip_tags = {"code", "h1", "h2"} # Apply cipher only to text within main for main in main_elements: for elem in main.find_all(string=True): if elem.parent.name not in skip_tags: elem.replace_with(apply_cipher(elem)) return str(soup)
<end text>