André Polykanine

1 month ago

André Polykanine
1 month ago

I've just seen an absolutely disgusting article. I said "seen", not "read", because I'm blind and I could not read it.
for your reference, here's the first beautiful sentence of this article:
"ffGE ARrj XRejm XAj bZgui cB R EXZgl, Rmi mjji jrjg-DmygjREDmI XgRDmDmI iRXR XZ DlkgZrj."
I don't know the technology behind this BS, but screen readers see it as scrambled text, kind of encrypted or something like this. I guess it's some font juggling (ChatGPT supposed it's gliph scrambling, where random Unicode values are mapped to random letters — I'll trust her in this because I really don't care about the tech behind it), but if you have a tiny little grain of empathy, never ever ever do this, for goodness sake.
tilschuenemann.de/projects/sac…
#Accessibility #Blindness #Empathy #BadPractices #Web #Text

sacrificing accessibility for not getting web-scraped

^{tilschuenemann.de}

#Accessibility #web #blindness #empathy #text #badpractices

reshared this

in reply to André Polykanine

Kirill

in reply to André Polykanine 1 month ago

Ни хрена себе. Так могут звучать шифровки вояк например, что они по эфиру передают. Сам слышал. Непонятно что это вообще было, а главное зачем. Вряд ли SEO-шные приколы.

in reply to Kirill

André Polykanine

in reply to Kirill 1 month ago

@Yinshi Ну у этого идиота как раз сеошные. По заголовку видно.

@Kirill

in reply to André Polykanine

Kirill

in reply to André Polykanine 1 month ago

Заголовок-то я и не глянул. Вообще сеошникам бил бы по рукам хотя бы за то что они в кириллицу впихивают визуально-идентичные буквы на латинице. Борцы за уникальность текста, ёбаный насос.

in reply to Kirill

⏚ Antoine Chambert-Loir

in reply to Kirill 1 month ago

@Yinshi
Sorry to add a comment, but if you select the language of your message when typing it, the mastodon app will provide an automatic translation if requested.
Here, the app believes that your message is in English. (You can change that on new messages by clicking on the language box, and you can also modify the message afterwards if needed.)

@Kirill

in reply to ⏚ Antoine Chambert-Loir

André Polykanine

in reply to ⏚ Antoine Chambert-Loir 1 month ago

@antoinechambertloir @Yinshi Sorry, my client lack this possibility in replies, I need to implement it some day.

@⏚ Antoine Chambert-Loir @Kirill

Unknown parent

André Polykanine

Unknown parent 1 month ago

@juliemoynat aha, that's what I thought, thanks! I once saw one Ukrainian Internet store inadvertently using a font that mangled only numbers like that. so for example, you could see a smartphone costing fVjZ hryvnias. but that was not mean, just overdoing fanciness, they fixed it afterwards.

@Julie Moynat

in reply to André Polykanine

mousefet 🐭 🧀

in reply to André Polykanine 1 month ago

yes, it's an article about scrambling text using fonts in order to prevent LLM scraping.

It's a cute idea. I'd be more at ease if the author had described it as a neat toy that shouldn't be seriously used for anything, but they don't seem to be thinking that way, only describing the accessibility problems as a "drawback".

in reply to mousefet 🐭 🧀

André Polykanine

in reply to mousefet 🐭 🧀 1 month ago

@mossfet If it's a joke, my apologies, but in this case it should have been put to the beginning of the article in normal unscrambled text.

@mousefet 🐭 🧀

in reply to André Polykanine

Timothy Wynn

in reply to André Polykanine 1 month ago

Here is the original article decoded using the code they provided:
pastebin.com/fF4TTsLb

https://pastebin.com/fF4TTsLb

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

^Pastebin

in reply to André Polykanine

Patrick H. Lauke

in reply to André Polykanine 1 month ago

mastodon.social/@patrick_h_lau…

Patrick H. Lauke (@patrick_h_lauke@mastodon.social)

@h4ckernews i mean, an even better idea: just delete your site. done. #muppet #accessibility

^{Patrick H. Lauke (Mastodon)}

in reply to André Polykanine

D:\side\>

in reply to André Polykanine 1 month ago

hm-m, that sounds familiar… ah. Yeah. I've heard of this trick before.

Russia's Central Election Commission (ЦИК) pulled the same stunt with its website during the election, when that website published intermediate results, under the pretense of "preventing the system from crashing from unprecedented attacks"; I had to look up the date when exactly, seems to be 2021, which feels about right:
forbes.ru/tekhnologii/440479-p…

How was this supposed prevent any attacks and what *kinds* of attacks even is beyond me to this day. LLMs did not exist back then. Or in the very least weren't anywhere near as capable.

Памфилова объяснила кодирование результатов выборов на сайте ЦИК

Центризбирком закодировал результаты выборов на своем сайте, чтобы «система не вылетела» из-за обрушившихся на нее «беспрецедентных атак», заявила Элла Памфилова. Ранее эксперты обратили внимание, что копировать данные с сайта ЦИК стало невозможно, п

^{Тимур Батыров (Forbes.ru)}

This entry was edited (1 month ago)

in reply to D:\side\>

André Polykanine

in reply to D:\side\> 1 month ago

@dside Cooool! but Russia doesn't care about accessibility so it's expected…

@D:\side\>

in reply to André Polykanine

D:\side\>

in reply to André Polykanine 1 month ago

or transparency of elections, evidently! Wouldn't want election monitors' jobs to be too easy… or anyone else's, for that matter 😑

in reply to André Polykanine

Manny Dexter

in reply to André Polykanine 1 month ago

In #Safari on an #iPhone going to the link I see... something that resembles a blog, but tapping the "reader view" I get the following, proving the point of the headline. If we can't read it, an #AI #Bot can't scrape it.

<begin text>
# /// script # requires-python = ">=3.12" # dependencies = [ # "bs4", # "fonttools", # ] # /// import random import string from typing import Dict from bs4 import BeautifulSoup from fontTools.ttLib import TTFont def scramble_font(seed: int = 1234) -> Dict[str, str]: random.seed(seed) font = TTFont("src/fonts/Mulish-Regular.ttf") # Pick a Unicode cmap (Windows BMP preferred) cmap_table = None for table in font["cmap"].tables: if table.isUnicode() and table.platformID == 3: break cmap_table = table cmap = cmap_table.cmap # Filter codepoints for a-z and A-Z codepoints = [cp for cp in cmap.keys() if chr(cp) in string.ascii_letters] glyphs = [cmap[cp] for cp in codepoints] shuffled_glyphs = glyphs[:] random.shuffle(shuffled_glyphs) # Create new mapping scrambled_cmap = dict(zip(codepoints, shuffled_glyphs, strict=True)) cmap_table.cmap = scrambled_cmap translation_mapping = {} for original_cp, original_glyph in zip(codepoints, glyphs, strict=True): for new_cp, new_glyph in scrambled_cmap.items(): if new_glyph == original_glyph: translation_mapping[chr(original_cp)] = chr(new_cp) break font.save("src/fonts/Mulish-Regular-scrambled.ttf") return translation_mapping def scramble_html( input: str, translation_mapping: Dict[str, str], ) -> str: def apply_cipher(text): repl = "".join(translation_mapping.get(c, c) for c in text) return repl # Read HTML file soup = BeautifulSoup(input, "html.parser") # Find all main elements main_elements = soup.find_all("main") skip_tags = {"code", "h1", "h2"} # Apply cipher only to text within main for main in main_elements: for elem in main.find_all(string=True): if elem.parent.name not in skip_tags: elem.replace_with(apply_cipher(elem)) return str(soup)
<end text>

#Bot #AI #iphone #safari

Unknown parent

André Polykanine

Unknown parent 1 month ago

@christianrickert where could the industry be moving towards, sorry? Assistive technologies like screen readers take information straight from the DOM, so anything like this breaks accessibility completely. Also, I received replies from people using specific fonts (dyslexic-friendly and specific for some low vision needs) for whom it also reads gibberish. No, this is absolutely bad practice.

@Christian Rickert

in reply to André Polykanine

infinite love ⴳ

in reply to André Polykanine 4 weeks ago

is this supposed to be readable by so-called humans? the text is scrambled for me in firefox. i can't even tell what the intended effect is supposed to be -- are readers supposed to run some kind of script to undo the caesar cipher? is there a key somewhere? i don't get it.

in reply to infinite love ⴳ

André Polykanine

in reply to infinite love ⴳ 3 weeks ago

@trwnh Yes, under some conditions it's readable by humans with working eyeballs, but you are not the first to report that it is not readable in Firefox. I spread it wide for everyone not to even think about doing such nonsense.

@infinite love ⴳ

in reply to infinite love ⴳ

infinite love ⴳ

in reply to infinite love ⴳ 4 weeks ago (Received 3 weeks ago)

perhaps a better title would be, "sacrificing your audience and still getting scraped anyway". as far as i can tell, this doesn't stop scrapers from continuing to scrape the useless information.

⇧

André Polykanine 1 month ago • •

André Polykanine
1 month ago