Simon Willison

2 months ago • •

Simon Willison
2 months ago • •

Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns?

in reply to Simon Willison

Mikołaj Hołysz

in reply to Simon Willison • 2 months ago • •

Short answer, just don't, preferrably provide both a HTML alternative and the LaTeX source.

PDF is essentially a vector graphics format, the ultimate end goal of PDF is making a document that prints and displays in exactly the same way for everybody, everything else is secondary. In HTML, the "recommended way to do things" is to essentially say "put a h1 here" and let the browser deal with it, possibly with some help from your style sheet along the way. In PDF, you essentially say "hey, here's some text, put it 2.7 inches from the left margin, 16 point, use font so and so". If you were so inclined, you could even re-order the characters in your font and use completely nonsensical codepoints, and things would still pretty much work visually.

LaTeX definitely uses shenanigans like that, Polish diacritics for example aren't expressed as a single character. Instead, the English letter is used, along with some extra markup that tells the renderer where to draw the acute accents on the page. Those acute accents aren't actually part of the character from an a11y perspective though, they're just random squiggles that the renderer happens to be told to draw. Some say that modern JS frameworks are crazy, I say that PDF is far, far crazier than that.

Speaking onf the two-column stuff in particular, I've seen it work and I've also seen it not work, this probably depends on where the text goes in the document, what it is rendered with, and probably on what software you're using and what their a11y implementation is like.

Yes, there's a way to mark PDFs up for accessibility properly, but very few people do it, LaTeX makes it far harder, there are a lot of other problems (think math), and support among reading programs is... spotty at best.

in reply to Mikołaj Hołysz

Jürgen Hubert

in reply to Mikołaj Hołysz • 1 month ago • •

@miki For the record, I am using #TeXLaTeX to create both the EPUB and the PDF versions of my books, although my requirements are nowhere near those of academic papers. But it should be doable.

#texlatex @Mikołaj Hołysz

in reply to Simon Willison

Simon Willison

in reply to Simon Willison • 2 months ago • •

As an experiment I downloaded the two column PDF of this new paper from Google research "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL" research.google/pubs/sql-has-p…

... and uploaded it to Google AI Studio and told Gemini Pro 1.5 "Convert this document to neatly styled semantic HTML" - and the results were pretty good! static.simonwillison.net/stati…

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

^{research.google}

in reply to Simon Willison

Matt Campbell

in reply to Simon Willison • 2 months ago • •

I'd be really worried about both hallucination and prompt injection when using an LLM for document conversion, as an accessibility tool for blind or other disabled users. But the tools I've tried on this paper do worse than what you got out of Gemini.

in reply to Matt Campbell

Simon Willison

in reply to Matt Campbell • 2 months ago • •

@matt yeah, me too. The responsible way to do this would be to use Gemini Pro to create the first draft, then spend significant time and effort checking and verifying it, iterating on the prompts, porting across the figures etc

@Matt Campbell

⇧

Simon Willison

Simon Willison 2 months ago • •

Mikołaj Hołysz

Jürgen Hubert

Simon Willison

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

Matt Campbell

Simon Willison

Simon Willison
2 months ago • •