daniel:// stenberg://

4 months ago

daniel:// stenberg://
4 months ago

Having ongoing discussions about URL parsing differences as a basis for a #curl security vulnerability report made me check when I wrote my "my URL isn't your URL" blog post.

*Nine years ago*. And we have not made a single move towards a solution in all this time.

daniel.haxx.se/blog/2016/05/11…

My URL isn’t your URL

When I started the precursor to the curl project, httpget, back in 1996, I wrote my first URL parser. Back then, the universal address was still called URL: Uniform Resource Locators. That spec was published by the IETF in 1994.

^{daniel.haxx.se}

#curl

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 4 months ago

I've said it before. The WHATWG won't fix this because they are happy with a spec that works for them and they don't care about URLs for the rest of the world.

The IETF has given up the topic, partly I think because WHATWG already has stated that they run their own race and making a unified spec that works would be next to impossible.

This entry was edited (4 months ago)

in reply to daniel:// stenberg://

Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg:// 4 months ago

My perception of WHATWG (as a traumatized but recovering browser engineer) is that they aren't / weren't issuing specifications, they are / were issuing documentation (for existing behaviors).

The distinction is important, because, when implementation and specifications disagree, the implementation is wrong; however, when implementation and documentation disagree, the documentation is wrong.

in reply to Jean-Baptiste "JBQ" Quéru

daniel:// stenberg://

in reply to Jean-Baptiste "JBQ" Quéru 4 months ago

@jbqueru I've been told in discussions with WHATWG people that their specs (they call them specs) are both: documents how browsers work and they say how browsers should work. When I've pointed out discrepancies in the past (browsers that didn't follow their spec), the answer have usually been that it will be fixed in a future browser release and that the spec is right.

@Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

jelte

in reply to daniel:// stenberg:// 4 months ago

@jbqueru So it's neither...

@Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

Jim Fuller

in reply to daniel:// stenberg:// 4 months ago

draw a line between all existing implementations and 'call that the spec' ... was always a race to the bottom. I think the only way forward is a new addressing scheme. Yes I just said that.

in reply to Jim Fuller

daniel:// stenberg://

in reply to Jim Fuller 4 months ago

@jimfuller and then have specific and separate rules for that single scheme?

@Jim Fuller

in reply to daniel:// stenberg://

Jim Fuller

in reply to daniel:// stenberg:// 4 months ago

yes no attempt at full backwards compat - declare 'bankruptcy' ... for most cases a url should translate to newnewurl scheme

in reply to Jim Fuller

daniel:// stenberg://

in reply to Jim Fuller 4 months ago

@jimfuller so a super prefix and then sub-prefixes for the different protocols? It quickly gets messy...

@Jim Fuller

in reply to daniel:// stenberg://

Jim Fuller

in reply to daniel:// stenberg:// 4 months ago

it sure does ... or we could just register a new URI scheme - curl:// which defaults to http3 + newnewurl ;)

This entry was edited (4 months ago)

in reply to Jim Fuller

daniel:// stenberg://

in reply to Jim Fuller 4 months ago

@jimfuller one of WHATWG's key mistakes is thinking that URLs are for the web only when in reality they cover countless protocols and solutions

@Jim Fuller

in reply to daniel:// stenberg://

Jim Fuller

in reply to daniel:// stenberg:// 4 months ago

maybe a universal multi planet wide addressing scheme is too hard ... nah

in reply to daniel:// stenberg://

Henri Sivonen

in reply to daniel:// stenberg:// 4 months ago

@jimfuller What kind of compatibility-driven needs to deviate from WHATWG URL does curl have these days?

@Jim Fuller

in reply to Henri Sivonen

daniel:// stenberg://

in reply to Henri Sivonen 4 months ago

@hsivonen @jimfuller I'm not sure. I have not monitored where the WHATWG spec has gone the last few years. Generally, we want to support the same URLs we supported back in the late 1990s.

@Henri Sivonen @Jim Fuller

in reply to daniel:// stenberg://

Henri Sivonen

in reply to daniel:// stenberg:// 4 months ago

@jimfuller The complaint would be technically stronger and potentially actionable, if it came with concrete points that curl can’t implement the spec without breaking compatibility constraint X with Y.

@Jim Fuller

in reply to Henri Sivonen

daniel:// stenberg://

in reply to Henri Sivonen 4 months ago

@hsivonen @jimfuller the few times I tried to talk to whatwg I ended up against a wall (understandably so because all supporters are there and no so many doubters like myself) so I gave up. I agree and understand that's not providing technical details, but I have never felt even an inch of willingness to actually discuss the details.

The '@' in the userinfo is probably a specific detail WHATWG changed from RFC3986.

@Henri Sivonen @Jim Fuller

in reply to Henri Sivonen

daniel:// stenberg://

in reply to Henri Sivonen 4 months ago

@hsivonen @jimfuller I'm also concerned about making a parser for (almost) general URLs like curl's as lenient as the WHATWG spec is, as then we would accept countless URLs that today would just be rejected. Primarily non-http(s):// ones I mean, even if we presume that people in general are fine with the WHATWG anything-goes style for http(s).

@Henri Sivonen @Jim Fuller

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 4 months ago

@hsivonen @jimfuller I'm still open for making a whatwg flag in the code at some point to experiment closer with what exactly it would mean.

@Henri Sivonen @Jim Fuller

Unknown parent

daniel:// stenberg://

Unknown parent 4 months ago

@suzannealdrich @jbqueru and its impossible to be perfectly compliant because then suddenly the "spec" changes and you're not anymore. Not a way to build things like URLs that are supposed to in theory be possible to outlive us all.

@Suzanne Aldrich (she/her) @Jean-Baptiste "JBQ" Quéru

Unknown parent

Suzanne Aldrich (she/her)

Unknown parent 4 months ago

@jbqueru well then. that's not engineering. that's hoarding behavior.

@Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

Peter Bindels

in reply to daniel:// stenberg:// 4 months ago

@suzannealdrich @jbqueru

I know I'm coming at URL parsing from a clean brain, but can't we specify the base things in an actually parseable way? Surely it's not *that* hard to parse a URL?

@Suzanne Aldrich (she/her) @Jean-Baptiste "JBQ" Quéru

in reply to Peter Bindels

daniel:// stenberg://

in reply to Peter Bindels 4 months ago

@dascandy @suzannealdrich @jbqueru we could theoretically do that, sure.

@Suzanne Aldrich (she/her) @Peter Bindels @Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

Peter Bindels

in reply to daniel:// stenberg:// 4 months ago

Taking a stab in the dark with some EBNF:

url ::= protocol '://' [ name [ ':' name ] '@' ] [ server ] [ '/' [ path ] [ '?' arg { '&' arg } ] ]
protocol ::= name
server ::= name { '.' name }
path ::= name { '/' name }
arg ::= name [ '=' name ]

name = ([^:/@?&=.%]|%[0-9a-f][0-9a-f])+

I understand that whatwg makes that :// not a required part ... but this kinda matches what my naive brain thinks of as a URL and how to read it.

This entry was edited (4 months ago)

in reply to Peter Bindels

daniel:// stenberg://

in reply to Peter Bindels 4 months ago

@dascandy @suzannealdrich @jbqueru add port numbers, IP addresses (v4, v6, zone id?), add options? How about IDN?

I see you used two slashes, but URIs actually don't have that. Many have no slashes at all.

But sure, everything *could* be written down in a spec. Getting the world to agree with that spec though: not so easy.

@Suzanne Aldrich (she/her) @Peter Bindels @Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

Peter Bindels

in reply to daniel:// stenberg:// 4 months ago

@suzannealdrich @jbqueru

If Curl and Firefox start by advocating "this is the spec for URIs, and other things might work or might not" that would already be a good start for the free internet.

Then we only have Chromium to convince, and the rest follows implicitly.

@Suzanne Aldrich (she/her) @Jean-Baptiste "JBQ" Quéru

in reply to Peter Bindels

daniel:// stenberg://

in reply to Peter Bindels 4 months ago

@dascandy @suzannealdrich @jbqueru I'm sorry but already while I worked at Mozilla and Firefox my colleagues were in team WHATWG and I'm pretty sure they still are.

And doing a standard for URLs without having the super powers involved and interested is obviously not going to make a successful spec

@Suzanne Aldrich (she/her) @Peter Bindels @Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

Peter Bindels

in reply to daniel:// stenberg:// 4 months ago

@suzannealdrich @jbqueru dang I hate when politics get in the way of working tech.

@Suzanne Aldrich (she/her) @Jean-Baptiste "JBQ" Quéru

in reply to Peter Bindels

daniel:// stenberg://

in reply to Peter Bindels 4 months ago

@dascandy @suzannealdrich @jbqueru this is 100% politics in the way

@Suzanne Aldrich (she/her) @Peter Bindels @Jean-Baptiste "JBQ" Quéru

Unknown parent

daniel:// stenberg://

Unknown parent 4 months ago

@erincandescent @jbqueru @suzannealdrich oh yes. Having to read through pseudo code in order to figure out a syntax or format is not to my liking

@Suzanne Aldrich (she/her) @Erin 💽✨ @ 39C3 @Jean-Baptiste "JBQ" Quéru

in reply to daniel:// stenberg://

Erin 💽✨ @ 39C3

in reply to daniel:// stenberg:// 4 months ago

also I note from your post that you find the WHATWG URL spec hard to read but I have to say that in general I find all WHATWG specifications nigh-unreadable

English but precisely defined pseudocode has to be one of the worst possible ways to write a specification

in reply to daniel:// stenberg://

Pandorunner

in reply to daniel:// stenberg:// 4 months ago

"Making URL's not ASCII Is important"
No I personally think this the least important thing in your post, I don't see how this is practical.

in reply to Pandorunner

daniel:// stenberg://

in reply to Pandorunner 4 months ago

@Pandorunner they already are "non-ASCII" for those who want

@Pandorunner

⇧

daniel:// stenberg:// 4 months ago • •

daniel:// stenberg://
4 months ago