Having ongoing discussions about URL parsing differences as a basis for a #curl security vulnerability report made me check when I wrote my "my URL isn't your URL" blog post.
*Nine years ago*. And we have not made a single move towards a solution in all this time.
daniel.haxx.se/blog/2016/05/11…
My URL isn’t your URL
When I started the precursor to the curl project, httpget, back in 1996, I wrote my first URL parser. Back then, the universal address was still called URL: Uniform Resource Locators. That spec was published by the IETF in 1994.daniel.haxx.se
daniel:// stenberg://
in reply to daniel:// stenberg:// • • •I've said it before. The WHATWG won't fix this because they are happy with a spec that works for them and they don't care about URLs for the rest of the world.
The IETF has given up the topic, partly I think because WHATWG already has stated that they run their own race and making a unified spec that works would be next to impossible.
Jean-Baptiste "JBQ" Quéru
in reply to daniel:// stenberg:// • • •My perception of WHATWG (as a traumatized but recovering browser engineer) is that they aren't / weren't issuing specifications, they are / were issuing documentation (for existing behaviors).
The distinction is important, because, when implementation and specifications disagree, the implementation is wrong; however, when implementation and documentation disagree, the documentation is wrong.
daniel:// stenberg://
in reply to Jean-Baptiste "JBQ" Quéru • • •jelte
in reply to daniel:// stenberg:// • • •Jean-Baptiste "JBQ" Quéru
in reply to daniel:// stenberg:// • • •That's been my experience as well.
I personally disagree with the approach, because the so-called "spec" ends up ossifying the behavior of a prototype, instead of learning from it and creating a cleaner spec. WHATWG doesn't learn from mistakes or fix them, it propagates and perpetuates mistakes.
Suzanne Aldrich (she/her)
in reply to Jean-Baptiste "JBQ" Quéru • • •daniel:// stenberg://
in reply to Suzanne Aldrich (she/her) • • •Peter Bindels
in reply to daniel:// stenberg:// • • •@suzannealdrich @jbqueru
I know I'm coming at URL parsing from a clean brain, but can't we specify the base things in an actually parseable way? Surely it's not *that* hard to parse a URL?
daniel:// stenberg://
in reply to Peter Bindels • • •Peter Bindels
in reply to daniel:// stenberg:// • • •Taking a stab in the dark with some EBNF:
url ::= protocol '://' [ name [ ':' name ] '@' ] [ server ] [ '/' [ path ] [ '?' arg { '&' arg } ] ]
protocol ::= name
server ::= name { '.' name }
path ::= name { '/' name }
arg ::= name [ '=' name ]
name = ([^:/@?&=.%]|%[0-9a-f][0-9a-f])+
I understand that whatwg makes that :// not a required part ... but this kinda matches what my naive brain thinks of as a URL and how to read it.
daniel:// stenberg://
in reply to Peter Bindels • • •@dascandy @suzannealdrich @jbqueru add port numbers, IP addresses (v4, v6, zone id?), add options? How about IDN?
I see you used two slashes, but URIs actually don't have that. Many have no slashes at all.
But sure, everything *could* be written down in a spec. Getting the world to agree with that spec though: not so easy.
Peter Bindels
in reply to daniel:// stenberg:// • • •@suzannealdrich @jbqueru
If Curl and Firefox start by advocating "this is the spec for URIs, and other things might work or might not" that would already be a good start for the free internet.
Then we only have Chromium to convince, and the rest follows implicitly.
daniel:// stenberg://
in reply to Peter Bindels • • •@dascandy @suzannealdrich @jbqueru I'm sorry but already while I worked at Mozilla and Firefox my colleagues were in team WHATWG and I'm pretty sure they still are.
And doing a standard for URLs without having the super powers involved and interested is obviously not going to make a successful spec
Peter Bindels
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Peter Bindels • • •Erin 💽✨
in reply to daniel:// stenberg:// • • •also I note from your post that you find the WHATWG URL spec hard to read but I have to say that in general I find all WHATWG specifications nigh-unreadable
English but precisely defined pseudocode has to be one of the worst possible ways to write a specification
daniel:// stenberg://
in reply to Erin 💽✨ • • •Jim Fuller
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Jim Fuller • • •Jim Fuller
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Jim Fuller • • •Jim Fuller
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Jim Fuller • • •Jim Fuller
in reply to daniel:// stenberg:// • • •Henri Sivonen
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Henri Sivonen • • •Henri Sivonen
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Henri Sivonen • • •@hsivonen @jimfuller the few times I tried to talk to whatwg I ended up against a wall (understandably so because all supporters are there and no so many doubters like myself) so I gave up. I agree and understand that's not providing technical details, but I have never felt even an inch of willingness to actually discuss the details.
The '@' in the userinfo is probably a specific detail WHATWG changed from RFC3986.
daniel:// stenberg://
in reply to Henri Sivonen • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •Pandorunner
in reply to daniel:// stenberg:// • • •No I personally think this the least important thing in your post, I don't see how this is practical.
daniel:// stenberg://
in reply to Pandorunner • • •