Having ongoing discussions about URL parsing differences as a basis for a #curl security vulnerability report made me check when I wrote my "my URL isn't your URL" blog post.

*Nine years ago*. And we have not made a single move towards a solution in all this time.

daniel.haxx.se/blog/2016/05/11…

#curl
in reply to daniel:// stenberg://

I've said it before. The WHATWG won't fix this because they are happy with a spec that works for them and they don't care about URLs for the rest of the world.

The IETF has given up the topic, partly I think because WHATWG already has stated that they run their own race and making a unified spec that works would be next to impossible.

This entry was edited (1 day ago)
in reply to daniel:// stenberg://

My perception of WHATWG (as a traumatized but recovering browser engineer) is that they aren't / weren't issuing specifications, they are / were issuing documentation (for existing behaviors).

The distinction is important, because, when implementation and specifications disagree, the implementation is wrong; however, when implementation and documentation disagree, the documentation is wrong.

in reply to Jean-Baptiste "JBQ" Quéru

@jbqueru I've been told in discussions with WHATWG people that their specs (they call them specs) are both: documents how browsers work and they say how browsers should work. When I've pointed out discrepancies in the past (browsers that didn't follow their spec), the answer have usually been that it will be fixed in a future browser release and that the spec is right.
in reply to daniel:// stenberg://

Taking a stab in the dark with some EBNF:

url ::= protocol '://' [ name [ ':' name ] '@' ] [ server ] [ '/' [ path ] [ '?' arg { '&' arg } ] ]
protocol ::= name
server ::= name { '.' name }
path ::= name { '/' name }
arg ::= name [ '=' name ]

name = ([^:/@?&=.%]|%[0-9a-f][0-9a-f])+

I understand that whatwg makes that :// not a required part ... but this kinda matches what my naive brain thinks of as a URL and how to read it.

This entry was edited (1 day ago)
in reply to Peter Bindels

@dascandy @suzannealdrich @jbqueru add port numbers, IP addresses (v4, v6, zone id?), add options? How about IDN?

I see you used two slashes, but URIs actually don't have that. Many have no slashes at all.

But sure, everything *could* be written down in a spec. Getting the world to agree with that spec though: not so easy.

in reply to Peter Bindels

@dascandy @suzannealdrich @jbqueru I'm sorry but already while I worked at Mozilla and Firefox my colleagues were in team WHATWG and I'm pretty sure they still are.

And doing a standard for URLs without having the super powers involved and interested is obviously not going to make a successful spec

in reply to Henri Sivonen

@hsivonen @jimfuller the few times I tried to talk to whatwg I ended up against a wall (understandably so because all supporters are there and no so many doubters like myself) so I gave up. I agree and understand that's not providing technical details, but I have never felt even an inch of willingness to actually discuss the details.

The '@' in the userinfo is probably a specific detail WHATWG changed from RFC3986.

in reply to Henri Sivonen

@hsivonen @jimfuller I'm also concerned about making a parser for (almost) general URLs like curl's as lenient as the WHATWG spec is, as then we would accept countless URLs that today would just be rejected. Primarily non-http(s):// ones I mean, even if we presume that people in general are fine with the WHATWG anything-goes style for http(s).