in reply to daniel:// stenberg://

IDN-based phishing is the reason I turned of punycode translation in Firefox. So, whenever I see a URL beginning with "xn--" I know this is most likely a phishing attempt.

And if one knows how domain names work, one also knows there mostly is a "replacement" for special non-ASCII characters (for example in German we replace "ä" with "ae" if necessary).

in reply to daniel:// stenberg://

Nice read.

I see the use of curated library (white-list regexps, with some heuristics, perhaps context-aware) that can be shared among tools such as browsers or command-line tools, so that they refuse with a warning of a suspicious idn is used.

But I don't see the need to ban any 📯 emojis there.

If needed, the user shall of course be able to bypass such warning, but no short should be tolerated (--bypass-idn-check, not just -b).

in reply to daniel:// stenberg://

Yes. I have tried few homographic letters mentioned. slash and fragment are refused by idn2 command, but homographic ? is not refused by tested libidn2. But it seems safe to scan hostname for non-ascii letters. If they appear, normalize displayed name by converting to punnycode and then decoding back again. Can help ridding of strange characters while keeping name presentable in foreign alphabets too. I thought iterpunction were generally forbidden in names.
in reply to daniel:// stenberg://

The really sad part? All this complexity, all this surface area for nasty bugs, all these opportunities for social engineering.. and they don't even work for their intended purpose!! Earlier this year, I needed a new domain, with my last name in it — which contains an ø. I got one version with the "ø" and one with "o" instead just in case IDNs caused issues.

I've learned that virtually nothing supports IDNs. I've stopped using the "ø" version because "xn--blah-54a" showed up everywhere.

in reply to daniel:// stenberg://

IDNs use IDNA not just punycode. While punycode is an algorithm that can encode any Unicode character into ASCII, IDNA adds further rules and hence not all characters can end up in a domain name. Plus registries add their rules. And ICANN for gTLDs (see SAC095 at itp.cdn.icann.org/en/files/sec…). So lots of attacks can't work as domains can't exist for real. Then, yes, you have the odd ones, allowing lots of things, like `.ws` TLD. I recommend this past presentation: i.blackhat.com/USA-19/Thursday…
in reply to daniel:// stenberg://

For once, you write wrong things. Just one: the "crazy" example you show is disallowed since IDN does not allow many of these characters: afnic.fr/en/observatory-and-re…

#IDN #Unicode