people are also often obsessed by C vs non-C vulnerabilities, and in #curl the share of mistakes that are related to the programming language keep shrinking (just over 40% now)
This is WAY lower than what is commonly reported as a the general percentage. (60-70% is commonly repeated)
If it was a change in methodology, there would be noticeable changes around moments when the measurement method changed. But maybe that's what the data is showing and it's just subtle.
@lucasgonze in curl, I think we have gradually improved many methods and procedures over time, so there would not be any clear point in time that suddenly happened.
@lucasgonze Would it have to do with testing methodologies? I see a couple times in which the number of C mistakes jumps -- 2001, 2014, 2017. Why? Did testing improve and found issues, which would cause an immediate jump and then a reduced slope?
@huitema @lucasgonze I think fuzzing, test improvements and inviduals' poking to be driving factors.
Note that this graph shows when the error was *introduced* not when fixed. They were generally found several thousand days after being added. On average.
@lucasgonze Have you integrated sanitizers in your test suite? I found that a combination of static analyzers and ASAN/UBSAN does clean the code a lot. Especially if you put it in the CI tests.
The statement that the share of C language related mistakes are way lower in curl than in general sounds like trying to find a humble sounding for „people committing to curl are better in using C than programmers in general“ 😁
How do you define when a mistake is a C mistake versus a non-C mistake? For instance, a language with managed memory might make certain mistakes impossible, but they are conceptual mistakes about memory, not about how the C language works.
I'd suggest including both integer wrapping and also surprise implicit conversions, and particularly signed-unsigned interaction -- I'm not sure if curl's had any issues with those, but I notice that OpenBSD just did.
(this analysis is really cool and I'm eagerly following your work.)
If we're implicitly comparing against Rust, should integer wrapping really be considered a C problem? Rust debug builds will check for overflow, but release builds don't, and if we're concerned about problems triggered at runtime by unexpected inputs I'd think release behavior is probably the more relevant of the two...
@zev @cliffle @airspeedswift fair reasoning. I was not aware. But I'm more saying "flaws that a memory-safe language might have caught" without explicitly meaning rust or specifically release builds. It is of course not an exact science...
would be very interesting to have a curl which was written in a language which is does not suffer from these bugs to see if it would reduce the total bugs by halve or if these bugs would just shift to something else - thanks for sharing very fascinating
@troed @marshray well, "unnecessary" implies that we actually had a choice to use something else. Which might be true for like the last three years or so, but not for the vast majority of the time curl existed. And not for all platforms curl runs on...
It's early morning here - the "unnecessarily" is not a comment on there having been a choice that could have been different, just a comment on languages not having had security as a focus all these years :)
I had a discussion just the other week with someone working in the Swedish defense industry who didn't seem to know that using C/C++ today for new projects wasn't very optimal.
Elias Steurer
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Elias Steurer • • •Elias Steurer
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Elias Steurer • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •people are also often obsessed by C vs non-C vulnerabilities, and in #curl the share of mistakes that are related to the programming language keep shrinking (just over 40% now)
This is WAY lower than what is commonly reported as a the general percentage. (60-70% is commonly repeated)
Lucas Gonze
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Lucas Gonze • • •Lucas Gonze
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Lucas Gonze • • •Christian Huitema
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Christian Huitema • • •@huitema @lucasgonze I think fuzzing, test improvements and inviduals' poking to be driving factors.
Note that this graph shows when the error was *introduced* not when fixed. They were generally found several thousand days after being added. On average.
Christian Huitema
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Christian Huitema • • •Ethon
in reply to daniel:// stenberg:// • • •Morten Hilker-Skaaning
in reply to daniel:// stenberg:// • • •Andrew K. Hirsch
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Andrew K. Hirsch • • •Ben Cohen
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Ben Cohen • • •cliffle
in reply to daniel:// stenberg:// • • •@airspeedswift
I'd suggest including both integer wrapping and also surprise implicit conversions, and particularly signed-unsigned interaction -- I'm not sure if curl's had any issues with those, but I notice that OpenBSD just did.
(this analysis is really cool and I'm eagerly following your work.)
zev
in reply to cliffle • • •daniel:// stenberg://
in reply to zev • • •daniel:// stenberg://
in reply to cliffle • • •cliffle
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •Sir l33tname
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Sir l33tname • • •Troed Sångberg
in reply to daniel:// stenberg:// • • •Marsh Ray
in reply to Troed Sångberg • • •@troed One vulnerability per 1,000 lines of code is probably better than industry average, particularly for complex protocol code like this.
Thank you Daniel for publishing this.
Troed Sångberg
in reply to Marsh Ray • • •@marshray
Oh absolutely! I didn't mean that as a comment on curl quality - just that half of all serious issues being "unnecessarily" language-caused.
@bagder
daniel:// stenberg://
in reply to Troed Sångberg • • •Troed Sångberg
in reply to daniel:// stenberg:// • • •It's early morning here - the "unnecessarily" is not a comment on there having been a choice that could have been different, just a comment on languages not having had security as a focus all these years :)
I had a discussion just the other week with someone working in the Swedish defense industry who didn't seem to know that using C/C++ today for new projects wasn't very optimal.
@marshray
daniel:// stenberg://
in reply to Troed Sångberg • • •Alex🇺🇦
in reply to Troed Sångberg • • •daniel:// stenberg://
in reply to Alex🇺🇦 • • •Alex🇺🇦
in reply to daniel:// stenberg:// • • •Stefan Eissing
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
Unknown parent • • •Jim Fuller
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Jim Fuller • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •Jim Fuller
in reply to daniel:// stenberg:// • • •Harry Sintonen
in reply to daniel:// stenberg:// • • •daniel:// stenberg:// reshared this.