Skip to main content

in reply to daniel:// stenberg://

people are also often obsessed by C vs non-C vulnerabilities, and in #curl the share of mistakes that are related to the programming language keep shrinking (just over 40% now)

This is WAY lower than what is commonly reported as a the general percentage. (60-70% is commonly repeated)

#curl
This entry was edited (11 months ago)
in reply to daniel:// stenberg://

What do you think is the reason? Is it better conventions? Maybe more experience with common mistakes?
in reply to Lucas Gonze

@lucasgonze I suspect the most likely reason is rather just different ways of measurement...
in reply to daniel:// stenberg://

If it was a change in methodology, there would be noticeable changes around moments when the measurement method changed. But maybe that's what the data is showing and it's just subtle.
in reply to Lucas Gonze

@lucasgonze in curl, I think we have gradually improved many methods and procedures over time, so there would not be any clear point in time that suddenly happened.
in reply to daniel:// stenberg://

@lucasgonze Would it have to do with testing methodologies? I see a couple times in which the number of C mistakes jumps -- 2001, 2014, 2017. Why? Did testing improve and found issues, which would cause an immediate jump and then a reduced slope?
in reply to Christian Huitema

@huitema @lucasgonze I think fuzzing, test improvements and inviduals' poking to be driving factors.

Note that this graph shows when the error was *introduced* not when fixed. They were generally found several thousand days after being added. On average.

in reply to daniel:// stenberg://

@lucasgonze Have you integrated sanitizers in your test suite? I found that a combination of static analyzers and ASAN/UBSAN does clean the code a lot. Especially if you put it in the CI tests.
in reply to daniel:// stenberg://

The statement that the share of C language related mistakes are way lower in curl than in general sounds like trying to find a humble sounding for „people committing to curl are better in using C than programmers in general“ 😁
in reply to daniel:// stenberg://

How do you define when a mistake is a C mistake versus a non-C mistake? For instance, a language with managed memory might make certain mistakes impossible, but they are conceptual mistakes about memory, not about how the C language works.
in reply to daniel:// stenberg://

would you count unexpected integer wrapping (that didn’t then lead to unsafe memory access)?
in reply to daniel:// stenberg://

@airspeedswift

I'd suggest including both integer wrapping and also surprise implicit conversions, and particularly signed-unsigned interaction -- I'm not sure if curl's had any issues with those, but I notice that OpenBSD just did.

(this analysis is really cool and I'm eagerly following your work.)

in reply to cliffle

If we're implicitly comparing against Rust, should integer wrapping really be considered a C problem? Rust debug builds will check for overflow, but release builds don't, and if we're concerned about problems triggered at runtime by unexpected inputs I'd think release behavior is probably the more relevant of the two...
in reply to zev

@zev @cliffle @airspeedswift fair reasoning. I was not aware. But I'm more saying "flaws that a memory-safe language might have caught" without explicitly meaning rust or specifically release builds. It is of course not an exact science...
in reply to cliffle

@cliffle @airspeedswift we have not had any such vulnerabilities so I have not been forced to make up my mind about such a mistake...
in reply to daniel:// stenberg://

I did a version with two more plots for only the high/critical issues, which then makes C mistakes be the reason for 51% of them:
in reply to daniel:// stenberg://

would be very interesting to have a curl which was written in a language which is does not suffer from these bugs to see if it would reduce the total bugs by halve or if these bugs would just shift to something else - thanks for sharing very fascinating
in reply to Sir l33tname

@l33tname that's a hypothetical that we will never see. Such other project would be done by others, in a different time and have another feature set.
in reply to Troed Sångberg

@troed One vulnerability per 1,000 lines of code is probably better than industry average, particularly for complex protocol code like this.

Thank you Daniel for publishing this.

in reply to Marsh Ray

@marshray

Oh absolutely! I didn't mean that as a comment on curl quality - just that half of all serious issues being "unnecessarily" language-caused.

@bagder

in reply to Troed Sångberg

@troed @marshray well, "unnecessary" implies that we actually had a choice to use something else. Which might be true for like the last three years or so, but not for the vast majority of the time curl existed. And not for all platforms curl runs on...
in reply to daniel:// stenberg://

It's early morning here - the "unnecessarily" is not a comment on there having been a choice that could have been different, just a comment on languages not having had security as a focus all these years :)

I had a discussion just the other week with someone working in the Swedish defense industry who didn't seem to know that using C/C++ today for new projects wasn't very optimal.

@marshray

in reply to Alex🇺🇦

@alex @troed @marshray because C++ is not memory safe and because C++ is a huge foot gun would be my personal favorite reasons
Unknown parent

daniel:// stenberg://
@0x1eef sure, but also: it's not like we can do it any other way...