Skip to main content

in reply to daniel:// stenberg://

your best year is (number of year's lines of code remaining X years later)/(number of lines written in that year)
in reply to daniel:// stenberg://

I think this one turned out to be the most informative one, or at least it piques my curiosity the most.

I think I'll try following along this graph with curl's version history at hand. For example, I now wonder what kind of refactoring happened around late 2011 - the older code amount drops rather sharply there :)

in reply to Christoph Petrausch

@hikhvar @dascandy

extract the data using git blame => github.com/curl/stats/blob/mas…

render the graph from the data the script generated using gnuplot => github.com/curl/stats/blob/mas…

in reply to daniel:// stenberg://

Oh this is really nice! You've inspired me to generate this for the Linux kernel. The git blames are running now... I parallelized it, but it's still going to take a while! :)
This entry was edited (7 hours ago)
in reply to Kees Cook

@hikhvar @dascandy LOL. 102 Linux kernel tags, averaging 3 minutes per blame run (so far). 5 hours to generate the data. O_O
in reply to daniel:// stenberg://

@kees @dascandy Very nice visualization! I see, that the Linux Kernel has less churn than curl, so once code is commited it is very likely to stay.
in reply to Christoph Petrausch

@hikhvar @dascandy Yeah, it's not as steep as with curl, but I'm starting to see it getting deeper with each segment. The 2016-2018 segment seems to eat into prior areas much more than the other year segments.

I'm so impatient! Blame, git, blame! ;)

in reply to Kees Cook

@kees @dascandy That is true. But also the kernel code is placed on a very solid bedrock from the pre 2006 era.
in reply to Christoph Petrausch

@hikhvar @dascandy I'm curios to see how much of what's left from "start of git history" in Linux is blank lines and comments. :) I'll need a whole new scanner for that. :P
in reply to Kees Cook

yeah, that's basically what I found in the curl code left from < 2000. Mostly comments and a few #ifdef/#defines.
This entry was edited (5 hours ago)
in reply to daniel:// stenberg://

@hikhvar @dascandy Not sure if you want this too; I ended up tweaking the plot's display of lines slightly with this format:

set format y2 "%.0s%c"

So instead of, e.g., 200000, it'll show 200k

in reply to daniel:// stenberg://

How did you gather the data to generate this graph?

This would be very helpful for some respositories 🎉

in reply to Alex Rock

@pierstoval git blame is our friend. This is my (fairly small) perl script that extracts all the data:

github.com/curl/stats/blob/mas…

in reply to daniel:// stenberg://

That's very nice, I'm gonna try it out on the project I'm working on (which is probably about the same age as Curl)
in reply to Alex Rock

@pierstoval cool, just ask if there's anything I can help you with. You might spot that I have a way to list all tags and I get the age of the project at those moments in time. If you have tags like that, you can do it the same way otherwise you need to figure out a different way to identify snapshot moments.
in reply to daniel:// stenberg://

I already tweaked the perl code fetching the tags, but I'm not getting any data yet, I'm trying to figure out the code :)
This entry was edited (19 hours ago)
in reply to Alex Rock

@pierstoval note that the git blame command uses a specific path that you want to update/remove
in reply to daniel:// stenberg://

@pierstoval oh, and there's a version check in the loop at the end that you of course need to cut out
in reply to daniel:// stenberg://

Thanks, did that, also removed the "print cache" statement.

I'll make a fork in order to simplify reviewing it 👍

in reply to Alex Rock

@pierstoval once you've gathered a set of data, you want to make the cache work again as running a full run may take hours, depending on your repo
in reply to daniel:// stenberg://

Yep, it's 20 years old and has like thirty thousand commits, might take a while indeed :)

Here's the current diff: github.com/Pierstoval/stats/pu…

It's not gathering data yet, I'm on it :)

in reply to daniel:// stenberg://

is that how bedrocks are made? looks like it! that would make this geological time.
in reply to daniel:// stenberg://

Watch out for diamonds or other gemstones in the older layers near the bottom, from the digital mesoproterozoic age.
in reply to daniel:// stenberg://

Really the best visualization of this dataset so far!

I find it confusing that only even years like 2000, 2002, etc. are listed. Did you skip every 2nd year? If data for each two years is accumulated please write "2000-2001" in the key.

in reply to Daniel Böhmer

@dboehmer as said in the top, they are two-year segments. It's just a limit I decided on to keep the number of fields reasonable.
in reply to daniel:// stenberg://

Oh, I didn’t see/read this bit 🙈 Maybe that’s an indicator that this might be too subtle …
in reply to Daniel Böhmer

@dboehmer I wanted to keep the labels simple to reduce the amount of text, as it quickly becomes "heavy" otherwise. But yeah, I'll think of how to improve it.
in reply to daniel:// stenberg://

May a make two (edit: three) suggestions:

a) write "2000 f." for 2000–2001 like common for giving page numbers in citations.
(I just learned that "f." is for giving someone’s birthdate in Swedish 😁 )
en.wiktionary.org/wiki/f.#Adje…

b) Use "≤" or "≥" mathematical operators. As the key is most probably read from the top to the bottom maybe give the lower number year instead like
- ≥ 2023
- ≥ 2021
- ≥ 2019
- …
- < 2000

c) short form 2000/01 to 2023/24

This entry was edited (17 hours ago)
in reply to daniel:// stenberg://

You’re so quick! I find this better than take 4, for sure.

If you want to minimize text space I’d consider this the optimal solution.

But to be honest I think it’s a bit too technical even—for software people. it takes a moment to understand this means each color represents two years …

More than ½ h after posting my suggestions I tend to think option C (that I added to the post) might be the most common notation: just "2023/24". Don’t you think? At least Germans use that a lot.

This entry was edited (16 hours ago)
in reply to Daniel Böhmer

@dboehmer unfortunately I think that version gets too messy, probably because too many numbers. Without being crystal clear what it means. I think I'll stick with the ≥ for now.
in reply to daniel:// stenberg://

@dboehmer for me, reading the graph part makes everything very clear. Like, the year number is just a point in time, at the transition between two years (e. g. black covers 2010-2012).

It would also be possible to work with dashes, like saying "up to 2002", though that needs a different numbering then:

- 2000
- 2002
- 2004
...