Skip to main content

in reply to daniel:// stenberg://

your best year is (number of year's lines of code remaining X years later)/(number of lines written in that year)
in reply to daniel:// stenberg://

I think this one turned out to be the most informative one, or at least it piques my curiosity the most.

I think I'll try following along this graph with curl's version history at hand. For example, I now wonder what kind of refactoring happened around late 2011 - the older code amount drops rather sharply there :)

in reply to daniel:// stenberg://

How did you gather the data to generate this graph?

This would be very helpful for some respositories 🎉

in reply to Alex Rock

@pierstoval git blame is our friend. This is my (fairly small) perl script that extracts all the data:

github.com/curl/stats/blob/mas…

in reply to daniel:// stenberg://

That's very nice, I'm gonna try it out on the project I'm working on (which is probably about the same age as Curl)
in reply to Alex Rock

@pierstoval cool, just ask if there's anything I can help you with. You might spot that I have a way to list all tags and I get the age of the project at those moments in time. If you have tags like that, you can do it the same way otherwise you need to figure out a different way to identify snapshot moments.
in reply to daniel:// stenberg://

I already tweaked the perl code fetching the tags, but I'm not getting any data yet, I'm trying to figure out the code :)
This entry was edited (6 hours ago)
in reply to Alex Rock

@pierstoval note that the git blame command uses a specific path that you want to update/remove
in reply to daniel:// stenberg://

@pierstoval oh, and there's a version check in the loop at the end that you of course need to cut out
in reply to daniel:// stenberg://

Thanks, did that, also removed the "print cache" statement.

I'll make a fork in order to simplify reviewing it 👍

in reply to Alex Rock

@pierstoval once you've gathered a set of data, you want to make the cache work again as running a full run may take hours, depending on your repo
in reply to daniel:// stenberg://

Yep, it's 20 years old and has like thirty thousand commits, might take a while indeed :)

Here's the current diff: github.com/Pierstoval/stats/pu…

It's not gathering data yet, I'm on it :)

in reply to daniel:// stenberg://

is that how bedrocks are made? looks like it! that would make this geological time.
in reply to daniel:// stenberg://

Watch out for diamonds or other gemstones in the older layers near the bottom, from the digital mesoproterozoic age.
in reply to daniel:// stenberg://

Really the best visualization of this dataset so far!

I find it confusing that only even years like 2000, 2002, etc. are listed. Did you skip every 2nd year? If data for each two years is accumulated please write "2000-2001" in the key.

in reply to Daniel Böhmer

@dboehmer as said in the top, they are two-year segments. It's just a limit I decided on to keep the number of fields reasonable.
in reply to daniel:// stenberg://

Oh, I didn’t see/read this bit 🙈 Maybe that’s an indicator that this might be too subtle …
in reply to Daniel Böhmer

@dboehmer I wanted to keep the labels simple to reduce the amount of text, as it quickly becomes "heavy" otherwise. But yeah, I'll think of how to improve it.
in reply to daniel:// stenberg://

May a make two (edit: three) suggestions:

a) write "2000 f." for 2000–2001 like common for giving page numbers in citations.
(I just learned that "f." is for giving someone’s birthdate in Swedish 😁 )
en.wiktionary.org/wiki/f.#Adje…

b) Use "≤" or "≥" mathematical operators. As the key is most probably read from the top to the bottom maybe give the lower number year instead like
- ≥ 2023
- ≥ 2021
- ≥ 2019
- …
- < 2000

c) short form 2000/01 to 2023/24

This entry was edited (4 hours ago)
in reply to daniel:// stenberg://

You’re so quick! I find this better than take 4, for sure.

If you want to minimize text space I’d consider this the optimal solution.

But to be honest I think it’s a bit too technical even—for software people. it takes a moment to understand this means each color represents two years …

More than ½ h after posting my suggestions I tend to think option C (that I added to the post) might be the most common notation: just "2023/24". Don’t you think? At least Germans use that a lot.

This entry was edited (3 hours ago)
in reply to Daniel Böhmer

@dboehmer unfortunately I think that version gets too messy, probably because too many numbers. Without being crystal clear what it means. I think I'll stick with the ≥ for now.