I think this one turned out to be the most informative one, or at least it piques my curiosity the most.
I think I'll try following along this graph with curl's version history at hand. For example, I now wonder what kind of refactoring happened around late 2011 - the older code amount drops rather sharply there :)
Oh this is really nice! You've inspired me to generate this for the Linux kernel. The git blames are running now... I parallelized it, but it's still going to take a while! :)
@hikhvar @dascandy Yeah, it's not as steep as with curl, but I'm starting to see it getting deeper with each segment. The 2016-2018 segment seems to eat into prior areas much more than the other year segments.
@hikhvar @dascandy I'm curios to see how much of what's left from "start of git history" in Linux is blank lines and comments. :) I'll need a whole new scanner for that. :P
@pierstoval cool, just ask if there's anything I can help you with. You might spot that I have a way to list all tags and I get the age of the project at those moments in time. If you have tags like that, you can do it the same way otherwise you need to figure out a different way to identify snapshot moments.
Really the best visualization of this dataset so far!
I find it confusing that only even years like 2000, 2002, etc. are listed. Did you skip every 2nd year? If data for each two years is accumulated please write "2000-2001" in the key.
@dboehmer I wanted to keep the labels simple to reduce the amount of text, as it quickly becomes "heavy" otherwise. But yeah, I'll think of how to improve it.
a) write "2000 f." for 2000–2001 like common for giving page numbers in citations. (I just learned that "f." is for giving someone’s birthdate in Swedish 😁 ) en.wiktionary.org/wiki/f.#Adje…
b) Use "≤" or "≥" mathematical operators. As the key is most probably read from the top to the bottom maybe give the lower number year instead like - ≥ 2023 - ≥ 2021 - ≥ 2019 - … - < 2000
You’re so quick! I find this better than take 4, for sure.
If you want to minimize text space I’d consider this the optimal solution.
But to be honest I think it’s a bit too technical even—for software people. it takes a moment to understand this means each color represents two years …
More than ½ h after posting my suggestions I tend to think option C (that I added to the post) might be the most common notation: just "2023/24". Don’t you think? At least Germans use that a lot.
@dboehmer unfortunately I think that version gets too messy, probably because too many numbers. Without being crystal clear what it means. I think I'll stick with the ≥ for now.
@dboehmer for me, reading the graph part makes everything very clear. Like, the year number is just a point in time, at the transition between two years (e. g. black covers 2010-2012).
It would also be possible to work with dashes, like saying "up to 2002", though that needs a different numbering then:
naught101
in reply to daniel:// stenberg:// • • •iliazeus
in reply to daniel:// stenberg:// • • •I think this one turned out to be the most informative one, or at least it piques my curiosity the most.
I think I'll try following along this graph with curl's version history at hand. For example, I now wonder what kind of refactoring happened around late 2011 - the older code amount drops rather sharply there :)
Urix Turing
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Urix Turing • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •this is the version I'll make appear in the curl dashboard, starting tomorrow
curl.se/dashboard.html
curl - Project status dashboard
curl.seMan2Dev
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Man2Dev • • •Peter Bindels
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Peter Bindels • • •Christoph Petrausch
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Christoph Petrausch • • •@hikhvar @dascandy
extract the data using git blame => github.com/curl/stats/blob/mas…
render the graph from the data the script generated using gnuplot => github.com/curl/stats/blob/mas…
stats/codeage.pl at master · curl/stats
GitHubKees Cook
in reply to daniel:// stenberg:// • • •Kees Cook
in reply to Kees Cook • • •daniel:// stenberg://
in reply to Kees Cook • • •Kees Cook
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Kees Cook • • •Christoph Petrausch
in reply to daniel:// stenberg:// • • •Kees Cook
in reply to Christoph Petrausch • • •@hikhvar @dascandy Yeah, it's not as steep as with curl, but I'm starting to see it getting deeper with each segment. The 2016-2018 segment seems to eat into prior areas much more than the other year segments.
I'm so impatient! Blame, git, blame! ;)
Christoph Petrausch
in reply to Kees Cook • • •Kees Cook
in reply to Christoph Petrausch • • •daniel:// stenberg://
in reply to Kees Cook • • •Kees Cook
in reply to daniel:// stenberg:// • • •kernel-tools/stats at trunk · kees/kernel-tools
GitHubdaniel:// stenberg:// reshared this.
Kees Cook
in reply to daniel:// stenberg:// • • •@hikhvar @dascandy Not sure if you want this too; I ended up tweaking the plot's display of lines slightly with this format:
set format y2 "%.0s%c"
So instead of, e.g., 200000, it'll show 200k
daniel:// stenberg://
in reply to Kees Cook • • •Alex Rock
in reply to daniel:// stenberg:// • • •How did you gather the data to generate this graph?
This would be very helpful for some respositories 🎉
daniel:// stenberg://
in reply to Alex Rock • • •@pierstoval git blame is our friend. This is my (fairly small) perl script that extracts all the data:
github.com/curl/stats/blob/mas…
stats/codeage.pl at master · curl/stats
GitHubAlex Rock
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Alex Rock • • •Alex Rock
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Alex Rock • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to daniel:// stenberg:// • • •Alex Rock
in reply to daniel:// stenberg:// • • •Thanks, did that, also removed the "print cache" statement.
I'll make a fork in order to simplify reviewing it 👍
daniel:// stenberg://
in reply to Alex Rock • • •Alex Rock
in reply to daniel:// stenberg:// • • •Yep, it's 20 years old and has like thirty thousand commits, might take a while indeed :)
Here's the current diff: github.com/Pierstoval/stats/pu…
It's not gathering data yet, I'm on it :)
Custom usage by Pierstoval · Pull Request #1 · Pierstoval/stats
GitHubManvir Clair
in reply to daniel:// stenberg:// • • •Jean Luc POI I7FI 🕯️
in reply to daniel:// stenberg:// • • •Daniel Böhmer
in reply to daniel:// stenberg:// • • •Really the best visualization of this dataset so far!
I find it confusing that only even years like 2000, 2002, etc. are listed. Did you skip every 2nd year? If data for each two years is accumulated please write "2000-2001" in the key.
daniel:// stenberg://
in reply to Daniel Böhmer • • •Daniel Böhmer
in reply to daniel:// stenberg:// • • •daniel:// stenberg://
in reply to Daniel Böhmer • • •Daniel Böhmer
in reply to daniel:// stenberg:// • • •May a make two (edit: three) suggestions:
a) write "2000 f." for 2000–2001 like common for giving page numbers in citations.
(I just learned that "f." is for giving someone’s birthdate in Swedish 😁 )
en.wiktionary.org/wiki/f.#Adje…
b) Use "≤" or "≥" mathematical operators. As the key is most probably read from the top to the bottom maybe give the lower number year instead like
- ≥ 2023
- ≥ 2021
- ≥ 2019
- …
- < 2000
c) short form 2000/01 to 2023/24
f. - Wiktionary, the free dictionary
Wiktionarydaniel:// stenberg://
in reply to Daniel Böhmer • • •Daniel Böhmer
in reply to daniel:// stenberg:// • • •You’re so quick! I find this better than take 4, for sure.
If you want to minimize text space I’d consider this the optimal solution.
But to be honest I think it’s a bit too technical even—for software people. it takes a moment to understand this means each color represents two years …
More than ½ h after posting my suggestions I tend to think option C (that I added to the post) might be the most common notation: just "2023/24". Don’t you think? At least Germans use that a lot.
daniel:// stenberg://
in reply to Daniel Böhmer • • •sirjofri
in reply to daniel:// stenberg:// • • •@dboehmer for me, reading the graph part makes everything very clear. Like, the year number is just a point in time, at the transition between two years (e. g. black covers 2010-2012).
It would also be possible to work with dashes, like saying "up to 2002", though that needs a different numbering then:
- 2000
- 2002
- 2004
...