daniel:// stenberg://

5 months ago

daniel:// stenberg://
5 months ago

#curl source code age, take 4

#curl

daniel:// stenberg:// reshared this.

in reply to daniel:// stenberg://

naught101

in reply to daniel:// stenberg:// 5 months ago

your best year is (number of year's lines of code remaining X years later)/(number of lines written in that year)

in reply to daniel:// stenberg://

iliazeus

in reply to daniel:// stenberg:// 5 months ago

I think this one turned out to be the most informative one, or at least it piques my curiosity the most.

I think I'll try following along this graph with curl's version history at hand. For example, I now wonder what kind of refactoring happened around late 2011 - the older code amount drops rather sharply there :)

in reply to daniel:// stenberg://

Urix Turing

in reply to daniel:// stenberg:// 5 months ago

the only surviving line from 20th century

This entry was edited (5 months ago)

in reply to Urix Turing

daniel:// stenberg://

in reply to Urix Turing 5 months ago

@urixturing that is actually 1254 lines. I presume mostly file header comments or something...

@Urix Turing

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 5 months ago

this is the version I'll make appear in the curl dashboard, starting tomorrow

curl.se/dashboard.html

curl - Project status dashboard

^curl.se

in reply to daniel:// stenberg://

Peter Bindels

in reply to daniel:// stenberg:// 5 months ago

I like this one most; shows replacement and addition in one graph.

in reply to Peter Bindels

daniel:// stenberg://

in reply to Peter Bindels 5 months ago

@dascandy yes, I too think this is the best one

@Peter Bindels

in reply to daniel:// stenberg://

Christoph Petrausch

in reply to daniel:// stenberg:// 5 months ago

@dascandy Out of curiosity, how did you create this one?

@Peter Bindels

in reply to Christoph Petrausch

daniel:// stenberg://

in reply to Christoph Petrausch 5 months ago

@hikhvar @dascandy

extract the data using git blame => github.com/curl/stats/blob/mas…

render the graph from the data the script generated using gnuplot => github.com/curl/stats/blob/mas…

stats/codeage.pl at master · curl/stats

Scripts for generating project statistics and for plotting them as graphs. - curl/stats

^GitHub

@Christoph Petrausch @Peter Bindels

in reply to daniel:// stenberg://

Kees Cook

in reply to daniel:// stenberg:// 5 months ago

Oh this is really nice! You've inspired me to generate this for the Linux kernel. The git blames are running now... I parallelized it, but it's still going to take a while! :)

This entry was edited (5 months ago)

in reply to Kees Cook

Kees Cook

in reply to Kees Cook 5 months ago

@hikhvar @dascandy LOL. 102 Linux kernel tags, averaging 3 minutes per blame run (so far). 5 hours to generate the data. O_O

@Christoph Petrausch @Peter Bindels

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

@kees @hikhvar @dascandy I'm excited to see how the kernel version looks

@Christoph Petrausch @Kees Cook @Peter Bindels

in reply to daniel:// stenberg://

Kees Cook

in reply to daniel:// stenberg:// 5 months ago

@hikhvar @dascandy It's still running, but here's through 2018...

Graph of age of Linux kernel code over time. Graph rises steadily up and to the right, topping out at 20M lines of code, from 2006 through 2018, with bands of color that diminish in thickness over time. A bit more than half of each color (i.e. new lines of code) remains in 2018 compared to when it was introduced for their respective years.

@Christoph Petrausch @Peter Bindels

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

@kees @hikhvar @dascandy a beauty already!

@Christoph Petrausch @Kees Cook @Peter Bindels

in reply to daniel:// stenberg://

Christoph Petrausch

in reply to daniel:// stenberg:// 5 months ago

@kees @dascandy Very nice visualization! I see, that the Linux Kernel has less churn than curl, so once code is commited it is very likely to stay.

@Kees Cook @Peter Bindels

in reply to Christoph Petrausch

Kees Cook

in reply to Christoph Petrausch 5 months ago

@hikhvar @dascandy Yeah, it's not as steep as with curl, but I'm starting to see it getting deeper with each segment. The 2016-2018 segment seems to eat into prior areas much more than the other year segments.

I'm so impatient! Blame, git, blame! ;)

@Christoph Petrausch @Peter Bindels

in reply to Kees Cook

Christoph Petrausch

in reply to Kees Cook 5 months ago

@kees @dascandy That is true. But also the kernel code is placed on a very solid bedrock from the pre 2006 era.

@Kees Cook @Peter Bindels

in reply to Christoph Petrausch

Kees Cook

in reply to Christoph Petrausch 5 months ago

@hikhvar @dascandy I'm curios to see how much of what's left from "start of git history" in Linux is blank lines and comments. :) I'll need a whole new scanner for that. :P

@Christoph Petrausch @Peter Bindels

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

yeah, that's basically what I found in the curl code left from < 2000. Mostly comments and a few #ifdef/#defines.

#ifdef

This entry was edited (5 months ago)

in reply to daniel:// stenberg://

Kees Cook

in reply to daniel:// stenberg:// 5 months ago

@hikhvar @dascandy Here's the code, Python with parallelism: github.com/kees/kernel-tools/t…

kernel-tools/stats at trunk · kees/kernel-tools

Tools for doing upstream Linux kernel development, patch wrangling, builds, and testing - kees/kernel-tools

^GitHub

@Christoph Petrausch @Peter Bindels

daniel:// stenberg:// reshared this.

in reply to Kees Cook

kurtseifried

in reply to Kees Cook 5 months ago

@kees @hikhvar @dascandy can you do one without drivers? I bet that’s a lot of skewing older code.

@Christoph Petrausch @Kees Cook @Peter Bindels

in reply to kurtseifried

Kees Cook

in reply to kurtseifried 5 months ago

@kurtseifried @hikhvar @dascandy Yeah, once this finishes I'm going to rework the caching and also store paths. Then I can re-run it with arbitrary path filters. The counting phase is fast. The blame phase is sloooow. 😅

@Christoph Petrausch @kurtseifried @Peter Bindels

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

and yet it runs the simplest form of blame. It could be argued that blame -CCC would give the more "right" info, but that's just so slow it's unbearable to use for this

This entry was edited (5 months ago)

in reply to Kees Cook

Kees Cook

in reply to Kees Cook 5 months ago

@hikhvar @dascandy Okay, here's the full run.

@Christoph Petrausch @Peter Bindels

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

@kees @hikhvar @dascandy compared to the curl graph, this has clearly kept more lines untouched

@Christoph Petrausch @Kees Cook @Peter Bindels

in reply to daniel:// stenberg://

Kees Cook

in reply to daniel:// stenberg:// 5 months ago

@hikhvar @dascandy Not sure if you want this too; I ended up tweaking the plot's display of lines slightly with this format:

set format y2 "%.0s%c"

So instead of, e.g., 200000, it'll show 200k

@Christoph Petrausch @Peter Bindels

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

@kees @hikhvar @dascandy nice thanks, yes I wanted to do that but couldn't figure out how!

@Christoph Petrausch @Kees Cook @Peter Bindels

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

How did you gather the data to generate this graph?

This would be very helpful for some respositories 🎉

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval git blame is our friend. This is my (fairly small) perl script that extracts all the data:

github.com/curl/stats/blob/mas…

stats/codeage.pl at master · curl/stats

Scripts for generating project statistics and for plotting them as graphs. - curl/stats

^GitHub

@Alex Rock

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

That's very nice, I'm gonna try it out on the project I'm working on (which is probably about the same age as Curl)

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval cool, just ask if there's anything I can help you with. You might spot that I have a way to list all tags and I get the age of the project at those moments in time. If you have tags like that, you can do it the same way otherwise you need to figure out a different way to identify snapshot moments.

@Alex Rock

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

I already tweaked the perl code fetching the tags, but I'm not getting any data yet, I'm trying to figure out the code :)

This entry was edited (5 months ago)

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval note that the git blame command uses a specific path that you want to update/remove

@Alex Rock

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 5 months ago

@pierstoval sorry, I mean the git ls-tree command

@Alex Rock

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 5 months ago

@pierstoval oh, and there's a version check in the loop at the end that you of course need to cut out

@Alex Rock

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

Thanks, did that, also removed the "print cache" statement.

I'll make a fork in order to simplify reviewing it 👍

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval once you've gathered a set of data, you want to make the cache work again as running a full run may take hours, depending on your repo

@Alex Rock

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

Yep, it's 20 years old and has like thirty thousand commits, might take a while indeed :)

Here's the current diff: github.com/Pierstoval/stats/pu…

It's not gathering data yet, I'm on it :)

Custom usage by Pierstoval · Pull Request #1 · Pierstoval/stats

Scripts for generating project statistics and for plotting them as graphs. - Custom usage by Pierstoval · Pull Request #1 · Pierstoval/stats

^GitHub

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval how did it go, did you succeed?

@Alex Rock

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

I got caught up in many things so I didn't have time to continue, but I'll certainly work on it during the next days! I'll keep you in touch, in case you're interested :)

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

I'm interested! If it helps, @kees made a port of the script over to python to make it perform better on larger code bases like the Linux kernel: github.com/kees/kernel-tools/t…

kernel-tools/stats at trunk · kees/kernel-tools

Tools for doing upstream Linux kernel development, patch wrangling, builds, and testing - kees/kernel-tools

^GitHub

@Kees Cook

This entry was edited (5 months ago)

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

@kees It's interesting, I might try this one too, though it also needs tweaking to be adapted to what I need :)

@Kees Cook

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

I had some time this evening to check it out, turns out the very little things I did allow me to have an output, but it looks like this:

❯ perl stats/codeage.pl
2015-09-15;0;0;0;0;0;0;0;0;0;0;0;0;0;0
2015-11-27;0;0;0;0;0;0;0;0;0;0;0;0;0;0
2016-04-01;0;0;0;0;0;0;0;0;0;0;0;0;0;0
2016-04-11;0;0;0;0;0;0;0;0;0;0;0;0;0;0
2016-07-19;0;0;0;0;0;0;0;0;0;0;0;0;0;0
2016-07-27;0;0;0;0;0;0;0;0;0;0;0;0;0;0
2016-11-15;0;0;0;0;0;0;0;0;0;0;0;0;0;0

I'm trying to look where the 0s come from

in reply to Alex Rock

Alex Rock

in reply to Alex Rock 5 months ago

If I remove the "if" statement in the "sub show" function, apparently it gives me an output, though very slowly as you mentioned before:

❯ perl stats/codeage.pl
2015-09-15;0;0;0;12287;29598;54171;113862;150511;178495;178495;178495;178495;178495;178495
2015-11-27;0;0;0;12287;29337;53754;113326;149811;187962;187962;187962;187962;187962;187962

I don't know if these kind of data are relevant, but it's another output.

I pushed it to my fork, on the PR in an earlier post :)

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval each line is a date and then line number counters for all the different time "slots" separated with semicolons, so that looks like perfectly fine output

@Alex Rock

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 5 months ago

@pierstoval note also that you can run the gnuplot on a partial data set too, so once you have a 4-5 lines in there you can test that it seems correct

@Alex Rock

in reply to daniel:// stenberg://

Alex Rock

in reply to daniel:// stenberg:// 5 months ago

It seems to be okay when using @kees's scripts! The automatic cache definitely helps a lot 🎉

I will let it run through all day and wait for more details 👌

@Kees Cook

This entry was edited (5 months ago)

in reply to Alex Rock

daniel:// stenberg://

in reply to Alex Rock 5 months ago

@pierstoval @kees I also experimented with improving the color palette to reduce the duplicates

@Alex Rock @Kees Cook

in reply to daniel:// stenberg://

daniel:// stenberg://

in reply to daniel:// stenberg:// 5 months ago

my current look

See the fc instructions per plot in github.com/curl/stats/blob/mas…

stats/codeage.plot at master · curl/stats

Scripts for generating project statistics and for plotting them as graphs. - curl/stats

^GitHub

This entry was edited (5 months ago)

in reply to daniel:// stenberg://

Kees Cook

in reply to daniel:// stenberg:// 5 months ago

@pierstoval Oh yeah, I like the colors!

@Alex Rock

in reply to Kees Cook

daniel:// stenberg://

in reply to Kees Cook 5 months ago

@kees @pierstoval it shall henceforth be known as the scrambled rainbow theme 🤠

@Alex Rock @Kees Cook

Unknown parent

daniel:// stenberg://

Unknown parent 5 months ago

@Man2Dev I stick to light mode

@Man2Dev

in reply to daniel:// stenberg://

Manvir Clair

in reply to daniel:// stenberg:// 5 months ago

is that how bedrocks are made? looks like it! that would make this geological time.

in reply to daniel:// stenberg://

Jean Luc am Grimmsten

in reply to daniel:// stenberg:// 5 months ago

Watch out for diamonds or other gemstones in the older layers near the bottom, from the digital mesoproterozoic age.

in reply to daniel:// stenberg://

Daniel Böhmer

in reply to daniel:// stenberg:// 5 months ago

Really the best visualization of this dataset so far!

I find it confusing that only even years like 2000, 2002, etc. are listed. Did you skip every 2nd year? If data for each two years is accumulated please write "2000-2001" in the key.

in reply to Daniel Böhmer

daniel:// stenberg://

in reply to Daniel Böhmer 5 months ago

@dboehmer as said in the top, they are two-year segments. It's just a limit I decided on to keep the number of fields reasonable.

@Daniel Böhmer

in reply to daniel:// stenberg://

Daniel Böhmer

in reply to daniel:// stenberg:// 5 months ago

Oh, I didn’t see/read this bit 🙈 Maybe that’s an indicator that this might be too subtle …

in reply to Daniel Böhmer

daniel:// stenberg://

in reply to Daniel Böhmer 5 months ago

@dboehmer I wanted to keep the labels simple to reduce the amount of text, as it quickly becomes "heavy" otherwise. But yeah, I'll think of how to improve it.

@Daniel Böhmer

in reply to daniel:// stenberg://

Daniel Böhmer

in reply to daniel:// stenberg:// 5 months ago

May a make two (edit: three) suggestions:

a) write "2000 f." for 2000–2001 like common for giving page numbers in citations.
(I just learned that "f." is for giving someone’s birthdate in Swedish 😁 )
en.wiktionary.org/wiki/f.#Adje…

b) Use "≤" or "≥" mathematical operators. As the key is most probably read from the top to the bottom maybe give the lower number year instead like
- ≥ 2023
- ≥ 2021
- ≥ 2019
- …
- < 2000

c) short form 2000/01 to 2023/24

f. - Wiktionary, the free dictionary

^Wiktionary

#adjective

This entry was edited (5 months ago)

in reply to Daniel Böhmer

daniel:// stenberg://

in reply to Daniel Böhmer 5 months ago

@dboehmer whatdoyouthink?

@Daniel Böhmer

in reply to daniel:// stenberg://

Daniel Böhmer

in reply to daniel:// stenberg:// 5 months ago

You’re so quick! I find this better than take 4, for sure.

If you want to minimize text space I’d consider this the optimal solution.

But to be honest I think it’s a bit too technical even—for software people. it takes a moment to understand this means each color represents two years …

More than ½ h after posting my suggestions I tend to think option C (that I added to the post) might be the most common notation: just "2023/24". Don’t you think? At least Germans use that a lot.

This entry was edited (5 months ago)

in reply to Daniel Böhmer

daniel:// stenberg://

in reply to Daniel Böhmer 5 months ago

@dboehmer unfortunately I think that version gets too messy, probably because too many numbers. Without being crystal clear what it means. I think I'll stick with the ≥ for now.

@Daniel Böhmer

in reply to daniel:// stenberg://

sirjofri

in reply to daniel:// stenberg:// 5 months ago

@dboehmer for me, reading the graph part makes everything very clear. Like, the year number is just a point in time, at the transition between two years (e. g. black covers 2010-2012).

It would also be possible to work with dashes, like saying "up to 2002", though that needs a different numbering then:

- 2000
- 2002
- 2004
...

@Daniel Böhmer

in reply to daniel:// stenberg://

robryk

in reply to daniel:// stenberg:// 5 months ago

It might be interesting to see this with log scale on y axis and, if those lines seem to decrease roughly linearly, to compare how halflifes change over time.

⇧

daniel:// stenberg:// 5 months ago • •

daniel:// stenberg://
5 months ago