Regression tracking: state of the union early 2024

Jan 9, 2024 · 7 min read · regzbot regressions Linux kernel ·

[Posted the text on the regressions list as well: https://lore.kernel.org/all/7613e402-894a-4d38-8cef-7263630c1c57@leemhuis.info/; if you want to comment on it, please reply there.]

The long story short

I'm not really happy with my performance wrt to my regression tracking efforts during the last year. To counter that, I've already shifted my focus somewhat ~in October. With the new year I will shift it some more. Top-priority will be "make regzbot more useful for kernel subsystem maintainers" from now on. My tracking efforts of course will continue, but everything except regressions in the current and the previous mainline cycle might not see much attention from my side. This refocusing also means that I won't work much on resolving some ambiguities around "how regressions are supposed to be handled" which lead to tension quite a few times. But all that should be for the best in the long term.

The details

Looking back at 2023

My regression tracking efforts with regzbot still have a circus factor of "one": if I'd run off and join a circus tomorrow, it's likely that nobody would continue my work. That needs to change to make regression tracking successful in the long term. I'm very well aware of that, nevertheless when I look back at last year I think some of my efforts on regression tracking worked against my goal to establish regression tracking properly within the Linux kernel development process.

That's why I'm not really happy with my performance last year. That does not mean that I'm totally unhappy with it, as my work made a difference. But in the end I might have set the wrong priorities sometimes. Most importantly in these cases:

I got into too many debates with developers when I thought a particular regression was not handled appropriately. Sometimes I was right, occasionally I was wrong (or even stupid, as I'm just human :-/ ) – and most of the time it was in the big gray area in between, where your point of view and your understanding of how Linus wants regressions to be handled decides which of the two it is.
I should have intervened way earlier in public when volunteers tried to help with regression tracking, but did so in ways that annoyed developers (which was totally understandable).
I should have spent more time improving regzbot. But I did not due to lack of time and some tasks that are energy drainers. Some of them:

Skimming lists and bugzilla for regression reports to track takes a whole lot of time.
I spend a lot of time following up on tracked regression reports, because they from my understanding of what Linus wants were not handled well.
I had to spend a lot of time following up on regression reports I or somebody else added to the tracking, as developers often forget Link: or Closes: tags pointing to the report.

When I found time to work on regzbot I'm not sure if I worked on the right features. That's because I spend most of that time on code to support tracking regressions submitted on gitlab instances or github, as without it regzbot is unable to track regressions reported for the DRM and SOF subsystems or things ClangBuiltLinux finds. This work will also improve the rough bugzilla support, which is crucial if bugzilla gets used in the way Konstantin envisions it. These changes furthermore renovate a few really ugly parts in the regzbot code (written in its early days when I was getting into programming), which is wise to do before implementing some other important features.

All in all it was a lot of work, especially dealing with the APIs for the three bug trackers. Maybe 90 percent of that work is done now; it's committed, but not used in production, as it still needs a lot more testing and finetuning.

There are a few other things I'm unhappy with, but those were the major ones.

Plans for 2024

In general: regzbot becomes the priority; I'll try to stay on top of tracked regressions and look out for reports that need to be tracked, but for some time will work less strictly to reduce the timeI spend on this.

These are the regzbot features I plan to work on:

Finish the current work (gitlab/github support with related core improvements; see above), which will take at least the rest of January I fear.
Afterwards focus for a while on making regzbot a more useful and easier to use tool for kernel developers and subsystems maintainers.This partly relies on some of the internal renovations already in the works (see above) and will consist of many small changes in various areas. Some of them:

Make it dead easy to add regressions to the tracking reported only indirectly by way of a patch submission that fixes the problem.
Implement a dedicated "#regzbot forwarding" command, as people often fail to use the current syntax correctly (they forget the caret in "#regzbot ^introduced", put it in the wrong place, or do not reply to what's considered the report).
Related to the "do not reply to what's considered the report" in the previous point: implement a command like "#regzbot adjustreport https://example.com/foo" to adjust the location of the report.
Support bulk adding reports and updating the status of tracked regressions out-of-thread. This will reduce the amount of mail developers receive and make updating tracking bits easier for me as well.
Make it easier to handle duplicates.
Webpages and reports UI: create pages where subsystem maintainers can see unresolved regression in their area.

Ideally find a subsystem where the maintainers want to use regzbot and work closely with me to make regzbot more useful for them.
Allow tagging, for example to tag regressions reports coming from a certain CI, so that the CI projects can rely on regzbot's magic to keep an eye on regressions they reported.
Handle fixes not yet mainlined better in the webui and the reports; e.g. separate "Fix incoming" into something like "fix up for review", "fix pending (this cycle)", "fix pending (next cycle)".
There are a few other things planned for later, but I might work on them earlier if it turns out they make subsystem maintainers happier:
- Separate actionable vs non-actionable reports in the UI (actionable: a sane report with a bisection result).

Mark some regressions as "priority".
Export data in a simple format to enable developers to allow scripting things like "is anything in here known to cause a regression not yet fixed".
Make regzbot send mails or add comments. But only when regzbot works well; and ensure those mails won't bother people.

Regression tracking:

Spend less time looking out for regression reports and following up the regressions that regzbot tracks.

To do so, I plan to focus on regressions introduced during the current or the previous mainline release. I'll try to keep an eye on regressions in mainline releases from the past 12 months as well as those in stable/longterm trees, but will try to not spend too much time on that. I'll ignore everything older and regressions not bisected, unless it's one where I get "ohh, this is not good at all" vibes; in such cases I likely will continue to help reporters improve their report, but in other cases I won't do that anymore.

Side projects:

Submit a text on bisecting a Linux kernel regression for inclusion into the kernel's documentation. I started writing that text on Christmas eve while having a slight headache; got into a flow afterwards and finished the bulk of it early January. Just needs more polishing, so it would be a shame to let it linger on my hard disk.
Try to add some text to the kernel's documentation endorsed by Linus and briefly describing how he wants regressions to be handled. Basically a shorter version of the "Expectations and best practices for fixing regressions" section already in https://docs.kernel.org/process/handling-regressions.html. What I've written there is based on actions and past e-mails from Linus combined with putting things in context (in general and with stable in mind). But people don't take it for full, as it was only ACKed by Greg, but not from Linus – which leads to discussions that are annoying for everyone involved (and created a lot of tension between developers and myself).
Prepare discussions about handling and tracking regressions for both the kernel summit and the maintainers summit this fall.

Closing words

There are a ton of other things I could and maybe should write here, that's why I suspect I've forgotten an important thing or two. If that turns out to be true I might update this post within the first days of its publication.