status update: polishing, problems, further work

A (late) "Happy new year". Time for a status update, which is overdue anyway.

The long story short

  • regzbot in general is working as intended and regularly sending weekly reports now

  • the code got a lot of polishing, but more is needed (as always)

  • four additional features would make regzbot a lot more useful, but they have to wait a bit, as development slows down for a while

  • sadly, fewer subsystems and developer than hoped for use regzbot directly (at least for now); I also have to re-educate more developers than expected about placing "Link:" tags, despite them being mandated for years now

  • I submitted a text on regressions and regzbot for the Linux kernel's documentation to the lists and Greg already ACKed it, yeah!

The details

Working well

Regzbot does what it's indented to. It's definitely still far from perfect, but it makes tracking regressions a lot easier already, as it detects a lot of things automatically which I had to manually track down in the past. Quite a few regressions thus got tracked by regzbot already or are currently tracked, some of them with little or no intervention from myself, the Linux kernel's regression tracker.

The weekly reports mentioned in the last status update are regularly sent now on Sunday evening, e.g., a few hours before Linus Torvalds usually releases a new (pre)-release. He provided feedback on of the first reports and I changed things as requested. He also mentioned regzbot a few other times on public Linux kernel mailing lists and reacted to a "extra report" shortly before 5.16 reached the finish line.

Thx to regzbot's tracking efforts I was aware of a regression fix that likely would have missed Linux 5.16, if I wouldn't have brought the issue to Linus attention. He seemed to be happy about it and even honored the effort with the unusual "Tracked-by:" tag (only used for the second time) in a commit. That's not important at all, but nevertheless nice and motivating gesture. :-D

Polishing

The regzbot code got a lot of polishing since the previous report at the end of October. It for example now marks activities that contain patches and shows them prominently in the web-ui. I've multiple times slightly adjusted the format for the weekly reports to make them more useful. The web-ui now also has a "events" page that chronologically lists all the events it noticed, which make it easier to monitor the progress. And I squashed a lot of bugs, too.

Those are just the improvements that immediately sprang to my mind just now, there were many more. This can be seen in the commit-log, which shows a lot of changes in October, November and the first half of December. That's when I started to primarily work on a text about regressions and regzbot for the Linux kernel's documentation (see below) and slowed down development.

More features needed

As always, there is more that can and should be done. I'm aware of a few bugs I didn't yet find time to address. And there are four features I'd like to see, but didn't get around to work on, as I have other work on my hands that has to be done first.

The features are:

  • basic bugzilla.kernel.org integration, to at least detect any activity in linked tickets and show an accurate value for the "latest activity" in regzbot's web-ui. Would also be good if regzbot would learn a from: command to handle forwarded tickets better (and ideally was able to gather that information from the linked ticket).

  • create a new page in the web-ui where subsystem maintainer can see the open regressions for their subsystem, which can be detected from the lists that are CCed to mails

  • a way to mark reports that need more investigation as "dubious/on backburner" or something, so they can be separated in the UI

  • a page in the web-ui that shows regression or events that need to be checked by the regression tracker or someone else -- for example if a commit contains a "Fixed-by:" tag for a commit known to cause a regression, as regzbot can't know if it's a fix for the regression or some other issue caused by the change

As always, there are quite a lot of other features that would be nice to have in regzbot -- for example it might be nice if regzbot would show when commits causing regressions were backported to stable or longterm kernels, as then they're likely affected as well. But the day only has 24 hours and there are IMHO more important areas wrt regression tracking that need attention. That alone is reason enough why I'll try to avoid going down that rabbit hole to far at least in the foreseeable future.

There is another reason why I will try to avoid that route: I worked on regzbot nearly full-time in the past few weeks -- but that was not the plan, as I occasionally need to write a few articles (reminder: I worked for a German computer magazine for more than 20 years) to generate enough income. And that's what I'll do in the next few weeks. But guess what: I plan to write at least one German article about kernel bug reporting in general, which will mention regression and regzbot. Afterwards I hope to write at least one English text about regression tracking in general, which obviously will mention regzbot, too.

Besides that I submitted two talks about these topics for conferences in March: one for the biggest Linux conference in Germany, one for an international conference dedicated to the Linux kernel. The latter sadly was rescheduled due to COVID-19. :-/

Interaction with developers

I had hoped a few subsystems might start using regzbot regularly, but that didn't work out. But some maintainers at least occasionally use it. Niklas Schnelle (S390 PCI) was one of the first to give the bot a try; recently Jakub Kicinski used it, who's one of the two maintainers for the network subsystems (one of the biggest). Other maintainers at least started to bounce reports my way, to make me add them to regzbot.

A few Linux kernel developers and reporters used regzbot themselves without any help of mine, which is really nice to see. I hope more will do that in the future. With some more polish and some documentation (see below) that will slowly happen.

This is an uphill battle, but there is another one that might be even harder: developers for ages now are supposed to link to regression reports using the 'Link:' tags regzbot heavily relies on. Nevertheless, quite a few developers use other tags or don't add these pointers at all. A lot of reminders will need to be sent to educate people. A small change to the kernel's documentation merged yesterday for Linux 5.17 will help a little to explain things.

I also proposed a new tag to get a bit of order into things. Feedback on v1 was mostly neutral or slightly positive, but v2 received only one reply. I'll let this idea rest for now, but will likely try again in the future from a different angle or in smaller steps.

Text for the kernel's documentation

In the last few weeks I wrote a text about regression for the Linux kernel's documentation, which also covers regression tracking and how to use regzbot. Submitting it shortly before the festive season would likely have been a bad idea, hence I delayed this: early January I posted v1, which was quickly followed by v2. Sadly that means it was too late to get this text reviewed for the merge window of Linux v5.17, as it started just days after v2 was posted. This delays merging nine or ten weeks, as Linux 5.18 thus will be the first version where this can be merged once the text was reviewed. But the submission already got some good feedback; Greg Kroah-Hartman, second in command for the Linus kernel, already ACKed it! \o/