About the Linux kernel regression tracking efforts

Jan 8, 2023 · 12 min read ·

Overview

TLDR

Thorsten Leemhuis leads efforts to track and resolve Linux kernel regressions. People reporting regressions and Linux kernel developers normally don’t have to care about this work, which Thorsten performs with the help of the regression tracking bot ‘regzbot’.

But help from regression reporters and developers in a few cases would be greatly appreciated. For example, please tell regzbot if a tracked issue turns out to not actually be a regression or if something related is discussed somewhere else, like another thread or a bugtracker; it would also be real help if you could, when needed, update properties like the title of the regression or the range/commit when it started to happen. And it would be great if you could add reports to the tracking while posting one or when receiving one not yet added to the tracking.

For the developers among you there is another request. When fixing a regression, add ‘Link:’ or ‘Closes:’ tags to the signed-by-off area of the patch description to point to the report as explained by the the kernel’s documentation: Linus wants those and they make regression tracking a whole lot easier, as they allow connecting fixes with tracked reports.

Intro: the Linux kernel regression tracking efforts

This page provides insights about Thorsten Leemhuis’ efforts on Linux kernel regression tracking. It’s meant to provide a brief overview about this work for anyone that comes in contact with it, for example when a regression you reported, caused, or are CCed to is added to the tracking.

These regression tracking efforts are performed with the regression-tracking bot ‘regzbot’ written to facilitate the work. Regzbot’s capabilities are only briefly explained on this page, as they regzbot’s getting started guide and its reference documentation already describe them in more detail.

This page also lacks information about what Linux kernel regressions actually are, how they should be reported, and how developers should handle them: the kernel’s documentation covers these aspects thoroughly in the user-focused document “Reporting regressions” and the developer focused text “Handling regressions”.

General information about the regression tracking efforts

What are the goals and non-goals of the Linux regression tracking efforts?

Current main goals:

Regularly provide Linus with an overview and insights about unresolved regressions in mainline.
Work towards making regzbot and regression tracking with it an integral part of the Linux kernel development process that developer themselves use – and thus in the long run does not heavily depend on Thorsten or anyone else performing regression tracking.
Until then on a best effort basis try to look out for reported regression and add untracked ones to the tracking; also prod things when the fixing process apparently stalled.
Focus on regressions in mainline, the latest mainline release, the stable series derived from the latter.

Current none-goals:

Ensure that all regressions (even ones already fixed) are known by regzbot.
Handle all regressions equally: there are not enough resources to do so, thus it's better to use those for more important cases.
Watch every bugtracker specified in Linux kernels's MAINTAINERS file
Stats about regressions occurring in kernel development, as they would be misleading due to the former aspect.
People performing regression tracking are not expected to provide more than basic guidance for users that report regressions; furthermore, those people should not have to monitor the fixing process closely and are not meant to debug or fix regressions.
Track ordinary bugs with regzbot, apart from occasional issues with a strong reason why somebody better keep an eye on things.

How is regression tracking actually performed currently?

Ideally users or developers make regzbot track each reported regression themselves. The people that perform regression tracking (called “regression trackers” from now on in this document, even if it’s currently mainly Thorsten that does the work) try to take care of other reports where this did not work out; for this they skim the usual mailing lists and bugzilla.kernel.org (other bug trackers like the one used by the DRM developers for now are a blind spot). To ensure added reports are actionable, regression trackers occasionally ask reporters for additional information before adding them to the tracking; these people also try to ensure the right developers are in the loop while adding the reports.

Regzbot then monitors threads and bug entries about regressions for activity. It also looks out for patches posted or committed that reference the tracked report using a ‘Link:’ resp. ‘Closes:’ tags; once a change with such a reference lands in the appropriate tree it will mark the regression as resolved.

Regression trackers occasionally look through the list of tracked issues for any that looked stalled. That involves checking if regzbot tracking missed anything relevant, which will be the case when developers forget placing proper tags in patches fixing regressions. Regression trackers when needed also try to prod things to ensure all issues in the end are resolved in a reasonable time frame.

What should I do if a regression tracker missed an important aspect or did something stupid?

Don't hesitate to tell everyone in a public reply: it's in everyone's interest to set the public record straight! Regression trackers like Thorsten won’t mind, as it’s only natural that they will occasionally miss something important or do something stupid. That's partially because they have to deal with lots of reports which they do not monitor closely; furthermore, due to the complex nature of the kernel the regression trackers obviously will have no or only brief knowledge about the various kernel subsystems they come into contact with.

Aspects relevant for both users and developers

The following aspects are relevant for both users and developers that come in contact with regression tracking.

How to resolve a regression or remove it from the tracking?

Send a direct or indirect(¹) reply to the report briefly explaining the situation to the general audience; somewhere in that mail(²) include a paragraph containing one of the following ‘#regzbot commands’ to tell the regression tracking bot, too. There are three commands for this purpose.

One of them resoles the regression by specifying the fix using the "fix" command together with a stable commit-id or the subject of the fix:

1#regzbot fix: 1f2e3d4c5b6a

1#regzbot fix: foo: resolve a regression that broke bar

In other cases, used the resolve or inconclusive commands:

1#regzbot resolve: not a regression, turned out that problem has been around since forever

1#regzbot inconclusive: radio silence from reporter and insufficient data

(¹) e.g. reply to the report itself or any mail that directly or indirectly through earlier replies is a reply to the report (²) for example right at the end

How to let regzbot known if something relevant to a tracked regression is posted somewhere else?

Send a direct or indirect(¹) reply to the report that if needed explains new findings to the general audience; somewhere in that mail(²) include a paragraph containing one or multiple ‘#regzbot commands’ to point the bot to the additional places. There are four commands for this purpose that for example can be used like this:

1#regzbot monitor: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/

1#regzbot link: https://bbs.archlinux.example.com/post-123456/

1#regzbot dup: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/

1#regzbot dup-of: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/

Regzbot can only monitor discussion available on lore.kernel.org; for everything else use the link command. “dup” is short for “duplicate”.

(¹) e.g. reply to the report itself or any mail that directly or indirectly through earlier replies is a reply to the report (²) for example right at the end

How to update properties or a tracked regression, like the title or the point where it started to happen?

Send a direct or indirect(¹) reply to the report that if needed explains new findings to the general audience; somewhere in that mail(²) include a paragraph containing one or multiple ‘#regzbot commands’ to update properties. There are three commands for this purpose that for example can be used like this:

1#regzbot introduced: v6.1..v6.2-rc1

1#regzbot introduced: 1f2e3d4c5b6a

1#regzbot title: foo: bar stopped working

1#regzbot from: Foo Bar <foo.bar@example.com>

(¹) e.g. reply to the report itself or any mail that directly or indirectly through earlier replies is a reply to the report (²) for example right at the end

Aspects relevant mainly for people reporting regressions

The following aspects are relevant only for people that reported a regression that was added to the tracking; developers might want to head over to the section intended for them.

Do I have to care about the tracking?

No, you don’t need to. But it would be great if you could inform regzbot if something relevant happens, as outlined above – for example if it turns out that the problem is not a regression at all or if after a brief initial report you performed a bisection and found the change causing the problem.

Will the tracking guarantee that my problem will be fixed?

If you bisected the issue there is a pretty decent chance that it will be fixed relatively quickly. But it can’t be guaranteed, because life is complicated sometimes – sometimes for example fixing a regression for some people can cause a regression for others (see “Reporting regressions” for details).

How can I add my report to the tracking?

When writing the report, include a paragraph like this:

1#regzbot introduced: v6.1..v6.2-rc1

Instead of a version range like in this example you can specify a commit id as well.

A more detailed explanation can be found in rezbot’s getting started guide, its reference documentation, or the kernel’s file on how to report regressions.

Aspects relevant only for Linux kernel developers

The following aspects are only relevant for developers dealing with tracked regression reports.

What process should I follow when fixing a tracked regression?

Just fix the regression while using ‘Link:’ or ‘Closes:’ tags pointing to all reports about the issue being fixed, as explained by Documentation/process/submitting-patches.rst and Documentation/process/5.Posting.rst in more detail.

Linus expects such tags (see 1, 2, 3), as they allow him and everyone else to look into the backstory of a fix now or years later. The tags are also important for regression tracking, as they allow regzbot to associate reports with fixes that are posted, discussed, or committed in various places.

How can I add a report I received to the tracking?

Reply to the report with a paragraph like this:

1#regzbot ^introduced: v6.1..v6.2-rc1

The caret (“^”) tells regzbot to treat the mail you reply to as the report. Instead of a version range like in this example you can specify a commit id as well.

A more detailed explanation can be found in rezbot’s getting started guide, its reference documentation or in the kernel’s Documentation/process/handling-regressions.rst file.

Can you spare me those ‘info’ mails that seem primarily for regzbot?

If you consider these mails as noise, feel free to set up a local filter rule that deletes any mail whose ‘From:’ field starts with ‘Linux kernel regression tracking (#added)’. That’s an “From” address regression trackers normally use for mails that are primarily relevant for regzbot and people new to Linux kernel regression tracking.

This procmail rule should do the trick and catch other mails you might want to ignore, too:

1:0
2* ^From:.*Linux regression tracking #(adding|info|update)
3/dev/null

Note that people performing regression tracking will not use a From: address catched by above rule in cases where people are meant to see the mails – for example when CCing additional developers in a reply that also adds a report to the tracking.

For the record: a few approaches to spare developers these info mails were tried earlier, but none of them really worked out.

Should we CC regression trackers on all further mails regarding a regression?

No, just keep the Linux regression mailing list (regressions@lists.linux.dev) in the loop and use ‘Link:’ or ‘Closes:’ tags pointing to the report in patches fixing tracked regressions.

Various

History of the Linux kernel regression tracking efforts

Linux through most of its history had no formal infrastructure or dedicated people to ensure reported regression are addressed. That changed during the 2000s, when Rafael J. Wysocki started to perform regression tracking for a few years. In the end though, that effort fizzled out in 2012, as Rafael steadily moved to working on other tasks in the kernel, and had less time for regression tracking

Thorsten in 2017 picked up the task. He did the work in his spare time and mostly manually – but that proved too much of a hassle and in the end fizzled out after about half a year.

Thx to these activities Thorsten learned a lot about the whole thing – which led to the idea of building a bot to facilitate the work. But it was put on hold due to real-life issues and lack of funding.

In 2020 Thorsten was able to secure funding from NGI Pointer to realize this idea. That allowed him to announce the renewed Linux regression tracking efforts along with the plans to build regzbot in November 2020. It was advertised like this:

The goal of the project: build and integrate mechanisms into the Linux kernel development processes to track all regressions that humans or CI systems report. Together with the existing “no regression” rule this will help to ensure new releases with their improved security techniques work as well as their predecessors. That's important, as the Linux kernel is at the very heart of many devices that drive the internet or are connected to it. But many of those use outdated kernel versions with known vulnerabilities, as vendors and admins fear switching to a later version might break something. This project wants to work against that.

The mechanisms built for tracking Linux kernel regressions will include adapting or writing a service bot tailored to the specific needs of the kernel developers and their email based workflow. To make sure the solution is practical and accepted in practice the project owner will work it out in close relationship with the Linux kernel maintainers and create procedures on how to use it during development. The goal is to see the solution properly established to ensure its continued use after this project ends.

For more details see the introduction post.

The project is scheduled to last one year and is funded by NGI Pointer

This website is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871528.

Actual work due to scheduling conflicts started in April 2021. After a basic regzbot was built, Thorsten started to track regressions with the help of it in late August 2021. The NGI Pointer sponsored project was considered a success in the end, but in March 2022 the funding ended with no option for renewal, as declared initially.

Shortly afterwards Meta thankfully stepped in with funds to ensure Thorsten continued work on regression tracking and regzbot.

Many thx to all sponsors that make this work possible!