Pulling GitHub into the kernel process [LWN.net]

From:

There is an ongoing effort to “modernize” the kernel-development process; so far, the focus has been on providing better tools that can streamline the usual email-based workflow. But that “email-based” part has proven to be problematic for some potential contributors, especially those who might want to simply submit a small bug fix and are not interested in getting set up with that workflow. The project-hosting “forge” sites, like GitHub and GitLab, provide a nearly frictionless path for these kinds of one-off contributions, but they do not mesh well—at all, really—with most of mainline kernel development. There is some ongoing work that may change all of that, however.

Konstantin Ryabitsev at the Linux Foundation has been spearheading much of this work going back at least as far as his September 2019 draft proposal for better kernel tooling. Those ideas were discussed at the 2019 Kernel Maintainers Summit and at a meeting at Open Source Summit Europe 2019 in October. Throughout, Ryabitsev has been looking at ways to make it easier for non-email patch submitters; along the way, he has also released the b4 tool for collecting up patches and worked on patch attestation.

A recent post to the kernel workflows mailing list shows some progress toward a bot that can turn a GitHub pull request (PR) into a well-formed patch series to send to the proper reviewers and mailing lists. “This would be a one-way operation, effectively turning Github into a fancy ‘git-send-email’ replacement.” He also laid out some of the benefits that this bot could provide both for maintainers and patch submitters:

  • submitters would no longer need to navigate their way around git-format-patch, get_maintainer.pl, and git-send-email – nor would need to have a patch-friendly outgoing mail gateway to properly contribute patches
  • subsystem maintainers can configure whatever CI pre-checks they want before the series is sent to them for review (and we can work on a library of Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
  • the bot should (eventually) be clever enough to automatically track v1…vX on pull request updates, assuming the API makes it straightforward

He had some questions about whether the bot should be centralized in a single repository (per forge platform) that would serve as the single submission point, or whether subsystem maintainers would want to configure their own repositories. The latter would give maintainers the opportunity to set their own criteria for checks that would need to pass (e.g. checkpatch.pl) before the PR was considered valid, but would mean that they might have to ride herd on the repository as well.

In addition, Ryabitsev wondered when and how PRs would get closed. The bot could potentially monitor the mainline and auto-close PRs once the patch set gets merged, but that won’t be perfect, of course. An easier approach for him would be “to auto-close the pull request right after it’s sent to the list with a message like ‘thank you, please monitor your email for the rest of the process’”, but he was unsure if that would be best.

As might be guessed, reactions from those participating in the thread were all over the map. While there is a lack of many kinds of diversity within the kernel community, opinions on development workflow—opinions, in general, in truth—do not have that problem. Some maintainers have zero interest in this kind of effort at all. As Christoph Hellwig put it: “Please opt all subsystems I maintain out of this crap. The last thing I need is patches from people that can’t deal with a sane workflow.”

Hellwig’s complaint, which Jiri Kosina agreed with, may be more about the expectations of those who use GitHub (and the like), and less about the possibility of having a web-based interface to kernel development. Dmitry Vyukov asked why Hellwig and Kosina would be unwilling to accept patches from the system if they cannot really distinguish them from a regular submission. Vyukov said that he is currently experiencing a Git email submission problem that he is uninterested in working around, so he can see why others might be similarly inclined. Meanwhile, though, he sees benefits from this kind of bot:

On the other hand this workflow has the potential to ensure that you never need to remind to run checkpatch.pl, nor spend time on writing code formatting comments and re-reviewing v2 because code formatting will be enforced, etc. So I see how this is actually beneficial for maintainers.

Hellwig is not opposed to a web-based solution, though he wants nothing to do with GitHub. But Ryabitsev seems uninterested in “reimplementing a lot of stuff that we already get ‘for free’ from Github and other forges”. Both Mark Brown and Laurent Pinchart suggested that there are mismatches between GitHub-normal practices and those of the kernel community. Pinchart mentioned the inability to comment on a patch’s commit message on GitHub as something that generally leads to poor messages; the platform is training these developers to a certain extent:

Developers who have only been exposed to those platforms are very likely to never have learnt the importance of commit messages, and of proper split of changes across commits. Those are issues that are inherent to those platforms and that we will likely need to handle in an automated way (at least to some extent) or maintainers will become crazy […]

But Miguel Ojeda thinks that it is really no different from new developers showing up on the mailing list with patches. “The same happens in the LKML – some people have sent bad messages, but we correct them and they learn.” He also noted that automated checking of patches can help both developers and maintainers:

[…] it is particularly useful to teach newcomers and to save time for maintainers having to explain things. Even if a maintainer has a set of email templates for the usual things, it takes time vs. not even having to read the email.

Ojeda is working on the Rust for Linux project, which we looked at back in April; he said that he has also been working on a bot:

For Rust for Linux, I have a GitHub bot that reviews PRs and spots the usual mistakes in commit messages (tags, formatting, lkml vs. lore links, that sort of thing). It has been very effective so far to teach newcomers how to follow the kernel development process.

I am also extending it to take Acks, Reviewed-by’s, Tested-by’s, etc., and then performing the merge only if the CI passes (which includes running tests under QEMU, code formatting, lints, etc.) after applying each patch.

But Ojeda is taking things in a rather different direction than what Ryabitsev is envisioning. Ojeda wants to move the main place for patch review and the like from the mailing lists to GitHub. He is also considering having his bot pick up patches from the mailing list and turning them into GitHub PRs—the reverse of what Ryabitsev is doing.

For his part, Ryabitsev said: “That’s pretty cool, but I’m opposed to this on theological grounds. :)” In particular, he is concerned about the “single point of failure” problem for the kernel-development infrastructure. If his bot is unavailable for any reason, it may be inconvenient for those who use it, but that will not hobble development. He sees GitHub as simply a “developer frontend tool”.

Somewhat similar to Ojeda’s intentions, Brendan Higgins has a tool to pick up patches from a mailing list (kselftest in this case) and upload them to a Gerrit instance. He sees some potential synergies between his bot and the one Ryabitsev is working on. Similarly, Drew DeVault has been working on the reverse direction, from a mailing list to a project forge, as well. Patchwork is a longstanding code-review project that also collects up patches from mailing lists to populate a web application. It would seem that much of the focus is on getting patches out of mailing lists, though, which is not where Ryabitsev is headed.

While some maintainers want no part of this “GitHub Future”, others are enthusiastic about the possibilities it could bring. Vyukov thinks that having a single GitHub repository with multiple branches will help consolidate the kernel-development landscape, which is currently fragmented on subsystem lines. He sees it as an opportunity to apply consistent coding-style standards; it does not matter which, he said, “as long as it’s consistent across the project”. It would also allow testing consistency throughout the tree and the same for the development process:

For once: it will be possible to have proper documentation on the process (as compared to current per-subsystem rules, which are usually not documented again because of low RoI [return on investment] for anything related to a single subsystem only).

It is not at all clear that Vyukov’s interest in consistency throughout the tree is shared widely, but there have certainly been complaints along the way about the difficulty of navigating between the different subsystem processes and requirements for submissions. There is also interest in making things easier for quick, one-off contributions; as Ryabitsev put it:

Our code review process must also allow for what is effectively a “report a typo” link. Currently, this is extremely onerous for anyone, as a 15-minute affair suddenly becomes a herculean effort. The goal of this work is to make drive-by patches easier without also burying maintainers under a pile of junk submissions.

Clearly keeping “junk submissions” to a bare minimum is going to be important. Linus Torvalds said that he has had to turn off email from GitHub because it is too noisy; people have apparently signed him up as a project member without any kind of opt-in check. Beyond that, any kind of patch submission from PRs would need to have some sanity checks, including size limits, so that PRs like one pointed out by Ryabitsev do not end up on the mailing list.

That kind of PR highlights another problem: repository maintenance. Greg Kroah-Hartman said that there will be a need to monitor whatever repositories are being used for this purpose. It is not a small task:

What ever repo you put this on, it’s going to take constant maintenance to keep it up to date and prune out the PRs that are going to accumulate there, as well as deal with the obvious spam and abuse issues that popular trees always accumulate.

Torvalds does not want his GitHub tree used for this purpose and Kroah-Hartman said the same. However it plays out, someone will have to be tasked with keeping the repository tidy, which is “a thankless task that will take constant work”. But Ryabitsev is hopeful that the Linux Foundation could fund that kind of work if it becomes necessary.

In the end, it will likely come down to how seamlessly the GitHub bot fits in. If maintainers truly cannot really tell the difference in any substantive way, it is hard to see many of them rejecting well-formed patches that fix real problems in their subsystems. That ideal may not be reached right away, however, which might lead to a premature end to the experiment. It will be interesting to see it all play out over the coming months and years.