[已解决]警告:Debian 12.5 更新中新内核 linux-image-6.1.0-18-amd64 将导致 nvidia-driver 的 dkms 模块构建失败

请注意,该问题应该会导致安装 nvidia-driver 这个包的用户无法正常运行 。

错误原因:

/var/lib/dkms/nvidia-current/525.147.05/build/make.log

ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'

看上去是内核变动导致 GPL-only 符号被意外使用:


以下是安装失败后尝试重新运行未完成的安装时的日志信息:

$ sudo apt install
正在读取软件包列表... 完成正在分析软件包的依赖关系树... 完成正在读取状态信息... 完成                 
下列软件包是自动安装的并且现在不需要了:  linux-headers-6.1.0-16-amd64 linux-headers-6.1.0-16-common linux-image-6.1.0-16-amd64
使用'sudo apt autoremove'来卸载它(它们)。升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 1 个软件包未被升级。有 4 个软件包没有被完全安装或卸载。解压缩后会消耗 0 B 的额外空间。正在设置 linux-image-6.1.0-18-amd64 (6.1.76-1) ...
/etc/kernel/postinst.d/dkms:
dkms: running auto installation service for kernel 6.1.0-18-amd64.
Sign command: /usr/lib/linux-kbuild-6.1/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module:
Cleaning build area...
env NV_VERBOSE=1 make -j32 modules KERNEL_UNAME=6.1.0-18-amd64......(bad exit status: 2)
Error! Bad return status for module build on kernel: 6.1.0-18-amd64 (x86_64)
Consult /var/lib/dkms/nvidia-current/525.147.05/build/make.log for more information.
Error! One or more modules failed to install during autoinstall.
Refer to previous errors for more information.
dkms: autoinstall for kernel: 6.1.0-18-amd64 failed!
run-parts: /etc/kernel/postinst.d/dkms exited with return code 11
dpkg: 处理软件包 linux-image-6.1.0-18-amd64 (--configure)时出错: 已安装 linux-image-6.1.0-18-amd64 软件包 post-installation 脚本 子进程返回错误状态 1
正在设置 linux-headers-6.1.0-18-amd64 (6.1.76-1) ...
/etc/kernel/header_postinst.d/dkms:
dkms: running auto installation service for kernel 6.1.0-18-amd64.
Sign command: /usr/lib/linux-kbuild-6.1/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module:
Cleaning build area...
env NV_VERBOSE=1 make -j32 modules KERNEL_UNAME=6.1.0-18-amd64......(bad exit status: 2)
Error! Bad return status for module build on kernel: 6.1.0-18-amd64 (x86_64)
Consult /var/lib/dkms/nvidia-current/525.147.05/build/make.log for more information.
Error! One or more modules failed to install during autoinstall.
Refer to previous errors for more information.
dkms: autoinstall for kernel: 6.1.0-18-amd64 failed!
run-parts: /etc/kernel/header_postinst.d/dkms exited with return code 11
Failed to process /etc/kernel/header_postinst.d at /var/lib/dpkg/info/linux-headers-6.1.0-18-amd64.postinst line 11.
dpkg: 处理软件包 linux-headers-6.1.0-18-amd64 (--configure)时出错: 已安装 linux-headers-6.1.0-18-amd64 软件包 post-installation 脚本 子进程返回错误状态 1
dpkg: 依赖关系问题使得 linux-image-amd64 的配置工作不能继续: linux-image-amd64 依赖于 linux-image-6.1.0-18-amd64 (= 6.1.76-1);然而:  软件包 linux-image-6.1.0-18-amd64 尚未配置。
dpkg: 处理软件包 linux-image-amd64 (--configure)时出错: 依赖关系问题 - 仍未被配置dpkg: 依赖关系问题使得 linux-headers-amd64 的配置工作不能继续: linux-headers-amd64 依赖于 linux-headers-6.1.0-18-amd64 (= 6.1.76-1);然而:  软件包 linux-headers-6.1.0-18-amd64 尚未配置。
dpkg: 处理软件包 linux-headers-amd64 (--configure)时出错: 依赖关系问题 - 仍未被配置在处理时有错误发生: linux-image-6.1.0-18-amd64
 linux-headers-6.1.0-18-amd64
 linux-image-amd64
 linux-headers-amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)
1 个赞

该问题已经在 sid 的 nvidia-kernel-dkms 包中修复,使用了 gentoo 的补丁,相关的问题报告:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063363

nvidia-kernel-dkms 相关更新日志:

https://metadata.ftp-master.debian.org/changelogs//non-free-firmware/n/nvidia-graphics-drivers/nvidia-graphics-drivers_525.147.05-6_changelog

nvidia-graphics-drivers (525.147.05-6) unstable; urgency=medium

  * Apply pfn_valid patch from gentoo to fix kernel module build for
    Linux 6.1.76, 6.6.15, 6.7.3, 6.8.  (Closes: #1063363, #1062932)
  * nvidia-detect: Tesla and regular driver packages have been merged.
  * Update lintian overrides.

 -- Andreas Beckmann <anbe@debian.org>  Fri, 09 Feb 2024 20:43:30 +0100

nvidia-graphics-drivers (525.147.05-5) unstable; urgency=medium

  * Switch src:nvidia-graphics-drivers to the Tesla driver series.
  * Build for ppc64el.
  * Build all unversioned packages from src:nvidia-graphics-drivers.
  * Enable nvidia-suspend-common.  (Closes: #1059581, #1056557, #1062281)
  * nvidia-suspend-common: Depend on kbd for chvt.  (Closes: #1058081)
  * New Romanian (ro) debconf translations by Remus-Gabriel Chelu.
    (Closes: #1059590)

 -- Andreas Beckmann <anbe@debian.org>  Tue, 23 Jan 2024 18:13:36 +0100

nvidia-graphics-drivers (525.147.05-4~deb12u1) bookworm; urgency=medium

  * Rebuild for bookworm.

 -- Andreas Beckmann <anbe@debian.org>  Sun, 03 Dec 2023 00:24:21 +0100

当前 stable 中 nvidia-kernel-dkms 版本为 525.147.05-4~deb12u1

因此,nvidia-driver 包用户不应当更新到最新版本内核 linux-image-6.1.0-18-amd64;如果不小心更新,请选择启动时 grub 菜单中的旧版本的 linux-image-6.1.0-17-amd64,并等待修复更新。(这就是系统更新时总是保留一个旧版本内核的原因)

1 个赞

nvidia的470.223.02版本run包,6.1.0-18内核也不支持。

刚刚注意到修复版本已经被上传到 bookworm-proposed-updates

https://release.debian.org/proposed-updates/stable.html

Reason: restore compatibility with newer Linux kernel builds; take over packages from nvidia-graphics-drivers-tesla; add new nvidia-suspend-common package

启用了 bookworm-proposed-updates 仓库后更新了所有组件,目前为止工作正常。

关于 proposed-updates:

https://www.debian.org/releases/proposed-updates.zh-cn.html

不推荐一般用户默认开启这个仓库,不过如果你真的有需要(我想部分受影响的 nvidia-driver 用户应该是需要的),可以编辑软件源:

/etc/apt/sources.list

添加一行:

deb https://你的软件源地址/debian/ bookworm-proposed-updates main contrib non-free non-free-firmware

使用命令 sudo apt update 更新软件源信息,升级需要的的软件包即可。

后续不需要该仓库的话可以将该仓库关掉。

更新:

目前来看该新版本驱动没有什么问题,也许可以把这个作为解决方案也说不定。

我的系统更新失败后,仍然可以正常运行,内核还是6.1.0-17,不过在安装其它包的时候,还会报错。

这种情况,需要手动修复么,或者,后续能够自动修复么?

因为新内核安装失败所以没有能完成后续的操作,系统还在用旧内核。不过也许只要不更新相关的包(比如新版本内核?我不确定。终归这是不正确的软件包安装状态),应该问题不大,只是每次 apt 运行都会重复一次 dkms 构建失败 :smiling_face_with_tear:

这个问题只要安装新版本 nvidia-driver 就可以自动修复。不过不太清楚什么时候会往主仓库发布更新,估计得等到下一个 point release 才行?

那我就先等等看……

安装错误了,要修复,6.1.0-18如果不能正常使用,卸载6.1.0-18版本内核就可以,而且后续不会再自动更新6.1.0-18版本内核。

可以 apt-mark hold 一下 linux-image-6.1.0-17-amd64 ?

不用hold,自动更新就不会再显示内核更新(6.1.0-18),我前些天就这样做,再sudo apt update && sudo aptupgrade -y ,没有自动安装6.1.0-18内核。手动安装还可以。

不理解为什么,反正能用就行

https://lists.debian.org/debian-stable-announce/2024/02/msg00002.html

该问题已经在 bookworm-updatesstable-updates 的更新中解决。请检查 /etc/apt/sources.list 中是否存在条目:

deb https://你的软件源网址/debian/ bookworm-updates main contrib non-free non-free-firmware

然后照常更新系统即可。


相关软件包信息如下:

$ apt search --names-only nvidia-driver -t bookworm-updates
Sorting... Done
Full Text Search... Done
nvidia-driver/stable-updates,now 525.147.05-7~deb12u1 amd64 [installed,automatic]
  NVIDIA metapackage

nvidia-driver-bin/stable-updates,now 525.147.05-7~deb12u1 amd64 [installed,automatic]
  NVIDIA driver support binaries

nvidia-driver-libs/stable-updates,now 525.147.05-7~deb12u1 amd64 [installed,automatic]
  NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)

前两天看到有更新包提示,看了一下,里面有几个关于linux-header和nvidia的包;今早升了一下,升级没有提示错误,重启uname -r,内核是6.1.0-18-amd64,应该是完全修复了。

root@chaoqunxie-debian:/etc/apt# apt-get install nvidia-driver

Reading package lists... Done

Building dependency tree... Done

Reading state information... Done

nvidia-driver is already the newest version (525.147.05-7~deb12u1).

The following packages were automatically installed and are no longer required:

gir1.2-gmenu-3.0 gnome-session-common libbrotli-dev libfontconfig-dev libfontconfig1-dev libfreetype-dev libgnome-menu-3-0 libpthread-stubs0-dev libtcl8.6 libtk8.6 libxcb-xv0 tcl tcl-dev tcl8.6 tcl8.6-dev tk tk8.6 uuid-dev xtrans-dev

Use 'apt autoremove' to remove them.

0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

2 not fully installed or removed.

After this operation, 0 B of additional disk space will be used.

Do you want to continue? [Y/n] y

Setting up nvidia-kernel-dkms (525.147.05-7~deb12u1) ...

Removing old nvidia-current-525.147.05 DKMS files...

Deleting module nvidia-current-525.147.05 completely from the DKMS tree.

Loading new nvidia-current-525.147.05 DKMS files...

Building for 6.1.0-21-amd64

Building initial module for 6.1.0-21-amd64

Error! Bad return status for module build on kernel: 6.1.0-21-amd64 (x86_64)

Consult /var/lib/dkms/nvidia-current/525.147.05/build/make.log for more information.

dpkg: error processing package nvidia-kernel-dkms (--configure):

installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10

dpkg: dependency problems prevent configuration of nvidia-driver:

nvidia-driver depends on nvidia-kernel-dkms (= 525.147.05-7~deb12u1) | nvidia-kernel-525.147.05 | nvidia-open-kernel-525.147.05 | nvidia-open-kernel-525.147.05; however:

Package nvidia-kernel-dkms is not configured yet.

Package nvidia-kernel-525.147.05 is not installed.

Package nvidia-kernel-dkms which provides nvidia-kernel-525.147.05 is not configured yet.

Package nvidia-open-kernel-525.147.05 is not installed.

Package nvidia-open-kernel-525.147.05 is not installed.

dpkg: error processing package nvidia-driver (--configure):

dependency problems - leaving unconfigured

Errors were encountered while processing:

nvidia-kernel-dkms

nvidia-driver

E: Sub-process /usr/bin/dpkg returned an error code (1)

我原本就有 ```
deb https://你的软件源网址/debian/ bookworm-updates main contrib non-free non-free-firmware

6.1.0-21-amd64

按照这样的方式还是没能得到解决

我将内核版本降低到6.1.0-17 还是未能解决

看看 /var/lib/dkms/nvidia-current/525.147.05/build/make.log 报错内容?

以及 apt-cache policy nvidia-kernel-dkms 输出是什么?

dpkg -l | grep nvidia-kernel-dkms 输出是什么?