switch of riscv64 builders to bos03 has broken building of many KDE source packages

Asked by Rik Mills

Switch of riscv64 builders to bos03 has broken building of many KDE source packages.

This is obviously a CRITICAL issue flavours such as Kubuntu, UbuntuStudio (now using plasma as desktop) and lubuntu (who use a fair amount of kde libraries and apps).

The issue will prevent us from getting new KDE things into the dev release (noble), and would block SRUs to stable releases for the impacted sources.

Kconfig is part of the KDE frameworks QT extensions, so is used by large parts of plasma-desktop, kde applications, and in fact higher parts of the frameworks build stack (kconfig is tier2).

We initially noticed the issue when source packages that build depend on src:kconfig started failing to build on upload to noble or sync from Debian. For example:

https://launchpad.net/ubuntu/+source/krdc/4:23.08.3-1ubuntu1
https://launchpad.net/ubuntu/+source/qmlkonsole/23.08.3-1ubuntu1
https://launchpad.net/ubuntu/+source/haruna/0.12.3-1

These are failing due to a segmentation fault in /usr/lib/libexec/kf5/kconfig_compiler_kf5 at build time. That binary is provided by src:kconfig

This failure is new and did not occur before the builder switch to bos03 with the same kconfig version and build.

Likewise test no-change rebuilding kconfig also fails, as kconfig invokes /usr/lib/libexec/kf5/kconfig_compiler_kf5 in its build time tests.

https://launchpad.net/ubuntu/+source/kconfig/5.112.0-0ubuntu2

Disabling the tests would be futile, as we would just allow a new version of the segfaulting kconfig_compiler_kf5 to publish, still leaving much of KDE (and some other) source packages unbuildable.

In case you are wondering if this is caused by some change in the noble dev release toolchain, that can be demonstrated to not be the case test rebuilding kconfig on mantic where these issues never manifested.

https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4979/+sourcepub/15413039/+listing-archive-extra

This fails in exactly the same way (segfault) as in noble, whereas before the builder switch this was not the case.

Again I cannot stress how critical an issue this is for several Ubuntu flavours.

Thanks in advance for any assistance or solution you can bring.

Rik Mills
Kubuntu dev
MOTU

Question information

Language:
English Edit question
Status:
Answered
For:
Launchpad itself Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Rik Mills (rikmills) said :
#1

Also seems to be a similar segfault issue with source ksyntax-highlighting

https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4979/+sourcepub/15418253/+listing-archive-extra

The above segafault regression in the katehighlightingindexer binary occurs in that mantic no change test rebuild, AND in a noble test build

This is not caused by the previous segfaulting kconfig_compiler_kf5, as this is not used in the build at all. It is not even installed.

However, it does appear to be of a similar nature to the kconfig issue, and related to the builder switch.

Revision history for this message
Jürgen Gmach (jugmac00) said :
#2

Hi Rik, thanks for the report. We are now aware of the issue and will investigate. Please keep us posted in case you have additional information available.

Revision history for this message
Simon Quigley (tsimonq2) said :
#3

On a fresh Debian porterbox, the build passes and all tests pass too:

100% tests passed, 0 tests failed out of 45

In case a fellow Debian Developer would like to reproduce this, I used debian-riscv64-porterbox-01.debian.net

This is an Ubuntu-specific issue.

Revision history for this message
Steve Langasek (vorlon) said :
#4

> Again I cannot stress how critical an issue this is for several Ubuntu flavours.

Let it be noted that none of the flavors are obliged to support the riscv64 architecture. We certainly don't build flavor *images* for this architecture. While riscv64 isn't an architecture that we would categorically disable desktop packages for (like ppc64el and s390x are), if the builder issues take an unacceptable time to resolve and this is impacting flavor development for noble, please contact the Release Team and Archive Admins about removing the unbuildable riscv64 binaries from the noble release pocket to unblock you. (Once the builder issues are resolved, those failed package builds can be retried.)

Revision history for this message
Simon Quigley (tsimonq2) said :
#5

> Let it be noted that none of the flavors are obliged to support the riscv64 architecture. We certainly don't build flavor *images* for this architecture.

While we aren't obliged to support it, we certainly take a "best effort" approach. RISC-V is an interesting architecture, given that it is fully open. It is incredibly frustrating to not have access to any sort of Ubuntu porterbox, and it makes diagnosing these issues difficult (in this case, I would have already ran GDB/Valgrind and figured this out, but I can't reproduce it in Debian, and setting up an Ubuntu chroot on a Debian porterbox is both hacky and unsupported.) While enabling RISC-V for PPAs is a great first step, iterating on a PPA (especially for this architecture, and the uncertainty as to *which* builder it's going to use) is slow and painful, and how would I even run GDB in that case anyway?

In the future, yes, we would like to fully support these images. It just takes the appropriate amount of time, effort, and resources.

> While riscv64 isn't an architecture that we would categorically disable desktop packages for (like ppc64el and s390x are), if the builder issues take an unacceptable time to resolve and this is impacting flavor development for noble, please contact the Release Team and Archive Admins about removing the unbuildable riscv64 binaries from the noble release pocket to unblock you. (Once the builder issues are resolved, those failed package builds can be retried.)

Respectfully, this is a band-aid at best. If there are *any* security issues or stable release updates required (I can speak for Lubuntu when I say we *do* address these), are we supposed to remove riscv64 binaries from the release pocket of a stable release too?

The best approach, in my opinion, would be to not allow Britney to block on riscv64 for these packages, if the Launchpad Team doesn't have a solution in-hand soon. Let me be clear: this not only blocks Noble development, it blocks *any stable update for any release to any of these three flavors.*

I hope we can come to a solution for this soon.

Revision history for this message
Rik Mills (rikmills) said (last edit ):
#6

> if the builder issues take an unacceptable time to resolve and this is impacting flavor development for noble, please contact the Release Team and Archive Admins about removing the unbuildable riscv64 binaries from the noble release pocket to unblock you.

I appreciate that, however the basic nature of some of the dependencies and higher level packages I've already seen FTBFS in PPA tests mean that I think that you would in turn be removing huge parts of the KDE frameworks, desktop and applications stack. It would be like knocking over the bottom row on a house of cards. Not much would be left standing.

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) said :
#7

Debian can't reproduce the issue and the kernel is:

Linux debian-riscv64-porterbox-01 6.5.0-3-riscv64 #1 SMP Debian 6.5.8-1 (2023-10-22) riscv64 GNU/Linux

Ubuntu builders have now (bos2)
Kernel version: Linux bos03-riscv64-049 5.19.0-1021-generic #23~22.04.1-Ubuntu SMP Thu Jun 22 12:49:35 UTC 2023 riscv64

Previously qemu builders had
Kernel version: Linux riscv64-qemu-lgw01-006 5.13.0-1019-generic #21~20.04.1-Ubuntu SMP Thu Mar 24 22:36:01 UTC 2022 riscv64

I did a strace of kconfig during build with an hack in rules file:
https://launchpadlibrarian.net/701718932/buildlog_ubuntu-noble-riscv64.kconfig_5.112.0-0ubuntu2ppa1_BUILDING.txt.gz

brk(0x555581298000) = 0x555581298000
read(4, "<?xml version=\"1.0\" encoding=\"UT"..., 16384) = 2415
read(4, "", 13969) = 0
read(4, "", 16384) = 0
read(4, "", 16384) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffa88bc000
riscv_flush_icache(0x7fffa88cbc10, 0x7fffa88cbe3c, 0) = 0
riscv_flush_icache(0x7fffa88cb6b0, 0x7fffa88cb980, 0) = 0
riscv_flush_icache(0x7fffa88cb238, 0x7fffa88cb4a0, 0) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffff8126e464} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Revision history for this message
Rik Mills (rikmills) said :
#8

Not 100% certain this is related, but a couple of sources in main now segfault in a suspiciously similar fashion, without any significant changes to the build parameters. These built ok before the switch to bos03.

https://launchpad.net/ubuntu/+source/libsoup3/3.4.4-2

https://launchpad.net/ubuntu/+source/libsoup2.4/2.74.3-2

Revision history for this message
William Grant (wgrant) said :
#9

I've only had a few minutes to investigate, but it's crashing in the pcre2 JIT, which was enabled on riscv64 on 2023-08-29. The pcre2 test suite also shows the segfault when I turn off the default riscv64 nocheck. This segfault doesn't appear on real hardware, but I haven't tried on the old qemu version yet.

Revision history for this message
Emil Renner Berthing (esmil) said (last edit ):
#10

On our 6.5 kernels in mantic we had bug that would make stress-ng and rust programs segfault before 6.5-14.14.1 :
https://bugs.launchpad.net/bugs/2042388

That could explain some of the crashes, but I see that some of them were on a 5.19 kernel.

Revision history for this message
Erich Eickmeyer (eeickmeyer) said :
#11

Is it possible that what we're seeing is a major bug in qemu's riscv emulation?

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) said :
#12

Regardless of the fact that this might be a bug in qemu or in kernel, or in missing real HW, to speed up things, I reverted the PCRE2 change as William said, and uploaded into the archive.
https://launchpad.net/ubuntu/+source/pcre2/10.42-4ubuntu1

If JIT is not yet fully ready, better to not have it rather than having half the archive building.

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) said :
#13

I also explictly enabled testsuite on riscv64 for pcre2 build, this way we should get regressions earlier

Revision history for this message
Michael Hudson-Doyle (mwhudson) said :
#14

The new pcre2 seems to have helped, things seem to be OK now? Or is this still an issue?

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) said :
#15

I propose to close, and re-evalutate once we have real riscv64 hw

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) said :
#16

https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/costamagnagianfranco-ppa/+sourcepub/15866181/+listing-archive-extra looks like the 10.43-1 is not having troubles on riscv64 (or something different fixed in the meanwhile)

Can you help with this problem?

Provide an answer of your own, or ask Rik Mills for more information if necessary.

To post a message you must log in.