timeouts building ppa packages that used to build flawlessly

Asked by Václav Šmilauer

Hi, I have been successfully compiling packages for Yade simulation program for several months, in https://launchpad.net/~yade-users/+archive/ppa. In last weeks, however, all attempts to have package built failed with strange timeout; it appears as if the build process got stalled and then killed.

I have no problems building the package locally; I tried the following, without any changes:

1. explicitly disable parallel build (I know old version of fakeroot had issues with that); by default, I used value passed in DEB_BUILD_OPTIONS, which is 2 according to build logs.
2. make number of files in one compilation unit smaller, in order to decrease RAM usage (it might go over 4GB for the largest compilation unit, and over 12GB for parallel builds)

Build logs of 2 of those builds are here:

https://launchpad.net/~yade-users/+archive/ppa/+sourcepub/1005379/+listing-archive-extra
https://launchpad.net/~yade-users/+archive/ppa/+sourcepub/1009469/+listing-archive-extra

Can you investigate in more detail what happened, or give me some advice how to avoid this problem?

Thanks, Václav

---

Relevant part of one of the build logs:

[ ... commands run during the build, this is the last one ...]
cp -f py/ymport.py debian/yade-bzr2095-dbg/usr/lib/yade-bzr2095-dbg/py/yade/ymport.py

Session terminated, killing shell...make: *** [install] Terminated
 ...killed.
scons: *** [debian/build-bzr2095-dbg/plugins.os] Build interrupted.
scons: Build interrupted.
scons: building terminated because of errors.
scons: writing .sconsign file.
semop(1): encountered an error: Invalid argument
Build killed with signal 15 after 150 minutes of inactivity
******************************************************************************
Build finished at 20100324-0150
FAILED [dpkg-buildpackage died]

[ ...cleanup commands... ]

Question information

Language:
English Edit question
Status:
Solved
For:
Launchpad itself Edit question
Assignee:
Build Daemon Maintainers Edit question
Solved by:
Václav Šmilauer
Solved:
Last query:
Last reply:
Revision history for this message
Michael Nelson (michael.nelson) said :
#1

Hi Václav,

Looking at:

https://edge.launchpad.net/~yade-users/+archive/ppa/+builds?build_text=&build_state=failed

it seems that it's only Karmic builds of yade-bzr that are failing, although exceptions are:

28/03/2010 - took 9 hrs (https://edge.launchpad.net/~yade-users/+archive/ppa/+build/1585814), although the amd64 build failed as per usual.
10/03/2010 - took 24 minutes! (https://edge.launchpad.net/~yade-users/+archive/ppa/+build/1553992)

I'm not sure what might have changed on the buildd environments, but the "Build killed with signal 15 after x minutes of inactivity" seems to imply that the build process itself is stalling sometimes? I'm assuming you've done this build in pbuilder etc.

I'll see if Lamont (our buildd specialist) can take a look.

Revision history for this message
Michael Nelson (michael.nelson) said :
#2

Didn't mean to set the question as "Answered" with my previous comment.

Revision history for this message
Václav Šmilauer (eudoxos) said :
#3

Michael, thanks for your comments. The 25 minutes build was broken as most of the program wasn't compiled at all.

I double-checked in local pbuilder, the bottleneck is, as with usual local builds, compilation of libplugins.so (it happens twice, once for the debug build and once for the optimized build);

* compiler takes over 4GB of RAM on that file
* as assmbles it for about 5 minutes (the debug one; the optimized one goes fast)
* ld links it for 5 minutes (using vanilla binutils linker; it is about 10x faster with binutils-gold that I use locally, but I don't want to add it to build-deps since it didn't exist in older distribution)

All this on pretty snappy 2.66GHz i7 + 12GB RAM (DDR3); the total build took about 25 minutes in pbuilder chroot on ramdisk.

Don't have the buildbots 4GB RAM and start swapping during the build? That would then slow it down 1000x, explaining the timeout, as well as that https://launchpad.net/~yade-users/+archive/ppa/+build/1585814 suceeded, but it took it 9 hours (!!) -- perhaps there was no pressure from the queue system (I don't know how it works behind) and it let it run.

I can split the build ot libplugins.so into several translation units, to decrease RAM usage, but I would like to have pointer to a possible cause first, to not overcharge launchpad with useless experimentation.

Thanks.

Revision history for this message
Julian Edwards (julian-edwards) said :
#4

On Monday 29 March 2010 08:11:46 you wrote:
> I'm not sure what might have changed on the buildd environments, but the
> "Build killed with signal 15 after x minutes of inactivity" seems to
> imply that the build process itself is stalling sometimes? I'm assuming
> you've done this build in pbuilder etc.
>

This signal happens when there's no build output for a while IIRC.

Revision history for this message
Václav Šmilauer (eudoxos) said :
#5

@Julian: thanks. What can be its cause on the build machine? swapping? As said, it compiles normally in local pbuilder.

Revision history for this message
Julian Edwards (julian-edwards) said :
#6

I would only be guessing but yes, swapping would slow it down a lot.

Revision history for this message
Václav Šmilauer (eudoxos) said :
#7

Someone can give (non-guess) information on RAM on the build nodes?

(If not, I can create debian/rules that will run free or such, but I imagine someone knows this??)

Revision history for this message
Julian Edwards (julian-edwards) said :
#8

The build machines are not identical, the memory varies between 1-2GiB.

Revision history for this message
LaMont Jones (lamont) said :
#9

Most of the buildd machines have at least 1.5GB, though I suspect that at least one has .75GB of RAM. More than 2GB of RAM would be rare, and over 3.5GB would surprise me.

Revision history for this message
Václav Šmilauer (eudoxos) said :
#10

Thanks, LaMont. I think it is clear now.