[users at bb.net] Suggestions for Configuration

Fri Oct 28 20:02:41 UTC 2022

Hi!

I'm doing mass-builds for a number of projects (Binutils/GDB, GCC,
SIMH, Linux Kernel, NetBSD) using Laminar [1] right now. The Laminar
server is running on the very same host that runs the actual build
scripts. After some discussions with GCC folks, I'd like to
additionally setup a Buildbot instance replicating my current Laminar
setup (which works perfectly well for my needs.) Also, It would be
nice to spread the load to multiple builder machines.

  After skimming through the Docs, I'm left with more questions than
answers, so I'd like (sorry for the lengthy email!) to describe my
current setup to get suggestions how that'd be matched with Buildbot
features, external scripts, ...

  Lets start with "jobs". Laminar uses a jobs directory that contains
*.run files, each describing a single job. My setup uses two
fundamentally different job types:

  * "admin" jobs creating other jobs; and
  * regular jobs (being symlinks to a very small number of actual
    `*.run` files that implement one job.)

I'll follow with an overview of both of these job categories and with
details about the actual regular jobs.

 Admin Jobs (generating actual jobs)
=====================================
  * admin-docker: This job will symlink a few "docker-xxxxx" scripts
    to create proper docker container that contain the minimum
    required packages to build "binutils", "gdb", "gas", "gcc",
    "simh", ..., which will result in a number of equally named
    containers. The `xxxx` resolves to the final container name, as
    well as a variable name (sourced from a shell fragment) which
    sets a variable containing the package list needed for this
    container. All build containers are based on Debian unstable and
    built every few days.

  * admin-toolchain: This job will create "binutils-xxxx", "gas-xxxx",
    "gdb-xxxx" and "gcc-xxxxx" jobs to build these projects for one
    specific target. The target names are from an internal list and
    extended by the targets listed in `[gcc]/contrib/config-list.mk`.

  * admin-netbsd: Calls `[netbsd-src]/build.sh list-arch` and parses
    the presented list of architectures and machines, creating jobs
    named `netbsd-{arch}-{machine}` (calling a script that
    cross-builds NetBSD from within a Linux Docker container) or
    `nnetbsd-{arch}-{machine} to build it from within a Qemu-amd64 VM
    running latest NetBSD.

  * admin-simh: Create jobs for all machine simulators by `grep`ping
    through SIMH's `[simh]/makefile`, resulting in a number of
    `simh-{machine}` jobs

 Common "real" job concepts
============================
The Binutils, GAS, GCC, SIMH and NetBSD-cross-compiled-on-Linux jobs
are quite similar:

  * They're run in a per-project Docker container, always fresh
    (`docker run -rm ...) with a few volumes mounted:

      * /var/cache/git: This is where the respective GIT repos are
	mirrored. Each job always starts with a 100% clean environment.

      * /var/lib/laminar/cfg/scripts: Some helper scripts that are
        usually `source`d. Here's for example the definition which
        packages to install in the prepared docker containers, which
        APT-Cacher (proxy) to use, which compiler to use.

     * /var/cache/laminar: Build results (tarballs containing useable
       binaries) are stored here. For that, the current HEAD commit id
       is placed into the filename and a symlink is generated, ie:
       /---------------------
       | root at lili:/var/cache/laminar# ls -l gcc-vax-linux*
       | -rw-r--r-- 1 laminar laminar 122006662 Oct 28 11:45 gcc-vax-linux-0607307768b66a90e27c5bc91a247acc938f070e.tar.gz
       | -rw-r--r-- 1 laminar laminar 121828373 Oct 26 00:07 gcc-vax-linux-59af5e0bdadb1e47cb7f727bc608b2814e48fe27.tar.gz
       | -rw-r--r-- 1 laminar laminar 121986587 Oct 26 01:38 gcc-vax-linux-65f5fa23844b55e3c6359e1612e6fdd4d10950d0.tar.gz
       | -rw-r--r-- 1 laminar laminar 121757721 Oct 26 00:07 gcc-vax-linux-6ce0823721d476cabb2007fecc12c07202325e17.tar.gz
       | -rw-r--r-- 1 laminar laminar 121753321 Oct 26 00:07 gcc-vax-linux-7858368c3f3875f6bf634119e5731dc3c808a7c3.tar.gz
       | -rw-r--r-- 1 laminar laminar 121967482 Oct 26 00:07 gcc-vax-linux-7c55755d4c760de326809636531478fd7419e1e5.tar.gz
       | -rw-r--r-- 1 laminar laminar 121917312 Oct 26 00:08 gcc-vax-linux-92ef7822bfd4ea3393e0a1dd40b4abef9fce027f.tar.gz
       | -rw-r--r-- 1 laminar laminar 121928547 Oct 26 00:07 gcc-vax-linux-b3c98d6a59a6dcd5b0b52bd5676b586ef4fe785f.tar.gz
       | -rw-r--r-- 1 laminar laminar 121832342 Oct 26 00:08 gcc-vax-linux-b4a4c6382b14bc107d6d95ff809f3e9cd71944e7.tar.gz
       | -rw-r--r-- 1 laminar laminar 121926134 Oct 26 00:07 gcc-vax-linux-ba281da28d34f9a78a07f6ee56ad2c754447966e.tar.gz
       | -rw-r--r-- 1 laminar laminar 121828837 Oct 26 00:07 gcc-vax-linux-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.gz
       | -rw-r--r-- 1 laminar laminar 121869027 Oct 26 00:07 gcc-vax-linux-d45af5c2eb1ba1e48449d8f3c5b4e3994a956f92.tar.gz
       | -rw-r--r-- 1 laminar laminar 121761826 Oct 26 00:07 gcc-vax-linux-e5139d18dfb8130876ea59178e8471fb1b34bb80.tar.gz
       | lrwxrwxrwx 1 laminar laminar        61 Oct 28 11:45 gcc-vax-linux.tar.gz -> gcc-vax-linux-0607307768b66a90e27c5bc91a247acc938f070e.tar.gz
       \---------------------
       The symlink is only updated for `master` builds. That way,
       espeically the `linux-*` jobs can (per default) fetch the
       most recent compiler, but they can also use an older one.

     * /var/lib/laminar/cfg/patches: This directory contains my
       current pile of patches. They either have a name like
       `${JOB}--description.patch` (for patches specific to one actual
       job or `generic-${JOB_PREFIX}--descrion.patch` that applies to
       all similar jobs (so as a GCC example, the former may be for
       GCC with --target=pdp11-aout only, while the latter is applied
       for all GCC builds.)

  * A few environment variables are passed. They can be added when
    queueing a job with `laminarc queue ....`:

      * rev: The GIT commit ID or branch name to build. Helps with
        bisecting after I found some breakage.

      * compiler_suite: One of the scripts (in the directory above)
        is source'd and uses the ${compiler_suite} variable
	(containing one of "gcc-snapshot" (default), "gcc-system",
        "clang-11", "clang-13", "clang-14", "clang-15") and will setup
        $CC, $CXX, $CPP, $LD, $LD_LIBRARY_PATH accordingly.

      * The current ${JOB} name.

  * The docker container is actually started like:
    docker run --rm --interactive --volume ... --env "rev=${rev}" ...  \
       /bin/bash <<'DOCKER_SCRIPT'
    Then, the actual build script (set -ex, source compiler-choosing
    script, `git clone`, `git checkout $rev`, `apply_patches`, `make`,
    `make test`, `tar czf /var/cache/laminar/${JOB}-${rev}.tar.gz installdir`
    is fed into that running `bash`.

  * Obviously, the script has to parse ${JOB} from ie. "simh-vax" to
    "simh" (--> Docker image name) and "vax" (actual target to build
    within this job.) For the toolchain job, after the initial "gcc-",
    "binutils-", "gas-" or "gdb-", there's the `--target=...` value
    to be used.

 Specific different job scripts
================================
 nnetbsd-${arch}-${machine}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These jobs build NetBSD by starting a Qemu amd64 NetBSD VM. It is also
automatically installed (just like the Docker containers are prepared,
but I call the install script manually.) Each of these VMs use the
(prepared) base disk read-only with a read-writeable overlay, so the
VMs are fresh with every invocation as well. Similar to the Docker
containers started with a `bash` which gets it's input fed in from
stdin, the Qemu VMs are ssh'ed to, and `sh` is started with a number
of variables (just with Docker above.) However, the fed-from-stdin
script does _not_ prepare the repo. That's done beforehand by a
`tar cf ... | ssh build at netbsd-vm-NN "tar xf - `. Also, after the
`ssh sh` call finished successfully, the `nnetbsd-*.run` script will
download the resulting NetBSD ISO from the VM and shut it down
(destroying the r/w overlay.)

 linux-${arch}-${defconfig}
~~~~~~~~~~~~~~~~~~~~~~~~~~~
These jobs build one of Linux's defconfig files. We're jumping through
some hoops to not use the system compiler(s), but one (or two) of the
self-built GCCs. That's done by first doing a `make oldconfig`, then
parsing the resulting `.config` file to deduce the needs:

  * What (GNU) CPU name?
  * 32/64 bits?
  * little / bit endianess?

Based on that, the actual proper tarballs (gas-*.tar.gz,
binutils-*.tar.gz and gcc-*.tar.gz) are extracted, placed into $PATH
and the final build can take place. A succesful build will create a
tarball containing the kernel and its modules.

 docker-${job-prefix}
~~~~~~~~~~~~~~~~~~~~~~
These are small jobs, basically doing:
/-----------------------------
| # Build docker image.
| log_execute "docker pull"  docker image pull "${DOCKER_CONTAINER}"
| log_execute "docker build" docker build --no-cache                              \
|                                         --pull                                  \
|                                         --tag "${BUILD_IMAGE_NAME}"             \
|                                         --build-arg="http_proxy=${http_proxy}"  \
|                                         - << __EOF__
| FROM    ${DOCKER_CONTAINER}
| # Run installation twice as apt-cacher seems do die at 1 GB or the like...
| RUN     rm -f /etc/apt/apt.conf.d/docker*                                                                                       && \
|         apt-get update && { apt-get -y install ${!BUILD_VAR} < /dev/null; apt-get -y install ${!BUILD_VAR} < /dev/null; }       && \
|         apt-get clean
| __EOF__
\-----------------------------

 Artifacts
===========
Right now, I differentiate between two kinds of artifacts:

  * Final build results (tarballs) which are collected for actual
    further use (ie. gas/binutils/ld/gcc -> used to build Linux;
    the SIMH machines to run a SIMH-VAX instance with NetBSD, both
    built through Laminar from the most recent sources.) These are
    stored by the jobs themselfes in the /var/cache/laminar/ directory.

  * Laminar can run an `after` script, where I collect all *.sum,
    config.log, config.h and test-suite.log files into Laminar's
    "official" artifacts storage. This is accessible through its web
    itnerface.

  * This differentiation has historical reasons as, that way, it was
    most convenient to have access to all the tarballs.

  * For failed builds, I echo one line to a file
    (`/var/cache/laminar/laminar-build-changes/${JOB}`) to keep track
    (in a script-friendly manner) of failed builds (ie. which one
    failed first and when were attempts made to re-check this build?)
    This per-job file is deleted upon success.

 Goals for a Buildbot Setup
============================
  * Preferrably keep the scheme of job-generating jobs.

  * Use multiple worker hosts. (All shall be similar, ie. have access
    to the very identical Docker containers or NetBSD Qemu VMs.)

  * All jobs are ment to build single-threaded (to keep the log output
    as stable as possible), so on an usual assigned worker host, there
    can be many worker instances running in parallel, each offering
    one dockerized build environment or a Qemu VM (for running NetBSD.)

  * Put the generated Docker containers somewhere central and let the
    workers pull from there. (Probably easy, but I've never done
    that.)

  * Clone GIT repos via network instead from a locally-available
    directory. Alternatively: Mirror all repos to the build hosts?

  * Centrally host the "thick" artifacts (binaries) as well as the
    small stuff (test results etc.)

  * Give workers a way to fetch the "thick" artifacts (ie. the Linux
    builds need access to the toolchain tarballs.)

 Nice-to-have Goals
~~~~~~~~~~~~~~~~~~~~
With my current setup, I find lots of issues. Most often, I report
them to the person that "broke" it, or publically replying to the
submitted patch. I've got some scripts to help with tracking down
offending commits as I usually am too lazy to look through the commits
since the last successful build if that's more than a few commits.

  * For GCC/gas/binutils/gdb/simh, I've got a script to hand in a
    last-known-good and a known-bad commit id along with a modulo
    number. That way, every n'th commit between [good,bad] is queued
    as kind of a slightly parallelized `git bisect`. (Actually, I'm
    often using `git bisect` along with this.)

  * For Linux jobs, I can queue in jobs in a similar way, but for
    either Linux kernel commit IDs _or_ for a list of self-built
    compilers to test. (That's why all "thick" build results have the
    commit id in their filename and a symlink is done IFF this is a
    fresh master build.)

Well... That's a brief description of my current setup. It fits my
needs and works, but as mentioned, after discussions with some GCC
guys, I'd like to give Buildbot a try as an addition to my current
setup.

  What I'm searching for are, most importantly, common best practices.
That's something I'm missing from the manual (or from a good set of
examples.) In fact, the Laminar docs [2] had some pretty neat hints
that matches many of my ideas quite well.

  * How to I build variants of a generic job?

  * How can I run the build script inside a Docker container (that's a
    fresh one on every invocation?)

  * How do I spread /n/ instances on /m/ hosts? (With possibly
    different /n/ as there might be a different number of CPUs or
    insufficient RAM to serve as many jobs as there are CPUs.)

  * How can I prepare Docker containers centrally and push/pull them
    to the worker nodes?

`---> ...or how is something "like that" done The Buildbot Way? Hints
to docs or actual configurations are welcome, as well as hints to show
up completely different concepts or approaches to my builts.

Thanks a lot,
  Jan-Benedict Glaw

[1] https://laminar.ohwg.net/
[2] https://laminar.ohwg.net/docs.html

-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://lists.buildbot.net/pipermail/users/attachments/20221028/48224d3d/attachment.bin>