Managing updates to debian.org systems

Initial setup

Clone the dsa-misc repository.

Create links to scripts/multi-tool/* in ~/bin.

These scripts assume that you have

Most of the scripts will connect to hosts as root, so will need to be able to access your root SSH key. An exception is debian-upgrade-prepare, which connects as your normal user and will therefore need access to your standard key.

Security updates (and point releases)

Run debian-upgrade-prepare. This will connect to each system, analyse the available updates and present them for confirmation. If more than one system requires exactly the same updates then these will be grouped together, for confirmation as a unit. Finally, a debian-upgrade command will be output to the console that will install those upgrades that were confirmed.

The install process will launch a tmux process, with one window per system to upgrade. If there are any unconfirmed samhain alerts for the system, you will need to confirm these before the upgrades will be installed. After installing the upgrades, a check will be made for packages that are flagged for autoremoval. Finally, the samhain database will be updated and where possible any affected processes will be restarted.

Note that if a pending kernel reboot is detected, the process restart step will be skipped.

Reboots

General purpose systems

Systems with a reboot policy of justdoit configured in LDAP can be rebooted without warning.

For a mass reboot of such systems (e.g. following a point release) run the debian-reboot-simple helper script.

Buildds

To reboot all buildds, e.g. following a point release, run the debian-reboot-buildd helper script.

To reboot an individual buildd, connect to the system and run (in a screen)

buildd-reboot [-h] <REASON>

This will wait for any running build to end, ask the buildd process to terminate and then reboot the system. The optional -h flag will request a halt rather than reboot.

If the host is a VM running in a Ganeti cluster, then requesting a halt will result in Ganeti automatically restarting the system. This is particularly important for instances where e.g. the QEMU or KVM process needs to be restarted, which a reboot of the VM will not achieve.

Porter boxes

Run the debian-reboot-porterboxes helper script.

Redundant services

Several services are provided by more than one machine. In these cases, it is possible to reboot the nodes separately, ideally with a delay to ensure that the system is removed from the relevant rotation(s) first. The debian-reboot-rotation helper script can facilitate this.

snapshot.debian.org

The snapshot service consists of two clusters, hosted at Sanger and Leaseweb / manda. Both clusters offer the snapshot.debian.org service over HTTP, with updates to the data happening at Sanger.

Sanger

Prerequisites

Schedule reboots for sibelius with an initial 20 minute delay, and sallinen 10 minutes later (to ensure that sibelius is back up before sallinen)

FIRSTWAIT=10 HOSTWAIT=10 debian-reboot-many sibelius.debian.org sallinen.debian.org
Hosts: 10:sibelius.debian.org 20:sallinen.debian.org
Continue (or ^C)?

Leaseweb

Prerequisites

Schedule reboots with an initial 20 minute delay, and then a 5 minute interval between hosts. A suitable invocation is:

    FIRSTWAIT=10 HOSTWAIT=5 debian-reboot-many lw01.debian.org lw02.debian.org lw03.debian.org lw04.debian.org lw09.debian.org lw10.debian.org snapshotdb-manda-01.debian.org lw08.debian.org lw07.debian.org
    Hosts: 10:lw01.debian.org 15:lw02.debian.org 20:lw03.debian.org 25:lw04.debian.org 30:lw09.debian.org 35:lw10.debian.org 40:snapshotdb-manda-01.debian.org 45:lw08.debian.org 50:lw07.debian.org
    Continue (or ^C)?

(the primary goal is to ensure that all of the storage servers are back up before lw07 and lw08, and that the database is back up before lw07.)

Note that FIRSTWAIT=10 results in a delay of 10 minutes before a shutdown -r +10 is issued, thus creating a 20 minute delay from the initial invocation.

Ganeti clusters

2 node x86 / ARM

Connect to the master node, and run ganeti-reboot-cluster in a root screen.

If the hosts do not require rebooting, but their QEMU processes require restarting, this can be achieved by running ganeti-shuffle-cluster [currently in adsb's home directory but should be integrated into ganeti-reboot-cluster]

3-or-more node ARM

Connect to the master node, and for each other node in turn:

Finally apply the above steps to the master node, and:

UBC x86

Only three nodes of the cluster should have running VMs, with either the first or last node being empty.

If the hosts do not require rebooting, but their QEMU processes require restarting, this can be achieved by running migrate-all-VMs <up|down> 1. The migration direction will be down if the first node is initially empty, and up if the last node was empty.

ppc64el

We have two single node Ganeti clusters running ppc64el - pijper and prokofiev.

As there is only one node in each cluster, the VMs must be shut down in order to reboot the host.