Managing updates to systems

Initial setup

Clone the dsa-misc repository.

Create links to scripts/multi-tool/* in ~/bin.

These scripts assume that you have

Security updates (and point releases)

Run debian-upgrade-prepare. This will connect to each system, analyse the available updates and present them for confirmation. If more than one system requires exactly the same updates then these will be grouped together, for confirmation as a unit. Finally, a command will be output to the console that will install those upgrades that were confirmed.

The install process will launch a tmux process, with one window per system to upgrade. If there are any unconfirmed samhain alerts for the system, you will need to confirm these before the upgrades will be installed. After installing the upgrades, a check will be made for packages that are flagged for autoremoval. Finally, the samhain database will be updated and where possible any affected processes will be restarted.

Note that if a pending kernel reboot is detected, the process restart step will be skipped.


General purpose systems

Systems with a reboot policy of justdoit configured in LDAP can be rebooted without warning.

For a mass reboot of such systems (e.g. following a point release) run the debian-reboot-simple helper script.


To reboot all buildds, e.g. following a point release, run the debian-reboot-buildd helper script.

To reboot an individual buildd, connect to the system and run (in a screen)

buildd-reboot [-h] <REASON>

This will wait for any running build to end, ask the buildd process to terminate and then reboot the system. The optional -h flag will request a halt rather than reboot.

Porter boxes

Run the debian-reboot-porterboxes helper script.

Redundant services

Several services are provided by more than one machine. In these cases, it is possible to reboot the nodes separately, ideally with a delay to ensure that the system is removed from the relevant rotation(s) first. The debian-reboot-rotation helper script can facilitate this.

The snapshot service consists of two clusters, hosted at Sanger and Leaseweb / manda. Both clusters offer the service over HTTP, with updates to the data happening at Sanger.



Schedule reboots for sibelius with an initial 20 minute delay, and sallinen 10 minutes later (to ensure that sibelius is back up before sallinen)



Schedule reboots with an initial 20 minute delay, and then a 5 minute interval between hosts. A reasonable order is:

(the primary goal is to ensure that all of the storage servers are back up before lw07 and lw08, and that the database is back up before lw07.)

Ganeti clusters

2 node x86 / ARM

Connect to the master node, and run ganeti-reboot-cluster in a root screen.

If the hosts do not require rebooting, but their QEMU processes require restarting, this can be achieved by running ganeti-shuffle-cluster [currently in adsb's home directory but should be integrated into ganeti-reboot-cluster]

UBC x86

Only three nodes of the cluster should have running VMs, with either the first or last node being empty.

If the hosts do not require rebooting, but their QEMU processes require restarting, this can be achieved by running migrate-all-VMs <up|down> 1. The migration direction will be down if the first node is initially empty, and up if the last node was empty.

pijper / pieta

Live migration of VMs does not work on ppc64el currently, so all of the VMs run on pijper.


prokofiev is a single node cluster, so does not run Ganeti. The VMs should be halted (with an appropriate delay if they are part of a rotation) before prokofiev itself is rebooted.