Managing updates to debian.org systems
Initial setup
Clone the dsa-misc repository.
Create links to scripts/multi-tool/*
in ~/bin
.
These scripts assume that you have
- a root SSH key in dsa-puppet
- configured your SSH client so that you can access all debian.org systems, either directly or via a pre-configured jumphost
Most of the scripts will connect to hosts as root, so will need to be able to access your root SSH key. An exception is debian-upgrade-prepare
, which connects as your normal user and will therefore
need access to your standard key.
Security updates (and point releases)
Run debian-upgrade-prepare
. This will connect to each system, analyse the available updates and present them for confirmation. If more than one system requires exactly the same updates then these will
be grouped together, for confirmation as a unit. Finally, a debian-upgrade
command will be output to the console that will install those upgrades that were confirmed.
The install process will launch a tmux process, with one window per system to upgrade. If there are any unconfirmed samhain alerts for the system, you will need to confirm these before the upgrades will be installed. After installing the upgrades, a check will be made for packages that are flagged for autoremoval. Finally, the samhain database will be updated and where possible any affected processes will be restarted.
Note that if a pending kernel reboot is detected, the process restart step will be skipped.
Reboots
General purpose systems
Systems with a reboot policy of justdoit
configured in LDAP can be rebooted without warning.
For a mass reboot of such systems (e.g. following a point release) run the debian-reboot-simple
helper script.
Buildds
To reboot all buildds, e.g. following a point release, run the debian-reboot-buildd
helper script.
To reboot an individual buildd, connect to the system and run (in a screen)
buildd-reboot [-h] <REASON>
This will wait for any running build to end, ask the buildd process to terminate and then reboot the system. The optional -h
flag will request a halt rather than reboot.
If the host is a VM running in a Ganeti cluster, then requesting a halt will result in Ganeti automatically restarting the system. This is particularly important for instances where e.g. the QEMU or KVM process needs to be restarted, which a reboot of the VM will not achieve.
Porter boxes
Run the debian-reboot-porterboxes
helper script.
Redundant services
Several services are provided by more than one machine. In these cases, it is possible to reboot the nodes separately, ideally with a delay to ensure that the system is removed from the relevant
rotation(s) first. The debian-reboot-rotation
helper script can facilitate this.
snapshot.debian.org
The snapshot service consists of two clusters, hosted at Sanger and Leaseweb / manda. Both clusters offer the snapshot.debian.org service over HTTP, with updates to the data happening at Sanger.
Sanger
Prerequisites
- the DNS rotation for snapshot.debian.org includes lw07
- sallinen:/srv/snapshot.debian.org/log/snapshot.log does not indicate that an import is currently running
- dinstall was more than an hour ago
Schedule reboots for sibelius with an initial 20 minute delay, and sallinen 10 minutes later (to ensure that sibelius is back up before sallinen)
FIRSTWAIT=10 HOSTWAIT=10 debian-reboot-many sibelius.debian.org sallinen.debian.org
Hosts: 10:sibelius.debian.org 20:sallinen.debian.org
Continue (or ^C)?
Leaseweb
Prerequisites
- the DNS rotation for snapshot.debian.org includes sallinen
Schedule reboots with an initial 20 minute delay, and then a 5 minute interval between hosts. A suitable invocation is:
FIRSTWAIT=10 HOSTWAIT=5 debian-reboot-many lw01.debian.org lw02.debian.org lw03.debian.org lw04.debian.org lw09.debian.org lw10.debian.org snapshotdb-manda-01.debian.org lw08.debian.org lw07.debian.org
Hosts: 10:lw01.debian.org 15:lw02.debian.org 20:lw03.debian.org 25:lw04.debian.org 30:lw09.debian.org 35:lw10.debian.org 40:snapshotdb-manda-01.debian.org 45:lw08.debian.org 50:lw07.debian.org
Continue (or ^C)?
(the primary goal is to ensure that all of the storage servers are back up before lw07 and lw08, and that the database is back up before lw07.)
Note that FIRSTWAIT=10
results in a delay of 10 minutes before a shutdown -r +10
is issued, thus creating a 20 minute delay from the initial invocation.
Ganeti clusters
2 node x86 / ARM
Connect to the master node, and run ganeti-reboot-cluster
in a root screen.
If the hosts do not require rebooting, but their QEMU processes require restarting, this can be achieved by running ganeti-shuffle-cluster
[currently in adsb's home directory but should be integrated
into ganeti-reboot-cluster
]
3-or-more node ARM
Connect to the master node, and for each other node in turn:
gnt-node migrate -f $node
to migrate any VMs on the node to their secondary node- reboot the node
- once the node has rebooted, wait for DRBD to be synced on all nodes
Finally apply the above steps to the master node, and:
hbal -L -C -v -v --no-disk-moves -X
to move VMs back to the node
UBC x86
Only three nodes of the cluster should have running VMs, with either the first or last node being empty.
- Reboot the empty node.
- Fix up multipath breakage:
for dev in
journalctl -u systemd-udevd.service | sed -n '/killed/ s/.* \(sd[a-z]*\):.*/\1/p'
; do sudo udevadm trigger --action=add --name-match=$dev; echo $dev; sleep 0.5; done - Connect to the master node and migrate VMs to the empty node from its nearest neighbour.
- Repeat until all nodes have been rebooted.
migrate-all-VMs <up|down> 1
. The migration direction will be down
if the first node
is initially empty, and up
if the last node was empty.
pijper
pijper is a single node cluster, so the VMs must be shut down in order to reboot it.
gnt-cluster watcher pause <SECONDS>
- halt the VMs (with an appropriate delay if they are part of a rotation)
gnt-cluster watcher continue
- reboot pijper before the VMs are restarted