Decommissioning hosts
Phase 1: Planning and Coordination
- Confirm Decommissioning: Ensure the machine is no longer required (e.g., EOL architecture, replacement online) and verify approvals in RT (e.g., buildd admins).
- Initial Hoster Notification: For leased hardware or hosted VMs, notify the provider (e.g., OSUOSL, ISC, CARNet) about the upcoming decommissioning.
Phase 2: Service Cessation & Preparation
- Stop Services: Get rid of any services running on the host.
- Scream Test (Optional): Shut down the machine and wait a month to see if anyone complains.
- MQ Queues: Remove MQ queues related to the machine (see mq for details).
for queue in $(rabbitmqadmin -N rainier list queues name | \
grep $HOST.debian.org | \
awk '{print $2}') ; do
rabbitmqadmin -N rainier -V dsa delete queue name=$queue
done
- Password File: Update password file information for the
$HOST.
Phase 3: Data Destruction and Machine Shutdown
- Zero Disks:
badblocks -v -s -p 5 -b 2048 -c 2048 -w -f DEVICE_GOES_HERE
Note: It is often best to zero non-root partitions first, then the root partition.
- Ganeti VM removal: If the machine is a Ganeti VM, remove it from the cluster:
gnt-instance remove $HOST.debian.org
- Shutdown: Permanently shut down the machine.
Phase 4: Infrastructure Management Removal
- Puppet Management:
- Revoke puppet certificate on
handel:
puppet node clean $HOST.debian.org
puppet node deactivate $HOST.debian.org
- Remove
data/nodes/$HOST.debian.org.yaml from Puppet.
- Remove references from puppet configuration (except multipath configuration).
- LDAP Removal:
- Remove host entry, including references such as
subgroup: foo@hostname or supplementarygid: foo@hostname.
- Backups (Bacula):
- Rerun puppet on the bacula storage (
backup-storage-hetzner-01) and bacula director (dinis).
- Monitoring (Nagios): Remove the host from Nagios configuration.
- Security & Auditing:
- Reinit / update
samhain on the director.
- Remove any associated SSL certificates from
letsencrypt-domains.
Phase 5: Networking and Storage Cleanup
- DNS: Remove references to the host from DNS files (A/AAAA records) and reverse DNS (PTR records).
- IPAM: Return allocated IP addresses to the pool of free addresses.
- Storage (MSA/Multipath):
- On the KVM host (or each node for a Ganeti cluster):
echo -n "multipath device to remove: " &&
export LC_ALL=C &&
read path &&
dm=$(sudo multipath -ll | grep "$path " | awk '{print $3}') &&
cd /sys/devices/virtual/block/$dm/slaves &&
devices=$(ls -1) &&
blockdev --flushbufs /dev/$dm &&
multipath -f $path &&
for device in $(echo $devices); do blockdev --flushbufs /dev/$device; echo 1 > "/sys/block/$device/device/delete"; done
- Remove the paths in the MSA.
- Remove the entry from
multipath.conf.
Phase 6: Final Hardware Disposal
- Final Hoster Communication: Tell the hoster to dispose of the hardware or return it as appropriate.
Phase 7: Post-Decommissioning (3 Months Later)
- Backup Cleanup:
- Bacula backups will be automatically removed.
- Manually clean up PostgreSQL backups and any other manual backups.
- RT Ticket Closure: Ensure all steps are documented and close the decommissioning ticket.