5 private links
Boot time optimization
-
Ensure the system is in a stable state
Make sure no one else is using it and nothing else important is going on. It's probably a good idea to stop service-providing units like httpd or ftpd, just to ensure external connections don't disrupt things in the middle.
systemctl stop httpd
systemctl stop nfs-serverand so on....
Make sure you have lsof installed (lsof -v). And that fuser (fuser -V) in installed too (Debian/Ubuntu package: psmisc).
Unmount all unused filesystems
umount -a
This will print a number of 'Target is busy' warnings, for the root volume itself and for various temporary/system FSs. These can be ignored for the moment. What's important is that no on-disk filesystems remain mounted, except the root filesystem itself. Verify this:
mount alone provides the info, but column makes it possible to read
mount | column -t
If you see any on-disk filesystems still mounted, then something is still running that shouldn't be. Check what it is using fuser:
if necessary:
yum install psmisc
then:
fuser -vm <mountpoint>
systemctl stop <whatever>
umount -arepeat as required...
Make the temporary root Note: if /tmp is a directory on /, we will not be able to unmount / later in this procedure if we use /tmp/tmproot. Thus it may be necessary to use an alternative mountpoint such as /tmproot instead.
mkdir /tmp/tmproot
mount -t tmpfs none /tmp/tmproot
mkdir /tmp/tmproot/{proc,sys,dev,run,usr,var,tmp,oldroot}
cp -ax /{bin,etc,mnt,sbin,lib,lib64} /tmp/tmproot/
cp -ax /usr/{bin,sbin,lib,lib64} /tmp/tmproot/usr/
cp -ax /var/{account,empty,lib,local,lock,nis,opt,preserve,run,spool,tmp,yp} /tmp/tmproot/var/This creates a very minimal root system, which breaks (among other things) manpage viewing (no /usr/share), user-level customizations (no /root or /home) and so forth. This is intentional, as it constitutes encouragement not to stay in such a jury-rigged root system any longer than necessary.
At this point you should also ensure that all the necessary software is installed, as it will also assuredly break the package manager. Glance through all the steps, and make sure you have the necessary executables.
Pivot into the root
mount --make-rprivate / # necessary for pivot_root to work
pivot_root /tmp/tmproot /tmp/tmproot/oldroot
for i in dev proc sys run; do mount --move /oldroot/$i /$i; donesystemd causes mounts to allow subtree sharing by default (as with mount --make-shared), and this causes pivot_root to fail. Hence, we turn this off globally with mount --make-rprivate /. System and temporary filesystems are moved wholesale into the new root. This is necessary to make it work at all; the sockets for communication with systemd, among other things, live in /run, and so there's no way to make running processes close it.
Ensure remote access survived the changeover
systemctl restart sshd
systemctl status sshdAfter restarting sshd, ensure that you can get in, by opening another terminal and connecting to the machine again via ssh. If you can't, fix the problem before moving on.
Once you've verified you can connect in again, exit the shell you're currently using and reconnect. This allows the remaining forked sshd to exit and ensures the new one isn't holding /oldroot.
Close everything still using the old root
fuser -vm /oldroot
This will print a list of processes still holding onto the old root directory. On my system, it looked like this:
USER PID ACCESS COMMAND
/oldroot: root kernel mount /oldroot
root 1 ...e. systemd
root 549 ...e. systemd-journal
root 563 ...e. lvmetad
root 581 f..e. systemd-udevd
root 700 F..e. auditd
root 723 ...e. NetworkManager
root 727 ...e. irqbalance
root 730 F..e. tuned
root 736 ...e. smartd
root 737 F..e. rsyslogd
root 741 ...e. abrtd
chrony 742 ...e. chronyd
root 743 ...e. abrt-watch-log
libstoragemgmt 745 ...e. lsmd
root 746 ...e. systemd-logind
dbus 747 ...e. dbus-daemon
root 753 ..ce. atd
root 754 ...e. crond
root 770 ...e. agetty
polkitd 782 ...e. polkitd
root 1682 F.ce. master
postfix 1714 ..ce. qmgr
postfix 12658 ..ce. pickupYou need to deal with each one of these processes before you can unmount /oldroot. The brute-force approach is simply kill $PID for each, but this can break things. To do it more softly:
systemctl | grep running
This creates a list of running services. You should be able to correlate this with the list of processes holding /oldroot, then issue systemctl restart for each of them. Some services will refuse to come up in the temporary root and enter a failed state; these don't really matter for the moment.
If the root drive you want to resize is an LVM drive, you may also need to restart some other running services, even if they do not show up in the list created by fuser -vm /oldroot. You might be unable to to resize an LVM drive under Step 7 because of this Error:
fsadm: Cannot proceed with mounted filesystem "/oldroot"
You can try systemctl restart systemd-udevd and if that fails, you can find the leftover mounts with grep system /proc/*/mounts | column -t
Look for processes that say mounts:none and try restarting these:
PATH BIN FSTYPE
/proc/16395/mounts:tmpfs /run/systemd/timesync tmpfs
/proc/16395/mounts:none /var/lib/systemd/timesync tmpfs
/proc/18485/mounts:tmpfs /run/systemd/inhibit tmpfs
/proc/18485/mounts:tmpfs /run/systemd/seats tmpfs
/proc/18485/mounts:tmpfs /run/systemd/sessions tmpfs
/proc/18485/mounts:tmpfs /run/systemd/shutdown tmpfs
/proc/18485/mounts:tmpfs /run/systemd/users tmpfs
/proc/18485/mounts:none /var/lib/systemd/linger tmpfsSome processes can't be dealt with via simple systemctl restart. For me these included auditd (which doesn't like to be killed via systemctl, and so just wanted a kill -15). These can be dealt with individually.
The last process you'll find, usually, is systemd itself. For this, run systemctl daemon-reexec.
Once you're done, the table should look like this:
USER PID ACCESS COMMAND
/oldroot: root kernel mount /oldroot
Unmount the old root
umount /oldroot
At this point, you can carry out whatever manipulations you require. The original question needed a simple resize2fs invocation, but you can do whatever you want here; one other use case is transferring the root filesystem from a simple partition to LVM/RAID/whatever.
Pivot the root back
mount <blockdev> /oldroot
mount --make-rprivate / # again
pivot_root /oldroot /oldroot/tmp/tmproot
for i in dev proc sys run; do mount --move /tmp/tmproot/$i /$i; doneThis is a straightforward reversal of step 4.
Dispose of the temporary root
Repeat steps 5 and 6, except using /tmp/tmproot in place of /oldroot. Then:
umount /tmp/tmproot
rmdir /tmp/tmprootSince it's a tmpfs, at this point the temporary root dissolves into the ether, never to be seen again.
Put things back in their places
Mount filesystems again:
mount -a
At this point, you should also update /etc/fstab and grub.cfg in accordance with any adjustments you made during step 7.
Restart any failed services:
systemctl | grep failed
systemctl restart <whatever>Allow shared subtrees again:
mount --make-rshared /
Start the stopped service units - you can use this single command:
systemctl isolate default.target
And you're done.
User space scheduling
kernel threads have an empty symlink for /proc/PID/exe.
On a regular file system, using lstat(2) would have filled st_size with the length of the symlink. But on a procfs, lstat is not to be trusted, and even non-empty symlinks have st_size equal to 0. We thus really need to use the readlink(2) syscall to read the link. After doing this, you will notice that it returns ENOENT… exactly the same as if pid 2 did not exist!
We therefore need another check, to verify that pid 2 does exist. Luckily, here a lstat on /proc/2/exe file is fine. It must return zero.
Note that you need to do these operations in exactly this order, else you are subject to race conditions again: the only reason this works is that if pid 2 is kthreadd, it will not have terminated before the lstat check (because it cannot terminate).
Therefore, readlink(2) failing with ENOENT and lstat(2) succeeding is exactly the combination required to check pid 2 is kthreadd, which implies there are kernel threads in our pid namespace, which implies that we are in the initial namespace.