Gears & Gadgets

Ubuntu 20.04’s zsys adds ZFS snapshots to package management [Updated]

Closeup photo of an attentive wildcat.
Enlarge / This is a Fossa. It appears to be focusing. (Cryptoprocta ferox is a small, catlike carnivore native to Madagascar.)

Last October, an experimental ZFS installer showed up in Eoan Ermine, the second interim Ubuntu release of 2019. Next month, Focal Fossa—Ubuntu’s next LTS (Long Term Support) release—is due to drop, and it retains the ZFS installer while adding several new features to Ubuntu’s system management with the fledgling zsys package.

Phoronix reported this weekend that zsys is taking snapshots prior to package-management operations now, so we decided to install the latest Ubuntu 20.04 daily build and see how the new feature works.

Taking Focal Fossa for a quick spin

Focal installs much as any other Ubuntu release has, but it retains 19.10’s ZFS installer—which is still hidden behind “advanced features” and still labeled experimental. After selecting a ZFS install, you give your OK to the resulting partition layout—with one primary partition for UEFI boot and three logical partitions for swap, boot ZFS pool, and root ZFS pool. A few minutes later, you’ve got yourself an Ubuntu installation.

A quick look under the hood

After installing Fossa, the first thing we did was verify the installed version of zsys. The apt management snapshots were added very recently in 0.4.1, and we’ve learned not to take for granted what’s installed on beta or pre-beta daily builds of Linux distributions. Zsys was, in fact, already installed by default and was at version 0.4.1.

There weren’t any snapshots on the freshly installed system yet, so we did a quick apt install gimp. Afterward, we saw that zsys had taken a snapshot in every dataset present on rpool. Having a snapshot taken prior to installing new packages means that, if something should go haywire, we can easily revert the system to its state prior to the new package being installed. Carving the system up into so many different datasets means, in turn, that we can roll back only those parts of the system affected by the package manager—for example, we can roll back packages without affecting data in the user’s home directory.

After installing gimp and seeing new snapshots available, we tried installing a second package. One apt install pv later, we again checked for snapshots. Although we still found the snapshots taken prior to installing gimp, there were no new snapshots to roll back our pv installation. After several more experimental installations and removals with no new snapshots, we started grep-ing our way through the /etc directory to find out why.

In apt.conf.d we find a config file named 90_zsys_system_autosnapshot that adds a pre-install hook to dpkg. This pre-install hook calls zsys-system-autosnapshot prior to making any changes to the package system. We weren’t sure why we hadn’t gotten any new snapshots, so we tried running zsys-system-autosnapshot directly—still no new snapshot.

When we then took a look at zsys-system-autosnapshot itself, the reason for no new snapshots being taken was obvious. A minimum interval is built into that script so that it exits without doing anything if it has been less than 20 minutes since the last time it took snapshots.

We’re pretty dubious about this minimum-interval feature. On the one hand, once you accumulate a few thousand snapshots, you can begin seeing filesystem performance issues. On the other hand, we foresee a lot of problematic package installations not getting covered with snapshots this way.

Zsys is still early in development

We should note that zsys is nowhere near complete yet. The tool promises all manner of added functionality, and it’s already useful—but it’s still missing so much of the polish that normal users will need to see.

We can see that zsys refers to these automatically generated snapshots as “system state”—and that zsysctl save will take those snapshots, and zsysctl show will give us a high-level overview of what sets of state have been saved. But there’s no corresponding zsysctl load yet, and until there is, trying to use these saves to actually recover from disaster will remain a little more “expert” of an operation than it ought to be.

Ubuntu’s ZFS installer carves up the base system into a bewildering 21 separate datasets, so zsys really needs that high-level rollback assistant. It’s easy enough to roll back any individual dataset using the zfs command itself—e.g., zfs rollback rpool/USERDATA/jim_v1qce1@autosys_pmxbuj—but we don’t anticipate users having a good time navigating such commands.

We fully expect zsysctl to add functionality for easier rollbacks eventually. It’s just not here yet—at least, not without enabling the GRUB boot menu and rebooting.

Update: Reverting system state using the GRUB menu at boot

Although zsysctl doesn’t expose a state loading capability while the system is running, those features are present at boot time, in the GRUB menu. On an Ubuntu desktop installation with default settings, however, the GRUB menu is difficult to get to load on command—the keypress is different depending on UEFI or BIOS booting, and the timing is dicey. We couldn’t ever get un-hiding the menu to work at all on our VM, where the boot time is extremely fast.

You can un-hide the menu and give yourself a few seconds to respond to it by editing the GRUB defaults, with sudo nano /etc/default/grub. After setting GRUB_TIMEOUT_STYLE to menu rather than hidden, and giving yourself a few seconds to react to it with GRUB_TIMEOUT=5, you’ll need to issue sudo update-grub to apply your changes. You’ll see a cosmetic error from os-prober when you do, but don’t worry—it’s harmless, and your update actually did get applied.

Once we can actually see the GRUB menu, we can navigate to History and see a selection of saved states. For each state, we have the option of reverting either the system only or the system and userdata—userdata, for now at least, meaning home directories only. There is, unfortunately no option to revert only the userdata. For each option, you can also select whether to boot it normally or into recovery mode.

We had four saved states available and decided to split the middle and revert the system state only, two snapshots back. After the system finished booting—which took slightly longer than normal—we took another peek under the filesystem hood.

The system mountpoints didn’t look any different—but a closer look showed the boot environment still had some ZFS snapshots mounted, which it normally would not. This looks like a failure to clean up a little bit of the “magic” that went on during the GRUB process, since the snapshots are mounted read-only, in their normal .zfs directories, rather than mounted to someplace the production filesystem would stumble across them.

More importantly, we can see that GRUB and zsys created an entirely new namespace within the ZFS hierarchy, rpool/ROOT/ubuntu_6doptv. We’re still booting from the same namespace we started out with—ubuntu_w01csd—but our two older sets of snapshots, or system states, got moved over into the newer namespace.

The upshot is, it doesn’t appear that reverting to the earlier system state was a destructive operation, as a simple zfs rollback operation normally would be. We started out with four sets of snapshots, and we still had four sets of snapshots available after we reverted to an older one.

Garbage collection

Don't get too excited about these garbage-collection variables being in a config file—they're not accessible on an installed system.
Enlarge / Don’t get too excited about these garbage-collection variables being in a config file—they’re not accessible on an installed system.
Jim Salter

Although it’s still decidedly alpha—and isn’t actually linked for operation yet—zsys does also have a facility for automatically pruning stale snapshots. The zsys team again chose to ignore existing ZFS terminology and call this service “garbage collection” rather than naming it anything to do with the snapshots themselves.

The etymology reboot here appears to stem from a desire to get users to think in higher-level terms that encompass multiple grouped snapshots. We’re not quite sure how we feel about that. Handled properly, it might indeed reduce end-user confusion. Sysadmin types, though—either casual or professional—are more likely to be hindered by a layer of obfuscation between scripting and reality. And right now, only sysadmin types are going to see any of this stuff, since it’s the next best thing to undiscoverable.

The actual facility is called with the command zsysctl service gc -a, which performs garbage collection on all zsys namespaces present on the system.  If you run the command yourself on a relatively new system, you probably won’t see anything happen. It won’t destroy any snapshots unless there are a minimum of 20 present, after which it thins them out according to age.

The garbage collection ruleset is currently configurable before zsys compilation, but it’s inaccessible after installation—building the package hardcodes its values into the zsys binaries. As things stand right now, these parameters are inaccessible to end users and administrators.

As Ubuntu developer Didier Roche explained on Twitter, Ubuntu 20.04 systems don’t automatically run zsysctl service gc yet. Roche strongly recommends that nobody run the command on any production systems for the time being, since it’s a data-destroying feature that’s still in alpha. Once it’s had more time in testing and development, a systemd timer will be added to periodically call it.

Let’s block ads! (Why?)

Tech – Ars Technica

Leave a Reply

Your email address will not be published. Required fields are marked *