Provisioning disks

As a tech preview, Warewulf provides structures to define disks, partitions, and file systems. These structures can generate a configuration for Ignition to provision partitions and file systems dynamically on cluster nodes, or with sfdisk, mkfs, and mkswap during boot.

Ignition can, for example, create swap partitions or /scratch file systems.

Requirements

Partition and file system creation requires both ignition and sgdisk to be installed in the image.

Rocky Linux

dnf install ignition gdisk

Note

Packages for Ignition are not currently available for Rocky Linux 8, but it is available for Rocky Linux 9 as part of “appstream.”

openSuse Leap

zypper install ignition gptfdisk

Disks and partitions

A node or profile can have several disks. Each disk is identified by the path to its block device. Each disk holds a map to its partitions and a bool switch to indicate if an existing, non-matching partition table should be overwritten.

Each partition is identified by its label. The partition number can be omitted, but specifying it is recommended as Ignition may fail without it. Partition sizes should also be set (specified in MiB), except for the last partition: if no size is given, the maximum available size is used. Each partition has the switches should_exist and wipe_partition_entry which control the partition creation process (via the --partcreate and --partwipe flags). When omitting a partition number the wipe_partition_entry should be true, as this allows ignition to replace the existing partition.

wwctl node set n1 \
  --diskname /dev/vda --diskwipe \
  --partname scratch --partcreate --partwipe --partnumber 1

File systems

File systems are identified by their underlying block device, preferably using the /dev/by-partlabel format. Except for a swap partition, an absolute path for the mount point must be specified for each file system. Depending on the image used, valid formats are btrfs, ext3, ext4, and xfs. Each file system has the switch wipe_filesystem to control whether an existing file system is wiped.

wwctl node set n1 \
  --diskname /dev/vda --partname scratch \
  --fsname scratch --fsformat btrfs --fspath /scratch

Boot-time configuration

Ignition uses systemd, as the underlying sgdisk command relies on dbus notifications.

  1. ignition-disks-ww4.service uses Ignition to create the specified partitions and file systems.

  2. ww4-disks.target depends on a matching .mount unit for each mounted file system.

  3. Each .mount creates the necessary mount points in the root file system and mounts the provisioned file systems during boot.

These services and mount units are generated by the ignition overlay and depend on the existence of the file /warewulf/ignition.json, also generated by the ignition overlay.

Example disk configurations

This command formats a btrfs file system on a “scratch” partition of “vda” and mounts it at /scratch.

wwctl node set n1 \
  --diskname /dev/vda --diskwipe \
  --partname scratch --partcreate --partnumber 1 \
  --fsname scratch --fsformat btrfs --fspath /scratch

This command adds a swap partition to the “vda” disk.

wwctl node set n1 \
  --diskname /dev/vda \
  --partname swap --partsize=1024 --partnumber 2 \
  --fsname swap --fsformat swap --fspath swap

Re-using or wiping disks

For empty disks the desired configuration is created and the filesystems are mounted. If partitions or file systems already exist on the disk, ignition tries to reuse existing file systems by default.

To ignore existing file systems and provision fresh file systems on each boot, specify the --fswipe flag for that filesystem, and --diskwipe for the disk, as necessary.

If you would like to re-use existing partitions but want to replace existing file systems once, you may

  • wipe the existing data with tools like wipefs or dd [1]; or

  • set the --fswipe flag and remove it after one reboot.

See the upstream ignition documentation for additional information.

Swap and image memory usage

Warewulf images run entirely in memory. Configuring a local swap partition can allow the kernel to reclaim that RAM for applications — but only under the right conditions. Whether swap can free image memory depends on which root filesystem type the node uses.

tmpfs root

When the root filesystem is tmpfs (the default for two-stage dracut boot, or when --root=tmpfs is set explicitly), the image lives in the page cache. The Linux kernel can swap tmpfs pages to a local swap device exactly as it would any other anonymous memory. Adding swap therefore lets the kernel evict cold image pages to disk and reclaim that RAM for running workloads.

This is the recommended configuration for nodes with large images relative to available RAM.

initramfs root (single-stage boot default)

The default single-stage boot places the image in an initramfs root, which is an instance of ramfs. Unlike tmpfs, ramfs pages are pinned in memory: the kernel will never swap them out. Configuring swap on a node with an initramfs root will not free any memory used by the image.

If you are using single-stage boot and want swap to help with image memory, switch to tmpfs root first:

wwctl profile set default --root=tmpfs

For background on tmpfs NUMA interleaving and size limits, see tmpfs and NUMA.

Note

Two-stage dracut boot always uses tmpfs regardless of the --root setting. If you are already using dracut, no additional root configuration is needed.

Example: configuring swap to reclaim image memory

This example provisions a swap partition on the local disk of each node and activates it at boot, enabling the kernel to reclaim RAM occupied by the node image.

1. Configure tmpfs root (skip if using two-stage dracut boot):

wwctl profile set default --root=tmpfs

2. Add a swap partition to the node or profile disk configuration. This example adds an 8 GiB swap partition as the first partition of /dev/vda:

wwctl profile set default \
  --diskname /dev/vda --diskwipe \
  --partname swap --partcreate --partnumber 1 --partsize=8192 \
  --fsname swap --fsformat swap --fspath swap

3. Add the required overlays to the system overlay so that the swap partition is formatted and activated at boot. The -O / --system-overlays flag replaces the entire overlay list, so include any overlays already configured for the profile. Use Ignition (recommended for Rocky Linux 9 and openSUSE):

wwctl profile set default \
  -O wwinit,wwclient,fstab,hostname,ssh.host_keys,systemd.netname,NetworkManager,ignition,systemd.swap

Or, for systems without Ignition (e.g., Rocky Linux 8), use the sfdisk and mkswap overlays instead:

wwctl profile set default \
  -O wwinit,wwclient,fstab,hostname,ssh.host_keys,systemd.netname,NetworkManager,sfdisk,mkswap,systemd.swap

4. Rebuild the dracut initramfs with the tools needed to provision the disk during the two-stage boot (skip for single-stage boot):

# Ignition path
wwctl image exec rockylinux-9 -- /usr/bin/dracut --force --no-hostonly \
  --add wwinit --add ignition --regenerate-all

# sfdisk/mkswap path
wwctl image exec rockylinux-8 -- /usr/bin/dracut --force --no-hostonly \
  --add wwinit --install sfdisk --install blockdev --install udevadm \
  --install mkswap --regenerate-all

5. Reboot nodes to apply the new configuration.

Verifying swap is active

After reboot, confirm that the swap device is active:

swapon --show

Then check the overall memory picture with free -h:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       256Mi       120Mi       2.8Gi       2.8Gi
Swap:           8.0Gi        0Ki       8.0Gi

At idle, most of the node’s RAM is occupied by the image. The Swap: line shows the available device but near-zero usage at this point.

Confirming that the image gets swapped out

To demonstrate that the kernel evicts image pages to swap when an application needs memory, first note the image size with df -h /:

df -h /

This shows how much tmpfs space the image occupies — that is the amount of RAM currently holding the image.

Now apply memory pressure using stress-ng (install it in the OS image if not already present). The allocation must exceed available RAM — not just total RAM — to force the kernel to evict image pages. Compute the target from MemTotal:

stress-ng --vm 1 \
  --vm-bytes $(awk '/MemTotal/{printf "%dM", $2*0.9/1024}' /proc/meminfo) \
  --vm-keep -t 60s &

While stress-ng is running, observe memory usage:

free -h
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi        15Gi        32Mi       120Mi       512Mi       192Mi
Swap:           8.0Gi       4.2Gi       3.8Gi

The Swap: used value has grown by roughly the size of the image. The kernel has evicted cold image pages to swap, making that RAM available to the application. The application can access the full physical memory of the node, not just what is left over after the image is loaded.

After stress-ng finishes, the evicted image pages remain on swap until they are accessed again, so free -h will continue to show swap usage until the node is under less pressure and pages are faulted back in as needed.

Moving image pages to swap proactively

Rather than simulating a workload, you can instruct the kernel to push image pages to swap directly. On Linux 6.1 and later with cgroup v2, write the desired reclaim size to memory.reclaim:

# Request that the kernel reclaim up to 4 GiB from the root cgroup
# (adjust to match your image size shown by df -h /)
echo "4G" > /sys/fs/cgroup/memory.reclaim

If the kernel cannot reclaim the full requested amount it returns an error:

-bash: echo: write error: Resource temporarily unavailable

This is expected and benign. It means the kernel reclaimed as much as it could but the reclaimable pool was exhausted before reaching the target.

The kernel will swap out reclaimable pages until the target is met or exhausted, then stop. Run free -h immediately afterwards to see the reduction in Mem: used and corresponding increase in Swap: used.

This works best when run early in the boot process, before applications start, when nearly all anonymous memory belongs to the image. To automate it, add a systemd unit to a custom overlay that runs at local-fs.target:

[Unit]
Description=Reclaim OS image memory to swap
After=local-fs.target
ConditionPathExists=/sys/fs/cgroup/memory.reclaim

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo "$(df --output=used / | tail -1)K" > /sys/fs/cgroup/memory.reclaim'

[Install]
WantedBy=multi-user.target

This reads the actual image size from df and requests that exact amount be reclaimed, so it adapts automatically as the image grows or shrinks.

Provision to disk

New in Warewulf v4.6.2

As a tech preview, the Warewulf two-stage boot process can provision the node image to local storage.

Warning

This functionality is a technology preview and should be used with care. Pay specific attention to wipeFilesytem and similar settings.

Note

Warewulf doesn’t install a bootloader to the disk or add UEFI entries. Nodes still request an image and configuration from the Warewulf server on every boot.

Note

While provisioning to disk should be possible during a single-stage boot, not all features are available:

  • Warewulf does not perform hardware detection to ensure that necessary kernel modules are loaded prior to init.

  • Warewulf does not load udev to ensure that /dev/disk/by-* symlinks are available prior to init.

With Ignition

Warewulf needs a prepared file system to deploy the image to. Warewulf can provision this file system using Ignition. To use Ignition, include ignition in your system overlay. The ignition overlay provisions disks during init and, optionally, during the first stage of a two-stage boot. This allows the root file system to be provisioned before the image is loaded.

wwctl node set wwnode1 \
  --diskname /dev/vda --diskwipe \
  --partname rootfs --partcreate --partnumber 1 \
  --fsname rootfs --fsformat ext4 --fspath /

In order to allow Dracut to provision the disk, partition, and file system, Ignition must be included in the Dracut image.

wwctl image exec rockylinux-9 -- /usr/bin/dracut --force --no-hostonly --add wwinit --add ignition --regenerate-all

The necessary file system may alternatively be prepared out-of-band.

With sfdisk and mkfs

Systems that do not have access to Ignition (e.g., Rocky Linux 8) can provision the root file system using a combination of sfdisk and mkfs. To use them, include sfdisk and mkfs in your system overlay. The sfdisk and mkfs overlays provision disk and file systems during the first stage of a two-stage boot. This allows the root file system to be provisioned before the image is loaded.

Configure the sfdisk and mkfs overlays using resources:

wwctl node set wwnode1 \
  --diskname /dev/vda --diskwipe \
  --partname rootfs --partcreate --partnumber 1 \
  --fsname rootfs --fsformat ext4 --fspath /

In order to allow Dracut to provision the disk, partition, and file system, some additional commands must be included in the Dracut image, depending on which functionality is used:

  • sfdisk: writes the partition table

    • blockdev: used to re-read the partition table after writing

    • udevadm: used to trigger udev events after writing the partition table

  • mkfs: formats file systems (may also require file-system-specific commands like mkfs.ext4)

    • mkfs.ext4, mkfs.btrfs, etc: used by mkfs to format specific file systems

    • wipefs: used to determine if a file system already exists

wwctl image exec rockylinux-8 -- /usr/bin/dracut --force --no-hostonly \
  --add wwinit \
  --install sfdisk \
  --install blockdev \
  --install udevadm \
  --install mkfs \
  --install mkfs.ext4 \
  --install wipefs \
  --regenerate-all

Configuring the root device

Set the desired storage device for the image using the --root parameter.

wwctl node set wwnode1 --root /dev/disk/by-partlabel/rootfs

Known Problems

If the partition table on the disk isn’t properly readable the command sgdisk --zap-all (which is used by Ignition to wipe the partition table) returns with code 2. This is interpreted by Ignition < 2.16.2 as an error, and no partitions or filesystems are created. Since the partition table is still wiped, partitioning and formatting should succeed on the next boot.