Provisioning disks
As a tech preview, Warewulf provides structures to define disks, partitions, and file systems. These structures can generate a configuration for Ignition to provision partitions and file systems dynamically on cluster nodes, or with sfdisk, mkfs, and mkswap during boot.
Ignition can, for example, create swap partitions or /scratch file
systems.
Requirements
Partition and file system creation requires both ignition and sgdisk to
be installed in the image.
Rocky Linux
dnf install ignition gdisk
Note
Packages for Ignition are not currently available for Rocky Linux 8, but it is available for Rocky Linux 9 as part of “appstream.”
openSuse Leap
zypper install ignition gptfdisk
Disks and partitions
A node or profile can have several disks. Each disk is identified by the path to
its block device. Each disk holds a map to its partitions and a bool switch
to indicate if an existing, non-matching partition table should be overwritten.
Each partition is identified by its label. The partition number can be omitted,
but specifying it is recommended as Ignition may fail without it. Partition
sizes should also be set (specified in MiB), except for the last partition: if
no size is given, the maximum available size is used. Each partition has the
switches should_exist and wipe_partition_entry which control the
partition creation process (via the --partcreate and --partwipe flags).
When omitting a partition number the wipe_partition_entry should be true, as
this allows ignition to replace the existing partition.
wwctl node set n1 \
--diskname /dev/vda --diskwipe \
--partname scratch --partcreate --partwipe --partnumber 1
File systems
File systems are identified by their underlying block device, preferably using
the /dev/by-partlabel format. Except for a swap partition, an absolute
path for the mount point must be specified for each file system. Depending on
the image used, valid formats are btrfs, ext3, ext4, and xfs.
Each file system has the switch wipe_filesystem to control whether an
existing file system is wiped.
wwctl node set n1 \
--diskname /dev/vda --partname scratch \
--fsname scratch --fsformat btrfs --fspath /scratch
Boot-time configuration
Ignition uses systemd, as the underlying sgdisk command relies on dbus
notifications.
ignition-disks-ww4.serviceuses Ignition to create the specified partitions and file systems.ww4-disks.targetdepends on a matching.mountunit for each mounted file system.Each
.mountcreates the necessary mount points in the root file system and mounts the provisioned file systems during boot.
These services and mount units are generated by the ignition overlay and
depend on the existence of the file /warewulf/ignition.json, also generated
by the ignition overlay.
Example disk configurations
This command formats a btrfs file system on a “scratch” partition of
“vda” and mounts it at /scratch.
wwctl node set n1 \
--diskname /dev/vda --diskwipe \
--partname scratch --partcreate --partnumber 1 \
--fsname scratch --fsformat btrfs --fspath /scratch
This command adds a swap partition to the “vda” disk.
wwctl node set n1 \
--diskname /dev/vda \
--partname swap --partsize=1024 --partnumber 2 \
--fsname swap --fsformat swap --fspath swap
Re-using or wiping disks
For empty disks the desired configuration is created and the filesystems are
mounted. If partitions or file systems already exist on the disk, ignition
tries to reuse existing file systems by default.
To ignore existing file systems and provision fresh file systems on each boot,
specify the --fswipe flag for that filesystem, and --diskwipe for the
disk, as necessary.
If you would like to re-use existing partitions but want to replace existing file systems once, you may
wipe the existing data with tools like
wipefsordd[1]; orset the
--fswipeflag and remove it after one reboot.
See the upstream ignition documentation for additional information.
Swap and image memory usage
Warewulf images run entirely in memory. Configuring a local swap partition can allow the kernel to reclaim that RAM for applications — but only under the right conditions. Whether swap can free image memory depends on which root filesystem type the node uses.
tmpfs root
When the root filesystem is tmpfs (the default for two-stage dracut boot, or
when --root=tmpfs is set explicitly), the image lives in the page cache.
The Linux kernel can swap tmpfs pages to a local swap device exactly as it
would any other anonymous memory. Adding swap therefore lets the kernel evict
cold image pages to disk and reclaim that RAM for running workloads.
This is the recommended configuration for nodes with large images relative to available RAM.
initramfs root (single-stage boot default)
The default single-stage boot places the image in an initramfs root, which
is an instance of ramfs. Unlike tmpfs, ramfs pages are pinned in
memory: the kernel will never swap them out. Configuring swap on a node with an
initramfs root will not free any memory used by the image.
If you are using single-stage boot and want swap to help with image memory,
switch to tmpfs root first:
wwctl profile set default --root=tmpfs
For background on tmpfs NUMA interleaving and size limits, see
tmpfs and NUMA.
Note
Two-stage dracut boot always uses tmpfs regardless of the --root
setting. If you are already using dracut, no additional root configuration
is needed.
Example: configuring swap to reclaim image memory
This example provisions a swap partition on the local disk of each node and activates it at boot, enabling the kernel to reclaim RAM occupied by the node image.
1. Configure tmpfs root (skip if using two-stage dracut boot):
wwctl profile set default --root=tmpfs
2. Add a swap partition to the node or profile disk configuration. This
example adds an 8 GiB swap partition as the first partition of /dev/vda:
wwctl profile set default \
--diskname /dev/vda --diskwipe \
--partname swap --partcreate --partnumber 1 --partsize=8192 \
--fsname swap --fsformat swap --fspath swap
3. Add the required overlays to the system overlay so that the swap
partition is formatted and activated at boot. The -O / --system-overlays
flag replaces the entire overlay list, so include any overlays already
configured for the profile. Use Ignition (recommended for Rocky Linux 9 and
openSUSE):
wwctl profile set default \
-O wwinit,wwclient,fstab,hostname,ssh.host_keys,systemd.netname,NetworkManager,ignition,systemd.swap
Or, for systems without Ignition (e.g., Rocky Linux 8), use the sfdisk and
mkswap overlays instead:
wwctl profile set default \
-O wwinit,wwclient,fstab,hostname,ssh.host_keys,systemd.netname,NetworkManager,sfdisk,mkswap,systemd.swap
4. Rebuild the dracut initramfs with the tools needed to provision the disk during the two-stage boot (skip for single-stage boot):
# Ignition path
wwctl image exec rockylinux-9 -- /usr/bin/dracut --force --no-hostonly \
--add wwinit --add ignition --regenerate-all
# sfdisk/mkswap path
wwctl image exec rockylinux-8 -- /usr/bin/dracut --force --no-hostonly \
--add wwinit --install sfdisk --install blockdev --install udevadm \
--install mkswap --regenerate-all
5. Reboot nodes to apply the new configuration.
Verifying swap is active
After reboot, confirm that the swap device is active:
swapon --show
Then check the overall memory picture with free -h:
$ free -h
total used free shared buff/cache available
Mem: 15Gi 12Gi 256Mi 120Mi 2.8Gi 2.8Gi
Swap: 8.0Gi 0Ki 8.0Gi
At idle, most of the node’s RAM is occupied by the image. The Swap: line
shows the available device but near-zero usage at this point.
Confirming that the image gets swapped out
To demonstrate that the kernel evicts image pages to swap when an application
needs memory, first note the image size with df -h /:
df -h /
This shows how much tmpfs space the image occupies — that is the amount of RAM currently holding the image.
Now apply memory pressure using stress-ng (install it in the OS image if
not already present). The allocation must exceed available RAM — not just
total RAM — to force the kernel to evict image pages. Compute the target from
MemTotal:
stress-ng --vm 1 \
--vm-bytes $(awk '/MemTotal/{printf "%dM", $2*0.9/1024}' /proc/meminfo) \
--vm-keep -t 60s &
While stress-ng is running, observe memory usage:
free -h
$ free -h
total used free shared buff/cache available
Mem: 15Gi 15Gi 32Mi 120Mi 512Mi 192Mi
Swap: 8.0Gi 4.2Gi 3.8Gi
The Swap: used value has grown by roughly the size of the image. The
kernel has evicted cold image pages to swap, making that RAM available to the
application. The application can access the full physical memory of the node,
not just what is left over after the image is loaded.
After stress-ng finishes, the evicted image pages remain on swap until
they are accessed again, so free -h will continue to show swap usage until
the node is under less pressure and pages are faulted back in as needed.
Moving image pages to swap proactively
Rather than simulating a workload, you can instruct the kernel to push image
pages to swap directly. On Linux 6.1 and later with cgroup v2, write the
desired reclaim size to memory.reclaim:
# Request that the kernel reclaim up to 4 GiB from the root cgroup
# (adjust to match your image size shown by df -h /)
echo "4G" > /sys/fs/cgroup/memory.reclaim
If the kernel cannot reclaim the full requested amount it returns an error:
-bash: echo: write error: Resource temporarily unavailable
This is expected and benign. It means the kernel reclaimed as much as it could but the reclaimable pool was exhausted before reaching the target.
The kernel will swap out reclaimable pages until the target is met or exhausted, then stop.
Run free -h immediately afterwards to see the reduction in Mem: used
and corresponding increase in Swap: used.
This works best when run early in the boot process, before applications start,
when nearly all anonymous memory belongs to the image. To automate it, add a
systemd unit to a custom overlay that runs at local-fs.target:
[Unit]
Description=Reclaim OS image memory to swap
After=local-fs.target
ConditionPathExists=/sys/fs/cgroup/memory.reclaim
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo "$(df --output=used / | tail -1)K" > /sys/fs/cgroup/memory.reclaim'
[Install]
WantedBy=multi-user.target
This reads the actual image size from df and requests that exact amount be
reclaimed, so it adapts automatically as the image grows or shrinks.
Provision to disk
New in Warewulf v4.6.2
As a tech preview, the Warewulf two-stage boot process can provision the node image to local storage.
Warning
This functionality is a technology preview and should be used with care. Pay
specific attention to wipeFilesytem and similar settings.
Note
Warewulf doesn’t install a bootloader to the disk or add UEFI entries. Nodes still request an image and configuration from the Warewulf server on every boot.
Note
While provisioning to disk should be possible during a single-stage boot, not all features are available:
Warewulf does not perform hardware detection to ensure that necessary kernel modules are loaded prior to init.
Warewulf does not load udev to ensure that
/dev/disk/by-*symlinks are available prior to init.
With Ignition
Warewulf needs a prepared file system to deploy the image to. Warewulf can
provision this file system using Ignition. To use Ignition, include ignition
in your system overlay. The ignition overlay provisions disks during init and,
optionally, during the first stage of a two-stage boot. This allows the
root file system to be provisioned before the image is loaded.
wwctl node set wwnode1 \
--diskname /dev/vda --diskwipe \
--partname rootfs --partcreate --partnumber 1 \
--fsname rootfs --fsformat ext4 --fspath /
In order to allow Dracut to provision the disk, partition, and file system, Ignition must be included in the Dracut image.
wwctl image exec rockylinux-9 -- /usr/bin/dracut --force --no-hostonly --add wwinit --add ignition --regenerate-all
The necessary file system may alternatively be prepared out-of-band.
With sfdisk and mkfs
Systems that do not have access to Ignition (e.g., Rocky Linux 8) can provision
the root file system using a combination of sfdisk and mkfs. To use
them, include sfdisk and mkfs in your system overlay. The sfdisk and
mkfs overlays provision disk and file systems during the first stage of a
two-stage boot. This allows the root file system to be provisioned before the
image is loaded.
Configure the sfdisk and mkfs overlays using resources:
wwctl node set wwnode1 \
--diskname /dev/vda --diskwipe \
--partname rootfs --partcreate --partnumber 1 \
--fsname rootfs --fsformat ext4 --fspath /
In order to allow Dracut to provision the disk, partition, and file system, some additional commands must be included in the Dracut image, depending on which functionality is used:
sfdisk: writes the partition table
blockdev: used to re-read the partition table after writing
udevadm: used to trigger udev events after writing the partition table
mkfs: formats file systems (may also require file-system-specific commands like mkfs.ext4)
mkfs.ext4, mkfs.btrfs, etc: used by mkfs to format specific file systems
wipefs: used to determine if a file system already exists
wwctl image exec rockylinux-8 -- /usr/bin/dracut --force --no-hostonly \
--add wwinit \
--install sfdisk \
--install blockdev \
--install udevadm \
--install mkfs \
--install mkfs.ext4 \
--install wipefs \
--regenerate-all
Configuring the root device
Set the desired storage device for the image using the --root
parameter.
wwctl node set wwnode1 --root /dev/disk/by-partlabel/rootfs
Known Problems
If the partition table on the disk isn’t properly readable the command sgdisk
--zap-all (which is used by Ignition to wipe the partition table) returns with
code 2. This is interpreted by Ignition < 2.16.2 as an error, and no
partitions or filesystems are created. Since the partition table is still wiped,
partitioning and formatting should succeed on the next boot.