Troubleshooting

sos

The warewulf-sos package (new in v4.6.1) adds support for gathering Warewulf server configuration information in an sos report.

dnf -y install warewulf-sos
sos report # optionally, --enable-plugins=warewulf

Note

The warewulf-sos package is not currently built for SUSE.

warewulfd

The Warewulf server (warewulfd) sends logs to the systemd journal.

journalctl -u warewulfd.service

To increase the verbosity of the log, specify either --verbose or --debug in the warewulfd OPTIONS.

echo "OPTIONS=--debug" >>/etc/default/warewulfd
systemctl restart warewulfd.service

iPXE

If you’re using iPXE to boot (the default), you can get a command prompt by pressing with C-b during boot.

From the iPXE command prompt, you can run the same commands from default.ipxe to troubleshoot potential boot problems.

For example, the following commands perform a (relatively) normal Warewulf boot. (Substitute your Warewulf server’s IP address in place of 10.0.0.1, update the port number if you have changed it from the default of 9873, and substitute your cluster node’s MAC address in place of 00:00:00:00:00:00.)

set base http://10.0.0.1:9873
set hwaddr 00:00:00:00:00:00
kernel --name kernel ${base}/kernel/${hwaddr}
imgextract --name image ${base}/image/${hwaddr}?compress=gz
imgextract --name system ${base}/system/${hwaddr}?compress=gz
imgextract --name runtime ${base}/runtime/${hwaddr}?compress=gz
boot kernel initrd=image initrd=system initrd=runtime
  • The base variable points to warewulfd for future reference. The MAC address is appended to each resource path so that Warewulf knows what image and overlays to provide.

  • The kernel command fetches a kernel for later booting.

  • The imgextract command fetches and decompresses the images that will make up the booted OS image. In a typical environment this is used to load a minimal “initial ramdisk” which, then, boots the rest of the system. Warewulf, by default, loads the entire image as an initial ramdisk, and also loads the system and runtime overlays at this time.

  • The boot command tells iPXE to boot the system with the given kernel and ramdisks.

Note

This example does not provide assetkey information to warewulfd. If your nodes have defined asset tags, provide it in the uri variable for the node you are trying to boot.

For example, you may want to try booting to a pre-init shell with debug logging enabled. To do so, substitute the boot command above.

boot kernel initrd=image initrd=system initrd=runtime rdinit=/bin/sh

Note

You may be more familiar with specifying init= on the kernel command line. rdinit indicates “ramdisk init.” Since Warewulf, by default, boots the OS image as an initial ramdisk, we must use rdinit= here.

GRUB

If you’re using GRUB to boot, you can get a command prompt by pressing “c” when prompted during boot.

From the GRUB command prompt, you can enter the same commands that you would otherwise find in grub.cfg.ww.

For example, the following commands perform a (relatively) normal Warewulf boot. (Substitute your Warewulf server’s IP address in place of 10.0.0.1, and update the port number if you have changed it from the default of 9873.)

base="(http,10.0.0.1:9873)"
linux "${base}/kernel/${net_default_mac}" wwid=${net_default_mac}
initrd "${base}/image/${net_default_mac}?compress=gz" "${base}/system/${net_default_mac}?compress=gz" "${base}/runtime/${net_default_mac}?compress=gz"
boot
  • The base variable points to warewulfd for future reference. ${net_default_mac} provides Warewulf with the MAC address of the booting node, so that Warewulf knows what image and overlays to provide it.

  • The linux command tells GRUB what kernel to boot, as provided by warewulfd. The wwid kernel argument helps wwclient identify the node during runtime.

  • The initrd command tells GRUB what images to load into memory for boot. In a typical environment this is used to load a minimal “initial ramdisk” which, then, boots the rest of the system. Warewulf, by default, loads the entire image as an initial ramdisk, and also loads the system and runtime overlays at this time.

  • The boot command tells GRUB to boot the system with the previously-defined configuration.

Note

This example does not provide assetkey information to warewulfd. If your nodes have defined asset tags, provide it in the uri variable for the node you are trying to boot.

For example, you may want to try booting to a pre-init shell with debug logging enabled. To do so, substitute the linux command above.

linux "${base}/kernel/${net_default_mac}" wwid=${net_default_mac} debug rdinit=/bin/sh

Note

You may be more familiar with specifying init= on the kernel command line. rdinit indicates “ramdisk init.” Since Warewulf, by default, boots the OS image as an initial ramdisk, we must use rdinit= here.

Dracut

By default, dracut simply panics and terminates when it encounters an issue.

Dracut looks at the kernel command line for its configuration. You can configure it for additional logging and to switch to an interactive shell on error:

wwctl profile set default --kernelargs=rd.shell,rd.debug,log_buf_len=1M

For more information on debugging Dracut problems, see the Fedora dracut problems guide.

Ignition

If partition creation doesn’t work as expected you have a few options to investigate:

  • Add systemd.log_level=debug and or rd.debug to the kernelArgs of the node you’re working on.

  • After the next boot you should be able to find verbose information on the node with journalctl -u ignition-ww4-disks.service.

  • You could also check the content of /warewulf/ignition.json.

  • You could try to tinker with /warewulf/ignition.json calling

    /usr/lib/dracut/modules.d/30ignition/ignition \
      --platform=metal \
      --stage=disks \
      --config-cache=/warewulf/ignition.json \
      --log-to-stdout
    

    after each iteration on the node directly until you find the settings you need. (Make sure to unmount all partitions if ignition was partially successful.)

  • Sometimes you need to add should_exist: "true" for the swap partition as well.

Overlay Shadowing

When Warewulf introduced the distinction between distribution overlays and site overlays, existing installations that had modified any distribution overlays were left with those modified files in the site overlay directory (typically /var/lib/warewulf/overlays/). Because a site overlay takes complete precedence over a distribution overlay with the same name — with no merging of individual files — the entire distribution overlay is shadowed. Any new files or updates added to the distribution overlay in a subsequent Warewulf upgrade will be hidden as long as a site overlay of the same name exists.

To check whether any distribution overlays are being shadowed by site overlays, use wwctl overlay list, which includes a SITE column:

wwctl overlay list

Any overlay showing true in the SITE column that you did not intentionally create locally may be unintentionally shadowing its distribution counterpart.

To see which files are present in a site overlay, use the --all flag:

wwctl overlay list --all <overlay_name>

To see the filesystem paths of the overlays directly, use the --path flag:

wwctl overlay list --path

If you determine that a site overlay is unintentionally shadowing a distribution overlay, you can restore the distribution overlay by deleting the site overlay. Back up any intentional local modifications first, then delete the site overlay:

wwctl overlay delete <overlay_name>

wwctl overlay delete only ever deletes site overlays, so this command is safe to run without risk of removing the underlying distribution overlay. After deleting the site overlay, wwctl overlay list should show false in the SITE column for that overlay, confirming that the distribution overlay is now active.

Running Containers on Cluster Nodes

Container runtimes such as Podman require filesystem features — most notably OverlayFS support for image storage and container layers — that are not available with the default initramfs root filesystem. To run Podman or similar runtimes on cluster nodes, configure the node or profile to use tmpfs as the root filesystem:

# Apply to all nodes via a profile
wwctl profile set default --root=tmpfs

# Or apply to a specific node
wwctl node set <nodename> --root=tmpfs

After changing the root filesystem type, reboot the affected nodes to apply the new configuration.

Note

The OS image itself must have Podman (or the desired container runtime) installed. See OS Images for guidance on customizing OS images.

For information on tuning tmpfs memory usage and NUMA interleaving behavior, see tmpfs and NUMA below.

tmpfs and NUMA

Warewulf can optionally mount the root filesystem as tmpfs instead of the default initramfs. Warewulf will add mpol=interleave to the mount point which will distribute the memory across all NUMA nodes. This avoids the hotspotting that occurs when the default initramfs stores large OS images on a single NUMA node. To enable this, set the rootfs type to tmpfs:

wwctl profile set default --root=tmpfs

You may also adjust the tmpfs size via the wwinit.tmpfs.size kernel argument:

# Set tmpfs to use maximum 1GB
wwctl profile set default --kernelargs="wwinit.tmpfs.size=1G"
# You can also use a percentage of physical RAM
wwctl profile set default --kernelargs="wwinit.tmpfs.size=25%"

By default this is set to 50% of physical RAM. Note that tmpfs is required for SELinux overlays since initramfs cannot preserve SELinux contexts.

Because the root is tmpfs, the kernel can also swap cold image pages to a local swap device, freeing RAM for running workloads. This does not apply to the default initramfs root (single-stage boot), where pages are pinned in memory and cannot be swapped. See Swap and image memory usage for a complete walkthrough.

Note

On some systems, it may also be necessary to include the noefi kernel argument. This works around specific EFI firmware bugs that can prevent proper memory release during the transition from initramfs to tmpfs.

OCI Blob Cache

Warewulf caches OCI image layers on disk to speed up repeated wwctl image import operations. The cache can grow large when many images — or many versions of the same image — are imported over time.

v4.6 and later

The cache is stored at $cachedir/warewulf (default: /var/cache/warewulf on RPM-based distributions). It contains files and directories such as blobs/, index.json, and oci-layout.

Use wwctl clean to remove the cache:

# wwctl clean

The cache is rebuilt automatically on the next wwctl image import.

v4.5.x and earlier (legacy cache location)

In v4.5.x and earlier releases, the OCI blob cache was stored at $datastore/oci (default: /usr/share/oci in the Rocky Linux RPM packages). This location is not removed by wwctl clean.

If you are upgrading from v4.5.x and want to reclaim the space used by the old cache, you can safely delete this directory manually:

# rm -rf /usr/share/oci

Adjust the path if your installation used a non-default datastore setting in warewulf.conf.