Docker will run on an OTG network in the same way it would any other - providing that the various hosts have connectivity with each other. Why would I use OTG networking? OTG networking when done right means that a Pi Zero can be connected to almost any host device (PC, Pi 3, Laptop) with a single USB cable to create a fully-workable network.
In this post, I walk through the steps I took in order to network boot a Raspberry Pi with support for running Docker. Most guides on network booting a Pi use NFS for storage and therefore don’t support running Docker because the default storage driver is
overlay2, which uses
overlayfs, which as of this writing does not support NFS when using multiple lower layers.
Therefore, instead of using NFS we will use iSCSI with ZFS as the backing store. While there are guides out network booting a Pi using iSCSI, there are certainly fewer of them and it seems to be the path less traveled. Nothing I present in this guide is particularly groundbreaking – it’s mostly a combination of the work of many other guides (which I’ve tried my best to link to in each section). However, some bugs and quirks that were avoided in older guides have since been fixed and are no longer necessary, so I thought an updated guide with a ZFS twist could be helpful!
I’m going through this exercise because I’m one of the many people that has gotten burned by having their Pi’s SD card die and not having proper backups. Well no more! Of course, a simpler solution could be to boot via a USB hard drive, but where’s the fun in that? In all seriousness, booting from the network does give us some cool advantages: we can use the much larger storage capacity of a server, we get all the cool features and reliability of ZFS, and we can easily reimage the Pi remotely!
For this guide, I’ll be using a Raspberry Pi 3 Model B+. From what I’ve read, this should work for the Raspberry Pi 3 Model B and the Raspberry Pi 4, although I don’t have those devices so I haven’t tested it first-hand.
To support the network booting, I have my server setup to be running Proxmox as a hypervisor that is hosting an Ubuntu Server VM to run Docker containers. I have my Proxmox server setup to use ZFS (and eventually will be setting up MergerFS+SnapRAID). I will be using the Proxmox host itself to present NFS shares and iSCSI targets, but will use Docker containers to do the rest (TFTP).
When the Pi network boots, it will discover the IP address via DHCP of a TFTP server that will provide the contents necessary to bootstrap the Pi (essentially, this is
bootcode.bin and the rest of the
/boot partition). The TFTP server will be a Docker container that will get its content via an NFS share exposed by the host. Later, the Pi will mount this NFS share as its
/boot partition such that future kernel updates will be reflected over the network.
We will configure the contents of the TFTP share to instruct the Pi to boot using a special initramfs image that we’ll build that has iSCSI support and will instruct the Pi to mount its root partition via iSCSI. This iSCSI target will be exposed on the Proxmox host and be backed by a ZFS block device.
See? It’s almost too easy!
It’s important to acknowledge that this setup is extremely insecure. I’m hardly a security expert, but despite that, I’m still able to poke enough holes in this to make it look like Swiss cheese:
/bootdirectory are accessible via NFS and read/write-able, and we’ve only locked that down by IP address. Using this, the iSCSI username and password can be discovered, granting read/write access to the root partition.
I’ve tried to mitigate this (somewhat) in this guide by attempting to lock things down as much as possible (using read-only when possible, using auth, etc.). Understand, however, that this is still the equivalent of putting the keys to your house under the front doormat – it wouldn’t take much to compromise this setup.
This being said, I weigh these risks against my threat model. To exploit any of these concerns, an attacker would already have to be in my network and able to intercept and manipulate traffic. Furthermore, I don’t intend to do anything mission-critical or sensitive on my Pi – I’m just going to be using it to run the OpenZWave Docker container for integration with Home Assistant. I’m hardly concerned about having a potential attacker be able to control my lights!
There are probably ways to lock this down that I may explore in the future. Secrets could be stored on the SD card. The Pi could be placed into its own vLAN. Probably other stuff as well – have I mentioned I’m not a security guru? I’d love to hear suggestions on how to improve upon this! But I’ve determined that this is good enough, for me, for my risk model, for now.
program_usb_boot_mode=1to the file.
dtparam=audio=on), WiFi (
dtoverlay=disable-wifi), and Bluetooth (
fb5d1ece. I’ve set my Pi’s hostname accordingly to
We need a way to respond to the Pi via DHCP with the IP of the TFTP server we’ll be setting up later that will host our
/boot directory. There are a number of ways to do this that may vary based on your setup. Many of the guides out there will mention having to setup a
Raspberry Pi Boot option, however, I didn’t find this necessary anymore. I presume that this has been fixed with more recent
While this isn’t required, I find that I prefer to have a Static IP for the Pi. This way, it’s easier to SSH into and I’m able to lock down the NFS and iSCSI shares a little tighter in the following sections.
Note that if you don’t set a Static IP, your Pi might obtain two separate DHCP leases. This post discusses how to resolve that.
Once again, since I’m running UniFi gear, this was as easy as setting a static IP for the Pi. Yes, this means I’m using DHCP to assign an IP to the Pi, but since it’s static, I avoid the issue of duplicate leases.
Before installing a bunch of things into the Proxmox host, since I’m using ZFS, I can take a snapshot of the host, should anything go haywire and I need to rollback. Optionally, all the VMs on the host can be stopped prior to taking the snapshot to get a more accurate image. I didn’t bother with this.
While I’m messing with the system install, I figured I might as well install any updates.
We’re going to create two datasets: one that will serve the necessary boot files for booting via TFTP and for mounting and updating once booted via NFS and one that will serve as the root volume. The root volume can be any desired size or name (it doesn’t have to match the Pi’s serial number).
Optionally (but recommended), create a filesystem specifically for our Pi (using your Pi’s serial number). I recommend setting a quote for this filesystem as well because it will be easily writeable via NFS, so it’s nice to constrain its growth.
Alternatively, LVM could probably be used. But I already have ZFS and I think its features are neat so I’ll be using that.
I’ll be using the kernel’s NFS implementation rather than ZFS’s NFS implementation. I don’t doubt that ZFS’s implementation is suitable for this purpose, however, most of the existing guides use NFS and I was already planning on using NFS for non-ZFS data in the future anyways.
First, we’ll make the directories that we’ll serve NFS out of. Then we’ll setup a “bind mount” from where the ZFS dataset is mounted to the directory we just created. Update January 30, 2021 – The fstab entry’s options were changed to be
rbind instead of
bind (in case separate boot datasets were created) and to wait for the ZFS datasets to be mounted.
Now that the directory is mounted, we’ll create the directory for our particular Pi’s boot data to live in. Whenever a Pi network boots, it loads
bootcode.bin from the root of the TFTP share, then searches for the remaining files in a directory with its serial number before finally looking in the root. We’ll create a directory for our particular Pi using the serial number we noted above:
Now that our directories are in place, we can install the NFS server:
/etc/exports to expose the two shares. The first share gives read-only access to the entirety of the TFTP share to the IP address of the VM that’s running Docker. This will be needed when we setup the TFTP Docker container. The container only needs serve the files, so we only give it read-only access. The second share gives read/write access to our particular Pi. Our Pi will mount this as its
/boot directory so that any updates will get persisted. This assumes that both the Docker VM and Pi have static IPs. If this isn’t the case, a simpler configuration that simply provides read/write access to the entirety of the share would suffice. It’s imporant to note that the
no_root_squash option is extremely insecure because this allows anybody to write files onto the host as root. However, we have this pretty well constrained to just this boot directory, so the risk seems minimal. Update January 30, 2021 – Added the
crossmnt option to the parent
netboot share (in case separate boot datasets were created).
Finally, we’ll refresh NFS with its new configuration:
First we’ll install iSCSI:
Next, we’ll try to enable and start iSCSI, but this will likely fail because Debian (which Proxmox is based off of) doesn’t ship a systemd unit file for some reason(?).
Assuming that worked, the rest of this section can be skipped. If it didn’t work, we’ll have to create the systemd unit file.
Create the file
Then, copy the file, mark it as executable, and attempt to enable and start again:
Next we’ll set up the iSCSI “target” on the host. In iSCSI terms, the “client” is the “initiator” and the “server” is the “target.
First we’ll create the backing store. This can be named whatever you’d like, I named mine consistently with my Pi’s hostname (and the name of the ZFS volume).
Next, we’ll create the target and
cd into it:
We’ll map to the backing store:
Then we’ll create an ACL. We’ll be using the “initiator name” that we noted above.
Lastly, we’ll confirm the entire configuration:
You should see something like this:
Finally, save and quit:
In order to be able to connect to your iSCSI drive during the boot, you’ll need to load an initrd image with the required module.
First, tell the initramfs tool to include the iscsi module by creating the required flag file and create the initramfs image for the current kernel:
The new initrd can be found in
We’ll need to edit the iSCSI configuration file so that the module can successfully load:
After rebooting, confirm that the modules loaded successfully:
Next, discover all the iSCSI targets available:
Consider rebooting the server to ensure the iSCSI targets persist.
Next, mount the target:
Confirm that the target is mounted and take note of the
dev entry (probably
On the Pi, format the iSCSI target:
Take note of new drive’s UUID, we’ll be using this later
Mount the iSCSI target:
Copy the Pi installation to the iSCSI target, excluding system directories, and then make new system directories:
Finally, we need to fix the fstab on the iSCSI target, otherwise when we do finally network boot the Pi will try to mount the SD card:
In the Docker VM, add the following to your docker-compose file:
Pull and start the container:
Verify that the NFS mount worked correctly:
Take note that if you need to change the configuration for the NFS volume in the future that simply changing in the compose file will not apply your changes. You will instead need to
docker rm it from the container (or
docker rm the container first) and then
docker rm the volume).
On the host, copy the
/boot directory from the Pi and
cd into the directory:
config.txt, to use the use the initramfs image we prepared earlier that contains the iSCSI module:
Now we’ll modify the
Finally, remember that the Pi looks in the root for
bootcode.bin. We’ll create a symbolic link from the rot to our specific Pi’s
bootcode.bin. This way, if the Pi updates the
bootcode.bin in its
/boot directory, it’ll boot with the updated file the next time. This probably isn’t super ideal if you have a bunch of Pi’s (especially if they’re different versions), so try to keep the Pi’s roughly on the same versions. As far as I’m aware, the Raspberry Pi 4 doesn’t use
bootcode.bin, so this is only a problem for older Pi’s.
It might be wise to create a backup or a ZFS snapshot of the boot directory at this point, just in case a future update breaks things.
Now it’s time to boot the Pi! Shutdown the Pi:
Once the Pi powers down, remove the SD card, power it back on (unplug and replug the power), and cross your fingers!
The Pi will take a while (minute-ish) to come up. If it doesn’t, proceed to the debugging section below…
My Pi didn’t originally startup, so I have some limited experience with debugging.
If the Pi sits at a black screen (like mine) and never shows the rainbow splash screen, this means it isn’t loading the
bootcode.bin properly. On any machine on your network, run
tcpdump -vv -i <eth0> port 67 or port 68 or port 69, reboot the Pi, and examine the output. If you don’t see any output, then the Pi isn’t discovering the TFTP server via DHCP correctly. If you do see the output, then your TFTP server likely isn’t setup properly. Try to login to the TFTP server and
get bootcode.bin. Originally I had setup my symbolic link to be an absolute path instead of a relative path, which didn’t work over NFS.
This reaches the extent of my debugging so far. For more ideas, check the sources below.
Docker was the whole reason for using iSCSI over NFS (for me at least). Fortunately, installing Docker and docker-compose is simple!
I haven’t actually done a kernel update yet, so these instructions are just borrowed from my sources. After doing an
apt dist-upgrade, the initramfs image should be created. If not, this can be created similar to above except by specifying the new kernel version instead of using
uname. Then, update the
config.txt to point at the new initramfs image, reboot, and cross your fingers!