From Single Node to Multi-Node - Building a Resilient OpenShift Home Lab
In this post, we are going to tear down the single node cluster that we built last time and replace it with a three node cluster. The single node cluster is great for maximizing your compute capabilities on edge type hardware such as the Intel NUC. However, you give up a lot of the resiliency that Kubernetes was designed for. Building a three node cluster will allow you to explore the resilient capabilities of Kubernetes and OpenShift.
So, if you’ve always wanted to explore the wonderful world of pod affinity, anti-affinity, node selectors, taints, tolerations, zero downtime updates, and so much more… Then continue on. You’ll have fun, I promise.
Note: This post assumes that you have completed the previous post: Back To Where It All Started - Let’s Build an OpenShift Home Lab
If you have not, then you should complete all of the steps up through this point: Set Up KVM Host
Once you have installed CentOS Stream on your KVM host, then return here.
Tear Down The Single Node Cluster
If you have deployed a single node cluster, the first thing that we need to do is tear it down. My lab cli can perform that service for you.
labcli --destroy -c -d=dev
Update the labcli
scripts. I may have fixed a couple of bugs between now and the last post.
WORK_DIR=$(mktemp -d)
git clone -b archive-2 https://github.com/cgruver/kamarotos.git ${WORK_DIR}
cp ${WORK_DIR}/bin/* ${HOME}/okd-lab/bin
chmod 700 ${HOME}/okd-lab/bin/*
cp -r ${WORK_DIR}/examples ${HOME}/okd-lab/lab-config
rm -rf ${WORK_DIR}
Once the previous cluster is cleaned up, we can prepare the environment for building a new cluster.
I have prepared an example for you to use. So, the next step is to setup the environment.
cp ${HOME}/okd-lab/lab-config/examples/basic-lab-3-node.yaml ${HOME}/okd-lab/lab-config
cp ${HOME}/okd-lab/lab-config/examples/cluster-configs/3-node-no-pi.yaml ${HOME}/okd-lab/lab-config/cluster-configs
ln -sf ${HOME}/okd-lab/lab-config/basic-lab-3-node.yaml ${HOME}/okd-lab/lab-config/lab.yaml
This command effectively replaced the lab configuration file for single node OpenShift with a configuration for 3 nodes.
Review the new configuration
-
Your lab domain will be:
my.awesome.lab
-
Your lab network will be:
10.11.12.0/24
-
These settings are in:
${HOME}/okd-lab/lab-config/lab.yaml
domain: my.awesome.lab network: 10.11.12.0 router-ip: 10.11.12.1 netmask: 255.255.255.0 centos-mirror: rsync://mirror.facebook.net/centos-stream/ sub-domain-configs: [] cluster-configs: - name: dev cluster-config-file: 3-node-no-pi.yaml domain: edge
-
The configuration file for your OpenShift cluster is in: `${HOME}/okd-lab/lab-config/cluster-configs/3-node-no-pi.yaml
cluster: name: okd4 cluster-cidr: 10.88.0.0/14 service-cidr: 172.20.0.0/16 remote-registry: quay.io/openshift/okd butane-version: v0.16.0 butane-spec-version: 1.4.0 butane-variant: fcos ingress-ip-addr: 10.11.12.2 bootstrap: metal: false node-spec: memory: 12288 cpu: 4 root-vol: 50 kvm-host: kvm-host01 ip-addr: 10.11.12.49 control-plane: metal: false node-spec: memory: 20480 cpu: 8 root-vol: 100 ceph: ceph-dev: sdb ceph-vol: 200 type: disk okd-hosts: - kvm-host: kvm-host01 ip-addr: 10.11.12.60 - kvm-host: kvm-host01 ip-addr: 10.11.12.61 - kvm-host: kvm-host01 ip-addr: 10.11.12.62 kvm-hosts: - host-name: kvm-host01 mac-addr: "YOUR_HOST_MAC_HERE" ip-addr: 10.11.12.200 disks: disk1: nvme0n1 disk2: NA
Note: You will need to replace
YOUR_HOST_MAC_HERE
with the MAC address of your server.
We are now ready to deploy our Three Node OpenShift cluster
Note: These instructions are pretty much identical to the single node cluster that we installed in the last post. The configuration files have taken care of the three node set up for you.
-
Set the lab environment variables:
labctx dev
-
Pull the latest release binaries for OKD:
labcli --latest
-
Deploy the configuration in preparation for the install:
labcli --deploy -c
This command does a lot of work for you.
- Creates the OpenShift install manifests
- Uses the
butane
cli to inject custom configurations into the ignition configs for the three cluster nodes - Creates the appropriate DNS entries and network configuration
- Prepares the iPXE boot configuration for each cluster node
- Configures Nginx on the router as the ingress load balancer for your cluster
-
Start the bootstrap node:
labcli --start -b
-
Start the control-plane nodes:
labcli --start -m
-
Monitor the bootstrap process:
labcli --monitor -b
Note: This command does not affect the install process. You can stop and restart it safely. It is just for monitoring the bootstrap.
Also Note: It will take a while for this command to stop throwing connection errors. You are effectively waiting for the bootstrap node to install its OS and start the bootstrap process. Be patient, and don’t worry.
If you want to watch logs for issues:
labcli --monitor -j
This command tails the journal log on the bootstrap node.
-
You will see the following, when the bootstrap is complete:
DEBUG Still waiting for the Kubernetes API: Get "https://api.okd4.my.awesome.lab:6443/version": read tcp 10.11.12.227:49643->10.11.12.2:6443: read: connection reset by peer - error from a previous attempt: read tcp 10.11.12.227:49642->10.11.12.2:6443: read: connection reset by peer INFO API v1.25.0-2786+eab9cc98fe4c00-dirty up DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... DEBUG Using Install Config loaded from state file INFO Waiting up to 30m0s (until 10:06AM) for bootstrapping to complete... DEBUG Bootstrap status: complete INFO It is now safe to remove the bootstrap resources DEBUG Time elapsed per stage: DEBUG Bootstrap Complete: 17m10s DEBUG API: 4m9s INFO Time elapsed: 17m10s
-
When the bootstrap process is complete, remove the bootstrap node:
labcli --destroy -b
This script shuts down and then deletes the Bootstrap VM. Then it removes the bootstrap entries from the DNS and network configuration.
-
Monitor the installation process:
labcli --monitor -i
Note: This command does not affect to install process. You can stop and restart it safely. It is just for monitoring.
-
Installation Complete:
DEBUG Cluster is initialized INFO Waiting up to 10m0s for the openshift-console route to be created... DEBUG Route found in openshift-console namespace: console DEBUG OpenShift console route is admitted INFO Install complete! INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/Users/yourhome/okd-lab/okd-install-dir/auth/kubeconfig' INFO Access the OpenShift web-console here: https://console-openshift-console.apps.okd4.my.awesome.lab INFO Login to the console with user: "kubeadmin", and password: "AhnsQ-CGRqg-gHu2h-rYZw3" DEBUG Time elapsed per stage: DEBUG Cluster Operators: 13m49s INFO Time elapsed: 13m49s
Post Install
-
Post Install Cleanup:
labcli --post
-
Trust the cluster certificates:
labcli --trust -c
-
Add Users:
Note: Make sure that the htpasswd command is installed on your system. It should be included by default on Mac OS. For Fedora, RHEL, or CentOS:
dnf install httpd-tools
Add a cluster-admin user:
labcli --user -i -a -u=admin
Note: You can ignore the warning:
Warning: User 'admin' not found
Add a non-privileged user:
labcli --user -u=devuser
Note: It will take a couple of minutes for the
authentication
services to restart after you create these user accounts.Note: This deletes the temporary
kubeadmin
account. Youradmin
user will now have cluster admin rights.
Install The Rook/Ceph Operator as a storage Provisioner
In the single node cluster build, we used the host path provisioner for storage. This works fine on a single node because we don’t have to worry about where pods get scheduled relative to their storage. In a multi node cluster, however we need a storage provisioner that is able to serve pods regardless of which node they get scheduled on.
I have prepared an opinionated install of the Rook operator and a Ceph storage cluster. Your OpenShift nodes were created with an additional block device attached to the virtual machines. We’ll use those devices as the underpinnings of a Ceph storage cluster.
Execute the following to install the Ceph cluster and create a storage class. You will also create a PVC for the internal image registry.
-
Install the Rook Operator:
labcli --ceph -i
-
Create a Ceph cluster:
labcli --ceph -c
-
Wait for the Ceph cluster to complete its install:
Note: This will take a good while to complete.
You can watch for the cluster to be complete by looking for the completion of the OSD preparation jobs.
watch oc get jobs -n rook-ceph
When you see all three jobs completed, then the install is done:
NAME COMPLETIONS DURATION AGE rook-ceph-osd-prepare-okd4-master-0.my.awesome.lab 1/1 17s 4m9s rook-ceph-osd-prepare-okd4-master-1.my.awesome.lab 1/1 17s 4m8s rook-ceph-osd-prepare-okd4-master-2.my.awesome.lab 1/1 18s 4m8s
-
Create a PVC for the internal image registry:
labcli --ceph -r
Verify that the internal image registry has a bound PVC:
-
Log into your cluster:
oc login -u admin https://api.okd4.my.awesome.lab:6443
Note: Use the
admin
password for the user that you created above. -
Check the PV that was created:
oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-e6063131-f362-4c42-8adf-798d8cb9267b 100Gi RWO Delete Bound openshift-image-registry/registry-pvc rook-ceph-block 97s
That’s it!
Have fun with OpenShift