From Single Node to Multi-Node - Building a Resilient OpenShift Home Lab

March 18, 2023 6 minute read

In this post, we are going to tear down the single node cluster that we built last time and replace it with a three node cluster. The single node cluster is great for maximizing your compute capabilities on edge type hardware such as the Intel NUC. However, you give up a lot of the resiliency that Kubernetes was designed for. Building a three node cluster will allow you to explore the resilient capabilities of Kubernetes and OpenShift.

So, if you’ve always wanted to explore the wonderful world of pod affinity, anti-affinity, node selectors, taints, tolerations, zero downtime updates, and so much more… Then continue on. You’ll have fun, I promise.

Note: This post assumes that you have completed the previous post: Back To Where It All Started - Let’s Build an OpenShift Home Lab

If you have not, then you should complete all of the steps up through this point: Set Up KVM Host

Once you have installed CentOS Stream on your KVM host, then return here.

Tear Down The Single Node Cluster

If you have deployed a single node cluster, the first thing that we need to do is tear it down. My lab cli can perform that service for you.

labcli --destroy -c -d=dev

Update the labcli scripts. I may have fixed a couple of bugs between now and the last post.

WORK_DIR=$(mktemp -d)
git clone -b archive-2 https://github.com/cgruver/kamarotos.git ${WORK_DIR}
cp ${WORK_DIR}/bin/* ${HOME}/okd-lab/bin
chmod 700 ${HOME}/okd-lab/bin/*
cp -r ${WORK_DIR}/examples ${HOME}/okd-lab/lab-config
rm -rf ${WORK_DIR}

Once the previous cluster is cleaned up, we can prepare the environment for building a new cluster.

I have prepared an example for you to use. So, the next step is to setup the environment.

cp ${HOME}/okd-lab/lab-config/examples/basic-lab-3-node.yaml ${HOME}/okd-lab/lab-config
cp ${HOME}/okd-lab/lab-config/examples/cluster-configs/3-node-no-pi.yaml ${HOME}/okd-lab/lab-config/cluster-configs

ln -sf ${HOME}/okd-lab/lab-config/basic-lab-3-node.yaml ${HOME}/okd-lab/lab-config/lab.yaml

This command effectively replaced the lab configuration file for single node OpenShift with a configuration for 3 nodes.

Review the new configuration

Your lab domain will be:

my.awesome.lab
Your lab network will be:

10.11.12.0/24

These settings are in: ${HOME}/okd-lab/lab-config/lab.yaml

domain: my.awesome.lab
network: 10.11.12.0
router-ip: 10.11.12.1
netmask: 255.255.255.0
centos-mirror: rsync://mirror.facebook.net/centos-stream/
sub-domain-configs: []
cluster-configs:
  - name: dev
    cluster-config-file: 3-node-no-pi.yaml
    domain: edge

The configuration file for your OpenShift cluster is in: `${HOME}/okd-lab/lab-config/cluster-configs/3-node-no-pi.yaml

cluster:
  name: okd4
  cluster-cidr: 10.88.0.0/14
  service-cidr: 172.20.0.0/16
  remote-registry: quay.io/openshift/okd
  butane-version: v0.16.0
  butane-spec-version: 1.4.0
  butane-variant: fcos
  ingress-ip-addr: 10.11.12.2
bootstrap:
  metal: false
  node-spec:
    memory: 12288
    cpu: 4
    root-vol: 50
  kvm-host: kvm-host01
  ip-addr: 10.11.12.49
control-plane:
  metal: false
  node-spec:
    memory: 20480
    cpu: 8
    root-vol: 100
  ceph:
    ceph-dev: sdb
    ceph-vol: 200
    type: disk
  okd-hosts:
    - kvm-host: kvm-host01
      ip-addr: 10.11.12.60
    - kvm-host: kvm-host01
      ip-addr: 10.11.12.61
    - kvm-host: kvm-host01
      ip-addr: 10.11.12.62
kvm-hosts:
  - host-name: kvm-host01
    mac-addr: "YOUR_HOST_MAC_HERE"
    ip-addr: 10.11.12.200
    disks:
      disk1: nvme0n1
      disk2: NA

Note: You will need to replace YOUR_HOST_MAC_HERE with the MAC address of your server.

We are now ready to deploy our Three Node OpenShift cluster

Note: These instructions are pretty much identical to the single node cluster that we installed in the last post. The configuration files have taken care of the three node set up for you.

Set the lab environment variables:
```
labctx dev
```
Pull the latest release binaries for OKD:
```
labcli --latest
```
Deploy the configuration in preparation for the install:
```
labcli --deploy -c
```
This command does a lot of work for you.
- Creates the OpenShift install manifests
- Uses the butane cli to inject custom configurations into the ignition configs for the three cluster nodes
- Creates the appropriate DNS entries and network configuration
- Prepares the iPXE boot configuration for each cluster node
- Configures Nginx on the router as the ingress load balancer for your cluster
Start the bootstrap node:
```
labcli --start -b
```
Start the control-plane nodes:
```
labcli --start -m
```
Monitor the bootstrap process:
```
labcli --monitor -b
```
Note: This command does not affect the install process. You can stop and restart it safely. It is just for monitoring the bootstrap.

Also Note: It will take a while for this command to stop throwing connection errors. You are effectively waiting for the bootstrap node to install its OS and start the bootstrap process. Be patient, and don’t worry.

If you want to watch logs for issues:
```
labcli --monitor -j
```
This command tails the journal log on the bootstrap node.

You will see the following, when the bootstrap is complete:

DEBUG Still waiting for the Kubernetes API: Get "https://api.okd4.my.awesome.lab:6443/version": read tcp 10.11.12.227:49643->10.11.12.2:6443: read: connection reset by peer - error from a previous attempt: read tcp 10.11.12.227:49642->10.11.12.2:6443: read: connection reset by peer 
INFO API v1.25.0-2786+eab9cc98fe4c00-dirty up     
DEBUG Loading Install Config...                    
DEBUG   Loading SSH Key...                         
DEBUG   Loading Base Domain...                     
DEBUG     Loading Platform...                      
DEBUG   Loading Cluster Name...                    
DEBUG     Loading Base Domain...                   
DEBUG     Loading Platform...                      
DEBUG   Loading Networking...                      
DEBUG     Loading Platform...                      
DEBUG   Loading Pull Secret...                     
DEBUG   Loading Platform...                        
DEBUG Using Install Config loaded from state file  
INFO Waiting up to 30m0s (until 10:06AM) for bootstrapping to complete... 
DEBUG Bootstrap status: complete                   
INFO It is now safe to remove the bootstrap resources 
DEBUG Time elapsed per stage:                      
DEBUG Bootstrap Complete: 17m10s                   
DEBUG                API: 4m9s                     
INFO Time elapsed: 17m10s                         

When the bootstrap process is complete, remove the bootstrap node:
```
labcli --destroy -b
```
This script shuts down and then deletes the Bootstrap VM. Then it removes the bootstrap entries from the DNS and network configuration.
Monitor the installation process:
```
labcli --monitor -i
```
Note: This command does not affect to install process. You can stop and restart it safely. It is just for monitoring.

Installation Complete:

DEBUG Cluster is initialized                       
INFO Waiting up to 10m0s for the openshift-console route to be created... 
DEBUG Route found in openshift-console namespace: console 
DEBUG OpenShift console route is admitted          
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/Users/yourhome/okd-lab/okd-install-dir/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.okd4.my.awesome.lab 
INFO Login to the console with user: "kubeadmin", and password: "AhnsQ-CGRqg-gHu2h-rYZw3" 
DEBUG Time elapsed per stage:                      
DEBUG Cluster Operators: 13m49s                    
INFO Time elapsed: 13m49s

Post Install

Post Install Cleanup:
```
labcli --post
```
Trust the cluster certificates:
```
labcli --trust -c
```
Add Users:

Note: Make sure that the htpasswd command is installed on your system. It should be included by default on Mac OS. For Fedora, RHEL, or CentOS: dnf install httpd-tools

Add a cluster-admin user:
```
labcli --user -i -a -u=admin
```
Note: You can ignore the warning: Warning: User 'admin' not found

Add a non-privileged user:
```
labcli --user -u=devuser
```
Note: It will take a couple of minutes for the authentication services to restart after you create these user accounts.

Note: This deletes the temporary kubeadmin account. Your admin user will now have cluster admin rights.

Install The Rook/Ceph Operator as a storage Provisioner

In the single node cluster build, we used the host path provisioner for storage. This works fine on a single node because we don’t have to worry about where pods get scheduled relative to their storage. In a multi node cluster, however we need a storage provisioner that is able to serve pods regardless of which node they get scheduled on.

I have prepared an opinionated install of the Rook operator and a Ceph storage cluster. Your OpenShift nodes were created with an additional block device attached to the virtual machines. We’ll use those devices as the underpinnings of a Ceph storage cluster.

Execute the following to install the Ceph cluster and create a storage class. You will also create a PVC for the internal image registry.

Install the Rook Operator:
```
labcli --ceph -i
```
Create a Ceph cluster:
```
labcli --ceph -c
```

Wait for the Ceph cluster to complete its install:

Note: This will take a good while to complete.

You can watch for the cluster to be complete by looking for the completion of the OSD preparation jobs.

watch oc get jobs -n rook-ceph

When you see all three jobs completed, then the install is done:

NAME                                                 COMPLETIONS   DURATION   AGE
rook-ceph-osd-prepare-okd4-master-0.my.awesome.lab   1/1           17s        4m9s
rook-ceph-osd-prepare-okd4-master-1.my.awesome.lab   1/1           17s        4m8s
rook-ceph-osd-prepare-okd4-master-2.my.awesome.lab   1/1           18s        4m8s

Create a PVC for the internal image registry:
```
labcli --ceph -r
```

Verify that the internal image registry has a bound PVC:

Log into your cluster:
```
oc login -u admin https://api.okd4.my.awesome.lab:6443
```
Note: Use the admin password for the user that you created above.

Check the PV that was created:

oc get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                   STORAGECLASS      REASON   AGE
pvc-e6063131-f362-4c42-8adf-798d8cb9267b   100Gi      RWO            Delete           Bound    openshift-image-registry/registry-pvc   rook-ceph-block            97s

That’s it!

Have fun with OpenShift

cgruver

From Single Node to Multi-Node - Building a Resilient OpenShift Home Lab

Tear Down The Single Node Cluster

Review the new configuration

We are now ready to deploy our Three Node OpenShift cluster

Post Install

Install The Rook/Ceph Operator as a storage Provisioner

You may also enjoy

Building a Single Node OpenShift Home Lab - Agent Based Install

Taking OpenShift Outdoors - Introducing a new format: VLOG

Eclipse Che / OpenShift Dev Spaces - Podman With Fuse Overlay

Try To Pull Yourself Up By Your Bootstraps - Without Falling Over…