Spring Tech Cleaning: The (not so) epic of Talos

This year’s tech spring cleaning I did a very needed software upgrade on a private server, due its installed OS (RancherOS) reached its End of Life two years ago (2021). Time to move on to a new container-centric OS. After some fine-print reading, I chose Talos for the task.

This solution layer is not something I handle on professional duties, so I’m writing this post to review personal notes/artifacts generated during the process and train some muscle memory.

Disclaimers:
This blog is mostly a self-documentation, as every information here is presented in some way on official documentation of used software.

Initial setup

The OS installation on every control and node use the same OS image, their roles will be assigned once you send a specific payload to that machine. This payload is generated by talosctl and also includes other stuff like authentication keys. Those configuration files must be securely stored.

After setting up talosctl binary on your workbench, generate default cluster configuration:

talosctl gen config $clustername https://$controlplaneip:6443 --output-dir tconf

Where:

  • $clustername: name for the cluster
  • $controlplaneip: IP of node which will be assigned as Control Plane. For my case it have a static IP, but for dynamic IPs you will be better with assigning a DNS hostname and using it here.
  • 6443: Kubernetes API (default) port.
  • tconf: path for a directory to write configs.

Reviewing configuration before deploying them

A best practice on distributed systems is avoiding the Manager of that systems (Control Plane) execute actual Worker workloads, otherwise you could cause resource starvation on Manager workloads and potentially put your cluster in jeopardize and a potential chain-reaction global failure. This thus requires at least two nodes for having a actual working environment. This best practice is the default for Talos.

Enabling work scheduling on Control Plane

Knowing that, and my needs for that server being very loose, I used the “trust me, I’m an Engineer” trump card and enabled scheduling of Work loads into Control Plane, so with only one node I could have a working server.

By documentation, it is simply as uncommenting line allowSchedulingOnControlPlanes: true on controlplane.yaml. Unfortunately, for some reason, that didn’t worked for me for me. So I made that change through a PATCH config: I wrote related delta configuration to a new YAML file and PATCHed Control Plane node after node activation with:

talosctl patch mc --nodes $controlplaneip --patch "tconf/cp-enableworker.yaml"
talosctl reboot

With cp-enableworker.yaml file containing:

cluster:
    allowSchedulingOnControlPlanes: true

Activating nodes with their roles

Activating the Control Plane node actual activation:

talosctl apply-config --insecure --nodes $controlplaneip --file "tconf/controlplane.yaml"
talosctl config endpoint $controlplaneip
talosctl bootstrap

If I had real Workers node, I would had fired the following commands for each Worker node:

talosctl apply-config --insecure --nodes $workerip --file "tconf/worker.yaml"

Wait until everything is green, Stage Running and Ready True

Hey Kube! It’s me, Talos

Retrieving a kube configuration pointing to Talos cluster is through a single operation kubeconfig:

talosctl --talosconfig tconf/talosconfig kubeconfig ~/.kube/

This will output directly to default kube settings folder, so any kubectl invocation will use Talos cluster as target.

Shortcut –talosconfig by defining a TALOSCONFIG environment variable

As section title tells, you can avoid rewriting --talosconfig path/to/talosconfig every time you need to invoke talosctl by defining an Environment variable named TALOSCONFIG.

Charting through the Helm of Awe

Kubernetes has its philosophy about how to deploy applications (operators), I won’t go there because those I had deployed prior this upgrade have a simplified alternative available, so I deployed them through Helm.

First I setup Helm binary on my workbench, added the charts to Helm, then created the required configuration files, and lastly sending the install.

For this recap, I’m using a PiHole chart.

Add to Helm repo:

helm repo add mojo2600 https://mojo2600.github.io/pihole-kubernetes/
helm repo update

Install application on Kubernetes through Helm:

helm install pihole mojo2600/pihole -f "values/pihole.values.xml" --dry-run --debug

Note: You can test/investigate deploy using --dry-run and --debug parameters.

Whenever I need to update that deploy with new (chart) configuration, I update the values file and issue a upgrade operation:

helm upgrade pihole mojo2600/pihole -f "values/pihole.values.xml"

The recap was with PiHole chart because one one possible variables dnsHostPort maps into hostPorts which is not allowed on newer Kubernetes version, in theory deployed app itself should not dictate which exposed port it runs. This is one point that should take attention when something does not run as expected, and this rule “violation” is reported on deployment status. You can override by modifying security on namespace by:

kubectl label ns default pod-security.kubernetes.io/enforce=privileged

Where:

  • default was the default namespace which I deployed an app requiring extra privileges.
  • pod-security.kubernetes.io/enforce=privileged is a security key/value described on Pod Security Standards.

Extras

Accessing non-exposed services when required

For PiHole example, on my deployment, the Web Admin is not exposed. Whenever I need to use it, I can do with a kubectl port-forward, that creates a tunnel binding a local port to a kube service port, so you access it safely through localhost while this process is active.

You can use kubectl get services to see all services running and locate the one you want to connect locally for inspection.

kubectl get services
kubectl port-forward service/pihole-web 80:80