homelab: k8s and monitoring

Oct 14, 2023

Some more k8s adventures:

After getting everything correctly set up and deploying a working nginx, was about to launch a container that I built that launches a Flask app that vomits errors in order to explore how error logging worked, but then got distracted by how many things there are in k8s and how does one monitor all these things…?
So I thought: monitoring. Ok, I like grafana, let’s see how k8s could be monitored on that… and can it be deployed on k8s as well? As it turns out, yes, yes it can and there is a whole page dedicated to it: https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/
Dutifully copy pasted it all, discovered that my persistent volume claim wasn’t working because…. oh, as it turns out, k8s needs you to set up a StorageClass… which I hadn’t set up…
This is where I start realizing that k8s is just a meta computer where you replace each and every physical part of your desktop with some set of services, some of which interact with other services more seamlessly than others, and that in fact the k8s documentation, although complete, is terrifyingly, infuriatingly neutral about what combo you might need, and therefore highly reminiscent of being transported back into the 90s where everything is RTFM to an extreme.
After spending about an hour reading through persistent volumes, persistent volume claims, and storage classes, went with the good old boring local StorageClass and got that up and running. Then mistakenly went down the path that grafana was in a CrashLoopBackOff due to not setting up a PV but as it turns out this is wrong, the PVC will handle that. I did manage to set up a PV which was a nice albeit useless aside.

And the reason for this post is to remind myself in the future that on a local kubernetes setup for grafana, you might run into this error:

GF_PATHS_DATA=’/var/lib/grafana’ is not writable

which is entirely due to the default manifest provided by grafana. As it turns out, you should just swap

          volumeMounts:
            - mountPath: /var/lib/grafana
              name: grafana-pv

with

          volumeMounts:
            - mountPath: grafana-pv:/var/lib/grafana
              name: grafana-pv

And then it will stop crashing. I’m sure I’ll totally understand this at some future point in time but right now that just felt like a magic voodoo incantation. I ended up looking at some commit here to figure this one out: https://github.com/questdb/questdb-slack-grafana-alerts/pull/5/commits/7cf28d57e142c2322157dcedc77626b6b26246b1

In any case once it stopped crashing I curled the :3000 endpoint, it responded, and I felt I was done for the day.

Well I also asked OpenAI out of curiosity to see if it would have helped me in this situation and as it turns out, yeah, no it wouldn’t have.

Next up: NodePort and other network magic incantations