rpi.carlosedp.cluster-monit.../Readme.md

208 lines
8.6 KiB
Markdown
Raw Normal View History

2019-04-23 16:39:59 +02:00
# Cluster Monitoring stack for ARM / X86-64 platforms
2018-03-02 01:03:53 +01:00
The Prometheus Operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.
2019-02-05 20:41:01 +01:00
This have been tested on a hybrid ARM64 / X84-64 Kubernetes cluster deployed as [this article](https://medium.com/@carlosedp/building-a-hybrid-x86-64-and-arm-kubernetes-cluster-e7f94ff6e51d).
2018-03-02 01:03:53 +01:00
2019-02-05 20:41:01 +01:00
This repository collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
2018-03-02 01:03:53 +01:00
2019-02-05 20:41:01 +01:00
The content of this project is written in jsonnet and is an extension of the fantastic [kube-prometheus](https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus) project.
2018-03-02 01:03:53 +01:00
2019-02-22 16:43:19 +01:00
To continue using my previous stack with manifests and previous versions of the operator and components, use the legacy repo tag from: https://github.com/carlosedp/prometheus-operator-ARM/tree/legacy.
2019-02-05 20:41:01 +01:00
Components included in this package:
2018-03-02 01:03:53 +01:00
2019-02-05 20:41:01 +01:00
* The Prometheus Operator
* Highly available Prometheus
* Highly available Alertmanager
* Prometheus node-exporter
2019-03-13 22:15:37 +01:00
* kube-state-metrics
* CoreDNS
* Grafana
* SMTP relay to Gmail for Grafana notifications
There are additional modules (disabled by default) to monitor other components of the infra-structure. These can be disabled on `vars.jsonnet` file by setting the module in `installModules` to `false`.
2019-03-13 22:15:37 +01:00
The additional modules are:
2019-02-05 20:41:01 +01:00
* ARM_exporter to generate temperature metrics
* MetalLB metrics
* Traefik metrics
2019-03-13 22:15:37 +01:00
* ElasticSearch metrics
* APC UPS metrics
2018-05-29 19:33:05 +02:00
2019-04-08 22:57:06 +02:00
There are also options to set the ingress domain suffix and enable persistence for Grafana and Prometheus.
After changing these parameters, rebuild the manifests with `make`.
2019-02-05 20:41:01 +01:00
## Quickstart
The repository already provides a set of compiled manifests to be applied into the cluster. The deployment can be customized thru the jsonnet files.
To simply deploy the stack, run:
2018-05-29 19:33:05 +02:00
2019-03-13 22:15:37 +01:00
```bash
$ make deploy
# Or manually:
2019-02-05 20:41:01 +01:00
$ kubectl apply -f manifests/
2018-03-02 01:03:53 +01:00
2019-02-05 20:41:01 +01:00
# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.
$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
2018-03-02 01:03:53 +01:00
2019-02-05 20:41:01 +01:00
$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).
```
If you get an error from applying the manifests, run the `make deploy` or `kubectl apply -f manifests/` again. Sometimes the resources required to apply the CRDs are not deployed yet.
## Customizing for K3s
To have your [K3s](https://github.com/rancher/k3s) cluster and the monitoring stack on it, follow the steps:
```bash
# Download K3s binary
wget https://github.com/rancher/k3s/releases/download/`curl -s https://api.github.com/repos/rancher/k3s/releases/latest | grep -oP '"tag_name": "\K(.*)(?=")'`/k3s && chmod +x k3s
# Move to your path
sudo mv k3s /usr/local/bin
# Start K3s
sudo k3s server --docker &
```
To generate the metrics with all metadata required by the dashboards, K3s needs to be started with Docker as the runtime.
Now to deploy the monitoring stack on your K3s cluster, there are three parameters to be configured on `vars.jsonnet`:
1. Set `k3s.enabled` to `true`.
2. Change your K3s master node IP(your VM or host IP) on `k3s.master_ip`.
3. Edit `suffixDomain` to have your node IP with the `.nip.io` suffix. This will be your ingress URL suffix.
2019-08-23 00:06:10 +02:00
4. Set _traefikExporter_ `enabled` parameter to `true` to collect Traefik metrics and deploy dashboard.
After changing these values, run `make` to build the manifests and `k3s kubectl apply -f manifests/` to apply the stack to your cluster. In case of errors on some resources, re-run the command.
Now you can open the applications:
To list the created ingresses, run `k3s kubectl get ingress --all-namespaces`.
* Grafana on [https://grafana.[your_node_ip].nip.io](https://grafana.[your_node_ip].nip.io),
* Prometheus on [https://prometheus.[your_node_ip].nip.io](https://prometheus.[your_node_ip].nip.io)
* Alertmanager on [https://alertmanager.[your_node_ip].nip.io](https://alertmanager.[your_node_ip].nip.io)
There are some dashboards that shows no values due to some cadvisor metrics not having the complete metadata if K3s is started with default script or no `--docker` arg. Check the open issues for more information.
## Updating the ingress suffixes
To avoid rebuilding all manifests, there is a make target to update the Ingress URL suffix to a different suffix (using nip.io) to match your host IP. Run `make change_suffix IP="[IP-ADDRESS]"` to change the ingress route IP for Grafana, Prometheus and Alertmanager and reapply the manifests. If you have a K3s cluster, run `make change_suffix IP="[IP-ADDRESS] K3S=k3s`.
2019-02-05 20:41:01 +01:00
## Customizing
The content of this project consists of a set of jsonnet files making up a library to be consumed.
### Pre-reqs
2019-03-13 22:15:37 +01:00
The project requires json-bundler and the jsonnet compiler. The Makefile does the heavy-lifting of installing them. You need [Go](https://golang.org/dl/) already installed:
2019-02-05 20:41:01 +01:00
2019-03-13 22:15:37 +01:00
```bash
git clone https://github.com/carlosedp/cluster-monitoring
2019-02-05 20:41:01 +01:00
cd prometheus-operator-ARM
make vendor
# Change the jsonnet files...
make
```
2019-03-13 22:15:37 +01:00
2019-02-05 20:41:01 +01:00
After this, a new customized set of manifests is built into the `manifests` dir. To apply to your cluster, run:
2019-03-13 22:15:37 +01:00
```bash
2019-02-05 20:41:01 +01:00
make deploy
```
To uninstall, run:
2019-03-13 22:15:37 +01:00
```bash
2019-02-05 20:41:01 +01:00
make teardown
2018-05-29 19:33:05 +02:00
```
## Images
2019-02-22 16:52:20 +01:00
This project depends on the following images (all supports ARM, ARM64 and AMD64 thru manifests):
2018-05-29 19:33:05 +02:00
**Alertmanager**
**Blackbox_exporter**
**Node_exporter**
**Snmp_exporter**
**Prometheus**
* Source: https://github.com/carlosedp/prometheus-ARM
* Autobuild: https://travis-ci.org/carlosedp/prometheus-ARM
* Images:
* https://hub.docker.com/r/carlosedp/prometheus/
* https://hub.docker.com/r/carlosedp/alertmanager/
* https://hub.docker.com/r/carlosedp/blackbox_exporter/
* https://hub.docker.com/r/carlosedp/node_exporter/
* https://hub.docker.com/r/carlosedp/snmp_exporter/
**ARM_exporter**
* Source: https://github.com/carlosedp/docker-arm_exporter
* Autobuild: https://travis-ci.org/carlosedp/docker-arm_exporter
* Images: https://hub.docker.com/r/carlosedp/arm_exporter/
**Prometheus-operator**
* Source: https://github.com/carlosedp/prometheus-operator
2018-06-11 17:31:49 +02:00
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
2018-05-29 19:33:05 +02:00
* Images: https://hub.docker.com/r/carlosedp/prometheus-operator
2019-02-05 20:41:01 +01:00
**Prometheus-adapter**
* Source: https://github.com/DirectXMan12/k8s-prometheus-adapter
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/k8s-prometheus-adapter
2018-05-29 19:33:05 +02:00
**Grafana**
* Source: https://github.com/carlosedp/grafana-ARM
* Autobuild: https://travis-ci.org/carlosedp/grafana-ARM
2019-02-05 20:41:01 +01:00
* Images: https://hub.docker.com/r/grafana/grafana/
2018-05-29 19:33:05 +02:00
**Kube-state-metrics**
* Source: https://github.com/kubernetes/kube-state-metrics
2018-06-11 17:31:49 +02:00
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
2018-05-29 19:33:05 +02:00
* Images: https://hub.docker.com/r/carlosedp/kube-state-metrics
**Addon-resizer**
2019-02-05 20:41:01 +01:00
* Source: https://github.com/kubernetes/autoscaler/tree/master/addon-resizer
2018-06-11 17:31:49 +02:00
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
2018-05-29 19:33:05 +02:00
* Images: https://hub.docker.com/r/carlosedp/addon-resizer
2019-02-22 16:52:20 +01:00
*Obs.* This image is a clone of [AMD64](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-amd64), [ARM64](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-arm64) and [ARM](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-arm64) with a manifest. It's cloned and generated by the `build_images.sh` script
2018-05-29 19:33:05 +02:00
**configmap_reload**
* Source: https://github.com/carlosedp/configmap-reload
* Autobuild: https://travis-ci.org/carlosedp/configmap-reload
* Images: https://hub.docker.com/r/carlosedp/configmap-reload
2018-03-02 01:03:53 +01:00
2018-06-11 17:31:49 +02:00
**prometheus-config-reloader**
* Source: https://github.com/coreos/prometheus-operator/tree/master/contrib/prometheus-config-reloader
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/prometheus-config-reloader
2018-05-29 19:33:05 +02:00
**SMTP-server**
2018-03-02 01:03:53 +01:00
2019-02-22 16:52:20 +01:00
* Source: https://github.com/carlosedp/docker-smtp
* Autobuild: https://travis-ci.org/carlosedp/docker-smtp
* Images: https://hub.docker.com/r/carlosedp/docker-smtp
2019-02-05 20:41:01 +01:00
**Kube-rbac-proxy**
2019-02-22 16:52:20 +01:00
* Source: https://github.com/brancz/kube-rbac-proxy
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/kube-rbac-proxy