rpi.carlosedp.cluster-monit.../Readme.md

# Cluster Monitoring stack for ARM / X86-64 platforms

The Prometheus Operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.

This have been tested on a hybrid ARM64 / X84-64 Kubernetes cluster deployed as [this article](https://medium.com/@carlosedp/building-a-hybrid-x86-64-and-arm-kubernetes-cluster-e7f94ff6e51d).

This repository collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator. 

The content of this project is written in jsonnet and is an extension of the fantastic [kube-prometheus](https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus) project.

To continue using my previous stack with manifests and previous versions of the operator and components, use the legacy repo tag from: https://github.com/carlosedp/prometheus-operator-ARM/tree/legacy.

Components included in this package:

* The Prometheus Operator
* Highly available Prometheus
* Highly available Alertmanager
* Prometheus node-exporter
* kube-state-metrics
* CoreDNS
* Grafana
* SMTP relay to Gmail for Grafana notifications

There are additional modules (disabled by default) to monitor other components of the infra-structure. These can be disabled on `vars.jsonnet` file by setting the module in `installModules` to `false`.

The additional modules are:

* ARM_exporter to generate temperature metrics
* MetalLB metrics
* Traefik metrics
* ElasticSearch metrics
* APC UPS metrics

There are also options to set the ingress domain suffix and enable persistence for Grafana and Prometheus.

After changing these parameters, rebuild the manifests with `make`.

## Quickstart

The repository already provides a set of compiled manifests to be applied into the cluster. The deployment can be customized thru the jsonnet files.

To simply deploy the stack, run:

```bash
$ make deploy

# Or manually:

$ kubectl apply -f manifests/

# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.
$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done

$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).
```

If you get an error from applying the manifests, run the `make deploy` or `kubectl apply -f manifests/` again. Sometimes the resources required to apply the CRDs are not deployed yet.

## Customizing for K3s

To have your [K3s](https://github.com/rancher/k3s) cluster and the monitoring stack on it, deploy K3s with `curl -sfL https://get.k3s.io | sh -`.

Now to deploy the monitoring stack on your K3s cluster, there are three parameters to be configured on `vars.jsonnet`:

1. Set `k3s.enabled` to `true`.
2. Change your K3s master node IP(your VM or host IP) on `k3s.master_ip`.
3. Edit `suffixDomain` to have your node IP with the `.nip.io` suffix. This will be your ingress URL suffix.

After changing these values, run `make` to build the manifests and `k3s kubectl apply -f manifests/` to apply the stack to your cluster. In case of errors on some resources, re-run the command.

Now you can open the applications:

* Grafana on [https://grafana.[your_node_ip].nip.io](https://grafana.[your_node_ip].nip.io), 
* Prometheus on [https://prometheus.[your_node_ip].nip.io](https://prometheus.[your_node_ip].nip.io) 
* Alertmanager on [https://alertmanager.[your_node_ip].nip.io](https://alertmanager.[your_node_ip].nip.io)

There are some dashboards that shows no values due to some cadvisor metrics not having the complete metadata. Check the open issues for more information.

## Updating the ingress suffixes

To avoid rebuilding all manifests, there is a make target to update the Ingress URL suffix to a different suffix (using nip.io) to match your host IP. Run `make change_suffix IP="[IP-ADDRESS]"` to change the ingress route IP for Grafana, Prometheus and Alertmanager and reapply the manifests. If you have a K3s cluster, run `make change_suffix IP="[IP-ADDRESS] K3S=k3s`.

## Customizing

The content of this project consists of a set of jsonnet files making up a library to be consumed.

### Pre-reqs

The project requires json-bundler and the jsonnet compiler. The Makefile does the heavy-lifting of installing them. You need [Go](https://golang.org/dl/) already installed:

```bash
git clone https://github.com/carlosedp/prometheus-operator-ARM
cd prometheus-operator-ARM
make vendor
# Change the jsonnet files...
make
```

After this, a new customized set of manifests is built into the `manifests` dir. To apply to your cluster, run:

```bash
make deploy
```

To uninstall, run:

```bash
make teardown
```

## Images

This project depends on the following images (all supports ARM, ARM64 and AMD64 thru manifests):

**Alertmanager**
**Blackbox_exporter**
**Node_exporter**
**Snmp_exporter**
**Prometheus**

* Source: https://github.com/carlosedp/prometheus-ARM
* Autobuild: https://travis-ci.org/carlosedp/prometheus-ARM
* Images:
    * https://hub.docker.com/r/carlosedp/prometheus/
    * https://hub.docker.com/r/carlosedp/alertmanager/
    * https://hub.docker.com/r/carlosedp/blackbox_exporter/
    * https://hub.docker.com/r/carlosedp/node_exporter/
    * https://hub.docker.com/r/carlosedp/snmp_exporter/

**ARM_exporter**

* Source: https://github.com/carlosedp/docker-arm_exporter
* Autobuild: https://travis-ci.org/carlosedp/docker-arm_exporter
* Images: https://hub.docker.com/r/carlosedp/arm_exporter/

**Prometheus-operator**

* Source: https://github.com/carlosedp/prometheus-operator
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/prometheus-operator

**Prometheus-adapter**

* Source: https://github.com/DirectXMan12/k8s-prometheus-adapter
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/k8s-prometheus-adapter

**Grafana**

* Source: https://github.com/carlosedp/grafana-ARM
* Autobuild: https://travis-ci.org/carlosedp/grafana-ARM
* Images: https://hub.docker.com/r/grafana/grafana/

**Kube-state-metrics**

* Source: https://github.com/kubernetes/kube-state-metrics
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/kube-state-metrics

**Addon-resizer**

* Source: https://github.com/kubernetes/autoscaler/tree/master/addon-resizer
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/addon-resizer

*Obs.* This image is a clone of [AMD64](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-amd64), [ARM64](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-arm64) and [ARM](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-arm64) with a manifest. It's cloned and generated by the `build_images.sh` script

**configmap_reload**

* Source: https://github.com/carlosedp/configmap-reload
* Autobuild: https://travis-ci.org/carlosedp/configmap-reload
* Images: https://hub.docker.com/r/carlosedp/configmap-reload

**prometheus-config-reloader**

* Source: https://github.com/coreos/prometheus-operator/tree/master/contrib/prometheus-config-reloader
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/prometheus-config-reloader

**SMTP-server**

* Source: https://github.com/carlosedp/docker-smtp
* Autobuild: https://travis-ci.org/carlosedp/docker-smtp
* Images: https://hub.docker.com/r/carlosedp/docker-smtp

**Kube-rbac-proxy**

* Source: https://github.com/brancz/kube-rbac-proxy
* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
* Images: https://hub.docker.com/r/carlosedp/kube-rbac-proxy
Update Readme.md 2019-04-23 16:39:59 +02:00			`# Cluster Monitoring stack for ARM / X86-64 platforms`
Initial import 2018-03-02 01:03:53 +01:00
			`The Prometheus Operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.`

Updated README 2019-02-05 20:41:01 +01:00			`This have been tested on a hybrid ARM64 / X84-64 Kubernetes cluster deployed as [this article](https://medium.com/@carlosedp/building-a-hybrid-x86-64-and-arm-kubernetes-cluster-e7f94ff6e51d).`
Initial import 2018-03-02 01:03:53 +01:00
Updated README 2019-02-05 20:41:01 +01:00			`This repository collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.`
Initial import 2018-03-02 01:03:53 +01:00
Updated README 2019-02-05 20:41:01 +01:00			`The content of this project is written in jsonnet and is an extension of the fantastic [kube-prometheus](https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus) project.`
Initial import 2018-03-02 01:03:53 +01:00
Updated readme 2019-02-22 16:43:19 +01:00			`To continue using my previous stack with manifests and previous versions of the operator and components, use the legacy repo tag from: https://github.com/carlosedp/prometheus-operator-ARM/tree/legacy.`

Updated README 2019-02-05 20:41:01 +01:00			`Components included in this package:`
Initial import 2018-03-02 01:03:53 +01:00
Updated README 2019-02-05 20:41:01 +01:00			`* The Prometheus Operator`
			`* Highly available Prometheus`
			`* Highly available Alertmanager`
			`* Prometheus node-exporter`
Update readme 2019-03-13 22:15:37 +01:00			`* kube-state-metrics`
			`* CoreDNS`
			`* Grafana`
			`* SMTP relay to Gmail for Grafana notifications`

Fix issue #11. Arm-exporter didn't have a ServiceAccount and TLS params 2019-04-23 17:03:34 +02:00			There are additional modules (disabled by default) to monitor other components of the infra-structure. These can be disabled on `vars.jsonnet` file by setting the module in `installModules` to `false`.
Update readme 2019-03-13 22:15:37 +01:00
			`The additional modules are:`

Updated README 2019-02-05 20:41:01 +01:00			`* ARM_exporter to generate temperature metrics`
			`* MetalLB metrics`
			`* Traefik metrics`
Update readme 2019-03-13 22:15:37 +01:00			`* ElasticSearch metrics`
			`* APC UPS metrics`
Updated readme 2018-05-29 19:33:05 +02:00
Updated readme 2019-04-08 22:57:06 +02:00			`There are also options to set the ingress domain suffix and enable persistence for Grafana and Prometheus.`

Fix issue #11. Arm-exporter didn't have a ServiceAccount and TLS params 2019-04-23 17:03:34 +02:00			After changing these parameters, rebuild the manifests with `make`.

Updated README 2019-02-05 20:41:01 +01:00			`## Quickstart`

			`The repository already provides a set of compiled manifests to be applied into the cluster. The deployment can be customized thru the jsonnet files.`

			`To simply deploy the stack, run:`
Updated readme 2018-05-29 19:33:05 +02:00
Update readme 2019-03-13 22:15:37 +01:00			```bash
			`$ make deploy`

			`# Or manually:`

Updated README 2019-02-05 20:41:01 +01:00			`$ kubectl apply -f manifests/`
Initial import 2018-03-02 01:03:53 +01:00
Updated README 2019-02-05 20:41:01 +01:00			`# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.`
			`$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done`
			`$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done`
Initial import 2018-03-02 01:03:53 +01:00
Updated README 2019-02-05 20:41:01 +01:00			`$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).`
			```

Update readme and manifest generation for K3s 2019-08-21 02:46:29 +02:00			If you get an error from applying the manifests, run the `make deploy` or `kubectl apply -f manifests/` again. Sometimes the resources required to apply the CRDs are not deployed yet.

			`## Customizing for K3s`

			To have your [K3s](https://github.com/rancher/k3s) cluster and the monitoring stack on it, deploy K3s with `curl -sfL https://get.k3s.io \| sh -`.

			Now to deploy the monitoring stack on your K3s cluster, there are three parameters to be configured on `vars.jsonnet`:

			1. Set `k3s.enabled` to `true`.
			2. Change your K3s master node IP(your VM or host IP) on `k3s.master_ip`.
			3. Edit `suffixDomain` to have your node IP with the `.nip.io` suffix. This will be your ingress URL suffix.

			After changing these values, run `make` to build the manifests and `k3s kubectl apply -f manifests/` to apply the stack to your cluster. In case of errors on some resources, re-run the command.

Update readme and regenerate manifests for default Kubernetes 2019-08-22 00:13:13 +02:00			`Now you can open the applications:`
Update readme and manifest generation for K3s 2019-08-21 02:46:29 +02:00
			`* Grafana on [https://grafana.[your_node_ip].nip.io](https://grafana.[your_node_ip].nip.io),`
			`* Prometheus on [https://prometheus.[your_node_ip].nip.io](https://prometheus.[your_node_ip].nip.io)`
			`* Alertmanager on [https://alertmanager.[your_node_ip].nip.io](https://alertmanager.[your_node_ip].nip.io)`

Update readme and regenerate manifests for default Kubernetes 2019-08-22 00:13:13 +02:00			`There are some dashboards that shows no values due to some cadvisor metrics not having the complete metadata. Check the open issues for more information.`

			`## Updating the ingress suffixes`

			To avoid rebuilding all manifests, there is a make target to update the Ingress URL suffix to a different suffix (using nip.io) to match your host IP. Run `make change_suffix IP="[IP-ADDRESS]"` to change the ingress route IP for Grafana, Prometheus and Alertmanager and reapply the manifests. If you have a K3s cluster, run `make change_suffix IP="[IP-ADDRESS] K3S=k3s`.

Updated README 2019-02-05 20:41:01 +01:00			`## Customizing`

			`The content of this project consists of a set of jsonnet files making up a library to be consumed.`

			`### Pre-reqs`

Update readme 2019-03-13 22:15:37 +01:00			`The project requires json-bundler and the jsonnet compiler. The Makefile does the heavy-lifting of installing them. You need [Go](https://golang.org/dl/) already installed:`
Updated README 2019-02-05 20:41:01 +01:00
Update readme 2019-03-13 22:15:37 +01:00			```bash
Updated README 2019-02-05 20:41:01 +01:00			`git clone https://github.com/carlosedp/prometheus-operator-ARM`
			`cd prometheus-operator-ARM`
			`make vendor`
			`# Change the jsonnet files...`
			`make`
			```
Update readme 2019-03-13 22:15:37 +01:00
Updated README 2019-02-05 20:41:01 +01:00			After this, a new customized set of manifests is built into the `manifests` dir. To apply to your cluster, run:

Update readme 2019-03-13 22:15:37 +01:00			```bash
Updated README 2019-02-05 20:41:01 +01:00			`make deploy`
			```

			`To uninstall, run:`

Update readme 2019-03-13 22:15:37 +01:00			```bash
Updated README 2019-02-05 20:41:01 +01:00			`make teardown`
Updated readme 2018-05-29 19:33:05 +02:00			```

			`## Images`

Readme fixes 2019-02-22 16:52:20 +01:00			`This project depends on the following images (all supports ARM, ARM64 and AMD64 thru manifests):`
Updated readme 2018-05-29 19:33:05 +02:00
			`Alertmanager`
			`Blackbox_exporter`
			`Node_exporter`
			`Snmp_exporter`
			`Prometheus`

			`* Source: https://github.com/carlosedp/prometheus-ARM`
			`* Autobuild: https://travis-ci.org/carlosedp/prometheus-ARM`
			`* Images:`
			`* https://hub.docker.com/r/carlosedp/prometheus/`
			`* https://hub.docker.com/r/carlosedp/alertmanager/`
			`* https://hub.docker.com/r/carlosedp/blackbox_exporter/`
			`* https://hub.docker.com/r/carlosedp/node_exporter/`
			`* https://hub.docker.com/r/carlosedp/snmp_exporter/`

			`ARM_exporter`

			`* Source: https://github.com/carlosedp/docker-arm_exporter`
			`* Autobuild: https://travis-ci.org/carlosedp/docker-arm_exporter`
			`* Images: https://hub.docker.com/r/carlosedp/arm_exporter/`

			`Prometheus-operator`

			`* Source: https://github.com/carlosedp/prometheus-operator`
Update image build script 2018-06-11 17:31:49 +02:00			* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
Updated readme 2018-05-29 19:33:05 +02:00			`* Images: https://hub.docker.com/r/carlosedp/prometheus-operator`

Updated README 2019-02-05 20:41:01 +01:00			`Prometheus-adapter`

			`* Source: https://github.com/DirectXMan12/k8s-prometheus-adapter`
			* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
			`* Images: https://hub.docker.com/r/carlosedp/k8s-prometheus-adapter`

Updated readme 2018-05-29 19:33:05 +02:00			`Grafana`

			`* Source: https://github.com/carlosedp/grafana-ARM`
			`* Autobuild: https://travis-ci.org/carlosedp/grafana-ARM`
Updated README 2019-02-05 20:41:01 +01:00			`* Images: https://hub.docker.com/r/grafana/grafana/`
Updated readme 2018-05-29 19:33:05 +02:00
			`Kube-state-metrics`

			`* Source: https://github.com/kubernetes/kube-state-metrics`
Update image build script 2018-06-11 17:31:49 +02:00			* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
Updated readme 2018-05-29 19:33:05 +02:00			`* Images: https://hub.docker.com/r/carlosedp/kube-state-metrics`

			`Addon-resizer`

Updated README 2019-02-05 20:41:01 +01:00			`* Source: https://github.com/kubernetes/autoscaler/tree/master/addon-resizer`
Update image build script 2018-06-11 17:31:49 +02:00			* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
Updated readme 2018-05-29 19:33:05 +02:00			`* Images: https://hub.docker.com/r/carlosedp/addon-resizer`

Readme fixes 2019-02-22 16:52:20 +01:00			Obs. This image is a clone of [AMD64](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-amd64), [ARM64](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-arm64) and [ARM](https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/addon-resizer-arm64) with a manifest. It's cloned and generated by the `build_images.sh` script
Updated readme 2018-05-29 19:33:05 +02:00
			`configmap_reload`

			`* Source: https://github.com/carlosedp/configmap-reload`
			`* Autobuild: https://travis-ci.org/carlosedp/configmap-reload`
			`* Images: https://hub.docker.com/r/carlosedp/configmap-reload`
Initial import 2018-03-02 01:03:53 +01:00
Update image build script 2018-06-11 17:31:49 +02:00			`prometheus-config-reloader`

			`* Source: https://github.com/coreos/prometheus-operator/tree/master/contrib/prometheus-config-reloader`
			* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
			`* Images: https://hub.docker.com/r/carlosedp/prometheus-config-reloader`

Updated readme 2018-05-29 19:33:05 +02:00			`SMTP-server`
Initial import 2018-03-02 01:03:53 +01:00
Readme fixes 2019-02-22 16:52:20 +01:00			`* Source: https://github.com/carlosedp/docker-smtp`
			`* Autobuild: https://travis-ci.org/carlosedp/docker-smtp`
			`* Images: https://hub.docker.com/r/carlosedp/docker-smtp`
Updated README 2019-02-05 20:41:01 +01:00
			`Kube-rbac-proxy`

Readme fixes 2019-02-22 16:52:20 +01:00			`* Source: https://github.com/brancz/kube-rbac-proxy`
			* Autobuild: No autobuild yet. Use provided `build_images.sh` script.
			`* Images: https://hub.docker.com/r/carlosedp/kube-rbac-proxy`