aboutsummaryrefslogtreecommitdiffstats
path: root/kubernetes/namespaces/monitoring (follow)
Commit message (Collapse)AuthorAgeLines
* Exclude 401 from NGINX 4xx alertsGravatar Joe Banks2025-07-18-1/+1
|
* Bump dns cache miss alert timeGravatar Johannes Christ2025-05-16-2/+3
|
* Update ingresses with NGINX ingress upgradeGravatar Joe Banks2025-04-05-3/+3
|
* Update quay.io/prometheus/node-exporter Docker tag to v1.9.0Gravatar renovate[bot]2025-02-23-1/+1
| | | | | | | datasource | package | from | to | | ---------- | -------------------------------- | ------ | ------ | | docker | quay.io/prometheus/node-exporter | v1.8.2 | v1.9.0 |
* Update registry.k8s.io/kube-state-metrics/kube-state-metrics Docker tag to ↵Gravatar renovate[bot]2025-02-06-1/+1
| | | | | | | | v2.15.0 | datasource | package | from | to | | ---------- | ----------------------------------------------------- | ------- | ------- | | docker | registry.k8s.io/kube-state-metrics/kube-state-metrics | v2.13.0 | v2.15.0 |
* Remove prestashop from alert.d nginx exception for p99 timing alertGravatar Chris Lovering2024-10-01-1/+1
|
* Disable GitHub authentication in GrafanaGravatar Joe Banks2024-09-19-11/+0
|
* Further optimisations to AlertManager initscriptGravatar Joe Banks2024-09-19-1/+1
|
* AlertManager init use AlpineGravatar Joe Banks2024-09-04-2/+2
| | | | Smaller image and faster startup
* Raise time threshold for 4xx alertsGravatar Johannes Christ2024-09-01-1/+1
| | | | | | At present we get plenty of unactionable, flapping alarms. So far, they have shown us nothing of value. Raise the time consecutive errors need to be seen before we alert.
* Show status code in nginx alertsGravatar Johannes Christ2024-08-29-4/+4
|
* Install prometheus-postfix-exporterGravatar Johannes Christ2024-08-26-0/+22
| | | | | | | | As a data-obsessed administrator I want to have more data such that I can widen my sense of power. This also installs rsyslog, because prometheus-postfix-exporter doesn't work with journald's binary log format.
* Improve alertmanager text e-mail formatGravatar Johannes Christ2024-08-26-1/+12
| | | | Already deployed.
* Update alertmanager config with mail portGravatar Joe Banks2024-08-25-1/+1
|
* Configure alertmanager to send e-mailsGravatar Johannes Christ2024-08-25-0/+11
|
* Unify alertmanager namingGravatar Johannes Christ2024-08-25-31/+31
| | | | Closes #451.
* Mount new config for LDAP to Grafana and add IPA CA certGravatar Joe Banks2024-07-26-2/+14
|
* Add LDAP bind user password for GrafanaGravatar Joe Banks2024-07-26-0/+0
|
* Add new Grafana LDAP config and ldap.toml configGravatar Joe Banks2024-07-26-0/+65
|
* chore(deps): update registry.k8s.io/kube-state-metrics/kube-state-metrics ↵Gravatar renovate[bot]2024-07-24-1/+1
| | | | | | | | | docker tag to v2.13.0 (#412) | datasource | package | from | to | | ---------- | ----------------------------------------------------- | ------- | ------- | | docker | registry.k8s.io/kube-state-metrics/kube-state-metrics | v2.12.0 | v2.13.0 | Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Update node_exporter daemonset to 1.27+ featuresetGravatar Joe Banks2024-07-18-3/+3
|
* chore(deps): update quay.io/prometheus/node-exporter docker tag to v1.8.2Gravatar renovate[bot]2024-07-18-1/+1
| | | | | | | datasource | package | from | to | | ---------- | -------------------------------- | ------ | ------ | | docker | quay.io/prometheus/node-exporter | v1.2.0 | v1.8.2 |
* Add Admins to Grafana authorized Team IDsGravatar Joe Banks2024-07-14-1/+1
|
* Allow new kube-state-metrics image to watch ingressesGravatar Joe Banks2024-07-01-0/+1
|
* Move away from vendored kube-state-metricsGravatar Joe Banks2024-07-01-1/+1
|
* Scale AM back to 3 replicasGravatar Chris Lovering2024-06-24-1/+1
|
* Add Kubernetes volume alertsGravatar Joe Banks2024-06-16-0/+11
| | | | | | | | | | | It seems that Linode has added storage reporting info to the CSI driver allowing us to pick up on the storage use of persistent volume claims within the cluster. This creates and deploys an alert that will report if any volume has under 10% of space left. I have excluded Prometheus as our TSDB retention settings mean that it will always stay just below it's volume size by design.
* Update Prometheus deployment with a tmpfs for the reloaderGravatar Joe Banks2024-06-10-0/+9
|
* Add secrets for reloader webhookGravatar Joe Banks2024-06-10-0/+0
|
* Add sidecar container to reload Prometheus config on changeGravatar Joe Banks2024-06-10-0/+25
|
* Add reloader hook configmap to reload prometheus on changeGravatar Joe Banks2024-06-10-0/+38
|
* Add Alert for Prometheus config reload failureGravatar Joe Banks2024-06-10-0/+9
|
* Enable scraping of Prometheus podsGravatar Joe Banks2024-06-10-0/+3
|
* Remove PostgreSQL Exporter from KubernetesGravatar Joe Banks2024-06-02-55/+0
|
* Remove Kubernetes PostgreSQL AlertsGravatar Joe Banks2024-06-02-29/+0
|
* Fix AlertManager Discord instance formattingGravatar Joe Banks2024-05-27-1/+1
| | | | | | | | | | | We made a change to include the instance in alerts sent to Discord, but not all of our configured alerts send this field. As a result, we would have incorrectly formatted alerts being sent through to Discord which were tricky to read. The format template has now been changed to only conditionally render the instance label if it is present on a triggered alert.
* Take 15 minutes before alerting on high latencyGravatar Johannes Christ2024-05-20-2/+2
|
* Annotations.instance => Labels.instanceGravatar Joe Banks2024-05-18-1/+1
|
* Add instance to AlertManager Discord embedsGravatar Joe Banks2024-05-17-1/+1
|
* Move AlertManager to 4 replicasGravatar Joe Banks2024-05-16-1/+1
|
* Move AlertManager to pydis.wtfGravatar Joe Banks2024-05-14-4/+5
|
* Move prometheus to pydis.wtfGravatar Joe Banks2024-05-14-3/+4
|
* Update Grafana configmap to grafana.pydis.wtfGravatar Joe Banks2024-05-14-2/+2
|
* Update Grafana ingress to grafana.pydis.wtfGravatar Joe Banks2024-05-14-3/+3
|
* Stop alerting for slow GitHub webhook filter endpoint calls (#235)Gravatar jchristgit2024-04-29-2/+2
| | | | | These are directly forwarded to GitHub with no time-consuming processing done on the site. We would therefore be alerting for GitHub's slowness, which is rather useless.
* Update all secrets to new PostgreSQL serviceGravatar Joe Banks2024-04-27-0/+0
|
* Exclude home and tag views from latency alertsGravatar Johannes Christ2024-04-24-2/+2
| | | | | These are known issues and we probably won't do anything about them, so stop alerting us about it.
* Update ContainerOOMEvent alertGravatar Joe Banks2024-04-17-4/+4
|
* Move Redis to databases namespaceGravatar Joe Banks2024-04-15-0/+0
|
* Move Grafana to monitoring namespaceGravatar Joe Banks2024-04-15-0/+151
|