infra - The PyDis DevOps boiler house, the engine room, where the magic happens.

	Commit message (Collapse)	Author	Age	Lines
*	Exclude 401 from NGINX 4xx alerts	Joe Banks	2025-07-18	-1/+1
\|
*	Bump dns cache miss alert time	Johannes Christ	2025-05-16	-2/+3
\|
*	Update ingresses with NGINX ingress upgrade	Joe Banks	2025-04-05	-3/+3
\|
*	Update quay.io/prometheus/node-exporter Docker tag to v1.9.0	renovate[bot]	2025-02-23	-1/+1
\| \| \| \| \| \|	\| datasource \| package \| from \| to \| \| ---------- \| -------------------------------- \| ------ \| ------ \| \| docker \| quay.io/prometheus/node-exporter \| v1.8.2 \| v1.9.0 \|
*	Update registry.k8s.io/kube-state-metrics/kube-state-metrics Docker tag to ↵	renovate[bot]	2025-02-06	-1/+1
\| \| \| \| \| \| \| \|	v2.15.0 \| datasource \| package \| from \| to \| \| ---------- \| ----------------------------------------------------- \| ------- \| ------- \| \| docker \| registry.k8s.io/kube-state-metrics/kube-state-metrics \| v2.13.0 \| v2.15.0 \|
*	Remove prestashop from alert.d nginx exception for p99 timing alert	Chris Lovering	2024-10-01	-1/+1
\|
*	Disable GitHub authentication in Grafana	Joe Banks	2024-09-19	-11/+0
\|
*	Further optimisations to AlertManager initscript	Joe Banks	2024-09-19	-1/+1
\|
*	AlertManager init use Alpine	Joe Banks	2024-09-04	-2/+2
\| \| \| \|	Smaller image and faster startup
*	Raise time threshold for 4xx alerts	Johannes Christ	2024-09-01	-1/+1
\| \| \| \| \| \|	At present we get plenty of unactionable, flapping alarms. So far, they have shown us nothing of value. Raise the time consecutive errors need to be seen before we alert.
*	Show status code in nginx alerts	Johannes Christ	2024-08-29	-4/+4
\|
*	Install prometheus-postfix-exporter	Johannes Christ	2024-08-26	-0/+22
\| \| \| \| \| \| \| \|	As a data-obsessed administrator I want to have more data such that I can widen my sense of power. This also installs rsyslog, because prometheus-postfix-exporter doesn't work with journald's binary log format.
*	Improve alertmanager text e-mail format	Johannes Christ	2024-08-26	-1/+12
\| \| \| \|	Already deployed.
*	Update alertmanager config with mail port	Joe Banks	2024-08-25	-1/+1
\|
*	Configure alertmanager to send e-mails	Johannes Christ	2024-08-25	-0/+11
\|
*	Unify alertmanager naming	Johannes Christ	2024-08-25	-31/+31
\| \| \| \|	Closes #451.
*	Mount new config for LDAP to Grafana and add IPA CA cert	Joe Banks	2024-07-26	-2/+14
\|
*	Add LDAP bind user password for Grafana	Joe Banks	2024-07-26	-0/+0
\|
*	Add new Grafana LDAP config and ldap.toml config	Joe Banks	2024-07-26	-0/+65
\|
*	chore(deps): update registry.k8s.io/kube-state-metrics/kube-state-metrics ↵	renovate[bot]	2024-07-24	-1/+1
\| \| \| \| \| \| \| \| \|	docker tag to v2.13.0 (#412) \| datasource \| package \| from \| to \| \| ---------- \| ----------------------------------------------------- \| ------- \| ------- \| \| docker \| registry.k8s.io/kube-state-metrics/kube-state-metrics \| v2.12.0 \| v2.13.0 \| Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
*	Update node_exporter daemonset to 1.27+ featureset	Joe Banks	2024-07-18	-3/+3
\|
*	chore(deps): update quay.io/prometheus/node-exporter docker tag to v1.8.2	renovate[bot]	2024-07-18	-1/+1
\| \| \| \| \| \|	\| datasource \| package \| from \| to \| \| ---------- \| -------------------------------- \| ------ \| ------ \| \| docker \| quay.io/prometheus/node-exporter \| v1.2.0 \| v1.8.2 \|
*	Add Admins to Grafana authorized Team IDs	Joe Banks	2024-07-14	-1/+1
\|
*	Allow new kube-state-metrics image to watch ingresses	Joe Banks	2024-07-01	-0/+1
\|
*	Move away from vendored kube-state-metrics	Joe Banks	2024-07-01	-1/+1
\|
*	Scale AM back to 3 replicas	Chris Lovering	2024-06-24	-1/+1
\|
*	Add Kubernetes volume alerts	Joe Banks	2024-06-16	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	It seems that Linode has added storage reporting info to the CSI driver allowing us to pick up on the storage use of persistent volume claims within the cluster. This creates and deploys an alert that will report if any volume has under 10% of space left. I have excluded Prometheus as our TSDB retention settings mean that it will always stay just below it's volume size by design.
*	Update Prometheus deployment with a tmpfs for the reloader	Joe Banks	2024-06-10	-0/+9
\|
*	Add secrets for reloader webhook	Joe Banks	2024-06-10	-0/+0
\|
*	Add sidecar container to reload Prometheus config on change	Joe Banks	2024-06-10	-0/+25
\|
*	Add reloader hook configmap to reload prometheus on change	Joe Banks	2024-06-10	-0/+38
\|
*	Add Alert for Prometheus config reload failure	Joe Banks	2024-06-10	-0/+9
\|
*	Enable scraping of Prometheus pods	Joe Banks	2024-06-10	-0/+3
\|
*	Remove PostgreSQL Exporter from Kubernetes	Joe Banks	2024-06-02	-55/+0
\|
*	Remove Kubernetes PostgreSQL Alerts	Joe Banks	2024-06-02	-29/+0
\|
*	Fix AlertManager Discord instance formatting	Joe Banks	2024-05-27	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We made a change to include the instance in alerts sent to Discord, but not all of our configured alerts send this field. As a result, we would have incorrectly formatted alerts being sent through to Discord which were tricky to read. The format template has now been changed to only conditionally render the instance label if it is present on a triggered alert.
*	Take 15 minutes before alerting on high latency	Johannes Christ	2024-05-20	-2/+2
\|
*	Annotations.instance => Labels.instance	Joe Banks	2024-05-18	-1/+1
\|
*	Add instance to AlertManager Discord embeds	Joe Banks	2024-05-17	-1/+1
\|
*	Move AlertManager to 4 replicas	Joe Banks	2024-05-16	-1/+1
\|
*	Move AlertManager to pydis.wtf	Joe Banks	2024-05-14	-4/+5
\|
*	Move prometheus to pydis.wtf	Joe Banks	2024-05-14	-3/+4
\|
*	Update Grafana configmap to grafana.pydis.wtf	Joe Banks	2024-05-14	-2/+2
\|
*	Update Grafana ingress to grafana.pydis.wtf	Joe Banks	2024-05-14	-3/+3
\|
*	Stop alerting for slow GitHub webhook filter endpoint calls (#235)	jchristgit	2024-04-29	-2/+2
\| \| \| \| \|	These are directly forwarded to GitHub with no time-consuming processing done on the site. We would therefore be alerting for GitHub's slowness, which is rather useless.
*	Update all secrets to new PostgreSQL service	Joe Banks	2024-04-27	-0/+0
\|
*	Exclude home and tag views from latency alerts	Johannes Christ	2024-04-24	-2/+2
\| \| \| \| \|	These are known issues and we probably won't do anything about them, so stop alerting us about it.
*	Update ContainerOOMEvent alert	Joe Banks	2024-04-17	-4/+4
\|
*	Move Redis to databases namespace	Joe Banks	2024-04-15	-0/+0
\|
*	Move Grafana to monitoring namespace	Joe Banks	2024-04-15	-0/+151
\|