infra - The PyDis DevOps boiler house, the engine room, where the magic happens.

	Commit message (Collapse)	Author	Age	Lines
...
*	Update Sir Robin to CJ11 (#399)	Boris Muratov	2024-07-03	-1/+1
\|
*	Move noqa definition required in latest ruff version	Chris Lovering	2024-07-01	-2/+2
\|
*	Allow new kube-state-metrics image to watch ingresses	Joe Banks	2024-07-01	-0/+1
\|
*	Move away from vendored kube-state-metrics	Joe Banks	2024-07-01	-1/+1
\|
*	Add issuer for Vault certificates in tooling namespace	Joe Banks	2024-06-27	-0/+5
\| \| \| \| \|	We will use this to deploy internal TLS certificates from a self-signed CA that allows for TLS traffic within the cluster.
*	Add deployment of Keycloak	Joe Banks	2024-06-27	-0/+122
\|
*	Scale AM back to 3 replicas	Chris Lovering	2024-06-24	-1/+1
\|
*	Add ff-bot deployment	Joe Banks	2024-06-16	-0/+82
\|
*	Add Kubernetes volume alerts	Joe Banks	2024-06-16	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	It seems that Linode has added storage reporting info to the CSI driver allowing us to pick up on the storage use of persistent volume claims within the cluster. This creates and deploys an alert that will report if any volume has under 10% of space left. I have excluded Prometheus as our TSDB retention settings mean that it will always stay just below it's volume size by design.
*	Update Loki config with new compactor preferences for retention modes	Joe Banks	2024-06-13	-1/+6
\| \| \| \| \| \| \| \| \|	* `retention_enabled`: enable retention mode within the compactor * `delete_request_store`: store deletion requests within the s3 cluster that is also used to house log chunks * `delete_request_cancel_period`: do not exercise log deletion instructions until at least one hour has passed to prevent accidental deletion
*	Update Prometheus deployment with a tmpfs for the reloader	Joe Banks	2024-06-10	-0/+9
\|
*	Add secrets for reloader webhook	Joe Banks	2024-06-10	-0/+0
\|
*	Add sidecar container to reload Prometheus config on change	Joe Banks	2024-06-10	-0/+25
\|
*	Add reloader hook configmap to reload prometheus on change	Joe Banks	2024-06-10	-0/+38
\|
*	Add Alert for Prometheus config reload failure	Joe Banks	2024-06-10	-0/+9
\|
*	Enable scraping of Prometheus pods	Joe Banks	2024-06-10	-0/+3
\|
*	Update Pinnwand logo to square image	Joe Banks	2024-06-09	-1/+1
\|
*	Update from command to args in site deployment	Joe Banks	2024-06-07	-1/+1
\| \| \| \| \| \| \| \|	Kubernetes renames ENTRYPOINT in Docker images to command and any additional args go in `args` (confusing, I know!) This ensures that we run within the context of Poetry so can reach Django and other installed requirements when running migrations.
*	Remove unnecessary shell execution for migration initContainer	Joe Banks	2024-06-07	-3/+3
\|
*	Update site to run migrations in an init container	Joe Banks	2024-06-07	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In accordance with updates from python-discord/site#1338 this changes the way migrations are run. Previously, migrations would be run all from within the manage.py execution process with the command being manually spawned using Django internals. After python-discord/site#1338 merges the Dockerfile will directly invoke gunicorn and bypass manage.py to simplify the process and avoid problems with shared database contexts. Hence, we need to manually run migrations using an init container. With testing there is no additional delay in doing this as spinning up an init container is cheap and we don't cut over any traffic until the site passes a healthcheck anyway.
*	Rename relabelledpods to just pods	Joe Banks	2024-06-07	-1/+1
\| \| \| \| \| \| \|	This was a redundant rename and reduced the clarity of jobs when querying from inside Grafana. This rectifies that by renaming the stream to just `pods`.
*	Reflect pydis.wtf certificate into Loki namespace	Joe Banks	2024-06-07	-2/+2
\|
*	Add secret for Loki authentication	Joe Banks	2024-06-07	-0/+0
\|
*	Add new Ingress for Loki gateway	Joe Banks	2024-06-07	-0/+25
\|
*	Add Metricity manifest	Joe Banks	2024-06-06	-0/+30
\| \| \| \|	Copies the Metricity deployment manifest from the Metricity repo.
*	Add tmpfs to King Arthur	Joe Banks	2024-06-05	-0/+9
\|
*	Remove PostgreSQL Exporter from Kubernetes	Joe Banks	2024-06-02	-55/+0
\|
*	Remove Kubernetes PostgreSQL Alerts	Joe Banks	2024-06-02	-29/+0
\|
*	Remove Kubernetes PostgreSQL backup from Blackbox	Joe Banks	2024-06-02	-6/+1
\|
*	Remove PostgreSQL deployment from Kubernetes	Joe Banks	2024-06-02	-127/+0
\|
*	Update pixels environment variable	Joe Banks	2024-06-02	-0/+0
\|
*	Update Metabase configuration secret	Joe Banks	2024-06-02	-0/+0
\|
*	Update site secret with new database address	Joe Banks	2024-06-01	-0/+0
\|
*	Update site and metricity with new metricity db user credentials	Joe Banks	2024-05-28	-0/+0
\|
*	Update kube-system namespace docs with new metrics-server details	Joe Banks	2024-05-28	-4/+5
\|
*	Add Helm deployment info for metrics-server	Joe Banks	2024-05-28	-0/+24
\| \| \| \| \| \| \|	Due to the way Linode seems to issue certificates for our nodes, we need to disable TLS verification for communications to fetch metric information. It's unfortunate but non-critical and it does restore metrics-server functionality.
*	Add documentation on services deployed to the kube-system namespace	Joe Banks	2024-05-28	-0/+33
\|
*	Add new ServiceAccount for cert issuance	Joe Banks	2024-05-27	-0/+5
\|
*	Update mTLS bundle for ingress-nginx	Joe Banks	2024-05-27	-36/+46
\|
*	Add Helm instructions for Vault	Joe Banks	2024-05-27	-0/+54
\|
*	Add pydis.wtf cert to vault namespace	Joe Banks	2024-05-27	-2/+2
\|
*	Fix AlertManager Discord instance formatting	Joe Banks	2024-05-27	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We made a change to include the instance in alerts sent to Discord, but not all of our configured alerts send this field. As a result, we would have incorrectly formatted alerts being sent through to Discord which were tricky to read. The format template has now been changed to only conditionally render the instance label if it is present on a triggered alert.
*	Take 15 minutes before alerting on high latency	Johannes Christ	2024-05-20	-2/+2
\|
*	Instruct code jam management to connect to lovelace	Johannes Christ	2024-05-18	-0/+0
\|
*	Instruct black knight to connect to lovelace	Johannes Christ	2024-05-18	-0/+0
\|
*	Annotations.instance => Labels.instance	Joe Banks	2024-05-18	-1/+1
\|
*	Add instance to AlertManager Discord embeds	Joe Banks	2024-05-17	-1/+1
\|
*	Update Bitwarden Kubernetes secret with new database location	Joe Banks	2024-05-17	-0/+0
\|
*	Bump limits and requests for bots that have been OOMing recently	Chris Lovering	2024-05-16	-3/+3
\|
*	Move AlertManager to 4 replicas	Joe Banks	2024-05-16	-1/+1
\|