Commit message (Collapse) | Author | Age | Lines | ||
---|---|---|---|---|---|
... | |||||
* | Update Sir Robin to CJ11 (#399) | 2024-07-03 | -1/+1 | ||
| | |||||
* | Move noqa definition required in latest ruff version | 2024-07-01 | -2/+2 | ||
| | |||||
* | Allow new kube-state-metrics image to watch ingresses | 2024-07-01 | -0/+1 | ||
| | |||||
* | Move away from vendored kube-state-metrics | 2024-07-01 | -1/+1 | ||
| | |||||
* | Add issuer for Vault certificates in tooling namespace | 2024-06-27 | -0/+5 | ||
| | | | | | We will use this to deploy internal TLS certificates from a self-signed CA that allows for TLS traffic within the cluster. | ||||
* | Add deployment of Keycloak | 2024-06-27 | -0/+122 | ||
| | |||||
* | Scale AM back to 3 replicas | 2024-06-24 | -1/+1 | ||
| | |||||
* | Add ff-bot deployment | 2024-06-16 | -0/+82 | ||
| | |||||
* | Add Kubernetes volume alerts | 2024-06-16 | -0/+11 | ||
| | | | | | | | | | | | It seems that Linode has added storage reporting info to the CSI driver allowing us to pick up on the storage use of persistent volume claims within the cluster. This creates and deploys an alert that will report if any volume has under 10% of space left. I have excluded Prometheus as our TSDB retention settings mean that it will always stay just below it's volume size by design. | ||||
* | Update Loki config with new compactor preferences for retention modes | 2024-06-13 | -1/+6 | ||
| | | | | | | | | | * `retention_enabled`: enable retention mode within the compactor * `delete_request_store`: store deletion requests within the s3 cluster that is also used to house log chunks * `delete_request_cancel_period`: do not exercise log deletion instructions until at least one hour has passed to prevent accidental deletion | ||||
* | Update Prometheus deployment with a tmpfs for the reloader | 2024-06-10 | -0/+9 | ||
| | |||||
* | Add secrets for reloader webhook | 2024-06-10 | -0/+0 | ||
| | |||||
* | Add sidecar container to reload Prometheus config on change | 2024-06-10 | -0/+25 | ||
| | |||||
* | Add reloader hook configmap to reload prometheus on change | 2024-06-10 | -0/+38 | ||
| | |||||
* | Add Alert for Prometheus config reload failure | 2024-06-10 | -0/+9 | ||
| | |||||
* | Enable scraping of Prometheus pods | 2024-06-10 | -0/+3 | ||
| | |||||
* | Update Pinnwand logo to square image | 2024-06-09 | -1/+1 | ||
| | |||||
* | Update from command to args in site deployment | 2024-06-07 | -1/+1 | ||
| | | | | | | | | Kubernetes renames ENTRYPOINT in Docker images to command and any additional args go in `args` (confusing, I know!) This ensures that we run within the context of Poetry so can reach Django and other installed requirements when running migrations. | ||||
* | Remove unnecessary shell execution for migration initContainer | 2024-06-07 | -3/+3 | ||
| | |||||
* | Update site to run migrations in an init container | 2024-06-07 | -0/+13 | ||
| | | | | | | | | | | | | | | | | | | In accordance with updates from python-discord/site#1338 this changes the way migrations are run. Previously, migrations would be run all from within the manage.py execution process with the command being manually spawned using Django internals. After python-discord/site#1338 merges the Dockerfile will directly invoke gunicorn and bypass manage.py to simplify the process and avoid problems with shared database contexts. Hence, we need to manually run migrations using an init container. With testing there is no additional delay in doing this as spinning up an init container is cheap and we don't cut over any traffic until the site passes a healthcheck anyway. | ||||
* | Rename relabelledpods to just pods | 2024-06-07 | -1/+1 | ||
| | | | | | | | This was a redundant rename and reduced the clarity of jobs when querying from inside Grafana. This rectifies that by renaming the stream to just `pods`. | ||||
* | Reflect pydis.wtf certificate into Loki namespace | 2024-06-07 | -2/+2 | ||
| | |||||
* | Add secret for Loki authentication | 2024-06-07 | -0/+0 | ||
| | |||||
* | Add new Ingress for Loki gateway | 2024-06-07 | -0/+25 | ||
| | |||||
* | Add Metricity manifest | 2024-06-06 | -0/+30 | ||
| | | | | Copies the Metricity deployment manifest from the Metricity repo. | ||||
* | Add tmpfs to King Arthur | 2024-06-05 | -0/+9 | ||
| | |||||
* | Remove PostgreSQL Exporter from Kubernetes | 2024-06-02 | -55/+0 | ||
| | |||||
* | Remove Kubernetes PostgreSQL Alerts | 2024-06-02 | -29/+0 | ||
| | |||||
* | Remove Kubernetes PostgreSQL backup from Blackbox | 2024-06-02 | -6/+1 | ||
| | |||||
* | Remove PostgreSQL deployment from Kubernetes | 2024-06-02 | -127/+0 | ||
| | |||||
* | Update pixels environment variable | 2024-06-02 | -0/+0 | ||
| | |||||
* | Update Metabase configuration secret | 2024-06-02 | -0/+0 | ||
| | |||||
* | Update site secret with new database address | 2024-06-01 | -0/+0 | ||
| | |||||
* | Update site and metricity with new metricity db user credentials | 2024-05-28 | -0/+0 | ||
| | |||||
* | Update kube-system namespace docs with new metrics-server details | 2024-05-28 | -4/+5 | ||
| | |||||
* | Add Helm deployment info for metrics-server | 2024-05-28 | -0/+24 | ||
| | | | | | | | Due to the way Linode seems to issue certificates for our nodes, we need to disable TLS verification for communications to fetch metric information. It's unfortunate but non-critical and it does restore metrics-server functionality. | ||||
* | Add documentation on services deployed to the kube-system namespace | 2024-05-28 | -0/+33 | ||
| | |||||
* | Add new ServiceAccount for cert issuance | 2024-05-27 | -0/+5 | ||
| | |||||
* | Update mTLS bundle for ingress-nginx | 2024-05-27 | -36/+46 | ||
| | |||||
* | Add Helm instructions for Vault | 2024-05-27 | -0/+54 | ||
| | |||||
* | Add pydis.wtf cert to vault namespace | 2024-05-27 | -2/+2 | ||
| | |||||
* | Fix AlertManager Discord instance formatting | 2024-05-27 | -1/+1 | ||
| | | | | | | | | | | | We made a change to include the instance in alerts sent to Discord, but not all of our configured alerts send this field. As a result, we would have incorrectly formatted alerts being sent through to Discord which were tricky to read. The format template has now been changed to only conditionally render the instance label if it is present on a triggered alert. | ||||
* | Take 15 minutes before alerting on high latency | 2024-05-20 | -2/+2 | ||
| | |||||
* | Instruct code jam management to connect to lovelace | 2024-05-18 | -0/+0 | ||
| | |||||
* | Instruct black knight to connect to lovelace | 2024-05-18 | -0/+0 | ||
| | |||||
* | Annotations.instance => Labels.instance | 2024-05-18 | -1/+1 | ||
| | |||||
* | Add instance to AlertManager Discord embeds | 2024-05-17 | -1/+1 | ||
| | |||||
* | Update Bitwarden Kubernetes secret with new database location | 2024-05-17 | -0/+0 | ||
| | |||||
* | Bump limits and requests for bots that have been OOMing recently | 2024-05-16 | -3/+3 | ||
| | |||||
* | Move AlertManager to 4 replicas | 2024-05-16 | -1/+1 | ||
| |