diff options
author | 2024-07-24 20:09:42 +0200 | |
---|---|---|
committer | 2024-07-25 20:06:54 +0200 | |
commit | a4d7e92d544aeb43dbe1fcd8648d97e0dbf7b9d3 (patch) | |
tree | 183318852234388654c99514e45f095af8c21676 /docs/onboarding | |
parent | Add link to DevOps Kanban board in meeting template (#420) (diff) |
Improve documentation
This commit ports our documentation to Sphinx.
The reason for this is straightforward. We need to improve both the
quality and the accessibility of our documentation. Hugo is not capable
of doing this, as its primary output format is HTML. Sphinx builds
plenty of high-quality output formats out of the box, and incentivizes
writing good documentation.
Diffstat (limited to 'docs/onboarding')
-rw-r--r-- | docs/onboarding/access.rst | 50 | ||||
-rw-r--r-- | docs/onboarding/index.rst | 17 | ||||
-rw-r--r-- | docs/onboarding/resources.rst | 28 | ||||
-rw-r--r-- | docs/onboarding/rules.rst | 16 | ||||
-rw-r--r-- | docs/onboarding/tools.rst | 50 |
5 files changed, 161 insertions, 0 deletions
diff --git a/docs/onboarding/access.rst b/docs/onboarding/access.rst new file mode 100644 index 0000000..940cd8b --- /dev/null +++ b/docs/onboarding/access.rst @@ -0,0 +1,50 @@ +Access table +============ + ++--------------------+-------------------------+-----------------------+ +| **Resource** | **Description** | **Keyholders** | ++====================+=========================+=======================+ +| Linode Kubernetes | The primary cluster | Hassan, Joe, Chris, | +| Cluster | where all resources are | Leon, Sebastiaan, | +| | deployed. | Johannes | ++--------------------+-------------------------+-----------------------+ +| Linode Dashboard | The online dashboard | Joe, Chris | +| | for managing and | | +| | allocating resources | | +| | from Linode. | | ++--------------------+-------------------------+-----------------------+ +| Netcup Dashboard | The dashboard for | Joe, Chris | +| | managing and allocating | | +| | resources from Netcup. | | ++--------------------+-------------------------+-----------------------+ +| Netcup servers | Root servers provided | Joe, Chris, Bella, | +| | by the Netcup | Johannes | +| | partnership. | | ++--------------------+-------------------------+-----------------------+ +| Grafana | The primary aggregation | Admins, Moderators, | +| | dashboard for most | Core Developers and | +| | resources. | DevOps (with varying | +| | | permissions) | ++--------------------+-------------------------+-----------------------+ +| Prometheus | The Prometheus query | Hassan, Joe, | +| Dashboard | dashboard. Access is | Johannes, Chris | +| | controlled via | | +| | Cloudflare Access. | | ++--------------------+-------------------------+-----------------------+ +| Alertmanager | The alertmanager | Hassan, Joe, | +| Dashboard | control dashboard. | Johannes, Chris | +| | Access is controlled | | +| | via Cloudflare Access. | | ++--------------------+-------------------------+-----------------------+ +| ``git-crypt``\ ed | ``git-crypt`` is used | Chris, Joe, Hassan, | +| files in infra | to encrypt certain | Johannes, Xithrius | +| repository | files within the | | +| | repository. At the time | | +| | of writing this is | | +| | limited to kubernetes | | +| | secret files. | | ++--------------------+-------------------------+-----------------------+ +| Ansible Vault | Used to store sensitive | Chris, Joe, Johannes, | +| | data for the Ansible | Bella | +| | deployment | | ++--------------------+-------------------------+-----------------------+ diff --git a/docs/onboarding/index.rst b/docs/onboarding/index.rst new file mode 100644 index 0000000..3929d7e --- /dev/null +++ b/docs/onboarding/index.rst @@ -0,0 +1,17 @@ +Onboarding +========== + +This section documents who manages which access to our DevOps resources, +and how access is managed. + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + access + resources + rules + tools + +.. vim: set textwidth=80 sw=2 ts=2: --> diff --git a/docs/onboarding/resources.rst b/docs/onboarding/resources.rst new file mode 100644 index 0000000..0ec846b --- /dev/null +++ b/docs/onboarding/resources.rst @@ -0,0 +1,28 @@ +Resources +========= + +The following is a collection of important reference documents for the +DevOps team. + +`Infra Repo <https://github.com/python-discord/infra>`__ +-------------------------------------------------------- + +This GitHub repo contains most of the manifests and configuration +applies to our cluster. It’s kept up to date manually and is considered +a source of truth for what we should have in the cluster. + +It is mostly documented, but improvements for unclear or outdated +aspects is always welcome. + +`Knowledge base <https://python-discord.github.io/infra/>`__ +------------------------------------------------------------ + +Deployed using GH pages, source can be found in the docs directory of +the k8s repo. + +This includes: + +- Changelogs +- Post-mortems +- Common queries +- Runbooks diff --git a/docs/onboarding/rules.rst b/docs/onboarding/rules.rst new file mode 100644 index 0000000..bd0ea0e --- /dev/null +++ b/docs/onboarding/rules.rst @@ -0,0 +1,16 @@ +Rules +===== + +The rules any DevOps team member must follow. + +1. LMAO - **L**\ ogging, **M**\ onitoring, **A**\ lerting, + **O**\ bservability +2. Modmail is the greatest piece of software ever written +3. Modmail needs at least 5 minutes to gather all its greatness at + startup +4. We never blame Chris, it’s always <@233481908342882304>’s fault +5. LKE isn’t bad, it’s your fault for not paying for the high + availability control plane +6. Our software is never legacy, it’s merely well-aged +7. Ignore these rules (however maybe not 1, 1 seems important to + remember) diff --git a/docs/onboarding/tools.rst b/docs/onboarding/tools.rst new file mode 100644 index 0000000..811f1ad --- /dev/null +++ b/docs/onboarding/tools.rst @@ -0,0 +1,50 @@ +Tools +===== + +We use a few tools to manage, monitor, and interact with our +infrastructure. Some of these tools are not unique to the DevOps team, +and may be shared by other teams. + +Most of these are gated behind a Cloudflare Access system, which is +accessible to the `DevOps +Team <https://github.com/orgs/python-discord/teams/devops>`__ on GitHub. +These are marked with the ☁️ emoji. If you don’t have access, please +contact Chris or Joe. + +`Grafana <https://grafana.pydis.wtf/>`__ +---------------------------------------- + +Grafana provides access to some of the most important resources at your +disposal. It acts as an aggregator and frontend for a large amount of +data. These range from metrics, to logs, to stats. Some of the most +important are listed below: + +- Service Logs/All App Logs Dashboard + + Service logs is a simple log viewer which gives you access to a large + majority of the applications deployed in the default namespace. The + All App logs dashboard is an expanded version of that which gives you + access to all apps in all namespaces, and allows some more in-depth + querying. + +- Kubernetes Dashboard + + This dashboard gives quick overviews of all the most important + metrics of the Kubernetes system. For more detailed information, + check out other dashboard such as Resource Usage, NGINX, and Redis. + +Accessed via a GitHub login, with permission for anyone in the dev-core +or dev-ops team. + +`Prometheus Dashboard <https://prometheus.pydis.wtf/>`__ (☁️)) +-------------------------------------------------------------- + +This provides access to the Prometheus query console. You may also enjoy +the `Alertmanager Console <https://alertmanager.pydis.wtf/>`__. + +`King Arthur <https://github.com/python-discord/king-arthur/>`__ +---------------------------------------------------------------- + +King Arthur is a discord bot which provides information about, and +access to our cluster directly in discord. Invoke its help command for +more information (``M-x help``). |