diff options
Diffstat (limited to 'docs/onboarding')
-rw-r--r-- | docs/onboarding/access.rst | 50 | ||||
-rw-r--r-- | docs/onboarding/index.rst | 17 | ||||
-rw-r--r-- | docs/onboarding/resources.rst | 28 | ||||
-rw-r--r-- | docs/onboarding/rules.rst | 16 | ||||
-rw-r--r-- | docs/onboarding/tools.rst | 50 |
5 files changed, 161 insertions, 0 deletions
diff --git a/docs/onboarding/access.rst b/docs/onboarding/access.rst new file mode 100644 index 0000000..940cd8b --- /dev/null +++ b/docs/onboarding/access.rst @@ -0,0 +1,50 @@ +Access table +============ + ++--------------------+-------------------------+-----------------------+ +| **Resource** | **Description** | **Keyholders** | ++====================+=========================+=======================+ +| Linode Kubernetes | The primary cluster | Hassan, Joe, Chris, | +| Cluster | where all resources are | Leon, Sebastiaan, | +| | deployed. | Johannes | ++--------------------+-------------------------+-----------------------+ +| Linode Dashboard | The online dashboard | Joe, Chris | +| | for managing and | | +| | allocating resources | | +| | from Linode. | | ++--------------------+-------------------------+-----------------------+ +| Netcup Dashboard | The dashboard for | Joe, Chris | +| | managing and allocating | | +| | resources from Netcup. | | ++--------------------+-------------------------+-----------------------+ +| Netcup servers | Root servers provided | Joe, Chris, Bella, | +| | by the Netcup | Johannes | +| | partnership. | | ++--------------------+-------------------------+-----------------------+ +| Grafana | The primary aggregation | Admins, Moderators, | +| | dashboard for most | Core Developers and | +| | resources. | DevOps (with varying | +| | | permissions) | ++--------------------+-------------------------+-----------------------+ +| Prometheus | The Prometheus query | Hassan, Joe, | +| Dashboard | dashboard. Access is | Johannes, Chris | +| | controlled via | | +| | Cloudflare Access. | | ++--------------------+-------------------------+-----------------------+ +| Alertmanager | The alertmanager | Hassan, Joe, | +| Dashboard | control dashboard. | Johannes, Chris | +| | Access is controlled | | +| | via Cloudflare Access. | | ++--------------------+-------------------------+-----------------------+ +| ``git-crypt``\ ed | ``git-crypt`` is used | Chris, Joe, Hassan, | +| files in infra | to encrypt certain | Johannes, Xithrius | +| repository | files within the | | +| | repository. At the time | | +| | of writing this is | | +| | limited to kubernetes | | +| | secret files. | | ++--------------------+-------------------------+-----------------------+ +| Ansible Vault | Used to store sensitive | Chris, Joe, Johannes, | +| | data for the Ansible | Bella | +| | deployment | | ++--------------------+-------------------------+-----------------------+ diff --git a/docs/onboarding/index.rst b/docs/onboarding/index.rst new file mode 100644 index 0000000..3929d7e --- /dev/null +++ b/docs/onboarding/index.rst @@ -0,0 +1,17 @@ +Onboarding +========== + +This section documents who manages which access to our DevOps resources, +and how access is managed. + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + access + resources + rules + tools + +.. vim: set textwidth=80 sw=2 ts=2: --> diff --git a/docs/onboarding/resources.rst b/docs/onboarding/resources.rst new file mode 100644 index 0000000..0ec846b --- /dev/null +++ b/docs/onboarding/resources.rst @@ -0,0 +1,28 @@ +Resources +========= + +The following is a collection of important reference documents for the +DevOps team. + +`Infra Repo <https://github.com/python-discord/infra>`__ +-------------------------------------------------------- + +This GitHub repo contains most of the manifests and configuration +applies to our cluster. It’s kept up to date manually and is considered +a source of truth for what we should have in the cluster. + +It is mostly documented, but improvements for unclear or outdated +aspects is always welcome. + +`Knowledge base <https://python-discord.github.io/infra/>`__ +------------------------------------------------------------ + +Deployed using GH pages, source can be found in the docs directory of +the k8s repo. + +This includes: + +- Changelogs +- Post-mortems +- Common queries +- Runbooks diff --git a/docs/onboarding/rules.rst b/docs/onboarding/rules.rst new file mode 100644 index 0000000..bd0ea0e --- /dev/null +++ b/docs/onboarding/rules.rst @@ -0,0 +1,16 @@ +Rules +===== + +The rules any DevOps team member must follow. + +1. LMAO - **L**\ ogging, **M**\ onitoring, **A**\ lerting, + **O**\ bservability +2. Modmail is the greatest piece of software ever written +3. Modmail needs at least 5 minutes to gather all its greatness at + startup +4. We never blame Chris, it’s always <@233481908342882304>’s fault +5. LKE isn’t bad, it’s your fault for not paying for the high + availability control plane +6. Our software is never legacy, it’s merely well-aged +7. Ignore these rules (however maybe not 1, 1 seems important to + remember) diff --git a/docs/onboarding/tools.rst b/docs/onboarding/tools.rst new file mode 100644 index 0000000..811f1ad --- /dev/null +++ b/docs/onboarding/tools.rst @@ -0,0 +1,50 @@ +Tools +===== + +We use a few tools to manage, monitor, and interact with our +infrastructure. Some of these tools are not unique to the DevOps team, +and may be shared by other teams. + +Most of these are gated behind a Cloudflare Access system, which is +accessible to the `DevOps +Team <https://github.com/orgs/python-discord/teams/devops>`__ on GitHub. +These are marked with the ☁️ emoji. If you don’t have access, please +contact Chris or Joe. + +`Grafana <https://grafana.pydis.wtf/>`__ +---------------------------------------- + +Grafana provides access to some of the most important resources at your +disposal. It acts as an aggregator and frontend for a large amount of +data. These range from metrics, to logs, to stats. Some of the most +important are listed below: + +- Service Logs/All App Logs Dashboard + + Service logs is a simple log viewer which gives you access to a large + majority of the applications deployed in the default namespace. The + All App logs dashboard is an expanded version of that which gives you + access to all apps in all namespaces, and allows some more in-depth + querying. + +- Kubernetes Dashboard + + This dashboard gives quick overviews of all the most important + metrics of the Kubernetes system. For more detailed information, + check out other dashboard such as Resource Usage, NGINX, and Redis. + +Accessed via a GitHub login, with permission for anyone in the dev-core +or dev-ops team. + +`Prometheus Dashboard <https://prometheus.pydis.wtf/>`__ (☁️)) +-------------------------------------------------------------- + +This provides access to the Prometheus query console. You may also enjoy +the `Alertmanager Console <https://alertmanager.pydis.wtf/>`__. + +`King Arthur <https://github.com/python-discord/king-arthur/>`__ +---------------------------------------------------------------- + +King Arthur is a discord bot which provides information about, and +access to our cluster directly in discord. Invoke its help command for +more information (``M-x help``). |