aboutsummaryrefslogtreecommitdiffstats
path: root/docs/onboarding
diff options
context:
space:
mode:
Diffstat (limited to 'docs/onboarding')
-rw-r--r--docs/onboarding/access.rst50
-rw-r--r--docs/onboarding/index.rst17
-rw-r--r--docs/onboarding/resources.rst28
-rw-r--r--docs/onboarding/rules.rst16
-rw-r--r--docs/onboarding/tools.rst50
5 files changed, 161 insertions, 0 deletions
diff --git a/docs/onboarding/access.rst b/docs/onboarding/access.rst
new file mode 100644
index 0000000..940cd8b
--- /dev/null
+++ b/docs/onboarding/access.rst
@@ -0,0 +1,50 @@
+Access table
+============
+
++--------------------+-------------------------+-----------------------+
+| **Resource** | **Description** | **Keyholders** |
++====================+=========================+=======================+
+| Linode Kubernetes | The primary cluster | Hassan, Joe, Chris, |
+| Cluster | where all resources are | Leon, Sebastiaan, |
+| | deployed. | Johannes |
++--------------------+-------------------------+-----------------------+
+| Linode Dashboard | The online dashboard | Joe, Chris |
+| | for managing and | |
+| | allocating resources | |
+| | from Linode. | |
++--------------------+-------------------------+-----------------------+
+| Netcup Dashboard | The dashboard for | Joe, Chris |
+| | managing and allocating | |
+| | resources from Netcup. | |
++--------------------+-------------------------+-----------------------+
+| Netcup servers | Root servers provided | Joe, Chris, Bella, |
+| | by the Netcup | Johannes |
+| | partnership. | |
++--------------------+-------------------------+-----------------------+
+| Grafana | The primary aggregation | Admins, Moderators, |
+| | dashboard for most | Core Developers and |
+| | resources. | DevOps (with varying |
+| | | permissions) |
++--------------------+-------------------------+-----------------------+
+| Prometheus | The Prometheus query | Hassan, Joe, |
+| Dashboard | dashboard. Access is | Johannes, Chris |
+| | controlled via | |
+| | Cloudflare Access. | |
++--------------------+-------------------------+-----------------------+
+| Alertmanager | The alertmanager | Hassan, Joe, |
+| Dashboard | control dashboard. | Johannes, Chris |
+| | Access is controlled | |
+| | via Cloudflare Access. | |
++--------------------+-------------------------+-----------------------+
+| ``git-crypt``\ ed | ``git-crypt`` is used | Chris, Joe, Hassan, |
+| files in infra | to encrypt certain | Johannes, Xithrius |
+| repository | files within the | |
+| | repository. At the time | |
+| | of writing this is | |
+| | limited to kubernetes | |
+| | secret files. | |
++--------------------+-------------------------+-----------------------+
+| Ansible Vault | Used to store sensitive | Chris, Joe, Johannes, |
+| | data for the Ansible | Bella |
+| | deployment | |
++--------------------+-------------------------+-----------------------+
diff --git a/docs/onboarding/index.rst b/docs/onboarding/index.rst
new file mode 100644
index 0000000..3929d7e
--- /dev/null
+++ b/docs/onboarding/index.rst
@@ -0,0 +1,17 @@
+Onboarding
+==========
+
+This section documents who manages which access to our DevOps resources,
+and how access is managed.
+
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Contents:
+
+ access
+ resources
+ rules
+ tools
+
+.. vim: set textwidth=80 sw=2 ts=2: -->
diff --git a/docs/onboarding/resources.rst b/docs/onboarding/resources.rst
new file mode 100644
index 0000000..0ec846b
--- /dev/null
+++ b/docs/onboarding/resources.rst
@@ -0,0 +1,28 @@
+Resources
+=========
+
+The following is a collection of important reference documents for the
+DevOps team.
+
+`Infra Repo <https://github.com/python-discord/infra>`__
+--------------------------------------------------------
+
+This GitHub repo contains most of the manifests and configuration
+applies to our cluster. It’s kept up to date manually and is considered
+a source of truth for what we should have in the cluster.
+
+It is mostly documented, but improvements for unclear or outdated
+aspects is always welcome.
+
+`Knowledge base <https://python-discord.github.io/infra/>`__
+------------------------------------------------------------
+
+Deployed using GH pages, source can be found in the docs directory of
+the k8s repo.
+
+This includes:
+
+- Changelogs
+- Post-mortems
+- Common queries
+- Runbooks
diff --git a/docs/onboarding/rules.rst b/docs/onboarding/rules.rst
new file mode 100644
index 0000000..bd0ea0e
--- /dev/null
+++ b/docs/onboarding/rules.rst
@@ -0,0 +1,16 @@
+Rules
+=====
+
+The rules any DevOps team member must follow.
+
+1. LMAO - **L**\ ogging, **M**\ onitoring, **A**\ lerting,
+ **O**\ bservability
+2. Modmail is the greatest piece of software ever written
+3. Modmail needs at least 5 minutes to gather all its greatness at
+ startup
+4. We never blame Chris, it’s always <@233481908342882304>’s fault
+5. LKE isn’t bad, it’s your fault for not paying for the high
+ availability control plane
+6. Our software is never legacy, it’s merely well-aged
+7. Ignore these rules (however maybe not 1, 1 seems important to
+ remember)
diff --git a/docs/onboarding/tools.rst b/docs/onboarding/tools.rst
new file mode 100644
index 0000000..811f1ad
--- /dev/null
+++ b/docs/onboarding/tools.rst
@@ -0,0 +1,50 @@
+Tools
+=====
+
+We use a few tools to manage, monitor, and interact with our
+infrastructure. Some of these tools are not unique to the DevOps team,
+and may be shared by other teams.
+
+Most of these are gated behind a Cloudflare Access system, which is
+accessible to the `DevOps
+Team <https://github.com/orgs/python-discord/teams/devops>`__ on GitHub.
+These are marked with the ☁️ emoji. If you don’t have access, please
+contact Chris or Joe.
+
+`Grafana <https://grafana.pydis.wtf/>`__
+----------------------------------------
+
+Grafana provides access to some of the most important resources at your
+disposal. It acts as an aggregator and frontend for a large amount of
+data. These range from metrics, to logs, to stats. Some of the most
+important are listed below:
+
+- Service Logs/All App Logs Dashboard
+
+ Service logs is a simple log viewer which gives you access to a large
+ majority of the applications deployed in the default namespace. The
+ All App logs dashboard is an expanded version of that which gives you
+ access to all apps in all namespaces, and allows some more in-depth
+ querying.
+
+- Kubernetes Dashboard
+
+ This dashboard gives quick overviews of all the most important
+ metrics of the Kubernetes system. For more detailed information,
+ check out other dashboard such as Resource Usage, NGINX, and Redis.
+
+Accessed via a GitHub login, with permission for anyone in the dev-core
+or dev-ops team.
+
+`Prometheus Dashboard <https://prometheus.pydis.wtf/>`__ (☁️))
+--------------------------------------------------------------
+
+This provides access to the Prometheus query console. You may also enjoy
+the `Alertmanager Console <https://alertmanager.pydis.wtf/>`__.
+
+`King Arthur <https://github.com/python-discord/king-arthur/>`__
+----------------------------------------------------------------
+
+King Arthur is a discord bot which provides information about, and
+access to our cluster directly in discord. Invoke its help command for
+more information (``M-x help``).