feat(docs): Added ADR for logging levels

2025-11-24 20:06:55 +01:00 · 2025-03-19 14:43:18 +01:00 · 2025-03-19 14:43:18 +01:00 · 0eeef18a6e
commit 0eeef18a6e
parent f5b26340e7
2 changed files with 37 additions and 0 deletions
--- a/website/docs/contributing/ADRs/ADRs.md
+++ b/website/docs/contributing/ADRs/ADRs.md
@ -15,6 +15,8 @@ These ADRs describe decisions that concern the entire codebase. They apply to ba

 * [Domain language](./overarching/domain-language.md)
 * [Separation of request and response schemas](./overarching/separation-request-response-schemas.md)
+* [Error Logging stack traces](./overarching/logging.md)
+* [Logging levels](./overarching/logging-levels.md)

 ## Back-end ADRs

--- a/website/docs/contributing/ADRs/overarching/logging-levels.md
+++ b/website/docs/contributing/ADRs/overarching/logging-levels.md
@ -0,0 +1,35 @@
+---
+title: "ADR: Logging levels"
+---
+## Date: 2025-03-20
+
+## Background
+
+Our log levels carry semantic information. 
+Log lines logged at the error level triggers SRE alerts if they exceed more than 1 per hour. Though we are pretty good at not excessively logging at ERROR, we do have cases where SRE alerts gets triggered, but by the time SRE can log on and check the deployment, everything is fine again. This means we never had an ERROR, we should have had a WARN message.
+
+This ADR aims to solidify an understanding that levels are important to use correctly to avoid mental load and on-call alerts for things we can't do anything about.
+
+## Decision
+
+We should agree on the semantic information carried in each level, and which levels it is ok to ignore while scanning logs from running applications.
+
+Current suggestion
+
+| Log level | Frequency in healthy application | Standard Availability                     | Configurable                |   
+|-----------|----------------------------------|-------------------------------------------|-----------------------------|
+| ERROR     | 0                                | All environments                          | NO                          |
+| WARN      | 1-10                             | All environments                          | NO                          |
+| INFO      | 10-100                           | Default deploy config sets LOG_LEVEL=info | YES                         |
+| DEBUG     | 100 - 1000                       | Local development                         | YES (specific deployments)  |
+| TRACE     | 1000 - 10000                     | NO                                        | YES (specific deployments)  |
+
+
+
+
+### Change
+
+Previously we might've logged an ERROR for a self-healable issue, this should change to WARN, and not be an ERROR.
+
+The only things that should be logged at ERROR are exceptional behaviour that we need to fix immediately, 
+everything else should be downgraded to WARN. In order to reduce WARN cardinality, this might mean that some messages at WARN today should be downgraded to INFO.