1
0
mirror of https://github.com/Unleash/unleash.git synced 2025-11-10 01:19:53 +01:00
Commit Graph

107 Commits

Author SHA1 Message Date
Gastón Fournier
d11f39e401
chore: expose custom strategy metrics in prometheus (#9657)
## About the changes
These metrics are sent to version info but not exposed in prometheus and
they can provide valuable data about their usage
2025-03-31 16:02:50 +02:00
Christopher Kolstad
efcf04487d
chore: make it build with strict null checks set to true (#9554)
As part of preparation for ESM and node/TSC updates, this PR will make
Unleash build with strictNullChecks set to true, since that's what's in
our tsconfig file. Hence, this PR also removes the `--strictNullChecks
false` flag in our compile tasks in package.json.

TL;DR - Clean up your code rather than turning off compiler security
features :)
2025-03-19 10:01:49 +01:00
sjaanus
c09afa3e99
git status 2025-03-12 11:03:54 +02:00
Ivar Conradi Østhus
3f730bb7f3
fix: add a metric to track client registrations (#9314)
Adding a counter to track every time a client registers with Unleash.
2025-02-17 09:01:19 +01:00
Mateusz Kwasniewski
77cb30a82f
feat: drop x- header prefix (#9175) 2025-01-30 16:09:26 +01:00
Mateusz Kwasniewski
cbe0ac475c
feat: separate frontend backend counting (#9167) 2025-01-29 13:31:37 +01:00
Mateusz Kwasniewski
05e608ab09
chore: gather metrics every hour (#9163) 2025-01-29 10:01:38 +01:00
Mateusz Kwasniewski
ce73190241
feat: unique connection gauge metric (#9089) 2025-01-13 14:06:09 +01:00
Mateusz Kwasniewski
161fa131c7
chore: remove connection id from tracking (#9072) 2025-01-09 09:46:04 +01:00
Mateusz Kwasniewski
cef10eee02
feat: Unique connection tracking (#9067) 2025-01-08 13:36:40 +01:00
Jaanus Sellin
b701fec75d
feat: store memory footprints to grafana (#9001)
When there is new revision, we will start storing memory footprint for
old client-api and the new delta-api.
We will be sending it as prometheus metrics.

The memory size will only be recalculated if revision changes, which
does not happen very often.
2024-12-19 13:15:30 +02:00
Jaanus Sellin
fdb20e94e1
chore: rename to seats used (#8993)
Instead of licensed users/used, we will use seats used.
2024-12-17 12:39:54 +02:00
Fredrik Strand Oseberg
39ca516823
feat: add prom metrics (#8980)
This PR adds prometheus metrics that allows us to see whether or not
tags and namePrefix is used at all in our cloud offering.
2024-12-16 10:48:33 +01:00
Jaanus Sellin
7906bfb177
chore: rename toggle to flag (#8854) 2024-11-26 09:57:43 +02:00
gitar-bot[bot]
6844984610
[Gitar] Cleaning up stale flag: trackLifecycleMetrics with value false (#8833)
[![Gitar](https://raw.githubusercontent.com/gitarcode/.github/main/assets/gitar-banner.svg)](https://gitar.ai)
This automated PR permanently removes the `trackLifecycleMetrics`
feature flag.
  
  ---
This automated PR was generated by [Gitar](https://gitar.ai). View
[docs](https://gitar.ai/docs).

Co-authored-by: Gitar <noreply@gitar.ai>
2024-11-22 11:54:28 +02:00
Jaanus Sellin
d2daae5857
feat: prometheus now gets licensed users data (#8740)
We have the licensed users service implemented, this just spread the
data to prometheus.
2024-11-13 16:00:47 +02:00
Mateusz Kwasniewski
61e297dd22
fix: password auth metrics (#8735) 2024-11-13 12:10:53 +01:00
Mateusz Kwasniewski
bb0403d551
feat: metrics for password and scim enabled (#8730) 2024-11-13 10:07:06 +01:00
gitar-bot[bot]
0ff0b2dbd0
[Gitar] Cleaning up stale flag: onboardingMetrics with value true (#8550) 2024-10-28 11:47:58 +01:00
Jaanus Sellin
677ec9cee8
feat: send traffic info to prometheus (#8541) 2024-10-25 15:43:14 +03:00
Gastón Fournier
15f55c7662
chore: Prometheus metrics refactor (#8484)
Migrate some prometheus metrics to use the new and sequential metric
updater
2024-10-22 15:11:57 +02:00
Gastón Fournier
a9f9be1efa
chore: add a class to handle aggreggation queries (#8446)
## About the changes
We have many aggregation queries that run on a schedule:
f63496d47f/src/lib/metrics.ts (L714-L719)

These staticCounters are usually doing db query aggregations that
traverse tables and we run all of them in parallel:
f63496d47f/src/lib/metrics.ts (L410-L412)

This can add strain to the db. This PR suggests a way of handling these
queries in a more structured way, allowing us to run them sequentially
(therefore spreading the load):
f02fe87835/src/lib/metrics-gauge.ts (L38-L40)

As an additional benefit, we get both the gauge definition and the
queries in a single place:
f02fe87835/src/lib/metrics.ts (L131-L141)

This PR only tackles 1 metric, and it only focuses on gauges to gather
initial feedback. The plan is to migrate these metrics and eventually
incorporate more types (e.g. counters)

---------

Co-authored-by: Nuno Góis <github@nunogois.com>
2024-10-18 11:11:22 +02:00
David Leek
24b9e4987b
chore:origin middleware flag cleanup (#8402) 2024-10-10 14:26:35 +02:00
Jaanus Sellin
93883b3767
chore: update debugging lifecycle format (#8371) 2024-10-07 14:54:53 +03:00
Jaanus Sellin
9a64dfbfbe
feat: add logging for lifecycle prom metrics (#8341) 2024-10-02 15:49:15 +03:00
Ivar Conradi Østhus
01afe87302
fix: extend feature_toggle_update counter with details about action (#8202)
Ideally `feature_lifecycle_stage_entered{stage="archived"}` would allow
me to see how many flags are archived per week.
It seems like the numbers for this is a bit off, and wanted to extend
our current `feature_toggle_update` counter with action details.
2024-09-30 12:16:03 +00:00
Mateusz Kwasniewski
c5d6bdecac
feat: projects onboarding metrics (#8014) 2024-08-29 14:57:27 +02:00
Jaanus Sellin
e61f016c8c
feat: start collecting prometheus metrics for onboarding events (#8012)
We start collecting prometheus metrics for onboarding events.

Co-authored-by: @kwasniew
2024-08-29 12:46:23 +03:00
Jaanus Sellin
7ad686e14e
fix: change .inc calls to .increment (#8000)
We are observing incorrect data in Prometheus, which is consistently
non-reproducible. After a restart, the issue does not occur, but if the
pods run for an extended period, they seem to enter a strange state
where the counters become entangled and start sharing arbitrary values
that are added to the counters.

For example, the `feature_lifecycle_stage_entered` counter gets an
arbitrary value, such as 12, added when `inc()` is called. The
`exceedsLimitErrorCounter` shows the same behavior, and the code
implementation is identical.

We also tested some existing `increase()` counters, and they do not
suffer from this issue.

All calls to `counter.labels(labels).inc(`) will be replaced by
`counter.increment()` to try to mitigate the issue.
2024-08-28 12:50:36 +03:00
Mateusz Kwasniewski
4e11e57f7f
feat: project actions count metric (#7929) 2024-08-20 09:46:39 +02:00
David Leek
e714a7fe2b
feat:metrics for outgoing integrations (#7921) 2024-08-20 09:00:28 +02:00
David Leek
cf83043d8a
feat: resolve useragent source and add as source label to metrics (#7883) 2024-08-15 13:25:42 +02:00
Thomas Heartman
0c53f7d21b
feat: create gauges for all resource limits (#7718)
This PR adds Grafana gauges for all the existing resource limits. The
primary purpose is to be able to use this in alerting. Secondarily, we
can also use it to get better insights into how many customers have
increased their limits, as well as how many people are approaching their
limit, regdardless of whether it's been increased or not.

## Discussion points

### Implementation

The first approach I took (in
87528b4c67),
was to add a new gauge for each resource limit. However, there's a lot
of boilerplate for it.

I thought doing it like this (the current implementation) would make it
easier. We should still be able to use the labelName to collate this in
Grafana, as far as I understand? As a bonus, we'd automatically get new
resource limits when we add them to the schema.

``` tsx
        const resourceLimit = createGauge({
            name: 'resource_limit',
            help: 'The maximum number of resources allowed.',
            labelNames: ['resource'],
        });

        // ...

        for (const [resource, limit] of Object.entries(config.resourceLimits)) {
            resourceLimit.labels({ resource }).set(limit);
        }
```

That way, when checking the stats, we should be able to do something
like this:

``` promql
resource_limit{resource="constraintValues"}
```

### Do we need to reset gauges?

I noticed that we reset gauges before setting values in them all over
the place. I don't know if that's necessary. I'd like to get that double
clarified before merging this.
2024-08-01 09:59:25 +02:00
Nuno Góis
49fecb2005
chore: request origin prom metrics (#7709)
https://linear.app/unleash/issue/2-2501/adapt-origin-middleware-to-stop-logging-ui-requests-and-start

This adapts the new origin middleware to stop logging UI requests (too
noisy) and adds new Prometheus metrics.

<img width="745" alt="image"
src="https://github.com/user-attachments/assets/d0c7f51d-feb6-4ff5-b856-77661be3b5a9">

This should allow us to better analyze this data. If we see a lot of API
requests, we can dive into the logs for that instance and check the
logged data, like the user agent.

This PR adds some helper methods to make listening and emitting metric
events more strict in terms of types. I think it's a positive change
aligned with our scouting principle, but if you think it's complex and
does not belong here I'm happy with dropping it.
2024-07-31 13:52:39 +01:00
Thomas Heartman
f15bcdc2a6
chore: send prometheus metrics when someone tries to exceed resource limits (#7617)
This PR adds prometheus metrics for when users attempt to exceed the
limits for a given resource.

The implementation sets up a second function exported from the
ExceedsLimitError file that records metrics and then throws the error.
This could also be a static method on the class, but I'm not sure that'd
be better.
2024-07-18 13:35:45 +02:00
Tymoteusz Czech
b9c3d101ba
feat: statistics for orphaned tokens (#7568)
Added metrics for orphaned tokens and modified `createTokenRowReducer` to exclude tokens in v1 format.
2024-07-11 11:39:38 +02:00
Simon Hornby
2e205fc14e
chore: make sdk metrics snake case (#7547) 2024-07-05 12:29:00 +02:00
Simon Hornby
30073d527a
feat: extended SDK metrics (#7527)
This adds an extended metrics format to the metrics ingested by Unleash
and sent by running SDKs in the wild. Notably, we don't store this
information anywhere new in this PR, this is just streamed out to
Victoria metrics - the point of this project is insight, not analysis.

Two things to look out for in this PR:

- I've chosen to take extend the registration event and also send that
when we receive metrics. This means that the new data is received on
startup and on heartbeat. This takes us in the direction of collapsing
these two calls into one at a later point
- I've wrapped the existing metrics events in some "type safety", it
ain't much because we have 0 type safety on the event emitter so this
also has some if checks that look funny in TS that actually check if the
data shape is correct. Existing tests that check this are more or less
preserved
2024-07-04 08:51:27 +02:00
Mateusz Kwasniewski
72de574012
feat: largest projects and features metric (#7459) 2024-06-26 16:09:08 +02:00
Mateusz Kwasniewski
3a3b6a29ff
feat: lifecycle stage entered counter (#7449) 2024-06-25 14:40:16 +02:00
Mateusz Kwasniewski
388fe2dbd3
fix: change lifecycle stage duration metric type (#7444) 2024-06-25 12:42:43 +02:00
Mateusz Kwasniewski
c3fa468a9d
refactor: lifecycle stage duration outside instance stats (#7442) 2024-06-25 11:22:26 +02:00
Mateusz Kwasniewski
6a9a2c687d
feat: stage count by project metric (#7441) 2024-06-25 09:54:26 +02:00
Thomas Heartman
c4e2159401
chore: add metrics/gauges for "max constraint values" and "max constraints" (#7398)
This PR adds metrics tracking for:
- "maxConstraintValues": the highest number of constraint values that
are in use
- "maxConstraintsPerStrategy": the highest number of constraints used on
a strategy

It updates the existing feature strategy read model that returns max
metrics for other strategy-related things.

It also moves one test into a more fitting describe block.
2024-06-17 11:13:13 +02:00
Mateusz Kwasniewski
0c79b36b74
feat: Max strategies metrics (#7392) 2024-06-14 09:20:43 +02:00
Jaanus Sellin
d17ae37800
feat: now CLIENT_METRICS event will be emitted with new structure (#7210)
1. CLIENT_METRICS event will be emitted with new structure
2. CLIENT_METRICS event will be emitted from bulkMetrics endpoint
2024-05-31 12:40:46 +03:00
Mateusz Kwasniewski
99403e481b
feat: add prometheus metrics error logging (#7105) 2024-05-22 10:08:31 +02:00
Jaanus Sellin
2fb95339ef
chore: change toggle to flag #3 (#7101) 2024-05-22 09:58:53 +03:00
Christopher Kolstad
8aa0616698
feat: expose postgres version (#7041)
Adds a postgres_version gauge to allow us to see postgres_version in
prometheus and to post it upstream when version checking. Depends on
https://github.com/bricks-software/version-function/pull/20 to be merged
first to ensure our version-function doesn't crash when given the
postgres-version data.
2024-05-13 14:41:28 +02:00
Jaanus Sellin
958ccabb54
feat: lifecycle prometheus metrics per project (#7032)
When we pushed metrics per feature, it had too many datapoints and
grafana could not handle it. Now I am taking median for a project.
2024-05-10 15:24:27 +03:00