unleash.unleash

mirror of https://github.com/Unleash/unleash.git synced 2025-11-10 01:19:53 +01:00

Author	SHA1	Message	Date
Gastón Fournier	6a63e27ebd	fix: get users total right at startup (#10750 ) # Summary Add optional lazy collection with TTL to our createGauge wrapper, allowing a gauge to fetch its value on scrape and cache it for a configurable duration. This lets us register a collect function directly at gauge declaration without changing existing call sites or behavior. We're experimenting with this, reason why we're only applying the solution to `users_total` and will evaluate afterwards. # Problem - Some gauges should be computed on scrape (e.g., expensive or external lookups) instead of being pushed continuously. - Our current `createGauge` helper doesn’t make it easy to attach a `collect` with caching. Each caller has to reimplement timing, caching, and error handling. - This leads to repeated costly work, inconsistent handling of unknown values, and boilerplate. # What changed - `createGauge` now accepts two optional options in addition to the usual prom-client options: - `fetchValue?: () => Promise<number \| null>` - `ttlMs?: number` - When `fetchValue` is provided: - We install a `collect` that fetches on scrape. - Successful values are cached for `ttlMs` milliseconds (if `ttlMs` > 0). - If `ttlMs` is 0 or omitted, we fetch on every scrape. - If `fetchValue` returns null or throws, we set `NaN` (indicates `"unknown"`). # Behavior details ## Caching: - A value is “fresh” when successfully fetched within `ttlMs`. - Only numeric successes are cached. null and errors are not cached; we’ll refetch on the next scrape. ## Unknown values: - null or thrown errors set the gauge to `NaN` so Prometheus won’t treat it as zero. ## Compatibility: - Backward compatible. Existing uses of `createGauge` are unchanged. If a user-supplied `collect` exists, it still runs after the TTL logic (can overwrite the value by design). - API remains the same for the returned wrapper: `{ gauge, labels, reset, set }`.	2025-10-07 14:22:33 +02:00
Mateusz Kwasniewski	413847374e	feat: using histogram metrics (#10695 )	2025-09-24 17:02:29 +02:00
Nuno Góis	d5f834e851	chore: add unknown flag metric for unique flag names (#10601 ) https://linear.app/unleash/issue/2-3845/add-metric-for-unique-unknown-flag-names Adds a new unknown flag metric for unique unknown flag names. <img width="864" height="126" alt="image" src="https://github.com/user-attachments/assets/c20ce53e-eff3-4cda-af4b-f2c2aae8575c" />	2025-09-03 08:44:28 +01:00
Nuno Góis	3338ea4300	chore: clear unknown flags every 24h instead of every 7d (#10446 ) https://linear.app/unleash/issue/2-3738/clear-unknown-flags-every-24h-instead-of-every-7d Clears unknown flags every 24h instead of every 7d. This ensures the list stays more relevant by removing stale entries sooner, allowing users to focus on actively reported unknown flags. Also includes small improvements, including a new paragraph on the unknown flags page that better explains the concept of unknown flag reports.	2025-07-31 11:22:40 +01:00
Mateusz Kwasniewski	998834245c	feat: Using impact metrics with flags (#10355 )	2025-07-15 08:42:38 +02:00
Mateusz Kwasniewski	3ea30eb7a1	feat: impact metrics total requests (#10334 )	2025-07-09 14:45:51 +02:00
Mateusz Kwasniewski	7c31ec71a1	feat: Count errors and gauge for heap memory (#10203 )	2025-06-25 08:03:15 +02:00
Gastón Fournier	bdb763c9d5	chore!: remove deprecated default env from new installs (#10080 ) BREAKING CHANGE: DEFAULT_ENV changed from `default` (should not be used anymore) to `development` ## About the changes - Only delete default env if the install is fresh new. - Consider development the new default. The main consequence of this change is that the default is no longer considered `type=production` environment but also for frontend tokens due to this assumption: `724c4b78a2/src/lib/schema/api-token-schema.test.ts (L54-L59)` (I believe this is mostly due to the [support for admin tokens](https://github.com/Unleash/unleash/pull/10080#discussion_r2126871567)) - `feature_toggle_update_total` metric reports `n/a` in environment and environment type as it's not environment specific	2025-06-06 12:02:21 +02:00
Mateusz Kwasniewski	0819b2cf32	chore: remove link flags (#10076 )	2025-06-03 13:47:24 +02:00
Tymoteusz Czech	5be0b37cb1	fix: add event listeners for release plan updates in metrics (#10070 ) Milestone changes should be visible in metrics, in the same way as strategy changes are.	2025-06-02 12:40:10 +02:00
Gastón Fournier	abe160eb7d	feat: Unleash v7 ESM migration (#9877 ) We're migrating to ESM, which will allow us to import the latest versions of our dependencies. Co-Authored-By: Christopher Kolstad <chriswk@getunleash.io>	2025-05-14 09:47:12 +02:00
Mateusz Kwasniewski	43efaf7c47	feat: report feature links by domain (#9936 )	2025-05-09 09:39:15 +02:00
Nuno Góis	eb238f502a	chore: unknown flags (#9837 ) https://linear.app/unleash/issue/2-3406/hold-unknown-flags-in-memory-and-show-them-in-the-ui-somehow This PR introduces a suggestion for a “unknown flags” feature. When clients report metrics for flags that don’t exist in Unleash (e.g. due to typos), we now track a limited set of these unknown flag names along with the appnames that reported them. The goal is to help users identify and clean up incorrect flag usage across their apps. We store up to 10 unknown flag + appName combinations, keeping only the most recent reports. Data is collected in-memory and flushed periodically to the DB, with deduplication and merging to ensure we don’t exceed the cap even across pods. We were especially careful to make this implementation defensive, as unknown flags could be reported in very high volumes. Writes are batched, deduplicated, and hard-capped to avoid DB pressure. No UI has been added yet — this is backend-only for now and intended as a step toward better visibility into client misconfigurations. I would suggest starting with a simple banner that opens a dialog showing the list of unknown flags and which apps reported them. <img width="497" alt="image" src="https://github.com/user-attachments/assets/b7348e0d-0163-4be4-a7f8-c072e8464331" />	2025-05-07 11:48:36 +01:00
Gastón Fournier	d11f39e401	chore: expose custom strategy metrics in prometheus (#9657 ) ## About the changes These metrics are sent to version info but not exposed in prometheus and they can provide valuable data about their usage	2025-03-31 16:02:50 +02:00
Christopher Kolstad	efcf04487d	chore: make it build with strict null checks set to true (#9554 ) As part of preparation for ESM and node/TSC updates, this PR will make Unleash build with strictNullChecks set to true, since that's what's in our tsconfig file. Hence, this PR also removes the `--strictNullChecks false` flag in our compile tasks in package.json. TL;DR - Clean up your code rather than turning off compiler security features :)	2025-03-19 10:01:49 +01:00
sjaanus	c09afa3e99	git status	2025-03-12 11:03:54 +02:00
Ivar Conradi Østhus	3f730bb7f3	fix: add a metric to track client registrations (#9314 ) Adding a counter to track every time a client registers with Unleash.	2025-02-17 09:01:19 +01:00
Mateusz Kwasniewski	77cb30a82f	feat: drop x- header prefix (#9175 )	2025-01-30 16:09:26 +01:00
Mateusz Kwasniewski	cbe0ac475c	feat: separate frontend backend counting (#9167 )	2025-01-29 13:31:37 +01:00
Mateusz Kwasniewski	05e608ab09	chore: gather metrics every hour (#9163 )	2025-01-29 10:01:38 +01:00
Mateusz Kwasniewski	ce73190241	feat: unique connection gauge metric (#9089 )	2025-01-13 14:06:09 +01:00
Mateusz Kwasniewski	161fa131c7	chore: remove connection id from tracking (#9072 )	2025-01-09 09:46:04 +01:00
Mateusz Kwasniewski	cef10eee02	feat: Unique connection tracking (#9067 )	2025-01-08 13:36:40 +01:00
Jaanus Sellin	b701fec75d	feat: store memory footprints to grafana (#9001 ) When there is new revision, we will start storing memory footprint for old client-api and the new delta-api. We will be sending it as prometheus metrics. The memory size will only be recalculated if revision changes, which does not happen very often.	2024-12-19 13:15:30 +02:00
Jaanus Sellin	fdb20e94e1	chore: rename to seats used (#8993 ) Instead of licensed users/used, we will use seats used.	2024-12-17 12:39:54 +02:00
Fredrik Strand Oseberg	39ca516823	feat: add prom metrics (#8980 ) This PR adds prometheus metrics that allows us to see whether or not tags and namePrefix is used at all in our cloud offering.	2024-12-16 10:48:33 +01:00
Jaanus Sellin	7906bfb177	chore: rename toggle to flag (#8854 )	2024-11-26 09:57:43 +02:00
gitar-bot[bot]	6844984610	[Gitar] Cleaning up stale flag: trackLifecycleMetrics with value false (#8833 ) [![Gitar](https://raw.githubusercontent.com/gitarcode/.github/main/assets/gitar-banner.svg)](https://gitar.ai) This automated PR permanently removes the `trackLifecycleMetrics` feature flag. --- This automated PR was generated by [Gitar](https://gitar.ai). View [docs](https://gitar.ai/docs). Co-authored-by: Gitar <noreply@gitar.ai>	2024-11-22 11:54:28 +02:00
Jaanus Sellin	d2daae5857	feat: prometheus now gets licensed users data (#8740 ) We have the licensed users service implemented, this just spread the data to prometheus.	2024-11-13 16:00:47 +02:00
Mateusz Kwasniewski	61e297dd22	fix: password auth metrics (#8735 )	2024-11-13 12:10:53 +01:00
Mateusz Kwasniewski	bb0403d551	feat: metrics for password and scim enabled (#8730 )	2024-11-13 10:07:06 +01:00
gitar-bot[bot]	0ff0b2dbd0	[Gitar] Cleaning up stale flag: onboardingMetrics with value true (#8550 )	2024-10-28 11:47:58 +01:00
Jaanus Sellin	677ec9cee8	feat: send traffic info to prometheus (#8541 )	2024-10-25 15:43:14 +03:00
Gastón Fournier	15f55c7662	chore: Prometheus metrics refactor (#8484 ) Migrate some prometheus metrics to use the new and sequential metric updater	2024-10-22 15:11:57 +02:00
Gastón Fournier	a9f9be1efa	chore: add a class to handle aggreggation queries (#8446 ) ## About the changes We have many aggregation queries that run on a schedule: `f63496d47f/src/lib/metrics.ts (L714-L719)` These staticCounters are usually doing db query aggregations that traverse tables and we run all of them in parallel: `f63496d47f/src/lib/metrics.ts (L410-L412)` This can add strain to the db. This PR suggests a way of handling these queries in a more structured way, allowing us to run them sequentially (therefore spreading the load): `f02fe87835/src/lib/metrics-gauge.ts (L38-L40)` As an additional benefit, we get both the gauge definition and the queries in a single place: `f02fe87835/src/lib/metrics.ts (L131-L141)` This PR only tackles 1 metric, and it only focuses on gauges to gather initial feedback. The plan is to migrate these metrics and eventually incorporate more types (e.g. counters) --------- Co-authored-by: Nuno Góis <github@nunogois.com>	2024-10-18 11:11:22 +02:00
David Leek	24b9e4987b	chore:origin middleware flag cleanup (#8402 )	2024-10-10 14:26:35 +02:00
Jaanus Sellin	93883b3767	chore: update debugging lifecycle format (#8371 )	2024-10-07 14:54:53 +03:00
Jaanus Sellin	9a64dfbfbe	feat: add logging for lifecycle prom metrics (#8341 )	2024-10-02 15:49:15 +03:00
Ivar Conradi Østhus	01afe87302	fix: extend feature_toggle_update counter with details about action (#8202 ) Ideally `feature_lifecycle_stage_entered{stage="archived"}` would allow me to see how many flags are archived per week. It seems like the numbers for this is a bit off, and wanted to extend our current `feature_toggle_update` counter with action details.	2024-09-30 12:16:03 +00:00
Mateusz Kwasniewski	c5d6bdecac	feat: projects onboarding metrics (#8014 )	2024-08-29 14:57:27 +02:00
Jaanus Sellin	e61f016c8c	feat: start collecting prometheus metrics for onboarding events (#8012 ) We start collecting prometheus metrics for onboarding events. Co-authored-by: @kwasniew	2024-08-29 12:46:23 +03:00
Jaanus Sellin	7ad686e14e	fix: change .inc calls to .increment (#8000 ) We are observing incorrect data in Prometheus, which is consistently non-reproducible. After a restart, the issue does not occur, but if the pods run for an extended period, they seem to enter a strange state where the counters become entangled and start sharing arbitrary values that are added to the counters. For example, the `feature_lifecycle_stage_entered` counter gets an arbitrary value, such as 12, added when `inc()` is called. The `exceedsLimitErrorCounter` shows the same behavior, and the code implementation is identical. We also tested some existing `increase()` counters, and they do not suffer from this issue. All calls to `counter.labels(labels).inc(`) will be replaced by `counter.increment()` to try to mitigate the issue.	2024-08-28 12:50:36 +03:00
Mateusz Kwasniewski	4e11e57f7f	feat: project actions count metric (#7929 )	2024-08-20 09:46:39 +02:00
David Leek	e714a7fe2b	feat:metrics for outgoing integrations (#7921 )	2024-08-20 09:00:28 +02:00
David Leek	cf83043d8a	feat: resolve useragent source and add as source label to metrics (#7883 )	2024-08-15 13:25:42 +02:00
Thomas Heartman	0c53f7d21b	feat: create gauges for all resource limits (#7718 ) This PR adds Grafana gauges for all the existing resource limits. The primary purpose is to be able to use this in alerting. Secondarily, we can also use it to get better insights into how many customers have increased their limits, as well as how many people are approaching their limit, regdardless of whether it's been increased or not. ## Discussion points ### Implementation The first approach I took (in `87528b4c67`), was to add a new gauge for each resource limit. However, there's a lot of boilerplate for it. I thought doing it like this (the current implementation) would make it easier. We should still be able to use the labelName to collate this in Grafana, as far as I understand? As a bonus, we'd automatically get new resource limits when we add them to the schema. ``` tsx const resourceLimit = createGauge({ name: 'resource_limit', help: 'The maximum number of resources allowed.', labelNames: ['resource'], }); // ... for (const [resource, limit] of Object.entries(config.resourceLimits)) { resourceLimit.labels({ resource }).set(limit); } ``` That way, when checking the stats, we should be able to do something like this: ``` promql resource_limit{resource="constraintValues"} ``` ### Do we need to reset gauges? I noticed that we reset gauges before setting values in them all over the place. I don't know if that's necessary. I'd like to get that double clarified before merging this.	2024-08-01 09:59:25 +02:00
Nuno Góis	49fecb2005	chore: request origin prom metrics (#7709 ) https://linear.app/unleash/issue/2-2501/adapt-origin-middleware-to-stop-logging-ui-requests-and-start This adapts the new origin middleware to stop logging UI requests (too noisy) and adds new Prometheus metrics. <img width="745" alt="image" src="https://github.com/user-attachments/assets/d0c7f51d-feb6-4ff5-b856-77661be3b5a9"> This should allow us to better analyze this data. If we see a lot of API requests, we can dive into the logs for that instance and check the logged data, like the user agent. This PR adds some helper methods to make listening and emitting metric events more strict in terms of types. I think it's a positive change aligned with our scouting principle, but if you think it's complex and does not belong here I'm happy with dropping it.	2024-07-31 13:52:39 +01:00
Thomas Heartman	f15bcdc2a6	chore: send prometheus metrics when someone tries to exceed resource limits (#7617 ) This PR adds prometheus metrics for when users attempt to exceed the limits for a given resource. The implementation sets up a second function exported from the ExceedsLimitError file that records metrics and then throws the error. This could also be a static method on the class, but I'm not sure that'd be better.	2024-07-18 13:35:45 +02:00
Tymoteusz Czech	b9c3d101ba	feat: statistics for orphaned tokens (#7568 ) Added metrics for orphaned tokens and modified `createTokenRowReducer` to exclude tokens in v1 format.	2024-07-11 11:39:38 +02:00
Simon Hornby	2e205fc14e	chore: make sdk metrics snake case (#7547 )	2024-07-05 12:29:00 +02:00

1 2 3

120 Commits