unleash.unleash

mirror of https://github.com/Unleash/unleash.git synced 2025-07-07 01:16:28 +02:00

Author	SHA1	Message	Date
Jaanus Sellin	7906bfb177	chore: rename toggle to flag (#8854 )	2024-11-26 09:57:43 +02:00
gitar-bot[bot]	6844984610	[Gitar] Cleaning up stale flag: trackLifecycleMetrics with value false (#8833 ) [![Gitar](https://raw.githubusercontent.com/gitarcode/.github/main/assets/gitar-banner.svg)](https://gitar.ai) This automated PR permanently removes the `trackLifecycleMetrics` feature flag. --- This automated PR was generated by [Gitar](https://gitar.ai). View [docs](https://gitar.ai/docs). Co-authored-by: Gitar <noreply@gitar.ai>	2024-11-22 11:54:28 +02:00
Jaanus Sellin	d2daae5857	feat: prometheus now gets licensed users data (#8740 ) We have the licensed users service implemented, this just spread the data to prometheus.	2024-11-13 16:00:47 +02:00
Mateusz Kwasniewski	61e297dd22	fix: password auth metrics (#8735 )	2024-11-13 12:10:53 +01:00
Mateusz Kwasniewski	bb0403d551	feat: metrics for password and scim enabled (#8730 )	2024-11-13 10:07:06 +01:00
gitar-bot[bot]	0ff0b2dbd0	[Gitar] Cleaning up stale flag: onboardingMetrics with value true (#8550 )	2024-10-28 11:47:58 +01:00
Jaanus Sellin	677ec9cee8	feat: send traffic info to prometheus (#8541 )	2024-10-25 15:43:14 +03:00
Gastón Fournier	15f55c7662	chore: Prometheus metrics refactor (#8484 ) Migrate some prometheus metrics to use the new and sequential metric updater	2024-10-22 15:11:57 +02:00
Gastón Fournier	a9f9be1efa	chore: add a class to handle aggreggation queries (#8446 ) ## About the changes We have many aggregation queries that run on a schedule: `f63496d47f/src/lib/metrics.ts (L714-L719)` These staticCounters are usually doing db query aggregations that traverse tables and we run all of them in parallel: `f63496d47f/src/lib/metrics.ts (L410-L412)` This can add strain to the db. This PR suggests a way of handling these queries in a more structured way, allowing us to run them sequentially (therefore spreading the load): `f02fe87835/src/lib/metrics-gauge.ts (L38-L40)` As an additional benefit, we get both the gauge definition and the queries in a single place: `f02fe87835/src/lib/metrics.ts (L131-L141)` This PR only tackles 1 metric, and it only focuses on gauges to gather initial feedback. The plan is to migrate these metrics and eventually incorporate more types (e.g. counters) --------- Co-authored-by: Nuno Góis <github@nunogois.com>	2024-10-18 11:11:22 +02:00
David Leek	24b9e4987b	chore:origin middleware flag cleanup (#8402 )	2024-10-10 14:26:35 +02:00
Jaanus Sellin	93883b3767	chore: update debugging lifecycle format (#8371 )	2024-10-07 14:54:53 +03:00
Jaanus Sellin	9a64dfbfbe	feat: add logging for lifecycle prom metrics (#8341 )	2024-10-02 15:49:15 +03:00
Ivar Conradi Østhus	01afe87302	fix: extend feature_toggle_update counter with details about action (#8202 ) Ideally `feature_lifecycle_stage_entered{stage="archived"}` would allow me to see how many flags are archived per week. It seems like the numbers for this is a bit off, and wanted to extend our current `feature_toggle_update` counter with action details.	2024-09-30 12:16:03 +00:00
Mateusz Kwasniewski	c5d6bdecac	feat: projects onboarding metrics (#8014 )	2024-08-29 14:57:27 +02:00
Jaanus Sellin	e61f016c8c	feat: start collecting prometheus metrics for onboarding events (#8012 ) We start collecting prometheus metrics for onboarding events. Co-authored-by: @kwasniew	2024-08-29 12:46:23 +03:00
Jaanus Sellin	7ad686e14e	fix: change .inc calls to .increment (#8000 ) We are observing incorrect data in Prometheus, which is consistently non-reproducible. After a restart, the issue does not occur, but if the pods run for an extended period, they seem to enter a strange state where the counters become entangled and start sharing arbitrary values that are added to the counters. For example, the `feature_lifecycle_stage_entered` counter gets an arbitrary value, such as 12, added when `inc()` is called. The `exceedsLimitErrorCounter` shows the same behavior, and the code implementation is identical. We also tested some existing `increase()` counters, and they do not suffer from this issue. All calls to `counter.labels(labels).inc(`) will be replaced by `counter.increment()` to try to mitigate the issue.	2024-08-28 12:50:36 +03:00
Mateusz Kwasniewski	4e11e57f7f	feat: project actions count metric (#7929 )	2024-08-20 09:46:39 +02:00
David Leek	e714a7fe2b	feat:metrics for outgoing integrations (#7921 )	2024-08-20 09:00:28 +02:00
David Leek	cf83043d8a	feat: resolve useragent source and add as source label to metrics (#7883 )	2024-08-15 13:25:42 +02:00
Thomas Heartman	0c53f7d21b	feat: create gauges for all resource limits (#7718 ) This PR adds Grafana gauges for all the existing resource limits. The primary purpose is to be able to use this in alerting. Secondarily, we can also use it to get better insights into how many customers have increased their limits, as well as how many people are approaching their limit, regdardless of whether it's been increased or not. ## Discussion points ### Implementation The first approach I took (in `87528b4c67`), was to add a new gauge for each resource limit. However, there's a lot of boilerplate for it. I thought doing it like this (the current implementation) would make it easier. We should still be able to use the labelName to collate this in Grafana, as far as I understand? As a bonus, we'd automatically get new resource limits when we add them to the schema. ``` tsx const resourceLimit = createGauge({ name: 'resource_limit', help: 'The maximum number of resources allowed.', labelNames: ['resource'], }); // ... for (const [resource, limit] of Object.entries(config.resourceLimits)) { resourceLimit.labels({ resource }).set(limit); } ``` That way, when checking the stats, we should be able to do something like this: ``` promql resource_limit{resource="constraintValues"} ``` ### Do we need to reset gauges? I noticed that we reset gauges before setting values in them all over the place. I don't know if that's necessary. I'd like to get that double clarified before merging this.	2024-08-01 09:59:25 +02:00
Nuno Góis	49fecb2005	chore: request origin prom metrics (#7709 ) https://linear.app/unleash/issue/2-2501/adapt-origin-middleware-to-stop-logging-ui-requests-and-start This adapts the new origin middleware to stop logging UI requests (too noisy) and adds new Prometheus metrics. <img width="745" alt="image" src="https://github.com/user-attachments/assets/d0c7f51d-feb6-4ff5-b856-77661be3b5a9"> This should allow us to better analyze this data. If we see a lot of API requests, we can dive into the logs for that instance and check the logged data, like the user agent. This PR adds some helper methods to make listening and emitting metric events more strict in terms of types. I think it's a positive change aligned with our scouting principle, but if you think it's complex and does not belong here I'm happy with dropping it.	2024-07-31 13:52:39 +01:00
Thomas Heartman	f15bcdc2a6	chore: send prometheus metrics when someone tries to exceed resource limits (#7617 ) This PR adds prometheus metrics for when users attempt to exceed the limits for a given resource. The implementation sets up a second function exported from the ExceedsLimitError file that records metrics and then throws the error. This could also be a static method on the class, but I'm not sure that'd be better.	2024-07-18 13:35:45 +02:00
Tymoteusz Czech	b9c3d101ba	feat: statistics for orphaned tokens (#7568 ) Added metrics for orphaned tokens and modified `createTokenRowReducer` to exclude tokens in v1 format.	2024-07-11 11:39:38 +02:00
Simon Hornby	2e205fc14e	chore: make sdk metrics snake case (#7547 )	2024-07-05 12:29:00 +02:00
Simon Hornby	30073d527a	feat: extended SDK metrics (#7527 ) This adds an extended metrics format to the metrics ingested by Unleash and sent by running SDKs in the wild. Notably, we don't store this information anywhere new in this PR, this is just streamed out to Victoria metrics - the point of this project is insight, not analysis. Two things to look out for in this PR: - I've chosen to take extend the registration event and also send that when we receive metrics. This means that the new data is received on startup and on heartbeat. This takes us in the direction of collapsing these two calls into one at a later point - I've wrapped the existing metrics events in some "type safety", it ain't much because we have 0 type safety on the event emitter so this also has some if checks that look funny in TS that actually check if the data shape is correct. Existing tests that check this are more or less preserved	2024-07-04 08:51:27 +02:00
Mateusz Kwasniewski	72de574012	feat: largest projects and features metric (#7459 )	2024-06-26 16:09:08 +02:00
Mateusz Kwasniewski	3a3b6a29ff	feat: lifecycle stage entered counter (#7449 )	2024-06-25 14:40:16 +02:00
Mateusz Kwasniewski	388fe2dbd3	fix: change lifecycle stage duration metric type (#7444 )	2024-06-25 12:42:43 +02:00
Mateusz Kwasniewski	c3fa468a9d	refactor: lifecycle stage duration outside instance stats (#7442 )	2024-06-25 11:22:26 +02:00
Mateusz Kwasniewski	6a9a2c687d	feat: stage count by project metric (#7441 )	2024-06-25 09:54:26 +02:00
Thomas Heartman	c4e2159401	chore: add metrics/gauges for "max constraint values" and "max constraints" (#7398 ) This PR adds metrics tracking for: - "maxConstraintValues": the highest number of constraint values that are in use - "maxConstraintsPerStrategy": the highest number of constraints used on a strategy It updates the existing feature strategy read model that returns max metrics for other strategy-related things. It also moves one test into a more fitting describe block.	2024-06-17 11:13:13 +02:00
Mateusz Kwasniewski	0c79b36b74	feat: Max strategies metrics (#7392 )	2024-06-14 09:20:43 +02:00
Jaanus Sellin	d17ae37800	feat: now CLIENT_METRICS event will be emitted with new structure (#7210 ) 1. CLIENT_METRICS event will be emitted with new structure 2. CLIENT_METRICS event will be emitted from bulkMetrics endpoint	2024-05-31 12:40:46 +03:00
Mateusz Kwasniewski	99403e481b	feat: add prometheus metrics error logging (#7105 )	2024-05-22 10:08:31 +02:00
Jaanus Sellin	2fb95339ef	chore: change toggle to flag #3 (#7101 )	2024-05-22 09:58:53 +03:00
Christopher Kolstad	8aa0616698	feat: expose postgres version (#7041 ) Adds a postgres_version gauge to allow us to see postgres_version in prometheus and to post it upstream when version checking. Depends on https://github.com/bricks-software/version-function/pull/20 to be merged first to ensure our version-function doesn't crash when given the postgres-version data.	2024-05-13 14:41:28 +02:00
Jaanus Sellin	958ccabb54	feat: lifecycle prometheus metrics per project (#7032 ) When we pushed metrics per feature, it had too many datapoints and grafana could not handle it. Now I am taking median for a project.	2024-05-10 15:24:27 +03:00
Jaanus Sellin	cd49ae2a26	feat: add project id to prometheus and feature flag (#7008 ) Now we are also sending project id to prometheus, also querying from database. This sets us up for grafana dashboard. Also put the metrics behind flag, just incase it causes cpu/memory issues.	2024-05-08 15:19:23 +03:00
Jaanus Sellin	02440dfed2	feat: duration in stage, add feature lifecycle prometheus metrics (#6973 ) Introduce a new concept. Duration in stage. Also add it into prometheus metric.	2024-05-08 11:33:51 +03:00
Gastón Fournier	0a2d40fb8b	feat: allow schedulers to run in a single node (#6794 ) ## About the changes This PR provides a service that allows a scheduled function to run in a single instance. It's currently not in use but tests show how to wrap a function to make it single-instance: `65b7080e05/src/lib/features/scheduler/job-service.test.ts (L26-L32)` The key `'test'` is used to identify the group and most likely should have the same name as the scheduled job. --------- Co-authored-by: Christopher Kolstad <chriswk@getunleash.io>	2024-04-10 11:47:22 +02:00
Thomas Heartman	cfd9e4894a	chore: Establish a baseline for the number of envs disabled per project (#6807 ) This PR adds a counter in Prometheus for counting the number of "environment disabled" events we get per project. The purpose of this is to establish a baseline for one of the "project management UI" project's key results. ## On gauges vs counters This PR uses a counter. Using a gauge would give you the total number of envs disabled, not the number of disable events. The difference is subtle, but important. For projects that were created before the new feature, the gauge might be appropriate. Because each disabled env would require at least one disabled event, we can get a floor of how many events were triggered for each project. However, for projects created after we introduce the planned change, we're not interested in the total envs anymore, because you can disable a hundred envs on creation with a single action. In this case, a gauge showing 100 disabled envs would be misleading, because it didn't take 100 events to disable them. So the interesting metric here is how many times did you specifically disable an environment in project settings, hence the counter. ## Assumptions and future plans To make this easier on ourselves, we make the follow assumption: people primarily disable envs when creating a project. This means that there might be a few lagging indicators granting some projects a smaller number of events than expected, but we may be able to filter those out. Further, if we had a metric for each project and its creation date, we could correlate that with the metrics to answer the question "how many envs do people disable in the first week? Two weeks? A month?". Or worded differently: after creating a project, how long does it take for people to configure environments? Similarly, if we gather that data, it will also make filtering out the number of events for projects created after the new changes have been released much easier. The good news: Because the project creation metric with dates is a static aggregate, it can be applied at any time, even retroactively, to see the effects.	2024-04-10 08:49:15 +02:00
Jaanus Sellin	d3847fd8ee	feat: collect prometheus data about archived features (#6728 )	2024-03-28 13:40:30 +02:00
Ivar Conradi Østhus	a6643e4721	Revert "fix: Add metrics for old proxy forward (#6695 )" This reverts commit `d065905e73`.	2024-03-26 14:13:18 +01:00
Ivar Conradi Østhus	d065905e73	fix: Add metrics for old proxy forward (#6695 ) This change adds a new prometheus counter to all us to capture when we automatically forward traffic from old /proxy paths to the /api/frontend path. ![image](https://github.com/Unleash/unleash/assets/158948/639a4ade-4758-41e6-b87b-a497f00313fa)	2024-03-26 12:25:15 +01:00
Christopher Kolstad	53354224fc	chore: Bump biome and configure husky (#6589 ) Upgrades biome to 1.6.1, and updates husky pre-commit hook. Most changes here are making type imports explicit.	2024-03-18 13:58:05 +01:00
Jaanus Sellin	2a57acca41	feat: start monitoring total time to update cache (#6517 )	2024-03-12 14:27:04 +02:00
Jaanus Sellin	b7915171ff	feat: start tracking operation duration (#6514 )	2024-03-12 12:30:30 +02:00
Mateusz Kwasniewski	bc83a4d66e	refactor: rename proxy to frontend api in openapi schemas (#6511 )	2024-03-12 10:15:24 +01:00
Nuno Góis	68729333e0	chore: rename incoming webhooks to signals (#6415 ) https://linear.app/unleash/issue/2-1994/ui-feature-rename-adapt-the-signals-ui https://linear.app/unleash/issue/2-1996/rename-feature-in-the-code-base Implements the feature rename to Signals by adapting the code base and UI.	2024-03-04 12:08:05 +00:00
David Leek	adb6f61015	chore: proxy repository load features metrics (#6314 ) ## About the changes - Adds createHistogram - Adds histogram metrics for proxy-repositorys loading features	2024-02-22 14:29:21 +01:00

1 2

94 Commits