As part of more telemetry on the usage of Unleash.
This PR adds a new `stat_` prefixed table as well as a trigger on the
events table trigger on each insert to increment a counter per
environment per day.
The trigger will trigger on every insert into the events base, but will
filter and only increment the counter for events that actually have the
environment set. (there are events, like user-created, that does not
relate to a specific environment).
Bit wary on this, but since we truncate down to row per (day,
environment) combo, finding conflict and incrementing shouldn't take too
long here.
@ivarconr was it something like this you were considering?
This PR cleans up and refactors the feature-strategy-store method
getFeatureOverview to join on the new table and attempts to make the
function more readable by extracting some of the logic into separate
functions. Keeping the LastSeenMapper for now in case there is a reason
to use it for the other endpoints.
Fixes an issue where SSO group sync would delete a syncable group that a
user was manually added to
## Discussion points
Is this the longterm fix for this? Or would we want another column in
the mapping table for future-proofing this?
## About the changes
This transactional implementation decorates a service with a
transactional method that removes the need to start transactions in the
method using the service.
This is a gradual rollout with a feature toggle, just because
transactions are not easy.
## About the changes
In our staging setup, we create ad-hoc environments and import Unleash
state from production. After unleash is deployed on such environment,
the import job kicks in and feeds Unleash instance with feature flags
from production.
Between Unleash being up and running and the import job running, some
applications start polling Unleash. They get an empty feature toggle
list with `meta.revisionId=0`. Then apps use this as part of `eTag`
header in subsequent requests. Even though after import Unleash server
finally has toggles to serve, it doesn't because it calculates _max
revision id_ based on toggle updates (not null `feature_name` column in
query) or `SEGMENT_UPDATED`.
This change adds an extra condition to query so feature toggles import
is considered something that should invalidate the cache.
This commit changes our linter/formatter to biome (https://biomejs.dev/)
Causing our prehook to run almost instantly, and our "yarn lint" task to
run in sub 100ms.
Some trade-offs:
* Biome isn't quite as well established as ESLint
* Are we ready to install a different vscode plugin (the biome plugin)
instead of the prettier plugin
The configuration set for biome also has a set of recommended rules,
this is turned on by default, in order to get to something that was
mergeable I have turned off a couple the rules we seemed to violate the
most, that we also explicitly told eslint to ignore.
## About the changes
Add partial index on events by announced. This should help avoid `Seq
Scan on events` when the majority of events are announced=true
---
Co-authored-by: Ivar Østhus <ivar@getunleash.io>
Co-authored-by: Gard Rimestad <gard@getunleash.io>
## About the changes
When the events table is large we might be doing a full table scan
searching for unannounced events. We spotted it due to a performance
alert and confirmed in AWS performance insights
![image](https://github.com/Unleash/unleash/assets/455064/8e815fa3-7a1b-4453-881a-98a148eae119)
The proposal is to limit this operation to 500 events (rule of thumb)
per round
f82ae354eb/src/lib/services/index.ts (L141-L147)
and also ignore the events older than a day (because it seems
reasonable)
## Discussion points
**Idea**: split the `events` table into `recent_events` and
`historical_events`. Recent can be anything from a day/week/month. This
would help with recurrent queries that rely on recent data from the
event's table such as optimal 304 calculation or event this scheduled
task that sends unannounced events.
## About the changes
This enables us to use names instead of permission ids across all our
APIs at the computational cost of searching for the ids in the DB but
improving the API user experience
## Open topics
We're using methods that are test-only and circumvent our business
logic. This makes our test to rely on assumptions that are not always
true because these assumptions are not validated frequently.
i.e. We are expecting that after removing a permission it's no longer
there, but to test this, the permission has to be there before:
78273e4ff3/src/test/e2e/services/access-service.e2e.test.ts (L367-L375)
But it seems that's not the case.
We'll look into improving this later.
## About the changes
- `getActiveUsers` is using multiple stores, so it is refactored into
read-model
- Refactored Instance stats service into `features` to co-locate related
code
Closes https://linear.app/unleash/issue/UNL-230/active-users-prometheus
### Important files
`src/lib/features/instance-stats/getActiveUsers.ts`
## Discussion points
`getActiveUsers` is coded less _class-based_ then previous similar
read-models. In one file instead of 3 (read-model interface, fake read
model, sql read model). I find types and functions way more readable,
but I'm ready to refactor it to interfaces and classes if consistency is
more important.
Seems like when 2 pods are trying to POST lastSeen metrics, the db gets
into a deadlock state.
This is an attempt to fix the deadlock by sorting the toggleNames before
the update.
The hypothesis is that sorted toggle names will reduce the chance of
working on the same row at the same exact time
Closes #
[1-1382](https://linear.app/unleash/issue/1-1382/order-data-before-updating-the-lastseen-to-reduce-change-of-deadlock)
Signed-off-by: andreas-unleash <andreas@getunleash.ai>
This change makes it so that you can save flag naming data on project
creation.
There was a mismatch between flattened and nested data in the creation
endpoint. Likely because we went back and forth a bit when we created
the feature originally.
This PR adds a feature naming pattern description to the project form.
It's rendered as a multi-line input field. The description is also
stored in the db.
This adapts most of @andreas-unleash's PR #4599 with some minor changes
(using description instead of prompt). Actually displaying this data to
the users will come in a later PR.
![image](https://github.com/Unleash/unleash/assets/17786332/b96d2dbb-2b90-4adf-bc83-cdc534c507ea)
Does what it says on the tin, should help with cleaning up
https://github.com/Unleash/unleash/pull/4512 and respective schema
changes.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
Adds a first iteration of feature flag naming patterns. Currently behind a flag.
Signed-off-by: andreas-unleash <andreas@getunleash.ai>
Co-authored-by: Thomas Heartman <thomas@getunleash.io>
Co-authored-by: andreas-unleash <andreas@getunleash.ai>
Co-authored-by: Thomas Heartman <thomas@getunleash.ai>
https://linear.app/unleash/issue/2-1128/change-the-api-to-support-adding-multiple-roles-to-a-usergroup-on-ahttps://linear.app/unleash/issue/2-1125/be-able-to-fetch-all-roles-for-a-user-in-a-projecthttps://linear.app/unleash/issue/2-1127/adapt-the-ui-to-be-able-to-do-a-multi-select-on-role-permissions-for
- Allows assigning project roles to groups with root roles
- Implements new methods that support assigning, editing, removing and
retrieving multiple project roles in project access, along with other
auxiliary methods
- Adds new events for updating and removing assigned roles
- Adapts `useProjectApi` to new methods that use new endpoints that
support multiple roles
- Adds the `multipleRoles` feature flag that controls the possibility of
selecting multiple roles on the UI
- Adapts `ProjectAccessAssign` to support multiple role, using the new
methods
- Adds a new `MultipleRoleSelect` component that allows you to select
multiple roles based on the `RoleSelect` component
- Adapts the `RoleCell` component to support either a single role or
multiple roles
- Updates the `access.spec.ts` Cypress e2e test to reflect our new logic
- Updates `access-service.e2e.test.ts` with tests covering the multiple
roles logic and covering some corner cases
- Updates `project-service.e2e.test.ts` to adapt to the new logic,
adding a test that covers adding access with `[roles], [groups],
[users]`
- Misc refactors and boy scouting
![image](https://github.com/Unleash/unleash/assets/14320932/d1cc7626-9387-4ab8-9860-cd293a0d4f62)
---------
Co-authored-by: David Leek <david@getunleash.io>
Co-authored-by: Mateusz Kwasniewski <kwasniewski.mateusz@gmail.com>
Co-authored-by: Nuno Góis <github@nunogois.com>
## About the changes
<!-- Describe the changes introduced. What are they and why are they
being introduced? Feel free to also add screenshots or steps to view the
changes if they're visual. -->
Adds projects user and group -usage information to the dialog shown when
user wants to delete a project role
<img width="670" alt="Skjermbilde 2023-08-10 kl 08 28 40"
src="https://github.com/Unleash/unleash/assets/707867/a1df961b-2d0f-419d-b9bf-fedef896a84e">
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
<!-- Thanks for creating a PR! To make it easier for reviewers and
everyone else to understand what your changes relate to, please add some
relevant content to the headings below. Feel free to ignore or delete
sections that you don't think are relevant. Thank you! ❤️ -->
## About the changes
<!-- Describe the changes introduced. What are they and why are they
being introduced? Feel free to also add screenshots or steps to view the
changes if they're visual. -->
Fixes an issue where project role deletion validation didn't validate
against project roles being connected to groups
https://linear.app/unleash/issue/2-1311/add-a-new-prometheus-metric-with-custom-root-roles-in-use
As a follow-up to https://github.com/Unleash/unleash/pull/4435, this PR
adds a metric for total custom root roles in use by at least one entity:
users, service accounts, groups.
`custom_root_roles_in_use_total`
Output from `http://localhost:4242/internal-backstage/prometheus`:
```
# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 0.060755
# HELP process_cpu_system_seconds_total Total system CPU time spent in seconds.
# TYPE process_cpu_system_seconds_total counter
process_cpu_system_seconds_total 0.01666
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.077415
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1691420275
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 199196672
# HELP nodejs_eventloop_lag_seconds Lag of event loop in seconds.
# TYPE nodejs_eventloop_lag_seconds gauge
nodejs_eventloop_lag_seconds 0
# HELP nodejs_eventloop_lag_min_seconds The minimum recorded event loop delay.
# TYPE nodejs_eventloop_lag_min_seconds gauge
nodejs_eventloop_lag_min_seconds 0.009076736
# HELP nodejs_eventloop_lag_max_seconds The maximum recorded event loop delay.
# TYPE nodejs_eventloop_lag_max_seconds gauge
nodejs_eventloop_lag_max_seconds 0.037683199
# HELP nodejs_eventloop_lag_mean_seconds The mean of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_mean_seconds gauge
nodejs_eventloop_lag_mean_seconds 0.011063251638989169
# HELP nodejs_eventloop_lag_stddev_seconds The standard deviation of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_stddev_seconds gauge
nodejs_eventloop_lag_stddev_seconds 0.0013618102764025837
# HELP nodejs_eventloop_lag_p50_seconds The 50th percentile of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_p50_seconds gauge
nodejs_eventloop_lag_p50_seconds 0.011051007
# HELP nodejs_eventloop_lag_p90_seconds The 90th percentile of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_p90_seconds gauge
nodejs_eventloop_lag_p90_seconds 0.011321343
# HELP nodejs_eventloop_lag_p99_seconds The 99th percentile of the recorded event loop delays.
# TYPE nodejs_eventloop_lag_p99_seconds gauge
nodejs_eventloop_lag_p99_seconds 0.013688831
# HELP nodejs_active_resources Number of active resources that are currently keeping the event loop alive, grouped by async resource type.
# TYPE nodejs_active_resources gauge
nodejs_active_resources{type="FSReqCallback"} 1
nodejs_active_resources{type="TTYWrap"} 3
nodejs_active_resources{type="TCPSocketWrap"} 5
nodejs_active_resources{type="TCPServerWrap"} 1
nodejs_active_resources{type="Timeout"} 1
nodejs_active_resources{type="Immediate"} 1
# HELP nodejs_active_resources_total Total number of active resources.
# TYPE nodejs_active_resources_total gauge
nodejs_active_resources_total 12
# HELP nodejs_active_handles Number of active libuv handles grouped by handle type. Every handle type is C++ class name.
# TYPE nodejs_active_handles gauge
nodejs_active_handles{type="WriteStream"} 2
nodejs_active_handles{type="ReadStream"} 1
nodejs_active_handles{type="Socket"} 5
nodejs_active_handles{type="Server"} 1
# HELP nodejs_active_handles_total Total number of active handles.
# TYPE nodejs_active_handles_total gauge
nodejs_active_handles_total 9
# HELP nodejs_active_requests Number of active libuv requests grouped by request type. Every request type is C++ class name.
# TYPE nodejs_active_requests gauge
nodejs_active_requests{type="FSReqCallback"} 1
# HELP nodejs_active_requests_total Total number of active requests.
# TYPE nodejs_active_requests_total gauge
nodejs_active_requests_total 1
# HELP nodejs_heap_size_total_bytes Process heap size from Node.js in bytes.
# TYPE nodejs_heap_size_total_bytes gauge
nodejs_heap_size_total_bytes 118587392
# HELP nodejs_heap_size_used_bytes Process heap size used from Node.js in bytes.
# TYPE nodejs_heap_size_used_bytes gauge
nodejs_heap_size_used_bytes 89642552
# HELP nodejs_external_memory_bytes Node.js external memory size in bytes.
# TYPE nodejs_external_memory_bytes gauge
nodejs_external_memory_bytes 1601594
# HELP nodejs_heap_space_size_total_bytes Process heap space size total from Node.js in bytes.
# TYPE nodejs_heap_space_size_total_bytes gauge
nodejs_heap_space_size_total_bytes{space="read_only"} 0
nodejs_heap_space_size_total_bytes{space="old"} 70139904
nodejs_heap_space_size_total_bytes{space="code"} 3588096
nodejs_heap_space_size_total_bytes{space="map"} 2899968
nodejs_heap_space_size_total_bytes{space="large_object"} 7258112
nodejs_heap_space_size_total_bytes{space="code_large_object"} 1146880
nodejs_heap_space_size_total_bytes{space="new_large_object"} 0
nodejs_heap_space_size_total_bytes{space="new"} 33554432
# HELP nodejs_heap_space_size_used_bytes Process heap space size used from Node.js in bytes.
# TYPE nodejs_heap_space_size_used_bytes gauge
nodejs_heap_space_size_used_bytes{space="read_only"} 0
nodejs_heap_space_size_used_bytes{space="old"} 66992120
nodejs_heap_space_size_used_bytes{space="code"} 2892640
nodejs_heap_space_size_used_bytes{space="map"} 2519280
nodejs_heap_space_size_used_bytes{space="large_object"} 7026824
nodejs_heap_space_size_used_bytes{space="code_large_object"} 983200
nodejs_heap_space_size_used_bytes{space="new_large_object"} 0
nodejs_heap_space_size_used_bytes{space="new"} 9236136
# HELP nodejs_heap_space_size_available_bytes Process heap space size available from Node.js in bytes.
# TYPE nodejs_heap_space_size_available_bytes gauge
nodejs_heap_space_size_available_bytes{space="read_only"} 0
nodejs_heap_space_size_available_bytes{space="old"} 1898360
nodejs_heap_space_size_available_bytes{space="code"} 7328
nodejs_heap_space_size_available_bytes{space="map"} 327888
nodejs_heap_space_size_available_bytes{space="large_object"} 0
nodejs_heap_space_size_available_bytes{space="code_large_object"} 0
nodejs_heap_space_size_available_bytes{space="new_large_object"} 16495616
nodejs_heap_space_size_available_bytes{space="new"} 7259480
# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v18.16.0",major="18",minor="16",patch="0"} 1
# HELP nodejs_gc_duration_seconds Garbage collection duration by kind, one of major, minor, incremental or weakcb.
# TYPE nodejs_gc_duration_seconds histogram
# HELP http_request_duration_milliseconds App response time
# TYPE http_request_duration_milliseconds summary
# HELP db_query_duration_seconds DB query duration time
# TYPE db_query_duration_seconds summary
db_query_duration_seconds{quantile="0.1",store="api-tokens",action="getAllActive"} 0.03091475
db_query_duration_seconds{quantile="0.5",store="api-tokens",action="getAllActive"} 0.03091475
db_query_duration_seconds{quantile="0.9",store="api-tokens",action="getAllActive"} 0.03091475
db_query_duration_seconds{quantile="0.95",store="api-tokens",action="getAllActive"} 0.03091475
db_query_duration_seconds{quantile="0.99",store="api-tokens",action="getAllActive"} 0.03091475
db_query_duration_seconds_sum{store="api-tokens",action="getAllActive"} 0.03091475
db_query_duration_seconds_count{store="api-tokens",action="getAllActive"} 1
# HELP feature_toggle_update_total Number of times a toggle has been updated. Environment label would be "n/a" when it is not available, e.g. when a feature toggle is created.
# TYPE feature_toggle_update_total counter
# HELP feature_toggle_usage_total Number of times a feature toggle has been used
# TYPE feature_toggle_usage_total counter
# HELP feature_toggles_total Number of feature toggles
# TYPE feature_toggles_total gauge
feature_toggles_total{version="5.3.0"} 31
# HELP users_total Number of users
# TYPE users_total gauge
users_total 1011
# HELP projects_total Number of projects
# TYPE projects_total gauge
projects_total 4
# HELP environments_total Number of environments
# TYPE environments_total gauge
environments_total 10
# HELP groups_total Number of groups
# TYPE groups_total gauge
groups_total 5
# HELP roles_total Number of roles
# TYPE roles_total gauge
roles_total 11
# HELP custom_root_roles_total Number of custom root roles
# TYPE custom_root_roles_total gauge
custom_root_roles_total 3
# HELP custom_root_roles_in_use_total Number of custom root roles in use
# TYPE custom_root_roles_in_use_total gauge
custom_root_roles_in_use_total 2
# HELP segments_total Number of segments
# TYPE segments_total gauge
segments_total 5
# HELP context_total Number of context
# TYPE context_total gauge
context_total 7
# HELP strategies_total Number of strategies
# TYPE strategies_total gauge
strategies_total 5
# HELP client_apps_total Number of registered client apps aggregated by range by last seen
# TYPE client_apps_total gauge
client_apps_total{range="allTime"} 0
client_apps_total{range="30d"} 0
client_apps_total{range="7d"} 0
# HELP saml_enabled Whether SAML is enabled
# TYPE saml_enabled gauge
saml_enabled 1
# HELP oidc_enabled Whether OIDC is enabled
# TYPE oidc_enabled gauge
oidc_enabled 0
# HELP client_sdk_versions Which sdk versions are being used
# TYPE client_sdk_versions counter
# HELP optimal_304_diffing Count the Optimal 304 diffing with status
# TYPE optimal_304_diffing counter
# HELP db_pool_min Minimum DB pool size
# TYPE db_pool_min gauge
db_pool_min 0
# HELP db_pool_max Maximum DB pool size
# TYPE db_pool_max gauge
db_pool_max 4
# HELP db_pool_free Current free connections in DB pool
# TYPE db_pool_free gauge
db_pool_free 0
# HELP db_pool_used Current connections in use in DB pool
# TYPE db_pool_used gauge
db_pool_used 4
# HELP db_pool_pending_creates how many asynchronous create calls are running in DB pool
# TYPE db_pool_pending_creates gauge
db_pool_pending_creates 0
# HELP db_pool_pending_acquires how many acquires are waiting for a resource to be released in DB pool
# TYPE db_pool_pending_acquires gauge
db_pool_pending_acquires 24
```
In a new fresh Unleash instance with cache enabled this can cause
feature toggles to never get updated.
We saw in our client that the ETag was ETag: "60e35fba:null" Which
looked incorrect for us.
I also did manual testing and if the andWhere had a value of largerThan
higher than whatever the id was then we would get back { max: null }.
This should fix that issue.
## About the changes
We are losing some events because of not having "created by" which is
required by a constraint in the DB. One scenario where this can happen
is with the default user admin, because it doesn't have a username or an
email (unless configured by the administrator).
Our code makes assumptions on the existence of one of these 2 attributes
(e.g.
248118af7c/src/lib/services/user-service.ts (L220-L222)).
Event lost metrics:
![Screenshot from 2023-07-24
14-17-20](https://github.com/Unleash/unleash/assets/455064/9b406ad0-bbcb-4263-98dc-74ddd307a5a2)
## Discussion points
The solution proposed here is falling back to a default value. I've
chosen `"admin"` because it covers one of the use cases, but it can also
be `"system"` mimicking
248118af7c/src/lib/services/user-service.ts (L32)
which is used as a default, or `"unknown"` which is sometimes used as a
default:
248118af7c/src/lib/services/project-service.ts (L57)
Anyway, I believe it's better not to lose the event rather than be
accurate with the "created by" that can be fixed later