In preparation for v6, this PR removes usage and references to
`error.description` instead favoring `error.message` (as mentioned
#4380)
I found no references in the front end, so this might be (I believe it
to be) all the required changes.
This PR is part of #4380 - Remove legacy `/api/feature` endpoint.
## About the changes
### Frontend
- Removes the useFeatures hook
- Removes the part of StrategyView that displays features using this
strategy (not been working since v4.4)
- Removes 2 unused features entries from routes
### Backend
- Removes the /api/admin/features endpoint
- Moves a couple of non-feature related tests (auth etc) to use
/admin/projects endpoint instead
- Removes a test that was directly related to the removed endpoint
- Moves a couple of tests to the projects/features endpoint
- Reworks some tests to fetch features from projects features endpoint
and strategies from project strategies
## About the changes
EdgeService is the only place where we use active tokens validation in
bulk. By switching to validating from the cache, we no longer need a
method to return all active tokens from the DB.
We talked with @nunogois that this test is testing migration from 2023,
but time has passed and the migration is working properly, so we think
to cut down on test run time, we can remove it.
Adds a postgres_version gauge to allow us to see postgres_version in
prometheus and to post it upstream when version checking. Depends on
https://github.com/bricks-software/version-function/pull/20 to be merged
first to ensure our version-function doesn't crash when given the
postgres-version data.
I've tried to use/add the audit info to all events I could see/find.
This makes this PR necessarily huge, because we do store quite a few
events.
I realise it might not be complete yet, but tests
run green, and I think we now have a pattern to follow for other events.
We encountered an issue with a customer because this query was returning
3 million rows. The problem arose from each instance reporting
approximately 100 features, with a total of 30,000 instances. The query
was joining these, thus multiplying the data. This approach was fine for
a reasonable number of instances, but in this extreme case, it did not
perform well.
This PR modifies the logic; instead of performing outright joins, we are
now grouping features by environment into an array, resulting in just
one row returned per instance.
I tested locally with the same dataset. Previously, loading this large
instance took about 21 seconds; now it has reduced to 2 seconds.
Although this is still significant, the dataset is extensive.
Previously, we were extracting the project from the token, but now we
will retrieve it from the session, which contains the full list of
projects.
This change also resolves an issue we encountered when the token was a
multi-project token, formatted as []:dev:token. Previously, it was
unable to display the exact list of projects. Now, it will show the
exact project names.
## About the changes
- Removes the feature flag for the created_by migrations.
- Adds a configuration option in IServerOption for
`ENABLE_SCHEDULED_CREATED_BY_MIGRATION` that defaults to `false`
- the new configuration option when set on startup enables scheduling of
the two created_by migration services (features+events)
- Removes the dependency on flag provider in EventStore as it's no
longer needed
- Adds a brief description of the new configuration option in
`configuring-unleash.md`
- Sets the events created_by migration interval to 15 minutes, up from
2.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
## About the changes
This PR establishes a simple yet effective mechanism to avoid DDoS
against our DB while also protecting against memory leaks.
This will enable us to release the flag `queryMissingTokens` to make our
token validation consistent across different nodes
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
The changes to arbitraries here is to make typescript agree with our
schema types. Seems like somewhere between 4.8.4 and 5.4.2, typescript
got stricter.
This column has not been used for 1.5 years and was replace by
**archived_at** column and people still get confused of why this is not
working as name suggests. Removing this column to remove technical debt.
## About the changes
Some of our metrics are not labeled correctly, one example is
`<base-path>/api/frontend/client/metrics` is labeled as
`/client/metrics`. We can see that in internal-backstage/prometheus:
![image](https://github.com/Unleash/unleash/assets/455064/0d8f1f40-8b5b-49d4-8a88-70b523e9be09)
This issue affects all endpoints that fail to validate the request body.
Also, endpoints that are rejected by the authorization-middleware or the
api-token-middleware are reported as `(hidden)`.
To gain more insights on our api usage but being protective of metrics
cardinality we're prefixing `(hidden)` with some well known base urls:
https://github.com/Unleash/unleash/pull/6400/files#diff-1ed998ca46ffc97c9c0d5d400bfd982dbffdb3004b78a230a8a38e7644eee9b6R17-R33
## How to reproduce:
Make an invalid call to metrics (e.g. stop set to null), then check
/internal-backstage/prometheus and find the 400 error. Expected to be at
`path="/api/client/metrics"` but will have `path=""`:
```shell
curl -H"Authorization: *:development.unleash-insecure-client-api-token" -H'Content-type: application/json' localhost:4242/api/client/metrics -d '{
"appName": "bash-test",
"instanceId": "application-name-dacb1234",
"environment": "development",
"bucket": {
"start": "2023-07-27T11:23:44Z",
"stop": null,
"toggles": {
"myCoolToggle": {
"yes": 25,
"no": 42,
"variants": {
"blue": 6,
"green": 15,
"red": 46
}
},
"myOtherToggle": {
"yes": 0,
"no": 100
}
}
}
}'
```
In order to stop privilege escalation via
`/api/admin/projects/:project/users/:userId/roles` and
`/api/admin/projects/:project/groups/:groupId/roles` this PR adds the
same check we added to setAccess methods to the methods updating access
for these two methods.
Also adds tests that verify that we throw an exception if you try to
assign roles you do not have.
Thank you @nunogois for spotting this during testing.
## About the changes
When edge is configured to automatically generate tokens, it requires
the token to be present in all unleash instances.
It's behind a flag which enables us to turn it on on a case by case
scenario.
The risk of this implementation is that we'd be adding load to the
database in the middleware that evaluates tokens (which are present in
mostly all our API calls. We only query when the token is missing but
because the /client and /frontend endpoints which will be the affected
ones are high throughput, we want to be extra careful to avoid DDoSing
ourselves
## Alternatives:
One alternative would be that we merge the two endpoints into one.
Currently, Edge does the following:
If the token is not valid, it tries to create a token using a service
account token and /api/admin/create-token endpoint. Then it uses the
token generated (which is returned from the prior endpoint) to query
/api/frontend. What if we could call /api/frontend with the same service
account we use to create the token? It may sound risky but if the same
application holding the service account token with permission to create
a token, can call /api/frontend via the generated token, shouldn't it be
able to call the endpoint directly?
The purpose of the token is authentication and authorization. With the
two tokens we are authenticating the same app with 2 different
authorization scopes, but because it's the same app we are
authenticating, can't we just use one token and assume that the app has
both scopes?
If the service account already has permissions to create a token and
then use that token for further actions, allowing it to directly call
/api/frontend does not necessarily introduce new security risks. The
only risk is allowing the app to generate new tokens. Which leads to the
third alternative: should we just remove this option from edge?
This PR adds a property issues to application schema, and also adds all
the missing features that have been reported by SDK, but do not exist in
Unleash.
In order to prevent users from being able to assign roles/permissions
they don't have, this PR adds a check that the user performing the
action either is Admin, Project owner or has the same role they are
trying to grant/add.
This addAccess method is only used from Enterprise, so there will be a
separate PR there, updating how we return the roles list for a user, so
that our frontend can only present the roles a user is actually allowed
to grant.
This adds the validation to the backend to ensure that even if the
frontend thinks we're allowed to add any role to any user here, the
backend can be smart enough to stop it.
We should still update frontend as well, so that it doesn't look like we
can add roles we won't be allowed to.
Fixes ##5799 and #5785
When you do not provide a token we should resolve to the "default"
environment to maintain backward compatibility. If you actually provide
a token we should prefer that and even block the request if it is not
valid.
An interesting fact is that "default" environment is not available on a
fresh installation of Unleash. This means that you need to provide a
token to actually get access to toggle configurations.
---------
Co-authored-by: Thomas Heartman <thomas@getunleash.io>
## About the changes
Implements a new store for collected traffic data usage that connects to
the new table `stat_traffic_data` primary key'd on [day, trafficGroup,
status_code_series].
Day being a date
Traffic group being which endpoint is being counted for, ie /api/admin,
/api/frontend etc
Status code series grouping 2xx status responses and 304 into their
respective 200 / 300 series.
No service here, this is for pro/enterprise
## About the changes
App stats is mainly used to cap the number of applications reported to
Unleash based on the last 7 days information:
cc2ccb1134/src/lib/middleware/response-time-metrics.ts (L24-L28)
Instead of getting all stats, just calculate appCount statistics
Use scheduler service instead of setInterval
## About the changes
getAllActive from api-tokens store is the second most frequent query
![image](https://github.com/Unleash/unleash/assets/455064/63c5ae76-bb62-41b2-95b4-82aca59a7c16)
To prevent starving our db connections, we can cache this data that
rarely changes and clear the cache when we see changes. Because we will
only clear changes in the node receiving the change we're only caching
the data for 1 minute.
This should give us some room to test if this solution will work
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
Adds a new Inactive Users list component to admin/users for easier cleanup of users that are counted as inactive: No sign of activity (logins or api token usage) in the last 180 days.
---------
Co-authored-by: David Leek <david@getunleash.io>
## About the changes
the created_by_user_id data migration from resolving events.created_by
(for both events and features) now emits events on how many rows were
updated.
Adds listeners for these events that records these metrics with
prometheus
![image](https://github.com/Unleash/unleash/assets/707867/3bb02645-0919-4a9a-83fe-a07383ac0be1)
## About the changes
Schedules a best-effort task setting the value of
events.created_by_user_id based on what is found in the created_by
column and if it's capable of resolving that to a userid/a system id.
The process is executed in the events-store, it takes a chunk of events
that haven't been processed yet, attempts to join users and api_tokens
tables on created_by = username/email, loops through and tries to figure
out an id to set. Then updates the record.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
## About the changes
Sets data migration of features and events created_by_user_id to
disabled by default
Map to promise and await all in created by user id migration for features
## About the changes
Adds a scheduled task that every 5 seconds updates 500 entries in the
features table setting `created_by_user_id`.
It does this by looking at the related event, checks created_by and
joins users table for match on username or email, and joins api_tokens
table on username matches. Then picks either a users id if set, or uses
-42 (admin token user)
Only triggers if there is any rows in client instances that have
sdk_version: unleash-edge with version < 17.0.0
The function that checks this memoizes the check for 10 minutes to avoid
scanning the client instances table too often.
## About the changes
Whenever we get a call from an admin token we want to associate it with
the [admin token
user](4d42093a07/src/lib/types/core.ts (L34-L41)).
This should give us the needed audit for this type of calls that
currently were lacking a user id (we only stored a string with the token
name in the event log).
We consciously decided not to use `id` as the property to prevent any
unforeseen side effects. The reason is that only `IUser` type has an id
and adding an id to `IApiUser` might lead to confusion.
Lots of work here, mostly because I didn't want to turn off the
`noImplicitAnyLet` lint. This PR tries its best to type all the untyped
lets biome complained about (Don't ask me how many hours that took or
how many lints that was >200...), which in the future will force test
authors to actually type their global variables setup in `beforeAll`.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
Was having some trouble running these migration tests locally due to
`dbm` not correctly picking up the passed in config. This fixes it by
setting the custom config property after it has been initialized, always
overriding any wrong values.
PS: I think I found the issue. `dbm` was prioritizing my `DATABASE_URL`
for some reason, as I started having issues when it was set, and stopped
having issues when I unset it.
I still think this is a good change, as it prevents similar
hard-to-debug issues in the future.
To help clarify this, running this locally:
- `export
DATABASE_URL=postgres://unleash_user:passord@localhost:5432/unleash`
- `yarn test dedupe-permissions`
Fails on `main`, but passes on this branch. For some reason the `dbm`
instance prioritizes whatever is set in `DATABASE_URL` instead of the
options that are passed in `getInstance`.
## About the changes
Migrations for:
- Adds column is_system to users
- Inserts unleash_system_user id -1337 to users
includes `is_system: false` in the activeUsers and activeAccounts where filter
Tested by running:
`
select * into users_pre_check from users where id > -1;
delete from users where id > -1;
`
before starting unleash, then inspecting users table after unleash has
started and verifying that an 'admin' user has been created.
---------
Co-authored-by: Christopher Kolstad <chriswk@getunleash.ai>
## About the changes
Adds the new nullable column created_by_user_id to the data used by
feature-tag-store and feature-tag-service. Also updates openapi schemas.