## About the changes
This PR establishes a simple yet effective mechanism to avoid DDoS
against our DB while also protecting against memory leaks.
This will enable us to release the flag `queryMissingTokens` to make our
token validation consistent across different nodes
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
Converts `newContextFieldUI` release flag to
`disableShowContextFieldSelectionValues` kill switch.
The kill switch controls whether we show the value selection above the
search filed when > 100 values
---------
Signed-off-by: andreas-unleash <andreas@getunleash.ai>
Adds a bearer token middleware that adds support for tokens prefixed
with "Bearer" scheme. Prefixing with "Bearer" is optional and the old
way of authenticating still works, so we now support both ways.
Also, added as part of our OpenAPI spec which now displays authorization
as follows:
![image](https://github.com/Unleash/unleash/assets/455064/77b17342-2315-4c08-bf34-4655e12a1cc3)
Related to #4630. Doesn't fully close the issue as we're still using
some invalid characters for the RFC, in particular `*` and `[]`
For safety reasons this is behind a feature flag
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
This change fixes the OpenAPI schema to disallow non-string properties
on the top level of the context (except, of course, the `properties`
object).
This means that we'll no longer be seeing issues with rendering
invalid contexts, because we don't accept them in the first place.
This solution comes with some tradeoffs discussed in the [PR](https://github.com/Unleash/unleash/pull/6676). Following on from that, this solution isn't optimal, but it's a good stop gap. A better solution (proposed in the PR discussion) has been added as an idea for future projects.
The bulk of the discussion around the solution is included here for reference:
@kwasniew:
Was it possible to pass non string properties with our UI before?
Is there a chance that something will break after this change?
@thomasheartman:
Good question and good looking out 😄
You **could** pass non-string, top-level properties into the API before. In other words, this would be allowed:
```js
{
appName: "my-app",
nested: { object: "accepted" }
}
```
But notably, non-string values under `properties` would **not** be accepted:
```js
{
appName: "my-app",
properties: {
nested: { object: "not accepted" }
}
}
```
**However**, the values would not contribute to the evaluation of any constraints (because their type is invalid), so they would effectively be ignored.
Now, however, you'll instead get a 400 saying that the "nested" value must be a string.
I would consider this a bug fix because:
- if you sent a nested object before, it was most likely an oversight
- if you sent the nested object on purpose, expecting it to work, you would be perplexed as to why it didn't work, as the API accepted it happily
Furthermore, the UI will also tell you that the property must be a string now if you try to do it from the UI.
On the other hand, this does mean that while you could send absolute garbage in before and we would just ignore it, we don't do that anymore. This does go against how we allow you to send anything for pretty much all other objects in our API.
However, the SDK context is special. Arbitrary keys aren't ignored, they're actually part of the context itself and as such should have a valid value.
So if anything breaks, I think it breaks in a way that tells you why something wasn't working before. However, I'd love to hear your take on it and we can re-evaluate whether this is the right fix, if you think it isn't.
@kwasniew:
Coming from the https://en.wikipedia.org/wiki/Robustness_principle mindset I'm thinking if ignoring the fields that are incorrect wouldn't be a better option. So we'd accept incorrect value and drop it instead of:
* failing with client error (as this PR) or
* saving incorrect value (as previous code we had)
@thomasheartman:
Yeah, I considered that too. In fact, that was my initial idea (for the reason you stated). However, there's a couple tradeoffs here (as always):
1. If we just ignore those values, the end user doesn't know what's happened unless they go and dig through the responses. And even then, they don't necessarily know why the value is gone.
2. As mentioned, for the context, arbitrary keys can't be ignored, because we use them to build the context. In other words, they're actually invalid input.
Now, I agree that you should be liberal in what you accept and try to handle things gracefully, but that means you need to have a sensible default to fall back to. Or, to quote the Wikipedia article (selectively; with added emphasis):
> programs that receive messages should accept non-conformant input **as long as the meaning is clear**.
In this case, the meaning isn't clear when you send extra context values that aren't strings.
For instance, what's the meaning here:
```js
{
appName: "my-app",
nested: { object: "accepted", more: { further: "nesting" } }
}
```
If you were trying to use the `nested` value as an object, then that won't work. Ideally, you should be alerted.
Should we "unwind" the object and add all string keys as context values? That doesn't sound very feasible **or** necessarily like the right thing.
Did you just intend to use the `appName` and for the `nested` object to be ignored?
And it's because of this caveat that I'm not convinced just ignoring the keys are the right thing to do. Because if you do, the user never knows they were ignored or why.
----
**However**, I'd be in favor of ignoring they keys if we could **also** give the users warnings at the same time. (Something like what we do in the CR api, right? Success with warnings?)
If we can tell the user that "we ignored the `a`, `b`, and `c` keys in the context you sent because they are invalid values. Here is the result of the evaluation without taking those keys into account: [...]", then I think that's the ideal solution.
But of course, the tradeoff is that that increases the complexity of the API and the complexity of the task. It also requires UI adjustments etc. This means that it's not a simple fix anymore, but more of a mini-project.
But, in the spirit of the playground, I think it would be a worthwhile thing to do because it helps people learn and understand how Unleash works.
Via the API you can currently create gradualRollout strategies without
any parameters set, when visiting the UI afterwards, you can edit this,
because the UI reads the parameter list from the database and sees that
some parameters are required, and refuses to accept the data. This PR
adds defaults for gradualRollout strategies created from the API, making
sure gradual rollout strategies always have `rollout`, `groupId` and
`stickiness` set.
Provides store method for retrieving traffic usage data based on
period parameter, and UI + ui hook with the new chart for displaying
traffic usage data spread out over selectable month.
![Skjermbilde 2024-03-21 kl 12 40
38](https://github.com/Unleash/unleash/assets/707867/539c6c98-b6f6-488a-97fb-baf4fccec687)
In this PR we copied and adapted a plugin written by DX for highlighting
a column in the chart:
![image](https://github.com/Unleash/unleash/assets/707867/70532b22-44ed-44c0-a9b4-75f65ed6a63d)
There are some minor improvements planned which will come in a separate
PR, reversing the order in legend and tooltip so the colors go from
light to dark, and adding a month -sum below the legend
## Discussion points
- Should any of this be extracted as a separate reusable component?
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
## About the changes
There seems to be a typo in the authorization header. We're keeping the
old typo as preferred just in case, but if not present we'll default to
the authorization header (not authorisation).
Not sure about the impact of this bug, as all registrations might be
using default project.
The changes to arbitraries here is to make typescript agree with our
schema types. Seems like somewhere between 4.8.4 and 5.4.2, typescript
got stricter.
## About the changes
We see some logs with: `Failed to store events: Error: The query is
empty` which suggests we're not sending events to batchStore. This will
help us confirm that and will give us better insights
An entrypoint for enterprise to register a hook which will be called
before the database and scheduler services are torn down. That way
enterprise can also perform graceful shutdown.
There was no need to join the entire metrics table, as it is a huge
table. We only needed all combinations of app_name, environment, and
feature_name. The new query retrieves all this data, which will then be
joined into the main query.
To check that users do indeed have permissions to update the roles from
project-service, we've been depending on req.user.id.
We had one error on Friday March 8th, where we managed to send
undefined/null to a method that requires a number. This PR assumes that
if we have an API token, and we have admin permissions and userId is not
set we're a legacy admin token.
It uses the util method for extractUserId(req: IAuthRequest | IApiRequest), so if we've passed through the apiTokenMiddleware first, we'll have userId -42, if we haven't, we'll get -1337.
Now frontend API requests will be counted separately under
getAllByfrontend. We are already tracking new FE db calls, so we can
build grafana dashboard.
https://linear.app/unleash/issue/2-2022/improve-actions-validation
Improves our current actions form validation.
Empty actions are now ignored on the payload and we get errors in
actions where any of the required fields are empty.
Also refactored our current actions into a constant map that can be
shared across frontend and backend.
## About the changes
This ports the CRUD store into OSS which is an abstraction to reduce the
amount of boilerplate code we have to write in stores.
By extending CRUDStore, the store becomes simply the type definition:
```typescript
type ActionModel = {
actionSetId: number;
action: string;
executionParams: Record<string, unknown>;
createdByUserId: number;
sortOrder: number;
};
export class ActionStore extends CRUDStore<
ActionModel & { id: number; createdAt: Date },
ActionModel
> {
}
```
And eventually specific mappings between those types can be provided (if
the mapping is more complex than camelCase -> snake_case):
```typescript
toRow: ({ project, name, actor, match, createdByUserId }) => ({
created_by_user_id: createdByUserId,
project,
name,
actor_id: actor,
source: match.source,
source_id: match.sourceId,
payload: match.payload,
}),
fromRow: ({
id,
created_at,
created_by_user_id,
project,
name,
actor_id,
source,
source_id,
payload,
}) => ({
id,
createdAt: created_at,
createdByUserId: created_by_user_id,
project,
name,
actor: actor_id,
match: {
source,
sourceId: source_id,
payload,
},
}),
```
Stores can also be extended to include additional functionality in case
you need to join with another table or do an aggregation, but it
significantly reduces the amount of boilerplate code needed to create a
basic store
We should not try to release the migration lock if where unable to
acquire it. By trying to close it when we have not successfully
connected to the database we end up hanging for a while before the
process is eventually killed.
I did not add a better error-message, as Unleash now gives a better
error stack and crashes immediate if you start without a database
password. We should still consider if you need to specify db credentials
or not. Technically it is possible to have a postgres without a password
(but it is likely not common).
Closes: #6408
## About the changes
We don't have a meaningful error for limits established by the
application. This could be a good starting point.
The error code is 400 cause I couldn't find anything better.
The name of the error was picked from UnleashApiErrorTypes:
2d8e9f87ff/src/lib/error/unleash-error.ts (L4-L34)
This column has not been used for 1.5 years and was replace by
**archived_at** column and people still get confused of why this is not
working as name suggests. Removing this column to remove technical debt.
## About the changes
Some of our metrics are not labeled correctly, one example is
`<base-path>/api/frontend/client/metrics` is labeled as
`/client/metrics`. We can see that in internal-backstage/prometheus:
![image](https://github.com/Unleash/unleash/assets/455064/0d8f1f40-8b5b-49d4-8a88-70b523e9be09)
This issue affects all endpoints that fail to validate the request body.
Also, endpoints that are rejected by the authorization-middleware or the
api-token-middleware are reported as `(hidden)`.
To gain more insights on our api usage but being protective of metrics
cardinality we're prefixing `(hidden)` with some well known base urls:
https://github.com/Unleash/unleash/pull/6400/files#diff-1ed998ca46ffc97c9c0d5d400bfd982dbffdb3004b78a230a8a38e7644eee9b6R17-R33
## How to reproduce:
Make an invalid call to metrics (e.g. stop set to null), then check
/internal-backstage/prometheus and find the 400 error. Expected to be at
`path="/api/client/metrics"` but will have `path=""`:
```shell
curl -H"Authorization: *:development.unleash-insecure-client-api-token" -H'Content-type: application/json' localhost:4242/api/client/metrics -d '{
"appName": "bash-test",
"instanceId": "application-name-dacb1234",
"environment": "development",
"bucket": {
"start": "2023-07-27T11:23:44Z",
"stop": null,
"toggles": {
"myCoolToggle": {
"yes": 25,
"no": 42,
"variants": {
"blue": 6,
"green": 15,
"red": 46
}
},
"myOtherToggle": {
"yes": 0,
"no": 100
}
}
}
}'
```
SCIM synchronizations requires a stable id no matter how many changes
are made to username and email (our other unique fields). In addition,
exposing internal incremented database ids to an external service (our
current id field) feels insecure. Our plan is to create either a uuidv7
or ulid when scim operations are performed against the user, so the
external scim provisioner has a stable globally unique id to use to
refer to the users they're modifying.
In order to stop privilege escalation via
`/api/admin/projects/:project/users/:userId/roles` and
`/api/admin/projects/:project/groups/:groupId/roles` this PR adds the
same check we added to setAccess methods to the methods updating access
for these two methods.
Also adds tests that verify that we throw an exception if you try to
assign roles you do not have.
Thank you @nunogois for spotting this during testing.
So, since our assumption about client instances ended up being wrong (or, less than stable).
This PR moves the EdgeUpgradeBanner to be displayed if the featureflag
displayEdgeBanner is enabled. That way, if customers comes back and says
they have upgraded but still get the banner, we can remove them from the
segment.
## About the changes
When edge is configured to automatically generate tokens, it requires
the token to be present in all unleash instances.
It's behind a flag which enables us to turn it on on a case by case
scenario.
The risk of this implementation is that we'd be adding load to the
database in the middleware that evaluates tokens (which are present in
mostly all our API calls. We only query when the token is missing but
because the /client and /frontend endpoints which will be the affected
ones are high throughput, we want to be extra careful to avoid DDoSing
ourselves
## Alternatives:
One alternative would be that we merge the two endpoints into one.
Currently, Edge does the following:
If the token is not valid, it tries to create a token using a service
account token and /api/admin/create-token endpoint. Then it uses the
token generated (which is returned from the prior endpoint) to query
/api/frontend. What if we could call /api/frontend with the same service
account we use to create the token? It may sound risky but if the same
application holding the service account token with permission to create
a token, can call /api/frontend via the generated token, shouldn't it be
able to call the endpoint directly?
The purpose of the token is authentication and authorization. With the
two tokens we are authenticating the same app with 2 different
authorization scopes, but because it's the same app we are
authenticating, can't we just use one token and assume that the app has
both scopes?
If the service account already has permissions to create a token and
then use that token for further actions, allowing it to directly call
/api/frontend does not necessarily introduce new security risks. The
only risk is allowing the app to generate new tokens. Which leads to the
third alternative: should we just remove this option from edge?
Since we're polling for updates to max revision id every second, and
listening for update events for revision id in the proxy repository then
running a refresh interval of 20secs in the proxy repo refresh seems
excessive.
This PR changes the frequency of the refresh to once per 45mins.
This PR adds a property issues to application schema, and also adds all
the missing features that have been reported by SDK, but do not exist in
Unleash.
Adds a migration to create and fill the `project_metrics_summary_trends`
This table is to be used in enterprise to update the metrics data daily
per project (after the aggregation of the hourly data)
Driving force for this was doing performance testing on the executive
dashboard.
This will allow to remove the expensive query to aggregate the data on
demand and will drop the total response time from 2.5sec to 125ms
(measurements done with 100 Projects, 10000 features and over 1M rows in
`client_metrics_env_daily`
Closes #
[1-2080](https://linear.app/unleash/issue/1-2080/create-the-project-metrics-summary-trends-table)
---------
Signed-off-by: andreas-unleash <andreas@getunleash.ai>
Follow up to: https://github.com/Unleash/unleash/pull/6298
We no longer need this table, since it was superseded by `action_events`
and is no longer used.
I believe it's safe to drop this table right away since the feature is
not being used yet.
https://linear.app/unleash/issue/2-1962/implement-new-action-events-logic
Adds a new `action_set_events` table for the new action events logic.
Even though observable events are technically immutable, we're storing
the observable event along with the action set event. This allows us to
avoid 1 join while allowing us to persist action set event information
after deleting observable events, if we wish to do so at a later stage.
In order to prevent users from being able to assign roles/permissions
they don't have, this PR adds a check that the user performing the
action either is Admin, Project owner or has the same role they are
trying to grant/add.
This addAccess method is only used from Enterprise, so there will be a
separate PR there, updating how we return the roles list for a user, so
that our frontend can only present the roles a user is actually allowed
to grant.
This adds the validation to the backend to ensure that even if the
frontend thinks we're allowed to add any role to any user here, the
backend can be smart enough to stop it.
We should still update frontend as well, so that it doesn't look like we
can add roles we won't be allowed to.
## About the changes
Our frontend API creates new instances of unleash-client-proxy. Because
this is by-design, we don't want to log a warning that was designed to
warn users about potential misconfiguration of Unleash Proxy.
As an extra, I'm renaming ProxyController to FrontendAPIController to
better reflect the intent of this controller.
## About the changes
This is a rough initial version as a PoC for a permission matrix.
This is only available after enabling the flag `userAccessUIEnabled`
that is set to true by default in local development.
The access was added to the users' admin page but could be embedded in
different contexts (e.g. when assigning a role to a user):
![image](https://github.com/Unleash/unleash/assets/455064/3f541f46-99bb-409b-a0fe-13f5d3f9572a)
This is how the matrix looks like
![screencapture-localhost-3000-admin-users-3-access-2024-02-13-12_15_44](https://github.com/Unleash/unleash/assets/455064/183deeb6-a0dc-470f-924c-f435c6196407)
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
Fixes ##5799 and #5785
When you do not provide a token we should resolve to the "default"
environment to maintain backward compatibility. If you actually provide
a token we should prefer that and even block the request if it is not
valid.
An interesting fact is that "default" environment is not available on a
fresh installation of Unleash. This means that you need to provide a
token to actually get access to toggle configurations.
---------
Co-authored-by: Thomas Heartman <thomas@getunleash.io>
Created a build script that generates orval schemas with automatic
cleanup. Also generating new ones.
1. yarn gen:api **(generates schemas)**
2. rm -rf src/openapi/apis **(remove apis)**
3. sed -i '1q' src/openapi/index.ts **(remove all rows except first)**
This change takes the (now rather involved) type used to send CR
schedule suspension emails and extracts it into a proper exported type.
This will allow us to import it in enterprise as well instead of
redefining it.
## About the changes
Implements a new store for collected traffic data usage that connects to
the new table `stat_traffic_data` primary key'd on [day, trafficGroup,
status_code_series].
Day being a date
Traffic group being which endpoint is being counted for, ie /api/admin,
/api/frontend etc
Status code series grouping 2xx status responses and 304 into their
respective 200 / 300 series.
No service here, this is for pro/enterprise
This PR adds an endpoint to Unleash that accepts an error message and
option error stack and logs it as an error. This allows us to leverage
errors in logs observability to catch UI errors consistently.
Considered a test, but this endpoint only accepts and logs input, so I'm
not sure how useful it would be.
## About the changes
Adds migration for creating table `stat_traffic_usage`.
This table primary-keys on day, traffic_group, and status_code_series.
Adds individual indexes for day, traffic_group, and status_code_series.
Traffic group is the grouping for API endpoints for which traffic is
counted.
status_code_series is 200/202 etc = 200, 304 etc = 300
## About the changes
App stats is mainly used to cap the number of applications reported to
Unleash based on the last 7 days information:
cc2ccb1134/src/lib/middleware/response-time-metrics.ts (L24-L28)
Instead of getting all stats, just calculate appCount statistics
Use scheduler service instead of setInterval
On all pods and instances, we run the same revision update query every
second. It is relatively fast when the application has started. This is
the single most ran query in unleash.
Benchmarks:
1. Running pod with existing revisionID:
- old 5.5ms
- new 0.028ms
2. New pod without existing revisionID
- old 9.329ms
- new 0.033ms
This query is getting optimized
7e66a79f9f/src/lib/features/events/event-store.ts (L161)
We have customers with tens or hundreds of thousands of applications,
and we have a scheduler running that sets application fields to
`announced` as true. However, every time it runs, it queries the entire
table, which is slow and causes database connection acquisition issues.
To make it faster, we added a partial index to the table.
This PR updates the change request email sending method to handle the
recent changes we have made. That means that the email now:
- says that change requests have been suspended instead of saying that
application will fail.
- handles cases where segments or strategies have been updated causing
potential conflicts.
I have updated the email templates and made some adjustments to the
email sending method. To make the transition from one to the other
easier, I have kept the original method as an interim solution until
enterprise has switched over.
## About the changes
getAllActive from api-tokens store is the second most frequent query
![image](https://github.com/Unleash/unleash/assets/455064/63c5ae76-bb62-41b2-95b4-82aca59a7c16)
To prevent starving our db connections, we can cache this data that
rarely changes and clear the cache when we see changes. Because we will
only clear changes in the node receiving the change we're only caching
the data for 1 minute.
This should give us some room to test if this solution will work
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
Adds a new Inactive Users list component to admin/users for easier cleanup of users that are counted as inactive: No sign of activity (logins or api token usage) in the last 180 days.
---------
Co-authored-by: David Leek <david@getunleash.io>
In the beginning we used process.nextTick() as a trick to load some data
initally in the constructor of a service.
This is a bad pattern and we should generally avoid any async operations
in the constructor. Today we have two alternatives:
1. Defer loading until data is needed (wrap it in async)
2. Use the schdule-service.
When a stop signal is sent to Unleash the scheduler-service should
cancel any scheduled jobs. This also applies to the job scheduled for
initial execution with jitter.
We observed that initial jobs was executed after the database
connections are terminated. This appears after v5.9.0 of Unleash.
```
Error: aborted
at Object.queryBuilder (/unleash/node_modules/knex/lib/knex-builder/make-knex.js:112:26)
at createQueryBuilder (/unleash/node_modules/knex/lib/knex-builder/make-knex.js:320:26)
at EventStore.knex [as db] (/unleash/node_modules/knex/lib/knex-builder/make-knex.js:101:12)
at EventStore.setUnannouncedToAnnounced (/unleash/node_modules/unleash-server/dist/lib/features/events/event-store.js:286:33)
at EventStore.publishUnannouncedEvents (/unleash/node_modules/unleash-server/dist/lib/features/events/event-store.js:293:35)
at EventAnnouncer.publishUnannouncedEvents (/unleash/node_modules/unleash-server/dist/lib/services/event-announcer-service.js:9:32)
at runScheduledFunctionWithEvent (/unleash/node_modules/unleash-server/dist/lib/features/scheduler/scheduler-service.js:30:23)
at Timeout.<anonymous> (/unleash/node_modules/unleash-server/dist/lib/features/scheduler/scheduler-service.js:50:27)
at runNextTicks (node:internal/process/task_queues:60:5)
at process.processTimers (node:internal/timers:509:9)
```
## About the changes
Queries on client-feature-toggle store have many purposes depending on
the requestType, making the query more complex or not depending on the
use case. Also, each use case has different frequencies (i.e. playground
is expected to be used rarely).
The name for the store metrics was wrong, copy&pasted from:
7b04db0547/src/lib/features/feature-toggle/feature-toggle-store.ts (L107)
Which was also present in feature-tag metrics:
7b04db0547/src/lib/db/feature-tag-store.ts (L37)
With this, we'll have more granularity to understand the execution time
and frequency of each
Usually maintenance mode is disabled. If the call throws, which we see a
lot of when a unleash instance is in terminating state, we should return
a default value.
By having it throw inside of the memoizee function, the response is not
cached, and it will trigger new calls until it return a cachable result.
## About the changes
Every schedule job will now check if maintenance is enabled. This ends
up querying the settings table in the db at least once per second per
running unleash instance. This small fix caches this query for 60
seconds to reduce the load somewhat.
We should reconsider this solution for the long term, but this will be a
great improvement on the short term.
**Logs after this fix running locally.**
We can observe that we resolve settings from the DB once per minute.
![image](https://github.com/Unleash/unleash/assets/158948/c313cf38-8d86-4b86-a0ba-4f4df60d50d6)
Also we should consider giving a warning in section where you enable
maintenance mode that it can take up to a minute to propagate.
## About the changes
the created_by_user_id data migration from resolving events.created_by
(for both events and features) now emits events on how many rows were
updated.
Adds listeners for these events that records these metrics with
prometheus
![image](https://github.com/Unleash/unleash/assets/707867/3bb02645-0919-4a9a-83fe-a07383ac0be1)
Improving the email template design
![image](https://github.com/Unleash/unleash/assets/103567375/0c62c1de-6d13-42b8-9898-4567be6ff2aa)
- @andreas-unleash i need you to fix the button for the change request
in case it's not correct like this
- also removing some leftover style from the "scheduled change conflict"
email
---------
Co-authored-by: andreas-unleash <andreas@getunleash.ai>
We were sending `user.id` to the service, but if an admin token is used,
there is no `user.id.` Instead, there is
`user.internalAdminTokenUserId`. so we need to use the special method
`extractUserIdFromUser`.
This PR adds this implementation, and now the service correctly
retrieves the appropriate ID for admins.
Related to: https://github.com/Unleash/unleash/pull/5924
## About the changes
Schedules a best-effort task setting the value of
events.created_by_user_id based on what is found in the created_by
column and if it's capable of resolving that to a userid/a system id.
The process is executed in the events-store, it takes a chunk of events
that haven't been processed yet, attempts to join users and api_tokens
tables on created_by = username/email, loops through and tries to figure
out an id to set. Then updates the record.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
## About the changes
Change the sorting of features to migrate created_by_user_id for, and
filter out unresolvable feature/users
Query tested manually in enterprise
## About the changes
Resets created_by_user_id on events incorrectly marked as -1337 when an actual user has been set in created_by column, to clean up after a bug
## About the changes
Sets data migration of features and events created_by_user_id to
disabled by default
Map to promise and await all in created by user id migration for features
This escape with `??` double escaped the LIKE query causing no results.
This updates to using whereLike, which does the correct escaping for
string query.
## About the changes
Adds a scheduled task that every 5 seconds updates 500 entries in the
features table setting `created_by_user_id`.
It does this by looking at the related event, checks created_by and
joins users table for match on username or email, and joins api_tokens
table on username matches. Then picks either a users id if set, or uses
-42 (admin token user)
## About the changes
This PR replaces the old systemUser -1 in user-service.ts with the new
SYSTEM_USER -1337 and adds a migration to move events created_by = -1 to
-1337
## Discussion points
Does it make sense to do both of these things? Or should we skip the
migration? How would this behave in a large system with hundreds of
thousands of events, should this be split up?
Only triggers if there is any rows in client instances that have
sdk_version: unleash-edge with version < 17.0.0
The function that checks this memoizes the check for 10 minutes to avoid
scanning the client instances table too often.
Previously we used a killswitch and returned 404 if the feature was
enabled. This flips that to a default disabled toggle, that has to be
turned on to handle old Edge (pre 17.0.0) posting bulk metrics
We should use the enhanced flagResolver
Tested locally:
```
9:44:13 AM - Starting compilation in watch mode...
[dev:backend]
[dev:backend]
[dev:backend] 9:44:26 AM - Found 0 errors. Watching for file changes.
[dev:backend] [2024-01-23T09:44:27.498] [INFO] server-impl.js - DB migration: start
[dev:backend] [2024-01-23T09:44:27.499] [INFO] server-impl.js - Running migration with lock
[dev:backend] [2024-01-23T09:44:29.884] [INFO] server-impl.js - DB migration: end
```
This PR will allow us to use a feature flag with variants to control
whether or not we should show the comments field of the feedback form.
This will allow us to see whether we can increase feedback collection if
we reduce the load on the customer.
## About the changes
This was spotted while testing automated actions. Steps to reproduce:
1. Add an editor user
2. Get a PAT for the editor user
3. As Admin create a feature in a project where the editor user is not a
member and enable the feature
4. Try using the editor's PAT to modify the feature
5. As the editor create a project (you'd be made owner) and try the same
request but just change the project name for the new project just
created (don't change anything else)
**Expected behavior**: you can't disable the feature
**Actual behavior**: the feature is disabled
This does not happen when trying to turn on a flag because during the
turn-on process we do validate if the feature belongs to project when we
call updateStrategy:
c18a7c0dc2/src/lib/features/feature-toggle/feature-toggle-service.ts (L1751-L1764)
https://linear.app/unleash/issue/2-1856/add-typesafe-wrappers-over-prom-clients-metrics
As discussed on the latest knowledge sharing session, this adds typesafe
wrappers over prom client's metrics, requiring us to specify all the
configured labels for each metric.
This uses a functional approach and only exposes the methods that are
currently relevant to us, while also exposing the underlying instance of
the metric for an easy access if needed.
Since we often chain `labels` with `inc` in counters, this adds a
convenience `increment` method for counters which does both in a single
call.
Uses a new `URL_SAFE_BASIC` regex constant that checks for characters
that are commonly used in URL path sections: alphanumeric lowercase
characters, dashes and underscores.
This will allow us to re-use this constant in our server-side
validation.
Follow up of https://github.com/Unleash/unleash/issues/4303
We are adding primary keys to all tables missing them, currently
**role_permission**, **api_token_project**, and **project_stats**.
By adding primary keys, the issue with migrations failing during
upgrades in replicated database setups will be resolved.
So, this was causing a lot of ERROR in our logs, due to the metric
having gotten an extra label the last month.
Two things for this fix.
1. add the missing label to the two calls that did not have it added
2. update the log line to include the error as another argument to the
logger, so we actually get a stacktrace from the error.
### What
Adds Read and Write permissions for project administration settings
(user access, change request settings, default strategy, other).
### Why
On request from two large customers that wanted our RBAC controls to be
more granulated to easier be able to limit the access they granted their
users.
## About the changes
Whenever we get a call from an admin token we want to associate it with
the [admin token
user](4d42093a07/src/lib/types/core.ts (L34-L41)).
This should give us the needed audit for this type of calls that
currently were lacking a user id (we only stored a string with the token
name in the event log).
We consciously decided not to use `id` as the property to prevent any
unforeseen side effects. The reason is that only `IUser` type has an id
and adding an id to `IApiUser` might lead to confusion.
Since we've now added PAT's we really do recommend switching to those,
or for enterprises, we recommend using service accounts.
Admin tokens have an obvious disadvantage in that they're not connected
to any user, so actions performed by them are harder to audit.
This PR adds a killswitch for turning it off, in preparation for
deprecating them and ultimately removing them in the future.
## About the changes
This admin token user will help us differentiate actions performed by
the system from actions performed with an admin token.
Events created with an admin token should have the id of this user as
createdByUserId property and the username of the token used as the
createdBy property. i.e.
```json
{
"id": 11,
"type": "pat-created",
"createdBy": "admin-token",
"createdAt": "2024-01-16T13:16:27.887Z",
"createdByUserId": -42,
"data": {
"description": "admin-pat",
"expiresAt": "2024-02-15T13:16:25.586Z",
"secret": "***",
"userId": 1
},
"preData": null,
"tags": [],
"featureName": null,
"project": null,
"environment": null
}
```
## About the changes
EventsService is a dependency in most of our services. This creates
helper methods to create them easily and replace a few places where
we're creating them manually
This change removes the system user's email from the definition, instead
setting it to `null`. It also changes the name to "Unleash System".
The IUser interface doesn't allow `null` email addresses, so we change
the type definition of the system user to get around it. However, using
`null` (instead of just removing the property entirely) is useful
because when you get the system user from the DB, it's email value will
be null (after it has been nulled out).
As of today, there is nowhere in the Unleash system (OSS or Enterprise)
where we use the system user as an IUser (we only use username and ID).
So this change shouldn't break anything.
This should follow https://github.com/Unleash/unleash/pull/5849.
Updates it from 'system@getunleash.io' to `null`. We don't have that
address registered (and probably don't want it), so we'll leave it
empty.
This is a companion PR to
https://github.com/Unleash/unleash/pull/5893. With both of those
merged, the system user in the DB should match the one defined in
`core.ts`
Lots of work here, mostly because I didn't want to turn off the
`noImplicitAnyLet` lint. This PR tries its best to type all the untyped
lets biome complained about (Don't ask me how many hours that took or
how many lints that was >200...), which in the future will force test
authors to actually type their global variables setup in `beforeAll`.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
This change adjusts the exported `SYSTEM_USER` constant in `core.ts` to
match the one created in the migration in
`src/migrations/20231222071533-unleash-system-user.js`
The slight discrepancy between these two caused me some minor headache
when trying to write a test in enterprise.
It also removes the email because we have no inbox at that address (and
we probably don't want one).
For reference, the migration looks like this:
``` sql
ALTER TABLE users ADD COLUMN IF NOT EXISTS is_system BOOLEAN NOT NULL DEFAULT FALSE;
INSERT INTO users
(id, name, username, email, created_by_user_id, is_system)
VALUES
(-1337, 'Unleash System', 'unleash_system_user', 'system@getunleash.io', -1337, true);
```
This adds a bulk endpoint under `/api/client/metrics`. Accessible under
`/api/client/metrics/bulk`.
This allows us to piggyback on the need for an API user with access.
This PR mostly copies the behaviour from our `/edge/metrics` endpoint,
but it filters metrics to only include the environment that the token
has access to.
So a client token that has access to the `production` will not be
allowed to report metrics for the `development` environment. More
importantly, a `development` token will not be allowed to post metrics
for the `production` environment.
This PR adds the schedule suspended event to the slack-app and webhook
definitions.
It also slightly tweaks the markdown formatting of change requests to
add a definite article. This means the snapshot also needs to be
updated.
This PR adds a new `reason` column to the change request schedules table
and populates it with the data that is in the `failure_reason` column.
This is the expand phase of the expand/contract pattern. The code in
enterprise will be updated to try and use the new column name, but fall
back to the old one if no value is present.
The old column can be removed later.
This metric was used while developing the optimal304 feature. The
feature flag has been removed and this data is not longer being
collected and this will remove the metric from Prometheus.