## About the changes
Some of our metrics are not labeled correctly, one example is
`<base-path>/api/frontend/client/metrics` is labeled as
`/client/metrics`. We can see that in internal-backstage/prometheus:
![image](https://github.com/Unleash/unleash/assets/455064/0d8f1f40-8b5b-49d4-8a88-70b523e9be09)
This issue affects all endpoints that fail to validate the request body.
Also, endpoints that are rejected by the authorization-middleware or the
api-token-middleware are reported as `(hidden)`.
To gain more insights on our api usage but being protective of metrics
cardinality we're prefixing `(hidden)` with some well known base urls:
https://github.com/Unleash/unleash/pull/6400/files#diff-1ed998ca46ffc97c9c0d5d400bfd982dbffdb3004b78a230a8a38e7644eee9b6R17-R33
## How to reproduce:
Make an invalid call to metrics (e.g. stop set to null), then check
/internal-backstage/prometheus and find the 400 error. Expected to be at
`path="/api/client/metrics"` but will have `path=""`:
```shell
curl -H"Authorization: *:development.unleash-insecure-client-api-token" -H'Content-type: application/json' localhost:4242/api/client/metrics -d '{
"appName": "bash-test",
"instanceId": "application-name-dacb1234",
"environment": "development",
"bucket": {
"start": "2023-07-27T11:23:44Z",
"stop": null,
"toggles": {
"myCoolToggle": {
"yes": 25,
"no": 42,
"variants": {
"blue": 6,
"green": 15,
"red": 46
}
},
"myOtherToggle": {
"yes": 0,
"no": 100
}
}
}
}'
```
## About the changes
When edge is configured to automatically generate tokens, it requires
the token to be present in all unleash instances.
It's behind a flag which enables us to turn it on on a case by case
scenario.
The risk of this implementation is that we'd be adding load to the
database in the middleware that evaluates tokens (which are present in
mostly all our API calls. We only query when the token is missing but
because the /client and /frontend endpoints which will be the affected
ones are high throughput, we want to be extra careful to avoid DDoSing
ourselves
## Alternatives:
One alternative would be that we merge the two endpoints into one.
Currently, Edge does the following:
If the token is not valid, it tries to create a token using a service
account token and /api/admin/create-token endpoint. Then it uses the
token generated (which is returned from the prior endpoint) to query
/api/frontend. What if we could call /api/frontend with the same service
account we use to create the token? It may sound risky but if the same
application holding the service account token with permission to create
a token, can call /api/frontend via the generated token, shouldn't it be
able to call the endpoint directly?
The purpose of the token is authentication and authorization. With the
two tokens we are authenticating the same app with 2 different
authorization scopes, but because it's the same app we are
authenticating, can't we just use one token and assume that the app has
both scopes?
If the service account already has permissions to create a token and
then use that token for further actions, allowing it to directly call
/api/frontend does not necessarily introduce new security risks. The
only risk is allowing the app to generate new tokens. Which leads to the
third alternative: should we just remove this option from edge?
Fixes ##5799 and #5785
When you do not provide a token we should resolve to the "default"
environment to maintain backward compatibility. If you actually provide
a token we should prefer that and even block the request if it is not
valid.
An interesting fact is that "default" environment is not available on a
fresh installation of Unleash. This means that you need to provide a
token to actually get access to toggle configurations.
---------
Co-authored-by: Thomas Heartman <thomas@getunleash.io>
Lots of work here, mostly because I didn't want to turn off the
`noImplicitAnyLet` lint. This PR tries its best to type all the untyped
lets biome complained about (Don't ask me how many hours that took or
how many lints that was >200...), which in the future will force test
authors to actually type their global variables setup in `beforeAll`.
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
## About the changes
This allows us to encrypt emails at signup for demo users to further
secure our demo instance. Currently, emails are anonymized before
displaying events performed by demo users. But this means that emails
are stored at rest in our DB. By encrypting the emails at login, we're
adding another layer of protection.
This can be enabled with a flag and requires the encryption key and the
initialization vector (IV for short) to be present as environment
variables.
### What
Adds `createdByUserId` to all events exposed by unleash. In addition
this PR updates all tests and usages of the methods in this codebase to
include the required number.
Today we include a lot of "secutiry headers" for all API calls. Quite a
lot of them are only relevent when we return a HTML document for the
browser.
This PR removes and simplify these headers for API calls, so that we do
not include unecessary data in the HTTP headers.
Each header have been carfully examied by following best practices from
these source:
-
https://cheatsheetseries.owasp.org/cheatsheets/REST_Security_Cheat_Sheet.html
- https://owasp.org/www-project-secure-headers/
This feature is protected with feature flag named 'stripHeadersOnAPI'.
## PR Description
https://linear.app/unleash/issue/2-1645/address-post-mortem-action-point-all-flags-should-be-runtime
Refactor with the goal of ensuring that flags are runtime controllable,
mostly focused on the current scheduler logic.
This includes the following changes:
- Moves scheduler into its own "scheduler" feature folder
- Reverts dependency: SchedulerService takes in the MaintenanceService,
not the other way around
- Scheduler now evaluates maintenance mode at runtime instead of relying
only on its mode state (active / paused)
- Favors flag checks to happen inside the scheduled methods, instead of
controlling whether the method is scheduled at all (favor runtime over
startup)
- Moves "account last seen update" to scheduler
- Updates tests accordingly
- Boyscouting
Here's a manual test showing this behavior, where my local instance was
controlled by a remote instance. Whenever I toggle `maintenanceMode`
through a flag remotely, my scheduled functions stop running:
https://github.com/Unleash/unleash/assets/14320932/ae0a7fa9-5165-4c0b-9b0b-53b9fb20de72
Had a look through all of our current flags and it *seems to me* that
they are all used in a runtime controllable way, but would still feel
more comfortable if this was double checked, since it can be complex to
ensure this.
The only exception to this was `migrationLock`, which I believe is OK,
since the migration only happens at the start anyways.
## Discussion / Questions
~~Scheduler `mode` (active / paused) is currently not *really* being
used, along with its respective methods, except in tests. I think this
could be a potential footgun. Should we remove it in favor of only
controlling the scheduler state through maintenance mode?~~ Addressed in
7c52e3f638
~~The config property `disableScheduler` is still a startup
configuration, but perhaps that makes sense to leave as is?~~
[Answered](https://github.com/Unleash/unleash/pull/5363#issuecomment-1819005445)
by @FredrikOseberg, leaving as is.
Are there any other tests we should add?
Is there anything I missed?
Identified some `setInterval` and `setTimeout` that may make sense to
leave as is instead of moving over to the scheduler service:
- ~~`src/lib/metrics` - This is currently considered a `MetricsMonitor`.
Should this be refactored to a service instead and adapt these
setIntervals to use the scheduler instead? Is there anything special
with this we need to take into account? @chriswk @ivarconr~~
[Answered](https://github.com/Unleash/unleash/pull/5363#issuecomment-1820501511)
by @ivarconr, leaving as is.
- ~~`src/lib/proxy/proxy-repository.ts` - This seems to have a complex
and specific logic currently. Perhaps we should leave it alone for now?
@FredrikOseberg~~
[Answered](https://github.com/Unleash/unleash/pull/5363#issuecomment-1819005445)
by @FredrikOseberg, leaving as is.
- `src/lib/services/user-service.ts` - This one also seems to be a bit
more specific, where we generate new timeouts for each receiver id.
Might not belong in the scheduler service. @Tymek
This fixes an edge case not caught originally in
https://github.com/Unleash/unleash/pull/5304 - When creating a new
segment on the global level:
- There is no `projectId`, either in the params or body
- The `UPDATE_PROJECT_SEGMENT` is still a part of the permissions
checked on the endpoint
- There is no `id` on the params
This made it so that we would run `segmentStore.get(id)` with an
undefined `id`, causing issues.
The fix was simply checking for the presence of `params.id` before
proceeding.
https://linear.app/unleash/issue/SR-164/ticket-1106-user-with-createedit-project-segment-is-not-able-to-edit-a
Fixes a bug where the `UPDATE_PROJECT_SEGMENT` permission is not
respected, both on the UI and on the API. The original intention was
stated
[here](https://github.com/Unleash/unleash/pull/3346#discussion_r1140434517).
This was easy to fix on the UI, since we were simply missing the extra
permission on the button permission checks.
Unfortunately the API can be tricky. Our auth middleware tries to grab
the `project` information from either the params or body object, but our
`DELETE` method does not contain this information. There is no body and
the endpoint looks like `/admin/segments/:id`, only including the
segment id.
This means that, in the rbac middleware when we check the permissions,
we need to figure out if we're in such a scenario and fetch the project
information from the DB, which feels a bit hacky, but it's something
we're seemingly already doing for features, so at least it's somewhat
consistent.
Ideally what we could do is leave this API alone and create a separate
one for project segments, with endpoints where we would have project as
a param, like so:
`http://localhost:4242/api/admin/projects/:projectId/segments/1`.
This PR opts to go with the quick and hacky solution for now since this
is an issue we want to fix quickly, but this is something that we should
be aware of. I'm also unsure if we want to create a new API for project
segments. If we decide that we want a different solution I don't mind
either adapting this PR or creating a follow up.
Expose new interface while also getting rid of unneeded compiler ignores
None of the changes should add new security risks, despite this report:
> Code scanning results / CodeQL Failing after 4s — 2 new alerts
including 2 high severity security vulnerabilities
Not sure what that means, maybe a removed ignore...
This commit changes our linter/formatter to biome (https://biomejs.dev/)
Causing our prehook to run almost instantly, and our "yarn lint" task to
run in sub 100ms.
Some trade-offs:
* Biome isn't quite as well established as ESLint
* Are we ready to install a different vscode plugin (the biome plugin)
instead of the prettier plugin
The configuration set for biome also has a set of recommended rules,
this is turned on by default, in order to get to something that was
mergeable I have turned off a couple the rules we seemed to violate the
most, that we also explicitly told eslint to ignore.
https://linear.app/unleash/issue/2-1403/consider-refactoring-the-way-tags-are-fetched-for-the-events
This adds 2 methods to `EventService`:
- `storeEvent`;
- `storeEvents`;
This allows us to run event-specific logic inside these methods. In the
case of this PR, this means fetching the feature tags in case the event
contains a `featureName` and there are no tags specified in the event.
This prevents us from having to remember to fetch the tags in order to
store feature-related events except for very specific cases, like the
deletion of a feature - You can't fetch tags for a feature that no
longer exists, so in that case we need to pre-fetch the tags before
deleting the feature.
This also allows us to do any event-specific post-processing to the
event before reaching the DB layer.
In general I think it's also nicer that we reference the event service
instead of the event store directly.
There's a lot of changes and a lot of files touched, but most of it is
boilerplate to inject the `eventService` where needed instead of using
the `eventStore` directly.
Hopefully this will be a better approach than
https://github.com/Unleash/unleash/pull/4729
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
Fix issues uncovered when reviewing integrations list and form.
- YouTube CSP
- Text content and formatting
- Margins
- Update old integration icons
- Fix headers in dark theme
If apiTokens are enabled breaks middleware chain with a 401 if no token
is found for requests to client and frontend apis. Previously the
middleware allowed the chain to process.
Removes the regex search for multiple slashes, and instead configures
the apiTokenMiddleware to reject unauthorized requests.
This PR attempts to improve the error handling introduced in #3607.
## About the changes
## **tl;dr:**
- Make `UnleashError` constructor protected
- Make all custom errors inherit from `UnleashError`.
- Add tests to ensure that all special error cases include their
relevant data
- Remove `PasswordMismatchError` and `BadRequestError`. These don't
exist.
- Add a few new error types: `ContentTypeError`, `NotImplementedError`,
`UnauthorizedError`
- Remove the `...rest` parameter from error constructor
- Add an unexported `GenericUnleashError` class
- Move OpenAPI conversion function to `BadDataError` clas
- Remove explicit `Error.captureStackTrace`. This is done automatically.
- Extract `getPropFromString` function and add tests
### **In a more verbose fashion**
The main thing is that all our internal errors now inherit
from`UnleashError`. This allows us to simplify the `UnleashError`
constructor and error handling in general while still giving us the
extra benefits we added to that class. However, it _does_ also mean that
I've had to update **all** existing error classes.
The constructor for `UnleashError` is now protected and all places that
called that constructor directly have been updated. Because the base
error isn't available anymore, I've added three new errors to cover use
cases that we didn't already have covered: `NotImplementedError`,
`UnauthorizedError`, `ContentTypeError`. This is to stay consistent in
how we report errors to the user.
There is also an internal class, `GenericUnleashError` that inherits
from the base error. This class is only used in conversions for cases
where we don't know what the error is. It is not exported.
In making all the errors inherit, I've also removed the `...rest`
parameter from the `UnleashError` constructor. We don't need this
anymore.
Following on from the fixes with missing properties in #3638, I have
added tests for all errors that contain extra data.
Some of the error names that were originally used when creating the list
don't exist in the backend. `BadRequestError` and
`PasswordMismatchError` have been removed.
The `BadDataError` class now contains the conversion code for OpenAPI
validation errors. In doing so, I extracted and tested the
`getPropFromString` function.
### Main files
Due to the nature of the changes, there's a lot of files to look at. So
to make it easier to know where to turn your attention:
The changes in `api-error.ts` contain the main changes: protected
constructor, removal of OpenAPI conversion (moved into `BadDataError`.
`api-error.test.ts` contains tests to make sure that errors work as
expected.
Aside from `get-prop-from-string.ts` and the tests, everything else is
just the required updates to go through with the changes.
## Discussion points
I've gone for inheritance of the Error type over composition. This is in
large part because throwing actual Error instances instead of just
objects is preferable (because they collect stack traces, for instance).
However, it's quite possible that we could solve the same thing in a
more elegant fashion using composition.
## For later / suggestions for further improvements
The `api-error` files still contain a lot of code. I think it might be
beneficial to break each Error into a separate folder that includes the
error, its tests, and its schema (if required). It would help decouple
it a bit.
We don't currently expose the schema anywhere, so it's not available in
the openapi spec. We should look at exposing it too.
Finally, it would be good to go through each individual error message
and update each one to be as helpful as possible.
When using PATs if the user that the PAT is for has been removed, we
currently log the missing user at ERROR level. Since this is not
something our SREs can fix, this PR downgrades the NotFoundError to
WARN, instead of ERROR.
<!-- Thanks for creating a PR! To make it easier for reviewers and
everyone else to understand what your changes relate to, please add some
relevant content to the headings below. Feel free to ignore or delete
sections that you don't think are relevant. Thank you! ❤️ -->
## About the changes
<!-- Describe the changes introduced. What are they and why are they
being introduced? Feel free to also add screenshots or steps to view the
changes if they're visual. -->
This deprecates the `username` properties on api-token schemas, and adds
a `tokenName` property.
DB field `username` has been renamed to `token_name`, migration added
for the rename.
Both `username` and `tokenName` can be used when consuming the service,
but only one of them.
## Discussion points
<!-- Anything about the PR you'd like to discuss before it gets merged?
Got any questions or doubts? -->
There's a couple of things I'd like to get opinions on and discuss:
- Frontend still uses the deprecated `username` property
- ApiTokenSchema is used both for input and output of `Create`
controller endpoints and should be split out into separate schemas. I'll
set up a task for this
---------
Co-authored-by: Thomas Heartman <thomas@getunleash.ai>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
This PR implements the first version of a suggested unification (and
documentation) of the errors that we return from the API today.
The goal is for this to be the first step towards the error type defined
in this internal [linear
task](https://linear.app/unleash/issue/1-629/define-the-error-type
'Define the new API error type').
## The state of things today
As things stand, we currently have no (or **very** little) documentation
of the errors that are returned from the API. We mention error codes,
but never what the errors may contain.
Second, there is no specified format for errors, so what they return is
arbitrary, and based on ... Who knows? As a result, we have multiple
different errors returned by the API depending on what operation you're
trying to do. What's more, with OpenAPI validation in the mix, it's
absolutely possible for you to get two completely different error
objects for operations to the same endpoint.
Third, the errors we do return are usually pretty vague and don't really
provide any real help to the user. "You don't have the right
permissions". Great. Well what permissions do I need? And how would I
know? "BadDataError". Sick. Why is it bad?
... You get it.
## What we want to achieve
The ultimate goal is for error messages to serve both humans and
machines. When the user provides bad data, we should tell them what
parts of the data are bad and what they can do to fix it. When they
don't have the right permissions, we should tell them what permissions
they need.
Additionally, it would be nice if we could provide an ID for each error
instance, so that you (or an admin) can look through the logs and locate
he incident.
## What's included in **this** PR?
This PR does not aim to implement everything above. It's not intended to
magically fix everything. Its goal is to implement the necessary
**breaking** changes, so that they can be included in v5. Changing error
messages is a slightly grayer area than changing APIs directly, but
changing the format is definitely something I'd consider breaking.
So this PR:
- defines a minimal version of the error type defined in the [API error
definition linear
task](https://linear.app/unleash/issue/1-629/define-the-error-type).
- aims to catch all errors we return today and wrap them in the error
type
- updates tests to match the new expectations.
An important point: because we are cutting v5 very soon and because work
for this wasn't started until last week, the code here isn't necessarily
very polished. But it doesn't need to be. The internals can be as messy
as we want, as long as the API surface is stable.
That said, I'm very open to feedback about design and code completeness,
etc, but this has intentionally been done quickly.
Please also see my inline comments on the changes for more specific
details.
### Proposed follow-ups
As mentioned, this is the first step to implementing the error type. The
public API error type only exposes `id`, `name`, and `message`. This is
barely any more than most of the previous messages, but they are now all
using the same format. Any additional properties, such as `suggestion`,
`help`, `documentationLink` etc can be added as features without
breaking the current format. This is an intentional limitation of this
PR.
Regarding additional properties: there are some error responses that
must contain extra properties. Some of these are documented in the types
of the new error constructor, but not all. This includes `path` and
`type` properties on 401 errors, `details` on validation errors, and
more.
Also, because it was put together quickly, I don't yet know exactly how
we (as developers) would **prefer** to use these new error messages
within the code, so the internal API (the new type, name, etc), is just
a suggestion. This can evolve naturally over time if (based on feedback
and experience) without changing the public API.
## Returning multiple errors
Most of the time when we return errors today, we only return a single
error (even if many things are wrong). AJV, the OpenAPI integration we
use does have a setting that allows it to return all errors in a request
instead of a single one. I suggest we turn that on, but that we do it in
a separate PR (because it updates a number of other snapshots).
When returning errors that point to `details`, the objects in the
`details` now contain a new `description` property. This "deprecates"
the `message` property. Due to our general deprecation policy, this
should be kept around for another full major and can be removed in v6.
```json
{
"name": "BadDataError",
"message": "Something went wrong. Check the `details` property for more information."
"details": [{
"message": "The .params property must be an object. You provided an array.",
"description": "The .params property must be an object. You provided an array.",
}]
}
```
## About the changes
This makes response time with app names enabled for everyone but also
kept a way of turning it off (kill switch) in case it cause some issues
because of misconfigured app names
## About the changes
This fixes response time metrics with app names when the app just starts
and has zero which is falsy. We want to compare against undefined (which
means the snapshot is not yet ready)
## About the changes
Introduce a snapshot version of instanceStats inside
instance-stats-service to provide a cached state of the statistics
without compromising the DB.
### Important notes
Some rule-of-thumb applied in the PR that can be changed:
1. The snapshot refresh time
2. The threshold to report appName with the metrics
## Discussion points
1. The snapshot could be limited to just the information needed (things
like `hasOIDC` don't change until there's a restart), to optimize the memory usage
3. metrics.ts (used to expose Prometheus metrics) has a [refresh
interval of
2hs](2d16730cc2/src/lib/metrics.ts (L189-L195)),
but with this implementation, we could remove that background task and
rely on the snapshot
4. We could additionally update the snapshot every time someone queries
the DB to fetch stats (`getStats()` method), but it may increase
complexity without a significant benefit
Co-authored-by: Mateusz Kwasniewski <kwasniewski.mateusz@gmail.com>
Co-authored-by: Simon Hornby <liquidwicked64@gmail.com>
## About the changes
This is a follow-up on #1953
This implementation generalizes how we fetch some standard parameters
from the query parameters or request body.
## Discussion points
Unfortunately, we have not used standard names for our APIs and one
example is our `projectId` (in some cases we just used `project`).
Ideally, we're only using one way of sending these parameters either
`projectId` or `project` (same applies to `environment` vs
`environmentId`).
If both parameters are present, due to historical reasons, we'll give
precedence to:
- `projectId` over `project`
- `environment` over `environmentId`
In the presence of both query parameters and body, we'll give precedence
to query parameters also for historical reasons.