## About the changes
getAllActive from api-tokens store is the second most frequent query

To prevent starving our db connections, we can cache this data that
rarely changes and clear the cache when we see changes. Because we will
only clear changes in the node receiving the change we're only caching
the data for 1 minute.
This should give us some room to test if this solution will work
---------
Co-authored-by: Nuno Góis <github@nunogois.com>
Adds a new Inactive Users list component to admin/users for easier cleanup of users that are counted as inactive: No sign of activity (logins or api token usage) in the last 180 days.
---------
Co-authored-by: David Leek <david@getunleash.io>
## PR Description
https://linear.app/unleash/issue/2-1645/address-post-mortem-action-point-all-flags-should-be-runtime
Refactor with the goal of ensuring that flags are runtime controllable,
mostly focused on the current scheduler logic.
This includes the following changes:
- Moves scheduler into its own "scheduler" feature folder
- Reverts dependency: SchedulerService takes in the MaintenanceService,
not the other way around
- Scheduler now evaluates maintenance mode at runtime instead of relying
only on its mode state (active / paused)
- Favors flag checks to happen inside the scheduled methods, instead of
controlling whether the method is scheduled at all (favor runtime over
startup)
- Moves "account last seen update" to scheduler
- Updates tests accordingly
- Boyscouting
Here's a manual test showing this behavior, where my local instance was
controlled by a remote instance. Whenever I toggle `maintenanceMode`
through a flag remotely, my scheduled functions stop running:
https://github.com/Unleash/unleash/assets/14320932/ae0a7fa9-5165-4c0b-9b0b-53b9fb20de72
Had a look through all of our current flags and it *seems to me* that
they are all used in a runtime controllable way, but would still feel
more comfortable if this was double checked, since it can be complex to
ensure this.
The only exception to this was `migrationLock`, which I believe is OK,
since the migration only happens at the start anyways.
## Discussion / Questions
~~Scheduler `mode` (active / paused) is currently not *really* being
used, along with its respective methods, except in tests. I think this
could be a potential footgun. Should we remove it in favor of only
controlling the scheduler state through maintenance mode?~~ Addressed in
7c52e3f638
~~The config property `disableScheduler` is still a startup
configuration, but perhaps that makes sense to leave as is?~~
[Answered](https://github.com/Unleash/unleash/pull/5363#issuecomment-1819005445)
by @FredrikOseberg, leaving as is.
Are there any other tests we should add?
Is there anything I missed?
Identified some `setInterval` and `setTimeout` that may make sense to
leave as is instead of moving over to the scheduler service:
- ~~`src/lib/metrics` - This is currently considered a `MetricsMonitor`.
Should this be refactored to a service instead and adapt these
setIntervals to use the scheduler instead? Is there anything special
with this we need to take into account? @chriswk @ivarconr~~
[Answered](https://github.com/Unleash/unleash/pull/5363#issuecomment-1820501511)
by @ivarconr, leaving as is.
- ~~`src/lib/proxy/proxy-repository.ts` - This seems to have a complex
and specific logic currently. Perhaps we should leave it alone for now?
@FredrikOseberg~~
[Answered](https://github.com/Unleash/unleash/pull/5363#issuecomment-1819005445)
by @FredrikOseberg, leaving as is.
- `src/lib/services/user-service.ts` - This one also seems to be a bit
more specific, where we generate new timeouts for each receiver id.
Might not belong in the scheduler service. @Tymek
This PR hooks up the changes introduced in #5301 to the API and puts
them behind a feature flag. A new test has been added and the test setup
has been slightly tweaked to allow this test.
When the flag is enabled, the API will now not let you delete a segment
that's used in any active CRs.
This PR adds a cleanup job that removes unknown feature flags from
last_seen_at_metrics table every 24 hours since we no longer have a
foreign key on the name column in the features table.
This PR is the first step in separating the client and admin stores.
Currently our feature toggle services uses the client store to serve
multiple purposes.
Admin API uses the feature toggle service to serve both the feature
toggle list and playground features, while the client API uses the
feature toggle service to serve client features. The admin API can
change often and have very different requirements than the client API,
which changes infrequently and generally keeps the same stable structure
for long periods of time. This architecture is error prone, because when
you need to make changes to the admin API, you can very easily affect
the client API.
I aim to put up a stone wall between the two APIs. Complete separation
between the two APIs, at the cost of some duplication.
In this PR I have created a feature oriented architecture for client
features and disconnected the client API from the feature toggle
service. It now goes through it's own service to it's own store. For
feature toggle service I have duplicated and replaced the functionality
that serves /api/admin/features, I have kept a lot of the ugliness in
the code and haven't removed anything in order to avoid breaking
changes.
Next steps:
* Move playground to admin API
* Remove client-feature-toggle-store from feature-toggle-service
## About the changes
This splits the interfaces for import and export, especially because the
import functionality has to be replaced in enterprise repo.
This is a breaking change because of the service renames, but I'll have
the PR for the other repository ready so we reduce the time to fix. I
intentionally avoided doing it backward compatible because of time.
As part of more telemetry on the usage of Unleash.
This PR adds a new `stat_` prefixed table as well as a trigger on the
events table trigger on each insert to increment a counter per
environment per day.
The trigger will trigger on every insert into the events base, but will
filter and only increment the counter for events that actually have the
environment set. (there are events, like user-created, that does not
relate to a specific environment).
Bit wary on this, but since we truncate down to row per (day,
environment) combo, finding conflict and incrementing shouldn't take too
long here.
@ivarconr was it something like this you were considering?
## About the changes
This transactional implementation decorates a service with a
transactional method that removes the need to start transactions in the
method using the service.
This is a gradual rollout with a feature toggle, just because
transactions are not easy.
https://linear.app/unleash/issue/2-1403/consider-refactoring-the-way-tags-are-fetched-for-the-events
This adds 2 methods to `EventService`:
- `storeEvent`;
- `storeEvents`;
This allows us to run event-specific logic inside these methods. In the
case of this PR, this means fetching the feature tags in case the event
contains a `featureName` and there are no tags specified in the event.
This prevents us from having to remember to fetch the tags in order to
store feature-related events except for very specific cases, like the
deletion of a feature - You can't fetch tags for a feature that no
longer exists, so in that case we need to pre-fetch the tags before
deleting the feature.
This also allows us to do any event-specific post-processing to the
event before reaching the DB layer.
In general I think it's also nicer that we reference the event service
instead of the event store directly.
There's a lot of changes and a lot of files touched, but most of it is
boilerplate to inject the `eventService` where needed instead of using
the `eventStore` directly.
Hopefully this will be a better approach than
https://github.com/Unleash/unleash/pull/4729
---------
Co-authored-by: Gastón Fournier <gaston@getunleash.io>
## About the changes
- `getActiveUsers` is using multiple stores, so it is refactored into
read-model
- Refactored Instance stats service into `features` to co-locate related
code
Closes https://linear.app/unleash/issue/UNL-230/active-users-prometheus
### Important files
`src/lib/features/instance-stats/getActiveUsers.ts`
## Discussion points
`getActiveUsers` is coded less _class-based_ then previous similar
read-models. In one file instead of 3 (read-model interface, fake read
model, sql read model). I find types and functions way more readable,
but I'm ready to refactor it to interfaces and classes if consistency is
more important.
This PR adds updates the potentially stale status change events whenever
the potentially stale update function is run.
No events are emitted yet. While the emission is only a few lines of
code, I'd like to do that in a separate PR so that we can give it the
attention it deserves in the form of tests, etc.
This PR also moves the potentially stale update functionality from the
`update` method to only being done in the
`updatePotentiallyStaleFeatures` method. This keeps all functionality
related to marking `potentiallyStale` in one place.
The emission implementation was removed in
4fb7cbde03
## The update queries
While it would be possible to do the state updates in a single query
instead of three separate ones, wrangling this into knex proved to be
troublesome (and would also probably be harder to understand and reason
about). The current solution uses three smaller queries (one select, two
updates), as Jaanus suggested in a private slack thread.
This PR lays most of the groundwork required for emitting events when
features are marked as potentially stale by Unleash. It does **not**
emit any events just yet. The summary is:
- periodically look for features that are potentially stale and mark
them (set to run every 10 seconds for now; can be changed)
- when features are updated, if the update data contains changes to the
feature's type or createdAt date, also update the potentially stale
status.
It is currently about 220 lines of tests and about 100 lines of
application code (primarily db migration and two new methods on the
IFeatureToggleStore interface).
The reason I wanted to put this into a single PR (instead of just the db
migration, then just the potentially stale marking, then the update
logic) is:
If users get the db migration first, but not the rest of the update
logic until the events are fired, then they could get a bunch of new
events for features that should have been marked as potentially stale
several days/weeks/months ago. That seemed undesirable to me, so I
decided to bunch those changes together. Of course, I'd be happy to
break it into smaller parts.
## Rules
A toggle will be marked as potentially stale iff:
- it is not already stale
- its createdAt date is older than its feature type's expected lifetime
would dictate
## Migration
The migration adds a new `potentially_stale` column to the features
table and sets this to true for any toggles that have exceeded their
expected lifetime and that have not already been marked as `stale`.
## Discussion
### The `currentTime` parameter of `markPotentiallyStaleFeatures`
The `markPotentiallyStaleFetaures` method takes an optional
`currentTime` parameter. This was added to make it easier to test (so
you can test "into the future"), but it's not used in the application.
We can rewrite the tests to instead update feature toggles manually, but
that wouldn't test the actual marking method. Happy to discuss.
This PR fixes an issue where events generated during a db transaction
would get published before the transaction was complete. This caused
errors in some of our services that expected the data to be stored
before the transaction had been commited. Refer to [linear issue
1-1049](https://linear.app/unleash/issue/1-1049/event-emitter-should-emit-events-after-db-transaction-is-commited-not)
for more info.
Fixes 1-1049.
## Changes
The most important change here is that the `eventStore` no longer emits
events when they happen (because that can be in the middle of a
transaction). Instead, events are stored with a new `announced` column.
The new event announcer service runs on a schedule (every second) and
publishes any new events that have not been published.
Parts of the code have largely been lifted from the
`client-application-store`, which uses a similar logic.
I have kept the emitting of the event within the event store because a
lot of other services listen to events from this store, so removing that
would require a large rewrite. It's something we could look into down
the line, but it seems like too much of a change to do right now.
## Discussion
### Terminology:
Published vs announced? We should settle on one or the other. Announced
is consistent with the client-application store, but published sounds
more fitting for events.
### Publishing and marking events as published
The current implementation fetches all events that haven't been marked
as announced, sets them as announced, and then emits them. It's possible
that Unleash would crash in the interim or something else might happen,
causing the events not to get published. Maybe it would make sense to
just fetch the events and only mark them as published after the
announcement? On the other hand, that might get us into other problems.
Any thoughts on this would be much appreciated.