Added more about ruby3 upgrades

This commit is contained in:
2023-05-12 09:57:29 -04:00
parent 35975e7fda
commit 03694ee62d

View File

@@ -18,11 +18,11 @@ The API deployment consists of:
** Release Candidate Deployment Strategy ** Release Candidate Deployment Strategy
This is a form of a canary deployment strategy. This strategy involves This is a form of a canary deployment strategy. This strategy involves
diverting just a small amout of traffic to the new version, while looking diverting just a small amount of traffic to the new version, while
for an increased error rate. After some time, we assess how the looking for an increased error rate. After some time, we assess how
candidate has been performing. If things look bad, then we scale back the candidate has been performing. If things look bad, then we scale
and address the issues. Otherwise we ramp up the amount of traffic back and address the issues. Otherwise we ramp up the amount of
that the pods see. traffic that the pods see.
Doing things this way allows us to build confidence in the release but Doing things this way allows us to build confidence in the release but
it does not come without drawbacks. The most important thing to be it does not come without drawbacks. The most important thing to be
@@ -47,30 +47,18 @@ the two versions are compatible, and can run side-by-side.
* Lessons from Previous Rails Upgrades * Lessons from Previous Rails Upgrades
We have telemetry set up to monitor the system as a whole, so
identifying whether or not something looks like an issue related to
the upgrade or is unrelated has been left to SMEs intution.
In the rails 5.2->6.0 upgrade we hit a couple issues:
- Rails 6 jobs were not able to be served with 5 workers
- We addressed this before rolling forwards
- Prometheus-client upgrade meant that all the cron jobs succeeded but
failed to report their status.
In the rails 6.1 upgrade we observed a new issue with respect to users
seeing 404s through the portal, after hitting the =/organizations=
endpoint.
- I decided that the scope of the bug was small enough that we were
okay to roll forward.
- Error rates looked largely the same because the symptom that we
observed was an increased number of 403s on the Projects Controller
* Defining key performance indicators * Defining key performance indicators
Typically, what I would do (and what I assume Lucas does) is just keep Typically, what I would do (and what I assume Lucas does) is just keep an eye on Rollbar. Rollbar would capture things that are at least fundamentally broken that would cause exceptions or errors in Rails. Additionally, I would keep a broad view on errors by span kind in honeycomb to see if we were seeing a spike associated with the release candidate.
an eye on Rollbar. Rollbar would capture things that are at least
fundamentally broken that would cause exceptions or errors in - What we were looking at in the previous releases
Rails. Additionally, I would keep a broad view on errors by span kind - Error rates by span kind per version
in honeycomb to see if we were seeing a spike associated with the
release candidate. This helps us know if the error rate for requests is higher in one version or the other. Or if we're failing specifically in proccessing background jobs.
- No surprises in Rollbar
Instead, ideally we'd be tracking some information the system reports that are stable.