Added more about ruby3 upgrades
This commit is contained in:
@@ -18,11 +18,11 @@ The API deployment consists of:
|
|||||||
** Release Candidate Deployment Strategy
|
** Release Candidate Deployment Strategy
|
||||||
|
|
||||||
This is a form of a canary deployment strategy. This strategy involves
|
This is a form of a canary deployment strategy. This strategy involves
|
||||||
diverting just a small amout of traffic to the new version, while looking
|
diverting just a small amount of traffic to the new version, while
|
||||||
for an increased error rate. After some time, we assess how the
|
looking for an increased error rate. After some time, we assess how
|
||||||
candidate has been performing. If things look bad, then we scale back
|
the candidate has been performing. If things look bad, then we scale
|
||||||
and address the issues. Otherwise we ramp up the amount of traffic
|
back and address the issues. Otherwise we ramp up the amount of
|
||||||
that the pods see.
|
traffic that the pods see.
|
||||||
|
|
||||||
Doing things this way allows us to build confidence in the release but
|
Doing things this way allows us to build confidence in the release but
|
||||||
it does not come without drawbacks. The most important thing to be
|
it does not come without drawbacks. The most important thing to be
|
||||||
@@ -47,30 +47,18 @@ the two versions are compatible, and can run side-by-side.
|
|||||||
|
|
||||||
* Lessons from Previous Rails Upgrades
|
* Lessons from Previous Rails Upgrades
|
||||||
|
|
||||||
We have telemetry set up to monitor the system as a whole, so
|
|
||||||
identifying whether or not something looks like an issue related to
|
|
||||||
the upgrade or is unrelated has been left to SMEs intution.
|
|
||||||
|
|
||||||
In the rails 5.2->6.0 upgrade we hit a couple issues:
|
|
||||||
- Rails 6 jobs were not able to be served with 5 workers
|
|
||||||
- We addressed this before rolling forwards
|
|
||||||
- Prometheus-client upgrade meant that all the cron jobs succeeded but
|
|
||||||
failed to report their status.
|
|
||||||
|
|
||||||
In the rails 6.1 upgrade we observed a new issue with respect to users
|
|
||||||
seeing 404s through the portal, after hitting the =/organizations=
|
|
||||||
endpoint.
|
|
||||||
- I decided that the scope of the bug was small enough that we were
|
|
||||||
okay to roll forward.
|
|
||||||
- Error rates looked largely the same because the symptom that we
|
|
||||||
observed was an increased number of 403s on the Projects Controller
|
|
||||||
|
|
||||||
|
|
||||||
* Defining key performance indicators
|
* Defining key performance indicators
|
||||||
|
|
||||||
Typically, what I would do (and what I assume Lucas does) is just keep
|
Typically, what I would do (and what I assume Lucas does) is just keep an eye on Rollbar. Rollbar would capture things that are at least fundamentally broken that would cause exceptions or errors in Rails. Additionally, I would keep a broad view on errors by span kind in honeycomb to see if we were seeing a spike associated with the release candidate.
|
||||||
an eye on Rollbar. Rollbar would capture things that are at least
|
|
||||||
fundamentally broken that would cause exceptions or errors in
|
- What we were looking at in the previous releases
|
||||||
Rails. Additionally, I would keep a broad view on errors by span kind
|
- Error rates by span kind per version
|
||||||
in honeycomb to see if we were seeing a spike associated with the
|
|
||||||
release candidate.
|
This helps us know if the error rate for requests is higher in one version or the other. Or if we're failing specifically in proccessing background jobs.
|
||||||
|
|
||||||
|
- No surprises in Rollbar
|
||||||
|
|
||||||
|
Instead, ideally we'd be tracking some information the system reports that are stable.
|
||||||
|
|||||||
Reference in New Issue
Block a user