Added more about ruby3 upgrades

2023-05-12 09:57:29 -04:00
parent 35975e7fda
commit 03694ee62d
1 changed files with 15 additions and 27 deletions
--- a/ruby3-upgrades.org
+++ b/ruby3-upgrades.org
@@ -18,11 +18,11 @@ The API deployment consists of:
 ** Release Candidate Deployment Strategy

 This is a form of a canary deployment strategy. This strategy involves
-diverting just a small amout of traffic to the new version, while looking
-for an increased error rate. After some time, we assess how the
-candidate has been performing. If things look bad, then we scale back
-and address the issues. Otherwise we ramp up the amount of traffic
-that the pods see.
+diverting just a small amount of traffic to the new version, while
+looking for an increased error rate. After some time, we assess how
+the candidate has been performing. If things look bad, then we scale
+back and address the issues. Otherwise we ramp up the amount of
+traffic that the pods see.

 Doing things this way allows us to build confidence in the release but
 it does not come without drawbacks. The most important thing to be
@@ -47,30 +47,18 @@ the two versions are compatible, and can run side-by-side.

 * Lessons from Previous Rails Upgrades

-We have telemetry set up to monitor the system as a whole, so
-identifying whether or not something looks like an issue related to
-the upgrade or is unrelated has been left to SMEs intution.

-In the rails 5.2->6.0 upgrade we hit a couple issues:
- Rails 6 jobs were not able to be served with 5 workers
-  - We addressed this before rolling forwards
- Prometheus-client upgrade meant that all the cron jobs succeeded but
-  failed to report their status.
-
-In the rails 6.1 upgrade we observed a new issue with respect to users
-seeing 404s through the portal, after hitting the =/organizations=
-endpoint.
- I decided that the scope of the bug was small enough that we were
-  okay to roll forward.
- Error rates looked largely the same because the symptom that we
-  observed was an increased number of 403s on the Projects Controller


 * Defining key performance indicators

-Typically, what I would do (and what I assume Lucas does) is just keep
-an eye on Rollbar. Rollbar would capture things that are at least
-fundamentally broken that would cause exceptions or errors in
-Rails. Additionally, I would keep a broad view on errors by span kind
-in honeycomb to see if we were seeing a spike associated with the
-release candidate.
+Typically, what I would do (and what I assume Lucas does) is just keep an eye on Rollbar. Rollbar would capture things that are at least fundamentally broken that would cause exceptions or errors in Rails. Additionally, I would keep a broad view on errors by span kind in honeycomb to see if we were seeing a spike associated with the release candidate.
+
+- What we were looking at in the previous releases
+- Error rates by span kind per version
+
+  This helps us know if the error rate for requests is higher in one version or the other. Or if we're failing specifically in proccessing background jobs.
+
+- No surprises in Rollbar
+
+Instead, ideally we'd be tracking some information the system reports that are stable.