From 03694ee62da52b34f06fdb0bae5c482911903943 Mon Sep 17 00:00:00 2001 From: Adam Mohammed Date: Fri, 12 May 2023 09:57:29 -0400 Subject: [PATCH] Added more about ruby3 upgrades --- ruby3-upgrades.org | 42 +++++++++++++++--------------------------- 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/ruby3-upgrades.org b/ruby3-upgrades.org index 64ec63c..fc71312 100644 --- a/ruby3-upgrades.org +++ b/ruby3-upgrades.org @@ -18,11 +18,11 @@ The API deployment consists of: ** Release Candidate Deployment Strategy This is a form of a canary deployment strategy. This strategy involves -diverting just a small amout of traffic to the new version, while looking -for an increased error rate. After some time, we assess how the -candidate has been performing. If things look bad, then we scale back -and address the issues. Otherwise we ramp up the amount of traffic -that the pods see. +diverting just a small amount of traffic to the new version, while +looking for an increased error rate. After some time, we assess how +the candidate has been performing. If things look bad, then we scale +back and address the issues. Otherwise we ramp up the amount of +traffic that the pods see. Doing things this way allows us to build confidence in the release but it does not come without drawbacks. The most important thing to be @@ -47,30 +47,18 @@ the two versions are compatible, and can run side-by-side. * Lessons from Previous Rails Upgrades -We have telemetry set up to monitor the system as a whole, so -identifying whether or not something looks like an issue related to -the upgrade or is unrelated has been left to SMEs intution. -In the rails 5.2->6.0 upgrade we hit a couple issues: -- Rails 6 jobs were not able to be served with 5 workers - - We addressed this before rolling forwards -- Prometheus-client upgrade meant that all the cron jobs succeeded but - failed to report their status. - -In the rails 6.1 upgrade we observed a new issue with respect to users -seeing 404s through the portal, after hitting the =/organizations= -endpoint. -- I decided that the scope of the bug was small enough that we were - okay to roll forward. -- Error rates looked largely the same because the symptom that we - observed was an increased number of 403s on the Projects Controller * Defining key performance indicators -Typically, what I would do (and what I assume Lucas does) is just keep -an eye on Rollbar. Rollbar would capture things that are at least -fundamentally broken that would cause exceptions or errors in -Rails. Additionally, I would keep a broad view on errors by span kind -in honeycomb to see if we were seeing a spike associated with the -release candidate. +Typically, what I would do (and what I assume Lucas does) is just keep an eye on Rollbar. Rollbar would capture things that are at least fundamentally broken that would cause exceptions or errors in Rails. Additionally, I would keep a broad view on errors by span kind in honeycomb to see if we were seeing a spike associated with the release candidate. + +- What we were looking at in the previous releases +- Error rates by span kind per version + + This helps us know if the error rate for requests is higher in one version or the other. Or if we're failing specifically in proccessing background jobs. + +- No surprises in Rollbar + +Instead, ideally we'd be tracking some information the system reports that are stable.