Ruby upgradessss

2023-05-10 16:03:37 -04:00
parent 4fc9046c8e
commit 7e576eb71a
1 changed files with 76 additions and 0 deletions
--- a/ruby3-upgrades.org
+++ b/ruby3-upgrades.org
@@ -0,0 +1,76 @@
+#+TITLE: Ruby 3 Upgrades
+#+AUTHOR: Adam Mohammed
+#+DATE: May 10, 2023
+
+
+* Agenda
+- Recap: API deployment architecture
+- Lessons from the Rails 6.0/6.1 upgrade
+- Defining key performance indicators
+
+* Recap: API Deployment
+
+The API deployment consists of:
+- **frontend pods** - 10 Pods dedicated to serving HTTP traffic
+- **worker pods** - 8 pods dedicated to job processing
+- **cron jobs** - various rake tasks executed to perform periodic upkeep necessary for the APIcontext
+
+** Release Candidate Deployment Strategy
+
+This is a form of a canary deployment strategy. This strategy involves
+diverting just a small amout of traffic to the new version, while looking
+for an increased error rate. After some time, we assess how the
+candidate has been performing. If things look bad, then we scale back
+and address the issues. Otherwise we ramp up the amount of traffic
+that the pods see.
+
+Doing things this way allows us to build confidence in the release but
+it does not come without drawbacks. The most important thing to be
+aware of is that we're relying on the k8s service to load balance
+between the two versions of the application. That means that we're not
+doing any tricks to make sure that a customer is only ever hitting a
+single app version.
+
+We accept this risk because issues with HTTP requests are mostly
+confined to the request and each span stamps the rails version that
+processed that portion of the request.
+
+Some HTTP requests are not completed completely at the
+request/response time. For these endpoints, we queue up background
+jobs that the workers eventually process. This means that some
+requests will be processed by the release candidate, and the
+background job will be processed by the older application version.
+
+Because of this, when using this release strategy, we're assuming that
+the two versions are compatible, and can run side-by-side.
+
+
+* Lessons from Previous Rails Upgrades
+
+We have telemetry set up to monitor the system as a whole, so
+identifying whether or not something looks like an issue related to
+the upgrade or is unrelated has been left to SMEs intution.
+
+In the rails 5.2->6.0 upgrade we hit a couple issues:
+- Rails 6 jobs were not able to be served with 5 workers
+  - We addressed this before rolling forwards
+- Prometheus-client upgrade meant that all the cron jobs succeeded but
+  failed to report their status.
+
+In the rails 6.1 upgrade we observed a new issue with respect to users
+seeing 404s through the portal, after hitting the =/organizations=
+endpoint.
+- I decided that the scope of the bug was small enough that we were
+  okay to roll forward.
+- Error rates looked largely the same because the symptom that we
+  observed was an increased number of 403s on the Projects Controller
+
+
+* Defining key performance indicators
+
+Typically, what I would do (and what I assume Lucas does) is just keep
+an eye on Rollbar. Rollbar would capture things that are at least
+fundamentally broken that would cause exceptions or errors in
+Rails. Additionally, I would keep a broad view on errors by span kind
+in honeycomb to see if we were seeing a spike associated with the
+release candidate.