Update notes for 5/10

Ruby upgradessss
2023-05-10 16:06:51 -04:00 · 2023-05-10 16:03:37 -04:00
3 changed files with 116 additions and 8 deletions
--- a/notes.org
+++ b/notes.org
@@ -1,13 +1,6 @@
 * Tasks
 ** TODO Look at why we'd be getting request bodies for GET ips avail
  [2023-04-19 Wed]
 ** TODO Upgrade CRDB to 22.2.7
 ** TODO Put together POC for micro-caching RAILS
-** TODO Look at "compiling" krakend configs from OpenAPI
+** DONE Meeting with DevRel to talk about Provisioning Failures
 ** TODO Meeting with DevRel to talk about Provisioning Failures
 Chris:
 Cluster api - failed provision
   it shows up with a 403 - moving the project to a new project
@@ -32,3 +25,4 @@ Cluster api - failed provision
   check on rescue and reinstall operations
 ** TODO Create a ticket to deal with 403s for provisioning failures
--- a/notes.org_archive
+++ b/notes.org_archive
@@ -215,3 +215,41 @@ cc817f6e-f56f-4cae-91f2-eb1a85049847
 :ARCHIVE_CATEGORY: notes
 :ARCHIVE_TODO: DONE
 :END:
 * DONE Audit Spot Market Bids
 :PROPERTIES:
 :ARCHIVE_TIME: 2023-05-10 Wed 16:03
 :ARCHIVE_FILE: ~/notes/org-notes/notes.org
 :ARCHIVE_OLPATH: Tasks
 :ARCHIVE_CATEGORY: notes
 :ARCHIVE_TODO: DONE
 :END:
 #+begin_src sql :name max_bids per facility
 SELECT p.slug,  array_agg(f.code), array_agg(cl.max_allowed_bid)
 FROM capacity_levels cl
 JOIN plans p ON cl.plan_id = p.id
 JOIN facilities f ON cl.facility_id = f.id
 JOIN metros m ON f.metro_id = m.id
 GROUP BY p.slug
 ORDER BY p.slug ASC;
 #+end_src
 #+begin_src sql :name checking for distinct prices
 SELECT cl.plan_id, cl.max_allowed_bid, COUNT(DISTINCT cl.max_allowed_bid)
 FROM capacity_levels cl
 WHERE cl.deleted_at < 'January 1, 1970'
 GROUP BY plan_id, max_allowed_bid;
 #+end_src
 Results [[file:capacity_levels_pricing.csv][capacity_levels_pricing.csv]]
 * DONE Upgrade CRDB to 22.2.7
 :PROPERTIES:
 :ARCHIVE_TIME: 2023-05-10 Wed 16:03
 :ARCHIVE_FILE: ~/notes/org-notes/notes.org
 :ARCHIVE_OLPATH: Tasks
 :ARCHIVE_CATEGORY: notes
 :ARCHIVE_TODO: DONE
 :END:
--- a/ruby3-upgrades.org
+++ b/ruby3-upgrades.org
@@ -0,0 +1,76 @@
 #+TITLE: Ruby 3 Upgrades
 #+AUTHOR: Adam Mohammed
 #+DATE: May 10, 2023
 * Agenda
 - Recap: API deployment architecture
 - Lessons from the Rails 6.0/6.1 upgrade
 - Defining key performance indicators
 * Recap: API Deployment
 The API deployment consists of:
 - **frontend pods** - 10 Pods dedicated to serving HTTP traffic
 - **worker pods** - 8 pods dedicated to job processing
 - **cron jobs** - various rake tasks executed to perform periodic upkeep necessary for the APIcontext
 ** Release Candidate Deployment Strategy
 This is a form of a canary deployment strategy. This strategy involves
 diverting just a small amout of traffic to the new version, while looking
 for an increased error rate. After some time, we assess how the
 candidate has been performing. If things look bad, then we scale back
 and address the issues. Otherwise we ramp up the amount of traffic
 that the pods see.
 Doing things this way allows us to build confidence in the release but
 it does not come without drawbacks. The most important thing to be
 aware of is that we're relying on the k8s service to load balance
 between the two versions of the application. That means that we're not
 doing any tricks to make sure that a customer is only ever hitting a
 single app version.
 We accept this risk because issues with HTTP requests are mostly
 confined to the request and each span stamps the rails version that
 processed that portion of the request.
 Some HTTP requests are not completed completely at the
 request/response time. For these endpoints, we queue up background
 jobs that the workers eventually process. This means that some
 requests will be processed by the release candidate, and the
 background job will be processed by the older application version.
 Because of this, when using this release strategy, we're assuming that
 the two versions are compatible, and can run side-by-side.
 * Lessons from Previous Rails Upgrades
 We have telemetry set up to monitor the system as a whole, so
 identifying whether or not something looks like an issue related to
 the upgrade or is unrelated has been left to SMEs intution.
 In the rails 5.2->6.0 upgrade we hit a couple issues:
 - Rails 6 jobs were not able to be served with 5 workers
  - We addressed this before rolling forwards
 - Prometheus-client upgrade meant that all the cron jobs succeeded but
  failed to report their status.
 In the rails 6.1 upgrade we observed a new issue with respect to users
 seeing 404s through the portal, after hitting the =/organizations=
 endpoint.
 - I decided that the scope of the bug was small enough that we were
  okay to roll forward.
 - Error rates looked largely the same because the symptom that we
  observed was an increased number of 403s on the Projects Controller
 * Defining key performance indicators
 Typically, what I would do (and what I assume Lucas does) is just keep
 an eye on Rollbar. Rollbar would capture things that are at least
 fundamentally broken that would cause exceptions or errors in
 Rails. Additionally, I would keep a broad view on errors by span kind
 in honeycomb to see if we were seeing a spike associated with the
 release candidate.
Author	SHA1	Message	Date
Adam Mohammed	35975e7fda	Update notes for 5/10	2023-05-10 16:06:51 -04:00
Adam Mohammed	7e576eb71a	Ruby upgradessss	2023-05-10 16:03:37 -04:00