Files
org-notes/equinix/api-team/proposals/ruby3-upgrades.org
2024-04-20 10:21:42 -04:00

2.6 KiB

Ruby 3 Upgrades

Agenda

  • Recap: API deployment architecture
  • Lessons from the Rails 6.0/6.1 upgrade
  • Defining key performance indicators

Recap: API Deployment

The API deployment consists of:

  • frontend pods - 10 Pods dedicated to serving HTTP traffic
  • worker pods - 8 pods dedicated to job processing
  • cron jobs - various rake tasks executed to perform periodic upkeep necessary for the APIcontext

Release Candidate Deployment Strategy

This is a form of a canary deployment strategy. This strategy involves diverting just a small amount of traffic to the new version, while looking for an increased error rate. After some time, we assess how the candidate has been performing. If things look bad, then we scale back and address the issues. Otherwise we ramp up the amount of traffic that the pods see.

Doing things this way allows us to build confidence in the release but it does not come without drawbacks. The most important thing to be aware of is that we're relying on the k8s service to load balance between the two versions of the application. That means that we're not doing any tricks to make sure that a customer is only ever hitting a single app version.

We accept this risk because issues with HTTP requests are mostly confined to the request and each span stamps the rails version that processed that portion of the request.

Some HTTP requests are not completed completely at the request/response time. For these endpoints, we queue up background jobs that the workers eventually process. This means that some requests will be processed by the release candidate, and the background job will be processed by the older application version.

Because of this, when using this release strategy, we're assuming that the two versions are compatible, and can run side-by-side.

Lessons from Previous Rails Upgrades

Defining key performance indicators

Typically, what I would do (and what I assume Lucas does) is just keep an eye on Rollbar. Rollbar would capture things that are at least fundamentally broken that would cause exceptions or errors in Rails. Additionally, I would keep a broad view on errors by span kind in honeycomb to see if we were seeing a spike associated with the release candidate.

  • What we were looking at in the previous releases
  • Error rates by span kind per version This helps us know if the error rate for requests is higher in one version or the other. Or if we're failing specifically in proccessing background jobs.
  • No surprises in Rollbar

Instead, ideally we'd be tracking some information the system reports that are stable.