Compare commits
2 Commits
4fc9046c8e
...
35975e7fda
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
35975e7fda | ||
|
|
7e576eb71a |
10
notes.org
10
notes.org
@@ -1,13 +1,6 @@
|
||||
* Tasks
|
||||
|
||||
** TODO Look at why we'd be getting request bodies for GET ips avail
|
||||
[2023-04-19 Wed]
|
||||
** TODO Upgrade CRDB to 22.2.7
|
||||
** TODO Put together POC for micro-caching RAILS
|
||||
** TODO Look at "compiling" krakend configs from OpenAPI
|
||||
** TODO Meeting with DevRel to talk about Provisioning Failures
|
||||
|
||||
|
||||
** DONE Meeting with DevRel to talk about Provisioning Failures
|
||||
Chris:
|
||||
Cluster api - failed provision
|
||||
it shows up with a 403 - moving the project to a new project
|
||||
@@ -32,3 +25,4 @@ Cluster api - failed provision
|
||||
|
||||
|
||||
check on rescue and reinstall operations
|
||||
** TODO Create a ticket to deal with 403s for provisioning failures
|
||||
|
||||
@@ -215,3 +215,41 @@ cc817f6e-f56f-4cae-91f2-eb1a85049847
|
||||
:ARCHIVE_CATEGORY: notes
|
||||
:ARCHIVE_TODO: DONE
|
||||
:END:
|
||||
|
||||
* DONE Audit Spot Market Bids
|
||||
:PROPERTIES:
|
||||
:ARCHIVE_TIME: 2023-05-10 Wed 16:03
|
||||
:ARCHIVE_FILE: ~/notes/org-notes/notes.org
|
||||
:ARCHIVE_OLPATH: Tasks
|
||||
:ARCHIVE_CATEGORY: notes
|
||||
:ARCHIVE_TODO: DONE
|
||||
:END:
|
||||
|
||||
#+begin_src sql :name max_bids per facility
|
||||
SELECT p.slug, array_agg(f.code), array_agg(cl.max_allowed_bid)
|
||||
FROM capacity_levels cl
|
||||
JOIN plans p ON cl.plan_id = p.id
|
||||
JOIN facilities f ON cl.facility_id = f.id
|
||||
JOIN metros m ON f.metro_id = m.id
|
||||
GROUP BY p.slug
|
||||
ORDER BY p.slug ASC;
|
||||
#+end_src
|
||||
|
||||
#+begin_src sql :name checking for distinct prices
|
||||
|
||||
SELECT cl.plan_id, cl.max_allowed_bid, COUNT(DISTINCT cl.max_allowed_bid)
|
||||
FROM capacity_levels cl
|
||||
WHERE cl.deleted_at < 'January 1, 1970'
|
||||
GROUP BY plan_id, max_allowed_bid;
|
||||
#+end_src
|
||||
|
||||
Results [[file:capacity_levels_pricing.csv][capacity_levels_pricing.csv]]
|
||||
|
||||
* DONE Upgrade CRDB to 22.2.7
|
||||
:PROPERTIES:
|
||||
:ARCHIVE_TIME: 2023-05-10 Wed 16:03
|
||||
:ARCHIVE_FILE: ~/notes/org-notes/notes.org
|
||||
:ARCHIVE_OLPATH: Tasks
|
||||
:ARCHIVE_CATEGORY: notes
|
||||
:ARCHIVE_TODO: DONE
|
||||
:END:
|
||||
|
||||
76
ruby3-upgrades.org
Normal file
76
ruby3-upgrades.org
Normal file
@@ -0,0 +1,76 @@
|
||||
#+TITLE: Ruby 3 Upgrades
|
||||
#+AUTHOR: Adam Mohammed
|
||||
#+DATE: May 10, 2023
|
||||
|
||||
|
||||
* Agenda
|
||||
- Recap: API deployment architecture
|
||||
- Lessons from the Rails 6.0/6.1 upgrade
|
||||
- Defining key performance indicators
|
||||
|
||||
* Recap: API Deployment
|
||||
|
||||
The API deployment consists of:
|
||||
- **frontend pods** - 10 Pods dedicated to serving HTTP traffic
|
||||
- **worker pods** - 8 pods dedicated to job processing
|
||||
- **cron jobs** - various rake tasks executed to perform periodic upkeep necessary for the APIcontext
|
||||
|
||||
** Release Candidate Deployment Strategy
|
||||
|
||||
This is a form of a canary deployment strategy. This strategy involves
|
||||
diverting just a small amout of traffic to the new version, while looking
|
||||
for an increased error rate. After some time, we assess how the
|
||||
candidate has been performing. If things look bad, then we scale back
|
||||
and address the issues. Otherwise we ramp up the amount of traffic
|
||||
that the pods see.
|
||||
|
||||
Doing things this way allows us to build confidence in the release but
|
||||
it does not come without drawbacks. The most important thing to be
|
||||
aware of is that we're relying on the k8s service to load balance
|
||||
between the two versions of the application. That means that we're not
|
||||
doing any tricks to make sure that a customer is only ever hitting a
|
||||
single app version.
|
||||
|
||||
We accept this risk because issues with HTTP requests are mostly
|
||||
confined to the request and each span stamps the rails version that
|
||||
processed that portion of the request.
|
||||
|
||||
Some HTTP requests are not completed completely at the
|
||||
request/response time. For these endpoints, we queue up background
|
||||
jobs that the workers eventually process. This means that some
|
||||
requests will be processed by the release candidate, and the
|
||||
background job will be processed by the older application version.
|
||||
|
||||
Because of this, when using this release strategy, we're assuming that
|
||||
the two versions are compatible, and can run side-by-side.
|
||||
|
||||
|
||||
* Lessons from Previous Rails Upgrades
|
||||
|
||||
We have telemetry set up to monitor the system as a whole, so
|
||||
identifying whether or not something looks like an issue related to
|
||||
the upgrade or is unrelated has been left to SMEs intution.
|
||||
|
||||
In the rails 5.2->6.0 upgrade we hit a couple issues:
|
||||
- Rails 6 jobs were not able to be served with 5 workers
|
||||
- We addressed this before rolling forwards
|
||||
- Prometheus-client upgrade meant that all the cron jobs succeeded but
|
||||
failed to report their status.
|
||||
|
||||
In the rails 6.1 upgrade we observed a new issue with respect to users
|
||||
seeing 404s through the portal, after hitting the =/organizations=
|
||||
endpoint.
|
||||
- I decided that the scope of the bug was small enough that we were
|
||||
okay to roll forward.
|
||||
- Error rates looked largely the same because the symptom that we
|
||||
observed was an increased number of 403s on the Projects Controller
|
||||
|
||||
|
||||
* Defining key performance indicators
|
||||
|
||||
Typically, what I would do (and what I assume Lucas does) is just keep
|
||||
an eye on Rollbar. Rollbar would capture things that are at least
|
||||
fundamentally broken that would cause exceptions or errors in
|
||||
Rails. Additionally, I would keep a broad view on errors by span kind
|
||||
in honeycomb to see if we were seeing a spike associated with the
|
||||
release candidate.
|
||||
Reference in New Issue
Block a user