org-notes/equinix/year-end-reviews/workday-notes.org

* Goal: Expand our Market - Lay the foundation for product-led growth
In the Nautilus the biggest responsibility we have is the monolith, and as we've added people to the team, we're starting to add services that are new logic to services outside of the monolith. In order to make this simple, and reduce maintenance burden, I've created exoskeleton and algolyzer, which are go libraries that we can use to develop go services a bit more quickly.

Exoskeleton provides a type-safe routing layer built on top of Gin, and bakes in OTEL so it's easy for us to take our services from local development to production ready.

Algolyzer makes it easier to keep updating algolia indexes happen out of the request span, to keep latency low, while still making sure our UIs are able to be easily searched for relevant objects.

Additionally, I have made a number of improvements to our core infrastructure:

- Improving monitoring of our application to make major upgrades less scary
- Upgrading from Rails 5 to Rails 6
- Upgrade from Ruby 2 to Ruby 3
- Deploying and performing regular maintenance on our CockroachDB cluster
- Diagnose anycast routing issues with our CRDB deployment that led to unexpectedly high latency, which resulted in changing the network from equal path routing to prefer local.

With these changes we're able to keep moving toward keeping the lights on while allowing us to experiment cheaply with common infra needed for smaller services.


* Goal: Build the foundation - A market-leading end-to-end user experience


As we started to deliver LBaaS, Infratographer had an entirely
different opinion on how to manage users and resource ownership, and I
created a GraphQL service to bridge the gap between infratographer
concepts and metal concepts, so when a customer uses the product,
it'll seem familiar. The metal API also emits events that can be
subscribed to over NATS to get updates for things such as organization
and project membership changes.

In order to accomplish this it meant close collaboration with the
identity team to help establish the interfaces and decide on who is
responsible for what parts. Load balancers can now be provisioned and
act as if they belong to a project, even though the system of record
lies completely outside of the Metal API.

VMC-E exposed that we had ordering issues in our VLAN assignments
portion of the networking stack. I worked with my team mates and SWNet
to improve the situation. I designed and implemented a queuing
solution that allows us to queue asynchronous tasks that are order
dependent on queues with a single consumer. We've already gotten
feedback from VMC-E and other customers that the correctness issues
with VLAN assignment have been solved, and we don't need to wait for a
complete networking overhaul from Orca to fix it. There are more
opportunities to target issues in our networking stack that suffer
from ordering issues with this solution.

For federated SSO, I was able to help keep communication between
Platform Identity, Nautilus and Portals flowing smoothly by
documenting exactly what was needed to get us in a position to onboard
our first set of customers using SSO. I used my knowledge of OAuth2 an
OpenIDConnect and broke down the integration points in a document
shared between these teams so it was clear what we needed to do. This
made it easier to commit and deliver within the timeframe we set.

not networking specific
nano metal
audit logging


* Goal: DS FunctionalPriorities - Build, socialize, and execute on plan to improve engineering experience

Throughout this year, I've been circulating ideas in writing and ins
hared forums more often. Within the nautilus team I did 8 tech-talks
to share ideas and information with the team and to solicit
feedback. I also wrote documents for collaborating with other teams
mainly for LBaaS (specifically around how it integrates with the
EMAPI) and federated SSO.

- CRDB performance troubleshooting

  I discussed how I determined that anycast routing was not properly
  weighted, and my methodology for designing tests to diagnose the issue.

- Monitoring strategy for the API Rails/Ruby Upgrades

  Here I discussed how we intended to do these upgrades in a way that
  built confidence on top of the confidence we got from our test
  suites by measuring indicators of performance.

- Recorded deployment and monitoring of API

  As we added more people to the team, recording this just made it
  easier to have something we could point to for an API deployment. We
  also have this process documented in the repo.

- Deep diving caching issues from #_incent-1564

  We ran into a very hard to reproduce error where a users accessing
  the same organization with different users were returned the same
  list of organizations/projects regardless of access. Although, the
  API prevented actual reads to the objects that the user didn't have
  proper access to, serving the wrong set of IDs produced unexpected
  behavior in the Portal. It took a long time to diagnose this, and
  then I discussed the results with the team.

- API monitoring by thinking about what we actually deliver

  Related to the rails upgrades, being able to accurately measure the
  health of the monolith requires periodically re-evaluating if we're
  measuring what matters.

- API Auth discussion with using identity-api

  Discussion on the potential uses for identity-api in a
  service-to-service context that the API uses quite frequently as we
  build functionality outside of the API.

- Static analysis on Ruby

  With a dynamically typed language, runtime exceptions are no fun,
  but some static analysis goes a long way. In this talk I explained
  how it works at the AST level and how we can use this to enforce
  conventions that we have adopted in the API. As an action item, I
  started enabling useful "cops" to prevent common logic errors in
  ruby.

- Session Scheduler

  Here I discussed the problem and the solution that we implemented
  to prevent VLANs from being in inconsistent states when assigned and
  unassigned quickly. The solution we delivered was generic, and
  solved the problem simply, and this talk was to shine some light on
  the new tool that the team has to use for ordering problems.


* Twilio account


always assisting the team
help new joinees to ramp up fast
participate in interviews
easy to work with across teams
clear communication
able to navigate
relations with delivery
not only engineering - product, devrel