Some new design docs

2024-04-20 10:13:39 -04:00
parent 12cf3967ee
commit 0e9dbebd6a
4 changed files with 225 additions and 0 deletions
--- a/design/interconnection-models.org
+++ b/design/interconnection-models.org
@@ -0,0 +1,34 @@
 #+TITLE: How do Interconnections work for dummiez
 #+Author: Adam Mohammed
 *  User Flows
 User starts by making a API call to ~POST
 /projects/:id/connections~. When they make this request they are able
 to either able to use a full dedicated port, on which they get the
 full bandwidth, or they can use a shared port. The dedicated port
 promises you get the full bandwidth, but is more costly.
 A user is also able to to select whether the connection at metal is
 the A side or the Z side. If it's the A-side, then Metal does the
 billing, if it's the Z-side, Fabric takes care of the billing.
 A-side/Z-Side is a telecom terminology, where the A-side is the
 requester and the Z side is the destination. So in the case of
 connecting to a CSP, we're concerned with a-side from metal because
 that means we're making use of Fabric as a service provider to give us
 connection to the CSP within the metro.
 If we were making z-side connnections, we'd be granting someone else
 in the DC access to our networks.
 * Under the Hood
 when the request comes in we create
 - An interconnection object to represent the request
 - Virtual Ports
 - Virtual circuits associated with each port
 - A service token for each
--- a/design/metal-fabric-message-bus.org
+++ b/design/metal-fabric-message-bus.org
@@ -0,0 +1,43 @@
 #+TITLE: Metal Event Entrypoint
 #+AUTHOR: Adam Mohammed
 * Problem
 We would like other parts of the company to be able to notify Metal about
 changes to infrastructure that crosses out of the Metal's business
 domain. The concrete example here is for Fabric to tell metal about
 the state of interconnections.
 * Solution
 Metal's API team would like to expose a message bus to receive events
 from the rest of the organization.
 Metal's API currently sits on top of a RabbitMQ cluster, and we'd like
 to leverage that infrastructure. There are a couple of problems we
 need to solve before we can expose the RabbbitMQ cluster.
 1. RabbitMQ is currently only available within the cluster.
 2. Fabric (and other interested parties) exist outside of Metal
   firewalls that allow traffic into the K8s clusters.
 3. We need to limit blast radius if something were to happen on this shared
 infrastructure, we don't want the main operations on Rabbit that Metal
 relies on to be impacted.
 For 1, the answer is simple expose a path under
 `api.core-a.ny5.metalkube.net` that points to the rabbit service.
 For 2, we leverage the fact that CF and Akamai are whitelisted to the
 metal K8s clusters for the domains `api.packet.net` and
 `api.equinix.com/metal/v1`. This covers getting the cluster exposed to
 the internet
 For 3, we can make use of RabbitMQ [[https://www.rabbitmq.com/vhosts.html][Virtual Hosts]] to isolate the
 /foreign/ traffic to that host. This let's us set up separate
 authentication and authorization policies (such as using Identity-API
 via [[https://www.rabbitmq.com/oauth2.html][OAuth]] plugin) which are absolutely
 necessary since now the core infrastructure is on the internet. We are
 also able to limit resource usage by Vhost to prevent attackers from
 affecting the core API workload.
--- a/design/multi-cloud-networking.org
+++ b/design/multi-cloud-networking.org
@@ -0,0 +1,26 @@
 Ok, so I met with Sangeetha and Bob from MCNS and I think I have an
 idea of what needs to happen for our integrated network for us to
 build things like MCNS and VMaaS.
 First, you just need two things to be able to integrate at the
 boundaries of Metal and Fabric, you need a VNI and you need a USE
 port.  Metal already has a service which allocates VNIs, so I was
 wondering why Jarrod might not have told MCNS about it. Since VNIs and
 USE ports are both shared resources that we want a single bookkeeper
 over, there's only one logical point to do that today, and that's the
 Metal API.
 In a perfect world though, the Metal API doesn't orchestrate our
 internal network state so specifically, at least I think. It'd be nice
 if we could rip out the USE port management from the API and push that
 down a layer away from the customer facing API. The end result is we
 have internal services Metal API, MCNs, VMaaS all building on our
 integrated network, but we still just have a single source of truth
 for allocating the shared resources.
 Sangeetha got a slice of VNIs and (eventually will have) USE ports for
 them to build the initial MCNS product, but eventually we'll want to
 bring those VNIs and ports under control of a single service, so we
 don't have multiple bookkeeping spots for the same resources.
 Jarrod's initial plan was to just build that in to the Metal API, but
 if we can,
--- a/design/nimf-m2.org
+++ b/design/nimf-m2.org
@@ -0,0 +1,122 @@
 #+TITLE: NIMF Milestone 2
 #+SUBTITLE: Authentication and Authorization
 #+AUTHOR: Adam Mohammed
 * Overview
 This document discusses the authentication and authorization between Metal
 and Fabric focussed on the customer's experience. We want to deliver a
 seamless user experience that allows users to set up connections
 directly from Metal to any of the Cloud Service Providers(CSPs) they
 leverage.
 * Authentication
 ** Metal
 There are a number of ways to authenticate to Metal, but ultimately it
 comes down to the mode that the customer wishes to use to access their
 resources. The main methods are directly as a user signed in to a web
 portal and directly against the API.
 Portal access is done by having the OAuth flow which lets the browser
 obtain a JWT that can be used to authenticate against the Metal
 APIs. It's important to understand that the Portal doesn't make calls
 as itself on behalf of the user, but the user themselves are making
 the calls by way of their browser.
 Direct API access is done either through static API keys issued to a
 user, or a project. Integrations through tooling or libraries built
 for the language are also provided.
 ** Fabric
 * Authorization
 ** Metal
 ** Fabric
 Option 4 - Asynchronous Events
 Highlights:
 - Fabric no longer makes direct calls to Metal, it only announces that the connection is ready
 - Messages are authenticated with JWT
 - Metal consumes the events and modifies the state of resources as a controller
 Option 5  - Callback/Webhook
 Highlights
    Similar to Option 4, though the infrastructure is provided by Metal
    Fabric instead emits a similarly shaped event that says connections state have changed
    It’s Metal’s responsibiity to consume that and respond accordingly
 Changes Required
    Fabric sends updates to this webhook URL
    Metal consumes messages on that URL and handles them accordingly
    Metal provides way to see current and desired state
 Advantages
 Disadvantages
 * Documents
 ** Equinix Interconnections
 Metal provided interconnections early on to give customers access to the
 network capabilities provided by Fabric and Network Edge.
 There currently two basic types of interconnections, a dedicated
 interconnection and a shared one. The dedicated version as it sounds
 uses dedicated port infrastructure that the customer owns. This is
 often cost prohibitive so interconnections over Equinix owned shared
 infrastructure fills that space.
 The dedicated interconnection types have relatively simple logic in
 the API relative to shared interconnections. A dedicated
 interconnection gives you a layer 2 connection and that's all, the
 rest is on the customer to manage.
 Shared connections connect metal to other networks either through
 layer 2 or layer 3.
 Layer 2 interconnections are created using either the
 =VlanFabricVCCreateInput= or the =SharedPortVCVlanCreateInput=. The
 former provides the interconnection using service tokens, used by
 Metal to poll the status of the interconnections. These allowed us to
 provide customers with connectivity, but a poor experience because if
 you look at the connection in Fabric, it's not clear how it relates to
 Metal resources.
 The =SharedPortVCVlanCreateInput= allows Fabric access to the related
 network resources on the Metal side which means managing these network
 resources on Fabric is a little bit easier. This type of
 interconnection did some groundwork to bring our physical and logical
 networks between Metal and Fabric closer together, but that's mostly
 invisible to the customer, but enables us to build products on our
 network infrastructure that weren't previously possible.
 Currently, both methods of creating these interconnections exist,
 until we can deprecate the =VlanFabricVCCreateInput=. The
 =SharedPortVCVlanCreateInput= type is only capable of layer 2
 interconnections to Amazon Web Services. This new input type allows
 fabric to start supporting more layer 2 connectivity without requiring
 any work on the Metal side. Once we reach parity with the connection
 destinations of =VlanFabricVCCreateInput= we can deprecate this input
 type.
 Layer 3 interconnections are created by passing the
 =VrfFabricVCCreateInput= to the interconnections endpoint. These
 isolate customer traffic by routing table instead of through VLAN
 tags.