Some new design docs

This commit is contained in:
2024-04-20 10:13:39 -04:00
parent 12cf3967ee
commit 0e9dbebd6a
4 changed files with 225 additions and 0 deletions

View File

@@ -0,0 +1,34 @@
#+TITLE: How do Interconnections work for dummiez
#+Author: Adam Mohammed
* User Flows
User starts by making a API call to ~POST
/projects/:id/connections~. When they make this request they are able
to either able to use a full dedicated port, on which they get the
full bandwidth, or they can use a shared port. The dedicated port
promises you get the full bandwidth, but is more costly.
A user is also able to to select whether the connection at metal is
the A side or the Z side. If it's the A-side, then Metal does the
billing, if it's the Z-side, Fabric takes care of the billing.
A-side/Z-Side is a telecom terminology, where the A-side is the
requester and the Z side is the destination. So in the case of
connecting to a CSP, we're concerned with a-side from metal because
that means we're making use of Fabric as a service provider to give us
connection to the CSP within the metro.
If we were making z-side connnections, we'd be granting someone else
in the DC access to our networks.
* Under the Hood
when the request comes in we create
- An interconnection object to represent the request
- Virtual Ports
- Virtual circuits associated with each port
- A service token for each

View File

@@ -0,0 +1,43 @@
#+TITLE: Metal Event Entrypoint
#+AUTHOR: Adam Mohammed
* Problem
We would like other parts of the company to be able to notify Metal about
changes to infrastructure that crosses out of the Metal's business
domain. The concrete example here is for Fabric to tell metal about
the state of interconnections.
* Solution
Metal's API team would like to expose a message bus to receive events
from the rest of the organization.
Metal's API currently sits on top of a RabbitMQ cluster, and we'd like
to leverage that infrastructure. There are a couple of problems we
need to solve before we can expose the RabbbitMQ cluster.
1. RabbitMQ is currently only available within the cluster.
2. Fabric (and other interested parties) exist outside of Metal
firewalls that allow traffic into the K8s clusters.
3. We need to limit blast radius if something were to happen on this shared
infrastructure, we don't want the main operations on Rabbit that Metal
relies on to be impacted.
For 1, the answer is simple expose a path under
`api.core-a.ny5.metalkube.net` that points to the rabbit service.
For 2, we leverage the fact that CF and Akamai are whitelisted to the
metal K8s clusters for the domains `api.packet.net` and
`api.equinix.com/metal/v1`. This covers getting the cluster exposed to
the internet
For 3, we can make use of RabbitMQ [[https://www.rabbitmq.com/vhosts.html][Virtual Hosts]] to isolate the
/foreign/ traffic to that host. This let's us set up separate
authentication and authorization policies (such as using Identity-API
via [[https://www.rabbitmq.com/oauth2.html][OAuth]] plugin) which are absolutely
necessary since now the core infrastructure is on the internet. We are
also able to limit resource usage by Vhost to prevent attackers from
affecting the core API workload.

View File

@@ -0,0 +1,26 @@
Ok, so I met with Sangeetha and Bob from MCNS and I think I have an
idea of what needs to happen for our integrated network for us to
build things like MCNS and VMaaS.
First, you just need two things to be able to integrate at the
boundaries of Metal and Fabric, you need a VNI and you need a USE
port. Metal already has a service which allocates VNIs, so I was
wondering why Jarrod might not have told MCNS about it. Since VNIs and
USE ports are both shared resources that we want a single bookkeeper
over, there's only one logical point to do that today, and that's the
Metal API.
In a perfect world though, the Metal API doesn't orchestrate our
internal network state so specifically, at least I think. It'd be nice
if we could rip out the USE port management from the API and push that
down a layer away from the customer facing API. The end result is we
have internal services Metal API, MCNs, VMaaS all building on our
integrated network, but we still just have a single source of truth
for allocating the shared resources.
Sangeetha got a slice of VNIs and (eventually will have) USE ports for
them to build the initial MCNS product, but eventually we'll want to
bring those VNIs and ports under control of a single service, so we
don't have multiple bookkeeping spots for the same resources.
Jarrod's initial plan was to just build that in to the Metal API, but
if we can,

122
design/nimf-m2.org Normal file
View File

@@ -0,0 +1,122 @@
#+TITLE: NIMF Milestone 2
#+SUBTITLE: Authentication and Authorization
#+AUTHOR: Adam Mohammed
* Overview
This document discusses the authentication and authorization between Metal
and Fabric focussed on the customer's experience. We want to deliver a
seamless user experience that allows users to set up connections
directly from Metal to any of the Cloud Service Providers(CSPs) they
leverage.
* Authentication
** Metal
There are a number of ways to authenticate to Metal, but ultimately it
comes down to the mode that the customer wishes to use to access their
resources. The main methods are directly as a user signed in to a web
portal and directly against the API.
Portal access is done by having the OAuth flow which lets the browser
obtain a JWT that can be used to authenticate against the Metal
APIs. It's important to understand that the Portal doesn't make calls
as itself on behalf of the user, but the user themselves are making
the calls by way of their browser.
Direct API access is done either through static API keys issued to a
user, or a project. Integrations through tooling or libraries built
for the language are also provided.
** Fabric
* Authorization
** Metal
** Fabric
Option 4 - Asynchronous Events
Highlights:
- Fabric no longer makes direct calls to Metal, it only announces that the connection is ready
- Messages are authenticated with JWT
- Metal consumes the events and modifies the state of resources as a controller
Option 5 - Callback/Webhook
Highlights
Similar to Option 4, though the infrastructure is provided by Metal
Fabric instead emits a similarly shaped event that says connections state have changed
Its Metals responsibiity to consume that and respond accordingly
Changes Required
Fabric sends updates to this webhook URL
Metal consumes messages on that URL and handles them accordingly
Metal provides way to see current and desired state
Advantages
Disadvantages
* Documents
** Equinix Interconnections
Metal provided interconnections early on to give customers access to the
network capabilities provided by Fabric and Network Edge.
There currently two basic types of interconnections, a dedicated
interconnection and a shared one. The dedicated version as it sounds
uses dedicated port infrastructure that the customer owns. This is
often cost prohibitive so interconnections over Equinix owned shared
infrastructure fills that space.
The dedicated interconnection types have relatively simple logic in
the API relative to shared interconnections. A dedicated
interconnection gives you a layer 2 connection and that's all, the
rest is on the customer to manage.
Shared connections connect metal to other networks either through
layer 2 or layer 3.
Layer 2 interconnections are created using either the
=VlanFabricVCCreateInput= or the =SharedPortVCVlanCreateInput=. The
former provides the interconnection using service tokens, used by
Metal to poll the status of the interconnections. These allowed us to
provide customers with connectivity, but a poor experience because if
you look at the connection in Fabric, it's not clear how it relates to
Metal resources.
The =SharedPortVCVlanCreateInput= allows Fabric access to the related
network resources on the Metal side which means managing these network
resources on Fabric is a little bit easier. This type of
interconnection did some groundwork to bring our physical and logical
networks between Metal and Fabric closer together, but that's mostly
invisible to the customer, but enables us to build products on our
network infrastructure that weren't previously possible.
Currently, both methods of creating these interconnections exist,
until we can deprecate the =VlanFabricVCCreateInput=. The
=SharedPortVCVlanCreateInput= type is only capable of layer 2
interconnections to Amazon Web Services. This new input type allows
fabric to start supporting more layer 2 connectivity without requiring
any work on the Metal side. Once we reach parity with the connection
destinations of =VlanFabricVCCreateInput= we can deprecate this input
type.
Layer 3 interconnections are created by passing the
=VrfFabricVCCreateInput= to the interconnections endpoint. These
isolate customer traffic by routing table instead of through VLAN
tags.