Files
org-notes/equinix/design/capability-systems.org

6.5 KiB

Bootstrapping trust in a capability model

There are two basic ways to start the chain of trust with a capability model, either the resource server is started with a set of root capabilities that governs all the resources, or ambient authority is used to provide the initial trust.

Let's take the IP example further, some IPAM service is supposed to govern the RFC1918 space for Equinix. Its provides an API for downstream services to request blocks of arbitrary size, so they can further allocate smaller blocks from those blocks.

I think the easiest way is just to use ACLs for the initial set of capabilities, and once the service is live, the majority of requests would be using wrapped resources. Let's say this IPAM service allows creation of "root" ranges through a create range API. An operator could create the range for 10.0.0.0/8. And then create a wrapped resource to delegate to downstream services.

If MCN, Metal and Fabric are all interested in sharing this IP space, we could have the service request a IP range of a specific size. Then the operator could create wrapped resources for larger ranges for each of the business units, and then hand those to the operators for the MCN/Metal and Fabric services.

Once the dependent service gets their wrapped resource, they can further divide the resources if they have multiple services that want to allocate from distinct pools within that space, or they can all share the capability as-is.

The dependent service could then make direct calls to the IPAM service to make "assignments" in the IPAM service to mark that that IP is currently in use within the larger range.

Eventually, we want to get away from this operator X does operation for operator Y, because it means that

Let's assume we made an IPAM service that has the following endpoints:

  • CREATE IP Range Adds an entry to allow the IPAM service to govern the range Returns a resource ID
  • LIST IP Ranges Lists all the ranges governed by the IPAM service
  • GET IP RANGE Shows details about the IP range, such as how much of the range is allocated. Can be accessed by either by ACL, or capability
  • DELETE IP Range Remove an IP range from being governed by the IPAM service
  • CREATE IP Range Request Request a capability which lets a service allocate from this IP Range
  • GET/LIST IP Range Request Show the status of a request
  • PUT IP Range Request Allows approving/denying the request
  • DELETE IP Range Request Removing an IP range request
  • CREATE IP Assignment Only accepts a wrapped resource, marks IP Address or subnet as allocated.

Now we consider how we get to be able to start using capabilities. Initially, an operator needs to start the service by creating some IP ranges that the IPAM service is responsible for. This endpoint can use ACLs to check that the operator has the authorization to create ranges, and then the service can allow requests.

Next, some service, like the Metal Provisioner needs to assign IPs to instances so they can talk to each other over the private network. Initially the provisioner doesn't have access to any IP ranges, so it sends a request for a /16. That /16 request is then approved by an IPAM operator, and the provisioner receives a capability that allows manipulating assignments on that range.

The IAM operator portion could be removed

—-

IPAM Worked Example

Let's assume we have an IPAM system which governs 10.0.0.0/8, and other IP blocks. We have a service, such as LBaaS which needs to assign Private IPs to customer Load balancer instances. The LBaaS service needs to assign unique IPs to the load balancer instances so that customer can route traffic to their metal instances.

The LB service needs to reach out to the IPAM service to pull an IP, and to do that, it must request it within a block represented by a wrapped resource. So how does the service initially obtain this wrapped resource?

On first startup, the LBaaS service knows it doesn't have the capability to assign IPs becasue it doesn't have a wrapped resource for the range. It reaches out authenticated as itself to the IPAM service, and requests a /16. That request is authorized just by the fact that the LB service has the correct audience to talk to the IPAM service.

The request is recorded, and some approval process is done by the IPAM operators, or is determined by buisiness logic. Once approved, the wrapped resource for the requested range is issued to the LBaaS service, which it stores. Now, whenever an IP is needed, it makes an assignment under that wrapped resource.

Internally, the IPAM service needs to record that a block is currently active, and that the capability sent to the LB service references it. As an example, let's say the 10.0.0.0/8 is represented by the root resource identifier `ntwkblk-a1b2c3`. When the LB service requests a /16, a new IP reservation resource is created `ntwkipr-xyzxyz`, and once approved, a capability is created, by calling, WrapResource(ntwkipr-xyzxyz, [create_assignment, read_assignment, delete_assignment], {}), which produces a wrapped resource with ID `ntwkipr-u8e82i.qeoalf` and the IPAM service distributes this back to the LB service.

When the LB service wishes to record an assignment to that block, it can make a request to the IPAM services assignment endpoint, (e.g. POST /ip-reservations/ntwkipr-u8e82i.qeoalf/assignments). From there, the IPAM service calls, UnwrapResource(ntwkipr-u8e82i.qeoalf, [create_assignment], {}), which succeeds because the wrapped resource is valid, the verifier matches, and the operation is allowed for that ID. And the assignment is created.

This example describes a manual approval process and doesn't necessarily describe how the async process is implemented for yieling the capability back to the requesting service. The manual approval process could easily be replaced by setting limits per identity, and requiring manual approval for higher limits, e.g. Any product can request a up to a /24, but if you want anything larger, you'll need manual approval by the governing team. In that case, the system becomes more dynamic and teams can self-serve their requests. The distribution of the capability must happen over a secure channel as well, such as a NATS topic that only the requesting service has access to, or by direct callback API.

Futher delegation is possible as well, where the LB service could ask the IPAM service to wrap `ntwkipr-u8e82i-qeoalf` another time, but this time only to perform `read_assignment` and then the LB team can create operator tools to find details about the assignment from the IPAM service without having the ability to do damage.