Add the rambling on capability systems
This commit is contained in:
152
equinix/design/capability-systems.org
Normal file
152
equinix/design/capability-systems.org
Normal file
@@ -0,0 +1,152 @@
|
||||
* Bootstrapping trust in a capability model
|
||||
|
||||
There are two basic ways to start the chain of trust with a capability
|
||||
model, either the resource server is started with a set of root
|
||||
capabilities that governs all the resources, or ambient authority is
|
||||
used to provide the initial trust.
|
||||
|
||||
Let's take the IP example further, some IPAM service is supposed to
|
||||
govern the RFC1918 space for Equinix. Its provides an API for
|
||||
downstream services to request blocks of arbitrary size, so they can
|
||||
further allocate smaller blocks from those blocks.
|
||||
|
||||
I think the easiest way is just to use ACLs for the initial set of
|
||||
capabilities, and once the service is live, the majority of requests
|
||||
would be using wrapped resources. Let's say this IPAM service allows
|
||||
creation of "root" ranges through a create range API.
|
||||
An operator could create the range for 10.0.0.0/8. And then create a
|
||||
wrapped resource to delegate to downstream services.
|
||||
|
||||
If MCN, Metal and Fabric are all interested in sharing this IP space,
|
||||
we could have the service request a IP range of a specific size. Then
|
||||
the operator could create wrapped resources for larger ranges for each
|
||||
of the business units, and then hand those to the operators for the
|
||||
MCN/Metal and Fabric services.
|
||||
|
||||
Once the dependent service gets their wrapped resource, they can
|
||||
further divide the resources if they have multiple services that want
|
||||
to allocate from distinct pools within that space, or they can all
|
||||
share the capability as-is.
|
||||
|
||||
The dependent service could then make direct calls to the IPAM service
|
||||
to make "assignments" in the IPAM service to mark that that IP is
|
||||
currently in use within the larger range.
|
||||
|
||||
Eventually, we want to get away from this operator X does operation
|
||||
for operator Y, because it means that
|
||||
|
||||
|
||||
Let's assume we made an IPAM service that has the following endpoints:
|
||||
|
||||
- CREATE IP Range
|
||||
Adds an entry to allow the IPAM service to govern the range
|
||||
Returns a resource ID
|
||||
- LIST IP Ranges
|
||||
Lists all the ranges governed by the IPAM service
|
||||
- GET IP RANGE
|
||||
Shows details about the IP range, such as how much of the range is
|
||||
allocated.
|
||||
|
||||
Can be accessed by either by ACL, or capability
|
||||
|
||||
- DELETE IP Range
|
||||
Remove an IP range from being governed by the IPAM service
|
||||
|
||||
- CREATE IP Range Request
|
||||
Request a capability which lets a service allocate from this IP Range
|
||||
|
||||
- GET/LIST IP Range Request
|
||||
Show the status of a request
|
||||
|
||||
- PUT IP Range Request
|
||||
Allows approving/denying the request
|
||||
|
||||
- DELETE IP Range Request
|
||||
Removing an IP range request
|
||||
|
||||
- CREATE IP Assignment
|
||||
Only accepts a wrapped resource, marks IP Address or subnet as allocated.
|
||||
|
||||
|
||||
Now we consider how we get to be able to start using
|
||||
capabilities. Initially, an operator needs to start the service by
|
||||
creating some IP ranges that the IPAM service is responsible for. This
|
||||
endpoint can use ACLs to check that the operator has the authorization
|
||||
to create ranges, and then the service can allow requests.
|
||||
|
||||
Next, some service, like the Metal Provisioner needs to assign IPs to
|
||||
instances so they can talk to each other over the private
|
||||
network. Initially the provisioner doesn't have access to any IP
|
||||
ranges, so it sends a request for a /16. That /16 request is then
|
||||
approved by an IPAM operator, and the provisioner receives a
|
||||
capability that allows manipulating assignments on that range.
|
||||
|
||||
|
||||
The IAM operator portion could be removed
|
||||
|
||||
|
||||
|
||||
----
|
||||
|
||||
IPAM Worked Example
|
||||
|
||||
Let's assume we have an IPAM system which governs 10.0.0.0/8, and
|
||||
other IP blocks. We have a service, such as LBaaS which needs to
|
||||
assign Private IPs to customer Load balancer instances. The LBaaS
|
||||
service needs to assign unique IPs to the load balancer instances so
|
||||
that customer can route traffic to their metal instances.
|
||||
|
||||
The LB service needs to reach out to the IPAM service to pull an IP,
|
||||
and to do that, it must request it within a block represented by a
|
||||
wrapped resource. So how does the service initially obtain this
|
||||
wrapped resource?
|
||||
|
||||
On first startup, the LBaaS service knows it doesn't have the
|
||||
capability to assign IPs becasue it doesn't have a wrapped resource
|
||||
for the range. It reaches out authenticated as itself to the IPAM
|
||||
service, and requests a =/16=. That request is authorized just by the
|
||||
fact that the LB service has the correct audience to talk to the IPAM
|
||||
service.
|
||||
|
||||
The request is recorded, and some approval process is done by the IPAM
|
||||
operators, or is determined by buisiness logic. Once approved, the
|
||||
wrapped resource for the requested range is issued to the LBaaS
|
||||
service, which it stores. Now, whenever an IP is needed, it makes an
|
||||
assignment under that wrapped resource.
|
||||
|
||||
Internally, the IPAM service needs to record that a block is currently
|
||||
active, and that the capability sent to the LB service references
|
||||
it. As an example, let's say the 10.0.0.0/8 is represented by the root
|
||||
resource identifier `ntwkblk-a1b2c3`. When the LB service requests a
|
||||
=/16=, a new IP reservation resource is created `ntwkipr-xyzxyz`, and
|
||||
once approved, a capability is created, by calling,
|
||||
WrapResource(ntwkipr-xyzxyz, [create_assignment, read_assignment, delete_assignment],
|
||||
{}), which produces a wrapped resource with ID
|
||||
`ntwkipr-u8e82i.qeoalf` and the IPAM service distributes this back to
|
||||
the LB service.
|
||||
|
||||
When the LB service wishes to record an assignment to that block, it
|
||||
can make a request to the IPAM services assignment endpoint,
|
||||
(e.g. POST /ip-reservations/ntwkipr-u8e82i.qeoalf/assignments). From
|
||||
there, the IPAM service calls, UnwrapResource(ntwkipr-u8e82i.qeoalf,
|
||||
[create_assignment], {}), which succeeds because the wrapped resource
|
||||
is valid, the verifier matches, and the operation is allowed for that
|
||||
ID. And the assignment is created.
|
||||
|
||||
This example describes a manual approval process and doesn't
|
||||
necessarily describe how the async process is implemented for yieling
|
||||
the capability back to the requesting service. The manual approval
|
||||
process could easily be replaced by setting limits per identity, and
|
||||
requiring manual approval for higher limits, e.g. Any product can
|
||||
request a up to a /24, but if you want anything larger, you'll need
|
||||
manual approval by the governing team. In that case, the system
|
||||
becomes more dynamic and teams can self-serve their requests. The
|
||||
distribution of the capability must happen over a secure channel as
|
||||
well, such as a NATS topic that only the requesting service has access
|
||||
to, or by direct callback API.
|
||||
|
||||
Futher delegation is possible as well, where the LB service could ask
|
||||
the IPAM service to wrap `ntwkipr-u8e82i-qeoalf` another time, but
|
||||
this time only to perform `read_assignment` and then the LB team can
|
||||
create operator tools to find details about the assignment from the
|
||||
IPAM service without having the ability to do damage.
|
||||
Reference in New Issue
Block a user