Add the rambling on capability systems
This commit is contained in:
152
equinix/design/capability-systems.org
Normal file
152
equinix/design/capability-systems.org
Normal file
@@ -0,0 +1,152 @@
|
|||||||
|
* Bootstrapping trust in a capability model
|
||||||
|
|
||||||
|
There are two basic ways to start the chain of trust with a capability
|
||||||
|
model, either the resource server is started with a set of root
|
||||||
|
capabilities that governs all the resources, or ambient authority is
|
||||||
|
used to provide the initial trust.
|
||||||
|
|
||||||
|
Let's take the IP example further, some IPAM service is supposed to
|
||||||
|
govern the RFC1918 space for Equinix. Its provides an API for
|
||||||
|
downstream services to request blocks of arbitrary size, so they can
|
||||||
|
further allocate smaller blocks from those blocks.
|
||||||
|
|
||||||
|
I think the easiest way is just to use ACLs for the initial set of
|
||||||
|
capabilities, and once the service is live, the majority of requests
|
||||||
|
would be using wrapped resources. Let's say this IPAM service allows
|
||||||
|
creation of "root" ranges through a create range API.
|
||||||
|
An operator could create the range for 10.0.0.0/8. And then create a
|
||||||
|
wrapped resource to delegate to downstream services.
|
||||||
|
|
||||||
|
If MCN, Metal and Fabric are all interested in sharing this IP space,
|
||||||
|
we could have the service request a IP range of a specific size. Then
|
||||||
|
the operator could create wrapped resources for larger ranges for each
|
||||||
|
of the business units, and then hand those to the operators for the
|
||||||
|
MCN/Metal and Fabric services.
|
||||||
|
|
||||||
|
Once the dependent service gets their wrapped resource, they can
|
||||||
|
further divide the resources if they have multiple services that want
|
||||||
|
to allocate from distinct pools within that space, or they can all
|
||||||
|
share the capability as-is.
|
||||||
|
|
||||||
|
The dependent service could then make direct calls to the IPAM service
|
||||||
|
to make "assignments" in the IPAM service to mark that that IP is
|
||||||
|
currently in use within the larger range.
|
||||||
|
|
||||||
|
Eventually, we want to get away from this operator X does operation
|
||||||
|
for operator Y, because it means that
|
||||||
|
|
||||||
|
|
||||||
|
Let's assume we made an IPAM service that has the following endpoints:
|
||||||
|
|
||||||
|
- CREATE IP Range
|
||||||
|
Adds an entry to allow the IPAM service to govern the range
|
||||||
|
Returns a resource ID
|
||||||
|
- LIST IP Ranges
|
||||||
|
Lists all the ranges governed by the IPAM service
|
||||||
|
- GET IP RANGE
|
||||||
|
Shows details about the IP range, such as how much of the range is
|
||||||
|
allocated.
|
||||||
|
|
||||||
|
Can be accessed by either by ACL, or capability
|
||||||
|
|
||||||
|
- DELETE IP Range
|
||||||
|
Remove an IP range from being governed by the IPAM service
|
||||||
|
|
||||||
|
- CREATE IP Range Request
|
||||||
|
Request a capability which lets a service allocate from this IP Range
|
||||||
|
|
||||||
|
- GET/LIST IP Range Request
|
||||||
|
Show the status of a request
|
||||||
|
|
||||||
|
- PUT IP Range Request
|
||||||
|
Allows approving/denying the request
|
||||||
|
|
||||||
|
- DELETE IP Range Request
|
||||||
|
Removing an IP range request
|
||||||
|
|
||||||
|
- CREATE IP Assignment
|
||||||
|
Only accepts a wrapped resource, marks IP Address or subnet as allocated.
|
||||||
|
|
||||||
|
|
||||||
|
Now we consider how we get to be able to start using
|
||||||
|
capabilities. Initially, an operator needs to start the service by
|
||||||
|
creating some IP ranges that the IPAM service is responsible for. This
|
||||||
|
endpoint can use ACLs to check that the operator has the authorization
|
||||||
|
to create ranges, and then the service can allow requests.
|
||||||
|
|
||||||
|
Next, some service, like the Metal Provisioner needs to assign IPs to
|
||||||
|
instances so they can talk to each other over the private
|
||||||
|
network. Initially the provisioner doesn't have access to any IP
|
||||||
|
ranges, so it sends a request for a /16. That /16 request is then
|
||||||
|
approved by an IPAM operator, and the provisioner receives a
|
||||||
|
capability that allows manipulating assignments on that range.
|
||||||
|
|
||||||
|
|
||||||
|
The IAM operator portion could be removed
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
IPAM Worked Example
|
||||||
|
|
||||||
|
Let's assume we have an IPAM system which governs 10.0.0.0/8, and
|
||||||
|
other IP blocks. We have a service, such as LBaaS which needs to
|
||||||
|
assign Private IPs to customer Load balancer instances. The LBaaS
|
||||||
|
service needs to assign unique IPs to the load balancer instances so
|
||||||
|
that customer can route traffic to their metal instances.
|
||||||
|
|
||||||
|
The LB service needs to reach out to the IPAM service to pull an IP,
|
||||||
|
and to do that, it must request it within a block represented by a
|
||||||
|
wrapped resource. So how does the service initially obtain this
|
||||||
|
wrapped resource?
|
||||||
|
|
||||||
|
On first startup, the LBaaS service knows it doesn't have the
|
||||||
|
capability to assign IPs becasue it doesn't have a wrapped resource
|
||||||
|
for the range. It reaches out authenticated as itself to the IPAM
|
||||||
|
service, and requests a =/16=. That request is authorized just by the
|
||||||
|
fact that the LB service has the correct audience to talk to the IPAM
|
||||||
|
service.
|
||||||
|
|
||||||
|
The request is recorded, and some approval process is done by the IPAM
|
||||||
|
operators, or is determined by buisiness logic. Once approved, the
|
||||||
|
wrapped resource for the requested range is issued to the LBaaS
|
||||||
|
service, which it stores. Now, whenever an IP is needed, it makes an
|
||||||
|
assignment under that wrapped resource.
|
||||||
|
|
||||||
|
Internally, the IPAM service needs to record that a block is currently
|
||||||
|
active, and that the capability sent to the LB service references
|
||||||
|
it. As an example, let's say the 10.0.0.0/8 is represented by the root
|
||||||
|
resource identifier `ntwkblk-a1b2c3`. When the LB service requests a
|
||||||
|
=/16=, a new IP reservation resource is created `ntwkipr-xyzxyz`, and
|
||||||
|
once approved, a capability is created, by calling,
|
||||||
|
WrapResource(ntwkipr-xyzxyz, [create_assignment, read_assignment, delete_assignment],
|
||||||
|
{}), which produces a wrapped resource with ID
|
||||||
|
`ntwkipr-u8e82i.qeoalf` and the IPAM service distributes this back to
|
||||||
|
the LB service.
|
||||||
|
|
||||||
|
When the LB service wishes to record an assignment to that block, it
|
||||||
|
can make a request to the IPAM services assignment endpoint,
|
||||||
|
(e.g. POST /ip-reservations/ntwkipr-u8e82i.qeoalf/assignments). From
|
||||||
|
there, the IPAM service calls, UnwrapResource(ntwkipr-u8e82i.qeoalf,
|
||||||
|
[create_assignment], {}), which succeeds because the wrapped resource
|
||||||
|
is valid, the verifier matches, and the operation is allowed for that
|
||||||
|
ID. And the assignment is created.
|
||||||
|
|
||||||
|
This example describes a manual approval process and doesn't
|
||||||
|
necessarily describe how the async process is implemented for yieling
|
||||||
|
the capability back to the requesting service. The manual approval
|
||||||
|
process could easily be replaced by setting limits per identity, and
|
||||||
|
requiring manual approval for higher limits, e.g. Any product can
|
||||||
|
request a up to a /24, but if you want anything larger, you'll need
|
||||||
|
manual approval by the governing team. In that case, the system
|
||||||
|
becomes more dynamic and teams can self-serve their requests. The
|
||||||
|
distribution of the capability must happen over a secure channel as
|
||||||
|
well, such as a NATS topic that only the requesting service has access
|
||||||
|
to, or by direct callback API.
|
||||||
|
|
||||||
|
Futher delegation is possible as well, where the LB service could ask
|
||||||
|
the IPAM service to wrap `ntwkipr-u8e82i-qeoalf` another time, but
|
||||||
|
this time only to perform `read_assignment` and then the LB team can
|
||||||
|
create operator tools to find details about the assignment from the
|
||||||
|
IPAM service without having the ability to do damage.
|
||||||
Reference in New Issue
Block a user