Add the rambling on capability systems

This commit is contained in:
2024-07-30 08:55:11 -04:00
parent 1ddc9f19f1
commit 1da31679cb

View File

@@ -0,0 +1,152 @@
* Bootstrapping trust in a capability model
There are two basic ways to start the chain of trust with a capability
model, either the resource server is started with a set of root
capabilities that governs all the resources, or ambient authority is
used to provide the initial trust.
Let's take the IP example further, some IPAM service is supposed to
govern the RFC1918 space for Equinix. Its provides an API for
downstream services to request blocks of arbitrary size, so they can
further allocate smaller blocks from those blocks.
I think the easiest way is just to use ACLs for the initial set of
capabilities, and once the service is live, the majority of requests
would be using wrapped resources. Let's say this IPAM service allows
creation of "root" ranges through a create range API.
An operator could create the range for 10.0.0.0/8. And then create a
wrapped resource to delegate to downstream services.
If MCN, Metal and Fabric are all interested in sharing this IP space,
we could have the service request a IP range of a specific size. Then
the operator could create wrapped resources for larger ranges for each
of the business units, and then hand those to the operators for the
MCN/Metal and Fabric services.
Once the dependent service gets their wrapped resource, they can
further divide the resources if they have multiple services that want
to allocate from distinct pools within that space, or they can all
share the capability as-is.
The dependent service could then make direct calls to the IPAM service
to make "assignments" in the IPAM service to mark that that IP is
currently in use within the larger range.
Eventually, we want to get away from this operator X does operation
for operator Y, because it means that
Let's assume we made an IPAM service that has the following endpoints:
- CREATE IP Range
Adds an entry to allow the IPAM service to govern the range
Returns a resource ID
- LIST IP Ranges
Lists all the ranges governed by the IPAM service
- GET IP RANGE
Shows details about the IP range, such as how much of the range is
allocated.
Can be accessed by either by ACL, or capability
- DELETE IP Range
Remove an IP range from being governed by the IPAM service
- CREATE IP Range Request
Request a capability which lets a service allocate from this IP Range
- GET/LIST IP Range Request
Show the status of a request
- PUT IP Range Request
Allows approving/denying the request
- DELETE IP Range Request
Removing an IP range request
- CREATE IP Assignment
Only accepts a wrapped resource, marks IP Address or subnet as allocated.
Now we consider how we get to be able to start using
capabilities. Initially, an operator needs to start the service by
creating some IP ranges that the IPAM service is responsible for. This
endpoint can use ACLs to check that the operator has the authorization
to create ranges, and then the service can allow requests.
Next, some service, like the Metal Provisioner needs to assign IPs to
instances so they can talk to each other over the private
network. Initially the provisioner doesn't have access to any IP
ranges, so it sends a request for a /16. That /16 request is then
approved by an IPAM operator, and the provisioner receives a
capability that allows manipulating assignments on that range.
The IAM operator portion could be removed
----
IPAM Worked Example
Let's assume we have an IPAM system which governs 10.0.0.0/8, and
other IP blocks. We have a service, such as LBaaS which needs to
assign Private IPs to customer Load balancer instances. The LBaaS
service needs to assign unique IPs to the load balancer instances so
that customer can route traffic to their metal instances.
The LB service needs to reach out to the IPAM service to pull an IP,
and to do that, it must request it within a block represented by a
wrapped resource. So how does the service initially obtain this
wrapped resource?
On first startup, the LBaaS service knows it doesn't have the
capability to assign IPs becasue it doesn't have a wrapped resource
for the range. It reaches out authenticated as itself to the IPAM
service, and requests a =/16=. That request is authorized just by the
fact that the LB service has the correct audience to talk to the IPAM
service.
The request is recorded, and some approval process is done by the IPAM
operators, or is determined by buisiness logic. Once approved, the
wrapped resource for the requested range is issued to the LBaaS
service, which it stores. Now, whenever an IP is needed, it makes an
assignment under that wrapped resource.
Internally, the IPAM service needs to record that a block is currently
active, and that the capability sent to the LB service references
it. As an example, let's say the 10.0.0.0/8 is represented by the root
resource identifier `ntwkblk-a1b2c3`. When the LB service requests a
=/16=, a new IP reservation resource is created `ntwkipr-xyzxyz`, and
once approved, a capability is created, by calling,
WrapResource(ntwkipr-xyzxyz, [create_assignment, read_assignment, delete_assignment],
{}), which produces a wrapped resource with ID
`ntwkipr-u8e82i.qeoalf` and the IPAM service distributes this back to
the LB service.
When the LB service wishes to record an assignment to that block, it
can make a request to the IPAM services assignment endpoint,
(e.g. POST /ip-reservations/ntwkipr-u8e82i.qeoalf/assignments). From
there, the IPAM service calls, UnwrapResource(ntwkipr-u8e82i.qeoalf,
[create_assignment], {}), which succeeds because the wrapped resource
is valid, the verifier matches, and the operation is allowed for that
ID. And the assignment is created.
This example describes a manual approval process and doesn't
necessarily describe how the async process is implemented for yieling
the capability back to the requesting service. The manual approval
process could easily be replaced by setting limits per identity, and
requiring manual approval for higher limits, e.g. Any product can
request a up to a /24, but if you want anything larger, you'll need
manual approval by the governing team. In that case, the system
becomes more dynamic and teams can self-serve their requests. The
distribution of the capability must happen over a secure channel as
well, such as a NATS topic that only the requesting service has access
to, or by direct callback API.
Futher delegation is possible as well, where the LB service could ask
the IPAM service to wrap `ntwkipr-u8e82i-qeoalf` another time, but
this time only to perform `read_assignment` and then the LB team can
create operator tools to find details about the assignment from the
IPAM service without having the ability to do damage.