more cleanup

This commit is contained in:
2024-04-20 10:23:31 -04:00
parent b4f4565894
commit 4daae4e756
10 changed files with 0 additions and 75 deletions

View File

@@ -0,0 +1,47 @@
#+TITLE: K8s concepts review
#+AUTHOR: Adam Mohammed
#+DATE: September 18, 2023
At one of the meetings I brought up how similar nano-metal felt to a
collection of K8s specifications that make standing up and managing
K8s clusters easier. In this document I'll cover the following topics
at a high level: Cluster API, CNI, CCM, and CSI.
First is the Cluster API, this came about as a means for creating and
managing kubernetes clusters using kubernetes it self. The cluster API
allows an operator to use a so-called "management cluster" to create
other K8s clusters known as "Workload clusters." The cluster API is
NOT part of the core K8s resources, but is implemented as a set of
custom resource definitions and controllers to actually carry out the
desired actions.
A cluster operator can use the cluster-api to create workload clusters
by relying on 3 components: bootstrap provider, infrastructure
provider, and the control plane provider. Nanometal aims at making
provisioning of bare metal machines extensible and scalable by
enabling facilities to carry out the desired operations requested by
the EMAPI. We can think of the EMAPI as the "management cluster" in
this world.
What Metal has today, maps well to the infrastructure provider, since
all the cluster-api has to do is ask for machines with a certain
configuration and the provider is responsible for making that
happen. I think for this project a bulk of this work is figuring out
how we make the infrastructure provider out of our existing
components, but let's put that aside for right now and consider the
rest of the components.
The bootstrap and the control plane providers are concepts that also
seem important to our goal. We want it to be simple for us to enter a
new facility and set up the components we need to start provisioning
hardware. The bootstrap provider, in the cluster-api concepts turns a
server provisioned with a base OS into an operating K8s node. For us,
we probably would also want some process which turns any facility or
existing datacenter, into an equinix metal managed facility.
Once we know about the facility that we need to manage, the concept of
the control plane provider maps well with the diagrams from Nanometal
so far. We'd want some component that installs the required agent and
supporting components in the facilty so we can start to be able to
provide metal services there.

View File

@@ -0,0 +1,44 @@
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc: # port 4317
http: # port 4318
processors:
batch:
filter/auditable:
spans:
include:
match_type: strict
attributes:
- key: auditable
value: "true"
transform/customer-facing:
trace_statements:
- context: resource
statements:
- 'keep_keys(attributes, ["service.name"])'
- context: scope
statements:
- 'set(name, "equinixWatch")'
- 'set(version, "1.0.0")'
- context: span
statements:
- 'keep_keys(attributes, ["http.route", "http.method", "http.status_code", "http.scheme", "http.host", "user.id", "http.user_agent"])'
- 'set(name, attributes["http.route"])'
exporters:
file:
path: /data/metrics.json
service:
pipelines:
traces:
receivers: [otlp]
processors:
- filter/auditable
- transform/customer-facing
exporters: [file]

View File

@@ -0,0 +1,94 @@
#+TITLE: Integrating Equinix Metal API with Equinix Watch
#+AUTHOR: Adam Mohammed
* Problem
Equinix Watch has defined the format for which they want to ingest
auditable events. They chose the OTLP as the protocol for
ingesting these events from services restricting their ingestion to
just the logging signal.
Normally when sending data to a collector, you would make use of the
OpenTelemetry libraries to make it easy to grab metadata about the
request and surrounding environment, without needing to manually
cobble that data together. Unfortunately, using OTEL logging as the
only signal that Equinix Watch accepts, makes adoption needlessly
painful. Ruby does not have a stable client library for OTEL logs, and
neither does Golang.
Most of the spec provided by EquinixWatch does not actually relate to
the log that we would like to provide to the customer. OTEL Logging
aims to make this simple by using the Baggage and Context APIs to
enrich the log records with information about the surrounding
environment and context. Again, the implementations for these are
incomplete and not production ready.
Until the OTEL libraries provide support for the context and baggage
propogation in the Logs API/SDK, this will data will need to be
extracted and formatted specifically for Equinix Watch, meaning the
burden of integration is higher than it needs to be. If we end up
doing this, we'll probably just fetch the same data from the span
attributes anyway, to keep things consistent.
There's absolutely no reason to do this work when we can add the logs
in a structured way to the trace and pass that through to their custom
collector. By doing this we don't need to wait for the OTEL libraries
to provide logging implementations that do what traces already
provide.
The only reason I can see not to do this is that it makes Equinix
Watch have to handle translating trace information to a format that
can be delivered to their end targets. I'd argue that's going to need
to happen anyway, so why not make use of all the wonderful tools we
have to enrich the data you have as input, so you can build complete
and interesting audit logs for you end user.
* Concerns
- Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing
OTEL collector, which is right now part of our internal tooling, and
making it part of the critical path for customer data with Equinix
Watch.
I don't understand this, of course you're going to be in your
critical path. I'm not saying to use your collector as the ONLY
collector, this is why we even have collectors. We are able to
configure where the data are exported.
- Alex: IMO internal traces are implementation details that are
subject to change and there are too many things that could go
wrong. What happens if the format of those traces changes due to
some library upgrade, or if there's memory pressure and we start
sampling events or something?
Traces being implementation details - like audit logs? There's a
reason we use standard libraries to instrument our traces. These
libraries follow OTEL Semantic Conventions so we have stable and
consistent span attributes that track data across services.
Memory pressure, this isn't solved by OLTP at all, in fact
collectors will refuse spans if they're experiencing memory pressure
to prevent getting OOMKilled. This is not an application concern,
this is an monitoring concern. You should know if your collector is
- Alex: In my experience, devs in general have a higher tolerance for
gaps and breakage in their internal tooling than what I'm willing to
have for customer-facing audit logs.
This is just poor form. If you don't trust the applications that
integrate with your application, what do you trust?
- Alex: I think customer-facing observability is net-new functionality
and, for the time being, I'm OK with putting a higher burden on
applications producing that data than "flip a flag in the collector
to redirect part of the firehose to Equinix Watch
Net-new - sure, I agree
Higher burden on applications producing the data - why though? we
can provide you a higher quality data source already instead of
hand-rolling an implementation to the the logs signal
"flip a flag in the collector" - I think this just shows illiteracy,
but we are able to control what parts are shipped to your fragile
collector.