more cleanup
This commit is contained in:
47
equinix/api-team/nanometal/k8s-concept-review.org
Normal file
47
equinix/api-team/nanometal/k8s-concept-review.org
Normal file
@@ -0,0 +1,47 @@
|
||||
#+TITLE: K8s concepts review
|
||||
#+AUTHOR: Adam Mohammed
|
||||
#+DATE: September 18, 2023
|
||||
|
||||
|
||||
At one of the meetings I brought up how similar nano-metal felt to a
|
||||
collection of K8s specifications that make standing up and managing
|
||||
K8s clusters easier. In this document I'll cover the following topics
|
||||
at a high level: Cluster API, CNI, CCM, and CSI.
|
||||
|
||||
First is the Cluster API, this came about as a means for creating and
|
||||
managing kubernetes clusters using kubernetes it self. The cluster API
|
||||
allows an operator to use a so-called "management cluster" to create
|
||||
other K8s clusters known as "Workload clusters." The cluster API is
|
||||
NOT part of the core K8s resources, but is implemented as a set of
|
||||
custom resource definitions and controllers to actually carry out the
|
||||
desired actions.
|
||||
|
||||
A cluster operator can use the cluster-api to create workload clusters
|
||||
by relying on 3 components: bootstrap provider, infrastructure
|
||||
provider, and the control plane provider. Nanometal aims at making
|
||||
provisioning of bare metal machines extensible and scalable by
|
||||
enabling facilities to carry out the desired operations requested by
|
||||
the EMAPI. We can think of the EMAPI as the "management cluster" in
|
||||
this world.
|
||||
|
||||
What Metal has today, maps well to the infrastructure provider, since
|
||||
all the cluster-api has to do is ask for machines with a certain
|
||||
configuration and the provider is responsible for making that
|
||||
happen. I think for this project a bulk of this work is figuring out
|
||||
how we make the infrastructure provider out of our existing
|
||||
components, but let's put that aside for right now and consider the
|
||||
rest of the components.
|
||||
|
||||
The bootstrap and the control plane providers are concepts that also
|
||||
seem important to our goal. We want it to be simple for us to enter a
|
||||
new facility and set up the components we need to start provisioning
|
||||
hardware. The bootstrap provider, in the cluster-api concepts turns a
|
||||
server provisioned with a base OS into an operating K8s node. For us,
|
||||
we probably would also want some process which turns any facility or
|
||||
existing datacenter, into an equinix metal managed facility.
|
||||
|
||||
Once we know about the facility that we need to manage, the concept of
|
||||
the control plane provider maps well with the diagrams from Nanometal
|
||||
so far. We'd want some component that installs the required agent and
|
||||
supporting components in the facilty so we can start to be able to
|
||||
provide metal services there.
|
||||
44
equinix/watch/collector.yaml
Normal file
44
equinix/watch/collector.yaml
Normal file
@@ -0,0 +1,44 @@
|
||||
# otel-collector-config.yaml
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc: # port 4317
|
||||
http: # port 4318
|
||||
|
||||
processors:
|
||||
batch:
|
||||
filter/auditable:
|
||||
spans:
|
||||
include:
|
||||
match_type: strict
|
||||
attributes:
|
||||
- key: auditable
|
||||
value: "true"
|
||||
|
||||
transform/customer-facing:
|
||||
trace_statements:
|
||||
- context: resource
|
||||
statements:
|
||||
- 'keep_keys(attributes, ["service.name"])'
|
||||
- context: scope
|
||||
statements:
|
||||
- 'set(name, "equinixWatch")'
|
||||
- 'set(version, "1.0.0")'
|
||||
- context: span
|
||||
statements:
|
||||
- 'keep_keys(attributes, ["http.route", "http.method", "http.status_code", "http.scheme", "http.host", "user.id", "http.user_agent"])'
|
||||
- 'set(name, attributes["http.route"])'
|
||||
|
||||
exporters:
|
||||
file:
|
||||
path: /data/metrics.json
|
||||
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors:
|
||||
- filter/auditable
|
||||
- transform/customer-facing
|
||||
|
||||
exporters: [file]
|
||||
94
equinix/watch/integration.org
Normal file
94
equinix/watch/integration.org
Normal file
@@ -0,0 +1,94 @@
|
||||
#+TITLE: Integrating Equinix Metal API with Equinix Watch
|
||||
#+AUTHOR: Adam Mohammed
|
||||
|
||||
* Problem
|
||||
|
||||
Equinix Watch has defined the format for which they want to ingest
|
||||
auditable events. They chose the OTLP as the protocol for
|
||||
ingesting these events from services restricting their ingestion to
|
||||
just the logging signal.
|
||||
|
||||
Normally when sending data to a collector, you would make use of the
|
||||
OpenTelemetry libraries to make it easy to grab metadata about the
|
||||
request and surrounding environment, without needing to manually
|
||||
cobble that data together. Unfortunately, using OTEL logging as the
|
||||
only signal that Equinix Watch accepts, makes adoption needlessly
|
||||
painful. Ruby does not have a stable client library for OTEL logs, and
|
||||
neither does Golang.
|
||||
|
||||
Most of the spec provided by EquinixWatch does not actually relate to
|
||||
the log that we would like to provide to the customer. OTEL Logging
|
||||
aims to make this simple by using the Baggage and Context APIs to
|
||||
enrich the log records with information about the surrounding
|
||||
environment and context. Again, the implementations for these are
|
||||
incomplete and not production ready.
|
||||
|
||||
Until the OTEL libraries provide support for the context and baggage
|
||||
propogation in the Logs API/SDK, this will data will need to be
|
||||
extracted and formatted specifically for Equinix Watch, meaning the
|
||||
burden of integration is higher than it needs to be. If we end up
|
||||
doing this, we'll probably just fetch the same data from the span
|
||||
attributes anyway, to keep things consistent.
|
||||
|
||||
There's absolutely no reason to do this work when we can add the logs
|
||||
in a structured way to the trace and pass that through to their custom
|
||||
collector. By doing this we don't need to wait for the OTEL libraries
|
||||
to provide logging implementations that do what traces already
|
||||
provide.
|
||||
|
||||
The only reason I can see not to do this is that it makes Equinix
|
||||
Watch have to handle translating trace information to a format that
|
||||
can be delivered to their end targets. I'd argue that's going to need
|
||||
to happen anyway, so why not make use of all the wonderful tools we
|
||||
have to enrich the data you have as input, so you can build complete
|
||||
and interesting audit logs for you end user.
|
||||
|
||||
* Concerns
|
||||
|
||||
- Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing
|
||||
OTEL collector, which is right now part of our internal tooling, and
|
||||
making it part of the critical path for customer data with Equinix
|
||||
Watch.
|
||||
|
||||
I don't understand this, of course you're going to be in your
|
||||
critical path. I'm not saying to use your collector as the ONLY
|
||||
collector, this is why we even have collectors. We are able to
|
||||
configure where the data are exported.
|
||||
|
||||
- Alex: IMO internal traces are implementation details that are
|
||||
subject to change and there are too many things that could go
|
||||
wrong. What happens if the format of those traces changes due to
|
||||
some library upgrade, or if there's memory pressure and we start
|
||||
sampling events or something?
|
||||
|
||||
Traces being implementation details - like audit logs? There's a
|
||||
reason we use standard libraries to instrument our traces. These
|
||||
libraries follow OTEL Semantic Conventions so we have stable and
|
||||
consistent span attributes that track data across services.
|
||||
|
||||
Memory pressure, this isn't solved by OLTP at all, in fact
|
||||
collectors will refuse spans if they're experiencing memory pressure
|
||||
to prevent getting OOMKilled. This is not an application concern,
|
||||
this is an monitoring concern. You should know if your collector is
|
||||
|
||||
- Alex: In my experience, devs in general have a higher tolerance for
|
||||
gaps and breakage in their internal tooling than what I'm willing to
|
||||
have for customer-facing audit logs.
|
||||
|
||||
This is just poor form. If you don't trust the applications that
|
||||
integrate with your application, what do you trust?
|
||||
|
||||
- Alex: I think customer-facing observability is net-new functionality
|
||||
and, for the time being, I'm OK with putting a higher burden on
|
||||
applications producing that data than "flip a flag in the collector
|
||||
to redirect part of the firehose to Equinix Watch
|
||||
|
||||
Net-new - sure, I agree
|
||||
|
||||
Higher burden on applications producing the data - why though? we
|
||||
can provide you a higher quality data source already instead of
|
||||
hand-rolling an implementation to the the logs signal
|
||||
|
||||
"flip a flag in the collector" - I think this just shows illiteracy,
|
||||
but we are able to control what parts are shipped to your fragile
|
||||
collector.
|
||||
Reference in New Issue
Block a user