more cleanup

2024-04-20 10:23:31 -04:00
parent b4f4565894
commit 4daae4e756
10 changed files with 0 additions and 75 deletions
--- a/equinix/api-team/nanometal/k8s-concept-review.org
+++ b/equinix/api-team/nanometal/k8s-concept-review.org
@@ -0,0 +1,47 @@
+#+TITLE: K8s concepts review
+#+AUTHOR: Adam Mohammed
+#+DATE: September 18, 2023
+
+
+At one of the meetings I brought up how similar nano-metal felt to a
+collection of K8s specifications that make standing up and managing
+K8s clusters easier. In this document I'll cover the following topics
+at a high level: Cluster API, CNI, CCM, and CSI.
+
+First is the Cluster API, this came about as a means for creating and
+managing kubernetes clusters using kubernetes it self. The cluster API
+allows an operator to use a so-called "management cluster" to create
+other K8s clusters known as "Workload clusters." The cluster API is
+NOT part of the core K8s resources, but is implemented as a set of
+custom resource definitions and controllers to actually carry out the
+desired actions.
+
+A cluster operator can use the cluster-api to create workload clusters
+by relying on 3 components: bootstrap provider, infrastructure
+provider, and the control plane provider. Nanometal aims at making
+provisioning of bare metal machines extensible and scalable by
+enabling facilities to carry out the desired operations requested by
+the EMAPI. We can think of the EMAPI as the "management cluster" in
+this world.
+
+What Metal has today, maps well to the infrastructure provider, since
+all the cluster-api has to do is ask for machines with a certain
+configuration and the provider is responsible for making that
+happen. I think for this project a bulk of this work is figuring out
+how we make the infrastructure provider out of our existing
+components, but let's put that aside for right now and consider the
+rest of the components.
+
+The bootstrap and the control plane providers are concepts that also
+seem important to our goal. We want it to be simple for us to enter a
+new facility and set up the components we need to start provisioning
+hardware. The bootstrap provider, in the cluster-api concepts turns a
+server provisioned with a base OS into an operating K8s node. For us,
+we probably would also want some process which turns any facility or
+existing datacenter, into an equinix metal managed facility.
+
+Once we know about the facility that we need to manage, the concept of
+the control plane provider maps well with the diagrams from Nanometal
+so far. We'd want some component that installs the required agent and
+supporting components in the facilty so we can start to be able to
+provide metal services there.
--- a/equinix/watch/collector.yaml
+++ b/equinix/watch/collector.yaml
@@ -0,0 +1,44 @@
+# otel-collector-config.yaml
+receivers:
+  otlp:
+    protocols:
+      grpc: # port 4317
+      http: # port 4318
+
+processors:
+  batch:
+  filter/auditable:
+    spans:
+      include:
+        match_type: strict
+        attributes:
+            - key: auditable
+              value: "true"
+
+  transform/customer-facing:
+    trace_statements:
+      - context: resource
+        statements:
+          - 'keep_keys(attributes, ["service.name"])'
+      - context: scope
+        statements:
+          - 'set(name, "equinixWatch")'
+          - 'set(version, "1.0.0")'
+      - context: span
+        statements:
+          - 'keep_keys(attributes, ["http.route", "http.method", "http.status_code", "http.scheme", "http.host", "user.id", "http.user_agent"])'
+          - 'set(name, attributes["http.route"])'
+
+exporters:
+  file:
+    path: /data/metrics.json
+
+service:
+  pipelines:
+    traces:
+      receivers: [otlp]
+      processors:
+        - filter/auditable
+        - transform/customer-facing
+
+      exporters: [file]
--- a/equinix/watch/integration.org
+++ b/equinix/watch/integration.org
@@ -0,0 +1,94 @@
+ #+TITLE: Integrating Equinix Metal API with Equinix Watch
+#+AUTHOR: Adam Mohammed
+
+* Problem
+
+Equinix Watch has defined the format for which they want to ingest
+auditable events. They chose the OTLP as the protocol for
+ingesting these events from services restricting their ingestion to
+just the logging signal.
+
+Normally when sending data to a collector, you would make use of the
+OpenTelemetry libraries to make it easy to grab metadata about the
+request and surrounding environment, without needing to manually
+cobble that data together. Unfortunately, using OTEL logging as the
+only signal that Equinix Watch accepts, makes adoption needlessly
+painful. Ruby does not have a stable client library for OTEL logs, and
+neither does Golang.
+
+Most of the spec provided by EquinixWatch does not actually relate to
+the log that we would like to provide to the customer. OTEL Logging
+aims to make this simple by using the Baggage and Context APIs to
+enrich the log records with information about the surrounding
+environment and context. Again, the implementations for these are
+incomplete and not production ready.
+
+Until the OTEL libraries provide support for the context and baggage
+propogation in the Logs API/SDK, this will data will need to be
+extracted and formatted specifically for Equinix Watch, meaning the
+burden of integration is higher than it needs to be. If we end up
+doing this, we'll probably just fetch the same data from the span
+attributes anyway, to keep things consistent.
+
+There's absolutely no reason to do this work when we can add the logs
+in a structured way to the trace and pass that through to their custom
+collector. By doing this we don't need to wait for the OTEL libraries
+to provide logging implementations that do what traces already
+provide.
+
+The only reason I can see not to do this is that it makes Equinix
+Watch have to handle translating trace information to a format that
+can be delivered to their end targets. I'd argue that's going to need
+to happen anyway, so why not make use of all the wonderful tools we
+have to enrich the data you have as input, so you can build complete
+and interesting audit logs for you end user.
+
+* Concerns
+
+- Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing
+  OTEL collector, which is right now part of our internal tooling, and
+  making it part of the critical path for customer data with Equinix
+  Watch.
+
+  I don't understand this, of course you're going to be in your
+  critical path. I'm not saying to use your collector as the ONLY
+  collector, this is why we even have collectors. We are able to
+  configure where the data are exported.
+
+- Alex: IMO internal traces are implementation details that are
+  subject to change and there are too many things that could go
+  wrong. What happens if the format of those traces changes due to
+  some library upgrade, or if there's memory pressure and we start
+  sampling events or something?
+
+  Traces being implementation details - like audit logs? There's a
+  reason we use standard libraries to instrument our traces. These
+  libraries follow OTEL Semantic Conventions so we have stable and
+  consistent span attributes that track data across services.
+
+  Memory pressure, this isn't solved by OLTP at all, in fact
+  collectors will refuse spans if they're experiencing memory pressure
+  to prevent getting OOMKilled. This is not an application concern,
+  this is an monitoring concern. You should know if your collector is
+
+- Alex: In my experience, devs in general have a higher tolerance for
+  gaps and breakage in their internal tooling than what I'm willing to
+  have for customer-facing audit logs.
+
+  This is just poor form. If you don't trust the applications that
+  integrate with your application, what do you trust?
+
+- Alex: I think customer-facing observability is net-new functionality
+  and, for the time being, I'm OK with putting a higher burden on
+  applications producing that data than "flip a flag in the collector
+  to redirect part of the firehose to Equinix Watch
+
+  Net-new - sure, I agree
+
+  Higher burden on applications producing the data - why though? we
+  can provide you a higher quality data source already instead of
+  hand-rolling an implementation to the the logs signal
+
+  "flip a flag in the collector" - I think this just shows illiteracy,
+  but we are able to control what parts are shipped to your fragile
+  collector.