Move equinix watch things

2023-10-11 09:16:47 -04:00
parent 1ee5ac51e7
commit 6f8d6220fa
2 changed files with 94 additions and 0 deletions
--- a/equinix-watch/collector.yaml
+++ b/equinix-watch/collector.yaml
--- a/equinix-watch/integration.org
+++ b/equinix-watch/integration.org
@@ -0,0 +1,94 @@
 #+TITLE: Integrating Equinix Metal API with Equinix Watch
 #+AUTHOR: Adam Mohammed
 * Problem
 Equinix Watch has defined the format for which they want to ingest
 auditable events. They chose the OTLP as the protocol for
 ingesting these events from services restricting their ingestion to
 just the logging signal.
 Normally when sending data to a collector, you would make use of the
 OpenTelemetry libraries to make it easy to grab metadata about the
 request and surrounding environment, without needing to manually
 cobble that data together. Unfortunately, using OTEL logging as the
 only signal that Equinix Watch accepts, makes adoption needlessly
 painful. Ruby does not have a stable client library for OTEL logs, and
 neither does Golang.
 Most of the spec provided by EquinixWatch does not actually relate to
 the log that we would like to provide to the customer. OTEL Logging
 aims to make this simple by using the Baggage and Context APIs to
 enrich the log records with information about the surrounding
 environment and context. Again, the implementations for these are
 incomplete and not production ready.
 Until the OTEL libraries provide support for the context and baggage
 propogation in the Logs API/SDK, this will data will need to be
 extracted and formatted specifically for Equinix Watch, meaning the
 burden of integration is higher than it needs to be. If we end up
 doing this, we'll probably just fetch the same data from the span
 attributes anyway, to keep things consistent.
 There's absolutely no reason to do this work when we can add the logs
 in a structured way to the trace and pass that through to their custom
 collector. By doing this we don't need to wait for the OTEL libraries
 to provide logging implementations that do what traces already
 provide.
 The only reason I can see not to do this is that it makes Equinix
 Watch have to handle translating trace information to a format that
 can be delivered to their end targets. I'd argue that's going to need
 to happen anyway, so why not make use of all the wonderful tools we
 have to enrich the data you have as input, so you can build complete
 and interesting audit logs for you end user.
 * Concerns
 - Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing
  OTEL collector, which is right now part of our internal tooling, and
  making it part of the critical path for customer data with Equinix
  Watch.
  I don't understand this, of course you're going to be in your
  critical path. I'm not saying to use your collector as the ONLY
  collector, this is why we even have collectors. We are able to
  configure where the data are exported.
 - Alex: IMO internal traces are implementation details that are
  subject to change and there are too many things that could go
  wrong. What happens if the format of those traces changes due to
  some library upgrade, or if there's memory pressure and we start
  sampling events or something?
  Traces being implementation details - like audit logs? There's a
  reason we use standard libraries to instrument our traces. These
  libraries follow OTEL Semantic Conventions so we have stable and
  consistent span attributes that track data across services.
  Memory pressure, this isn't solved by OLTP at all, in fact
  collectors will refuse spans if they're experiencing memory pressure
  to prevent getting OOMKilled. This is not an application concern,
  this is an monitoring concern. You should know if your collector is
 - Alex: In my experience, devs in general have a higher tolerance for
  gaps and breakage in their internal tooling than what I'm willing to
  have for customer-facing audit logs.
  This is just poor form. If you don't trust the applications that
  integrate with your application, what do you trust?
 - Alex: I think customer-facing observability is net-new functionality
  and, for the time being, I'm OK with putting a higher burden on
  applications producing that data than "flip a flag in the collector
  to redirect part of the firehose to Equinix Watch
  Net-new - sure, I agree
  Higher burden on applications producing the data - why though? we
  can provide you a higher quality data source already instead of
  hand-rolling an implementation to the the logs signal
  "flip a flag in the collector" - I think this just shows illiteracy,
  but we are able to control what parts are shipped to your fragile
  collector.