Move equinix watch things
This commit is contained in:
94
equinix-watch/integration.org
Normal file
94
equinix-watch/integration.org
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
#+TITLE: Integrating Equinix Metal API with Equinix Watch
|
||||||
|
#+AUTHOR: Adam Mohammed
|
||||||
|
|
||||||
|
* Problem
|
||||||
|
|
||||||
|
Equinix Watch has defined the format for which they want to ingest
|
||||||
|
auditable events. They chose the OTLP as the protocol for
|
||||||
|
ingesting these events from services restricting their ingestion to
|
||||||
|
just the logging signal.
|
||||||
|
|
||||||
|
Normally when sending data to a collector, you would make use of the
|
||||||
|
OpenTelemetry libraries to make it easy to grab metadata about the
|
||||||
|
request and surrounding environment, without needing to manually
|
||||||
|
cobble that data together. Unfortunately, using OTEL logging as the
|
||||||
|
only signal that Equinix Watch accepts, makes adoption needlessly
|
||||||
|
painful. Ruby does not have a stable client library for OTEL logs, and
|
||||||
|
neither does Golang.
|
||||||
|
|
||||||
|
Most of the spec provided by EquinixWatch does not actually relate to
|
||||||
|
the log that we would like to provide to the customer. OTEL Logging
|
||||||
|
aims to make this simple by using the Baggage and Context APIs to
|
||||||
|
enrich the log records with information about the surrounding
|
||||||
|
environment and context. Again, the implementations for these are
|
||||||
|
incomplete and not production ready.
|
||||||
|
|
||||||
|
Until the OTEL libraries provide support for the context and baggage
|
||||||
|
propogation in the Logs API/SDK, this will data will need to be
|
||||||
|
extracted and formatted specifically for Equinix Watch, meaning the
|
||||||
|
burden of integration is higher than it needs to be. If we end up
|
||||||
|
doing this, we'll probably just fetch the same data from the span
|
||||||
|
attributes anyway, to keep things consistent.
|
||||||
|
|
||||||
|
There's absolutely no reason to do this work when we can add the logs
|
||||||
|
in a structured way to the trace and pass that through to their custom
|
||||||
|
collector. By doing this we don't need to wait for the OTEL libraries
|
||||||
|
to provide logging implementations that do what traces already
|
||||||
|
provide.
|
||||||
|
|
||||||
|
The only reason I can see not to do this is that it makes Equinix
|
||||||
|
Watch have to handle translating trace information to a format that
|
||||||
|
can be delivered to their end targets. I'd argue that's going to need
|
||||||
|
to happen anyway, so why not make use of all the wonderful tools we
|
||||||
|
have to enrich the data you have as input, so you can build complete
|
||||||
|
and interesting audit logs for you end user.
|
||||||
|
|
||||||
|
* Concerns
|
||||||
|
|
||||||
|
- Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing
|
||||||
|
OTEL collector, which is right now part of our internal tooling, and
|
||||||
|
making it part of the critical path for customer data with Equinix
|
||||||
|
Watch.
|
||||||
|
|
||||||
|
I don't understand this, of course you're going to be in your
|
||||||
|
critical path. I'm not saying to use your collector as the ONLY
|
||||||
|
collector, this is why we even have collectors. We are able to
|
||||||
|
configure where the data are exported.
|
||||||
|
|
||||||
|
- Alex: IMO internal traces are implementation details that are
|
||||||
|
subject to change and there are too many things that could go
|
||||||
|
wrong. What happens if the format of those traces changes due to
|
||||||
|
some library upgrade, or if there's memory pressure and we start
|
||||||
|
sampling events or something?
|
||||||
|
|
||||||
|
Traces being implementation details - like audit logs? There's a
|
||||||
|
reason we use standard libraries to instrument our traces. These
|
||||||
|
libraries follow OTEL Semantic Conventions so we have stable and
|
||||||
|
consistent span attributes that track data across services.
|
||||||
|
|
||||||
|
Memory pressure, this isn't solved by OLTP at all, in fact
|
||||||
|
collectors will refuse spans if they're experiencing memory pressure
|
||||||
|
to prevent getting OOMKilled. This is not an application concern,
|
||||||
|
this is an monitoring concern. You should know if your collector is
|
||||||
|
|
||||||
|
- Alex: In my experience, devs in general have a higher tolerance for
|
||||||
|
gaps and breakage in their internal tooling than what I'm willing to
|
||||||
|
have for customer-facing audit logs.
|
||||||
|
|
||||||
|
This is just poor form. If you don't trust the applications that
|
||||||
|
integrate with your application, what do you trust?
|
||||||
|
|
||||||
|
- Alex: I think customer-facing observability is net-new functionality
|
||||||
|
and, for the time being, I'm OK with putting a higher burden on
|
||||||
|
applications producing that data than "flip a flag in the collector
|
||||||
|
to redirect part of the firehose to Equinix Watch
|
||||||
|
|
||||||
|
Net-new - sure, I agree
|
||||||
|
|
||||||
|
Higher burden on applications producing the data - why though? we
|
||||||
|
can provide you a higher quality data source already instead of
|
||||||
|
hand-rolling an implementation to the the logs signal
|
||||||
|
|
||||||
|
"flip a flag in the collector" - I think this just shows illiteracy,
|
||||||
|
but we are able to control what parts are shipped to your fragile
|
||||||
|
collector.
|
||||||
Reference in New Issue
Block a user