more cleanup
This commit is contained in:
94
equinix/watch/integration.org
Normal file
94
equinix/watch/integration.org
Normal file
@@ -0,0 +1,94 @@
|
||||
#+TITLE: Integrating Equinix Metal API with Equinix Watch
|
||||
#+AUTHOR: Adam Mohammed
|
||||
|
||||
* Problem
|
||||
|
||||
Equinix Watch has defined the format for which they want to ingest
|
||||
auditable events. They chose the OTLP as the protocol for
|
||||
ingesting these events from services restricting their ingestion to
|
||||
just the logging signal.
|
||||
|
||||
Normally when sending data to a collector, you would make use of the
|
||||
OpenTelemetry libraries to make it easy to grab metadata about the
|
||||
request and surrounding environment, without needing to manually
|
||||
cobble that data together. Unfortunately, using OTEL logging as the
|
||||
only signal that Equinix Watch accepts, makes adoption needlessly
|
||||
painful. Ruby does not have a stable client library for OTEL logs, and
|
||||
neither does Golang.
|
||||
|
||||
Most of the spec provided by EquinixWatch does not actually relate to
|
||||
the log that we would like to provide to the customer. OTEL Logging
|
||||
aims to make this simple by using the Baggage and Context APIs to
|
||||
enrich the log records with information about the surrounding
|
||||
environment and context. Again, the implementations for these are
|
||||
incomplete and not production ready.
|
||||
|
||||
Until the OTEL libraries provide support for the context and baggage
|
||||
propogation in the Logs API/SDK, this will data will need to be
|
||||
extracted and formatted specifically for Equinix Watch, meaning the
|
||||
burden of integration is higher than it needs to be. If we end up
|
||||
doing this, we'll probably just fetch the same data from the span
|
||||
attributes anyway, to keep things consistent.
|
||||
|
||||
There's absolutely no reason to do this work when we can add the logs
|
||||
in a structured way to the trace and pass that through to their custom
|
||||
collector. By doing this we don't need to wait for the OTEL libraries
|
||||
to provide logging implementations that do what traces already
|
||||
provide.
|
||||
|
||||
The only reason I can see not to do this is that it makes Equinix
|
||||
Watch have to handle translating trace information to a format that
|
||||
can be delivered to their end targets. I'd argue that's going to need
|
||||
to happen anyway, so why not make use of all the wonderful tools we
|
||||
have to enrich the data you have as input, so you can build complete
|
||||
and interesting audit logs for you end user.
|
||||
|
||||
* Concerns
|
||||
|
||||
- Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing
|
||||
OTEL collector, which is right now part of our internal tooling, and
|
||||
making it part of the critical path for customer data with Equinix
|
||||
Watch.
|
||||
|
||||
I don't understand this, of course you're going to be in your
|
||||
critical path. I'm not saying to use your collector as the ONLY
|
||||
collector, this is why we even have collectors. We are able to
|
||||
configure where the data are exported.
|
||||
|
||||
- Alex: IMO internal traces are implementation details that are
|
||||
subject to change and there are too many things that could go
|
||||
wrong. What happens if the format of those traces changes due to
|
||||
some library upgrade, or if there's memory pressure and we start
|
||||
sampling events or something?
|
||||
|
||||
Traces being implementation details - like audit logs? There's a
|
||||
reason we use standard libraries to instrument our traces. These
|
||||
libraries follow OTEL Semantic Conventions so we have stable and
|
||||
consistent span attributes that track data across services.
|
||||
|
||||
Memory pressure, this isn't solved by OLTP at all, in fact
|
||||
collectors will refuse spans if they're experiencing memory pressure
|
||||
to prevent getting OOMKilled. This is not an application concern,
|
||||
this is an monitoring concern. You should know if your collector is
|
||||
|
||||
- Alex: In my experience, devs in general have a higher tolerance for
|
||||
gaps and breakage in their internal tooling than what I'm willing to
|
||||
have for customer-facing audit logs.
|
||||
|
||||
This is just poor form. If you don't trust the applications that
|
||||
integrate with your application, what do you trust?
|
||||
|
||||
- Alex: I think customer-facing observability is net-new functionality
|
||||
and, for the time being, I'm OK with putting a higher burden on
|
||||
applications producing that data than "flip a flag in the collector
|
||||
to redirect part of the firehose to Equinix Watch
|
||||
|
||||
Net-new - sure, I agree
|
||||
|
||||
Higher burden on applications producing the data - why though? we
|
||||
can provide you a higher quality data source already instead of
|
||||
hand-rolling an implementation to the the logs signal
|
||||
|
||||
"flip a flag in the collector" - I think this just shows illiteracy,
|
||||
but we are able to control what parts are shipped to your fragile
|
||||
collector.
|
||||
Reference in New Issue
Block a user