diff --git a/equinix-watch.org b/equinix-watch/collector.yaml similarity index 100% rename from equinix-watch.org rename to equinix-watch/collector.yaml diff --git a/equinix-watch/integration.org b/equinix-watch/integration.org new file mode 100644 index 0000000..31c8f44 --- /dev/null +++ b/equinix-watch/integration.org @@ -0,0 +1,94 @@ + #+TITLE: Integrating Equinix Metal API with Equinix Watch +#+AUTHOR: Adam Mohammed + +* Problem + +Equinix Watch has defined the format for which they want to ingest +auditable events. They chose the OTLP as the protocol for +ingesting these events from services restricting their ingestion to +just the logging signal. + +Normally when sending data to a collector, you would make use of the +OpenTelemetry libraries to make it easy to grab metadata about the +request and surrounding environment, without needing to manually +cobble that data together. Unfortunately, using OTEL logging as the +only signal that Equinix Watch accepts, makes adoption needlessly +painful. Ruby does not have a stable client library for OTEL logs, and +neither does Golang. + +Most of the spec provided by EquinixWatch does not actually relate to +the log that we would like to provide to the customer. OTEL Logging +aims to make this simple by using the Baggage and Context APIs to +enrich the log records with information about the surrounding +environment and context. Again, the implementations for these are +incomplete and not production ready. + +Until the OTEL libraries provide support for the context and baggage +propogation in the Logs API/SDK, this will data will need to be +extracted and formatted specifically for Equinix Watch, meaning the +burden of integration is higher than it needs to be. If we end up +doing this, we'll probably just fetch the same data from the span +attributes anyway, to keep things consistent. + +There's absolutely no reason to do this work when we can add the logs +in a structured way to the trace and pass that through to their custom +collector. By doing this we don't need to wait for the OTEL libraries +to provide logging implementations that do what traces already +provide. + +The only reason I can see not to do this is that it makes Equinix +Watch have to handle translating trace information to a format that +can be delivered to their end targets. I'd argue that's going to need +to happen anyway, so why not make use of all the wonderful tools we +have to enrich the data you have as input, so you can build complete +and interesting audit logs for you end user. + +* Concerns + +- Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing + OTEL collector, which is right now part of our internal tooling, and + making it part of the critical path for customer data with Equinix + Watch. + + I don't understand this, of course you're going to be in your + critical path. I'm not saying to use your collector as the ONLY + collector, this is why we even have collectors. We are able to + configure where the data are exported. + +- Alex: IMO internal traces are implementation details that are + subject to change and there are too many things that could go + wrong. What happens if the format of those traces changes due to + some library upgrade, or if there's memory pressure and we start + sampling events or something? + + Traces being implementation details - like audit logs? There's a + reason we use standard libraries to instrument our traces. These + libraries follow OTEL Semantic Conventions so we have stable and + consistent span attributes that track data across services. + + Memory pressure, this isn't solved by OLTP at all, in fact + collectors will refuse spans if they're experiencing memory pressure + to prevent getting OOMKilled. This is not an application concern, + this is an monitoring concern. You should know if your collector is + +- Alex: In my experience, devs in general have a higher tolerance for + gaps and breakage in their internal tooling than what I'm willing to + have for customer-facing audit logs. + + This is just poor form. If you don't trust the applications that + integrate with your application, what do you trust? + +- Alex: I think customer-facing observability is net-new functionality + and, for the time being, I'm OK with putting a higher burden on + applications producing that data than "flip a flag in the collector + to redirect part of the firehose to Equinix Watch + + Net-new - sure, I agree + + Higher burden on applications producing the data - why though? we + can provide you a higher quality data source already instead of + hand-rolling an implementation to the the logs signal + + "flip a flag in the collector" - I think this just shows illiteracy, + but we are able to control what parts are shipped to your fragile + collector.