#+TITLE: Integrating Equinix Metal API with Equinix Watch #+AUTHOR: Adam Mohammed * Problem Equinix Watch has defined the format for which they want to ingest auditable events. They chose the OTLP as the protocol for ingesting these events from services restricting their ingestion to just the logging signal. Normally when sending data to a collector, you would make use of the OpenTelemetry libraries to make it easy to grab metadata about the request and surrounding environment, without needing to manually cobble that data together. Unfortunately, using OTEL logging as the only signal that Equinix Watch accepts, makes adoption needlessly painful. Ruby does not have a stable client library for OTEL logs, and neither does Golang. Most of the spec provided by EquinixWatch does not actually relate to the log that we would like to provide to the customer. OTEL Logging aims to make this simple by using the Baggage and Context APIs to enrich the log records with information about the surrounding environment and context. Again, the implementations for these are incomplete and not production ready. Until the OTEL libraries provide support for the context and baggage propogation in the Logs API/SDK, this will data will need to be extracted and formatted specifically for Equinix Watch, meaning the burden of integration is higher than it needs to be. If we end up doing this, we'll probably just fetch the same data from the span attributes anyway, to keep things consistent. There's absolutely no reason to do this work when we can add the logs in a structured way to the trace and pass that through to their custom collector. By doing this we don't need to wait for the OTEL libraries to provide logging implementations that do what traces already provide. The only reason I can see not to do this is that it makes Equinix Watch have to handle translating trace information to a format that can be delivered to their end targets. I'd argue that's going to need to happen anyway, so why not make use of all the wonderful tools we have to enrich the data you have as input, so you can build complete and interesting audit logs for you end user. * Concerns - Alex: Yeahhhh I've gotta say I'm uncomfortable making our existing OTEL collector, which is right now part of our internal tooling, and making it part of the critical path for customer data with Equinix Watch. I don't understand this, of course you're going to be in your critical path. I'm not saying to use your collector as the ONLY collector, this is why we even have collectors. We are able to configure where the data are exported. - Alex: IMO internal traces are implementation details that are subject to change and there are too many things that could go wrong. What happens if the format of those traces changes due to some library upgrade, or if there's memory pressure and we start sampling events or something? Traces being implementation details - like audit logs? There's a reason we use standard libraries to instrument our traces. These libraries follow OTEL Semantic Conventions so we have stable and consistent span attributes that track data across services. Memory pressure, this isn't solved by OLTP at all, in fact collectors will refuse spans if they're experiencing memory pressure to prevent getting OOMKilled. This is not an application concern, this is an monitoring concern. You should know if your collector is - Alex: In my experience, devs in general have a higher tolerance for gaps and breakage in their internal tooling than what I'm willing to have for customer-facing audit logs. This is just poor form. If you don't trust the applications that integrate with your application, what do you trust? - Alex: I think customer-facing observability is net-new functionality and, for the time being, I'm OK with putting a higher burden on applications producing that data than "flip a flag in the collector to redirect part of the firehose to Equinix Watch Net-new - sure, I agree Higher burden on applications producing the data - why though? we can provide you a higher quality data source already instead of hand-rolling an implementation to the the logs signal "flip a flag in the collector" - I think this just shows illiteracy, but we are able to control what parts are shipped to your fragile collector.