at work we've started to lean into the cloudflare stack more, using (almost) every offering within the developer platform by now one thing has been a consistent problem for us... observability. In a more "traditional" stack where you have control of everything from compute up, making sure you've got all the metrics you want isn't a huge problem. You can throw up an opentelemetry collector, install some packages and suddenly you have traces & logs for everything. However, this wasn't something you can do with the cloudflare ecosystem due to the restrictions workers have. They do offer logs and traces for workers but they have a
few restrictions,
1. Non-I/O operations may report time of 0 ms
2. Trace context propagation not yet supported
3. Incomplete spans attributes
4. Support for custom spans and attributes
5. Span and attribute names subject to change
6. Traces show in-progress when the request is processing
Not all of those are deal-breakers, and for many none of those are. However for us,
- we needed trace propogation since we're often calling out from CF Workers to legacy infrastructure or vice-versa and having to jump between lots of platforms was just going to make DX worse.
- no custom spans... well we need to instrument our own code, its not super helpful to just see the cloudflare actions we perform :/
- having traces and logs not show up untill after the trace is closed was also a problem as sometimes we could have a durable object stay open with a long (> 1hr!) websocket which we wanted logging for
With all that in mind I went on a look for a solution, we should just be able to export to opentelemetry, right? Wellll not as easily as I wanted. Since the trace has to be flushed after everything is done, but the execution context would be dead by then so no spans. And we wanted logs in real time so have to flush them "behind the scenes". The only thing I could find was
evanderkoogh/otel-cf-workers, which looks like an abandoned library. This did solve the tracing side of what we needed but didn't solve the logs side. So I forked it.
Adding support for all the cloudflare products we use, and as much else as I could while I had the chance. Then adding logging support, trace propogation between workers to durable objects and multiple exporters for logs and traces. We basically ticked everything off from the cloudflare limitations, except for the 0ms issues (you can read
this from cf for more info)
Our fork is up,
here and also avalible on npm. There is some more stuff I want to do with this, including finishing up a re-write of the library I'm testing right now (slightly faster and better tracing into durable objects). But for the most part this works super well!
Here is a quick example:
import { trace } from '@opentelemetry/api'
import { instrument, getLogger, OTLPTransport, ConsoleTransport } from '@inference-net/otel-cf-workers'
const handler = {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const logger = getLogger('my-app')
logger.info('Processing request', {
'http.url': request.url,
'user.id': '123',
})
try {
await env.MY_KV.get('key')
logger.debug('KV operation complete')
return new Response('OK')
} catch (error) {
logger.error(error as Error)
return new Response('Error', { status: 500 })
}
},
}
const config: ResolveConfigFn = (env: Env, _trigger) => ({
service: { name: 'my-worker' },
trace: {
exporter: {
url: `${env.OTEL_ENDPOINT}/v1/traces`,
headers: { 'x-api-key': env.API_KEY },
},
},
logs: {
transports: [
new OTLPTransport({
url: `${env.OTEL_ENDPOINT}/v1/logs`,
headers: { 'x-api-key': env.API_KEY },
}),
new ConsoleTransport({ pretty: true }),
],
},
})
export default instrument(handler, config)