How I Simplified LLM Telemetry Using Dual-Destination Observability Without Performance Degradation.

12 min read

Note: I haven’t written a blog post in over a year, so tread carefully! But this story felt worth sharing.

The Story

Picture this: You’re an engineer trying to convince (or shall I say, sell) Langfuse to your enterprise architect for LLM observability. You’ve done extensive homework and made the conclusion that Langfuse has incredible AI-specific features, an active community, and exactly the granular insights your GenAI applications need.

But then comes the inevitable response:

“That’s nice, but all our end-to-end observability goes through Instana. We’re not adding another monitoring platform to our stack, and adding an additional tool means training costs, security reviews, and maintenance overhead. Will you be taking support calls at 2 AM when Langfuse goes down?”

Cue internal screaming 😤

Here’s the thing – The architect wasn’t wrong. Enterprise observability strategies matter. Standardization makes sense. But Instana’s recent GenAI Observability support was… let’s be diplomatic… still finding its footing.

Out of the box, it simply didn’t provide the level of detail we desperately needed:

No granular prompt analysis – How do you optimize prompts when you can’t see what’s actually being sent?
Limited token behavior insights – Understanding token consumption patterns? Good luck with that.
Shallow cost attribution – “The LLM calls are expensive” is not actionable feedback.
Poor agent orchestration visibility – Multi-agent workflows were essentially black boxes.

Meanwhile, Langfuse was sitting there like that perfect tool you can’t use because “enterprise standards.”

So what do you do when you’re caught between enterprise compliance and engineering reality?

You build a bridge.

TL;DR

When enterprise architecture demands Instana APM but your GenAI applications need Langfuse’s granular AI observability, you don’t have to choose. I built a dual-export telemetry bridge Python module that simultaneously sends traces to both platforms using OpenTelemetry and the Traceloop SDK, achieving comprehensive observability with minimal performance overhead.

Key outcomes:

Simultaneous export to Instana and Langfuse
< 10ms latency overhead per trace
Zero data loss with circuit breaker resilience
Minimal resource footprint (< 10MB memory)

TS;RE

The fundamental challenge wasn’t technical – it was organizational. We had:

Team A (Enterprise Architecture): “Everything must go through Instana. No exceptions. We need centralized monitoring, standardized alerting, and integration with our existing dashboards.” Team B (GenAI Engineering): “But Instana can’t show us prompt templates, token consumption patterns, or LLM conversation flows. We’re flying blind trying to optimize our AI models!” Team C (Platform and Observability Engineering): “Can we all just… get along? And maybe not break production while figuring this out?”

Classic enterprise tension. And honestly? Everyone had valid points.

What we actually needed from an GenAI observability platform:

Granular token usage analysis and cost attribution
LLM conversation flow visualization across multi-agent systems
Prompt engineering insights and A/B testing capabilities
Agent orchestration visibility (who called what, when, and why)
Community-driven patterns and best practices

What Instana gave us at the time:

Great traditional APM (response times, error rates, throughput)
Solid infrastructure monitoring
Enterprise-grade alerting and dashboards
Compliance with security and operational standards
But… pretty basic GenAI support 😬

The gap was real, and it was hurting our GenAI engineering velocity.

My first instinct?

Use an OpenTelemetry Collector as a centralized hub. This seemed elegant – one collector, multiple destinations, clean architecture diagrams for the PowerPoint presentations.

On paper, this approach had some compelling advantages:

The Case FOR OTEL Collector:

Centralized everything: sampling, redaction, enrichment all in one place
Efficient egress: one channel with intelligent batching & retry logic
Vendor flexibility: decouple app lifecycle from observability vendor changes
Smart routing: traces, metrics, logs → OneUptime, logs → S3, whatever you need
Cost optimization: reduce noisy telemetry before it hits those priced tiers
Security win: no direct outbound internet calls from app nodes

The Case AGAINST (The Reality Check):

Operational overhead: extra component to deploy, monitor, and inevitably troubleshoot
Scaling complexity: potential chokepoint that must be sized and scaled properly
Single point of failure: misconfiguration can silently drop ALL telemetry
Latency tax: every trace pays the network hop penalty

It looked good in the architecture review, but something felt off. The idea of managing and deploying an extra service - was a complete Nono from me.

After some soul-searching, I realized that the added complexity wasn’t worth the potential benefits. So I did something that changed everything - why not just create a lightweight telemetry bridge directly in the application code that exports to both Instana and Langfuse simultaneously?

The solution: A telemetry bridge that leverages the Traceloop SDK for its minimal implementation and usage patterns, with OpenTelemetry’s flexible exporter architecture as the backend for simultaneously sending traces to both platforms. Same trace data, two destinations, everyone gets what they want for less.

Performance results:

Latency impact: < 0.6% (5ms overhead on ~850ms LLM calls)
Memory footprint: ~10MB for 10,000 traces (On paper)
Export success rates: 99.7%+ with circuit breaker protection
Zero data loss events in production

The How

The solution consists of three key components:

1. TelemetryManager: The Orchestra Conductor

The TelemetryManager acts as the central coordination point, handling three critical responsibilities:

Configuration Management: Loads settings from environment variables, validates exporter credentials, and manages SSL/TLS certificates for corporate environments
Exporter Lifecycle: Initializes, configures, and manages the lifecycle of multiple exporters simultaneously
Instrumentation Coordination: Automatically instruments AI libraries (Bedrock, Google ADK, LiteLLM) using the Traceloop SDK foundation

Key design decision: Built on top of Traceloop SDK rather than raw OpenTelemetry. This gives us AI-specific instrumentation out of the box – automatic token counting, model identification, and conversation threading - while adding our dual-export layer on top.

Handles initialization, configuration, and coordinates between different exporters without breaking existing code:

import os
import json
import boto3
from genai_telemetry_bridge import TelemetryManager

# Configure environment variables for service identification - This would typically be in your .env or deployment config
os.environ['OTEL_SERVICE_NAME'] = 'my-ai-app'
os.environ['OTEL_SERVICE_VERSION'] = '1.0.0'
os.environ['DEPLOYMENT_ENVIRONMENT'] = 'production'

# Enable dual exporters
os.environ['OTEL_EXPORTER_LANGFUSE_ENABLED'] = 'true'
os.environ['LANGFUSE_ENDPOINT'] = 'https://cloud.langfuse.com'
os.environ['OTEL_EXPORTER_INSTANA_ENABLED'] = 'true'

# Initialize once at app startup
manager = TelemetryManager.configure()


def call_ai_model(prompt: str, model_id: str) -> str:
    # Your AI logic here (Bedrock, OpenAI, etc.)
    bedrock = boto3.client('bedrock-runtime')
    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps({"prompt": prompt})
    )
    return response['body'].read()

# Automatic tracing and dual export to Instana + Langfuse
result = call_ai_model("Hello, AI!", "anthropic.claude-4-5-sonnet")

# Cleanup on app shutdown
manager.shutdown()

2. Platform Exporters: Speaking Multiple Languages

Each platform expects different data formats and semantic conventions. Our exporters handle these translations:

Instana Exporter:

Converts spans to OTLP (OpenTelemetry Protocol) format
Maps AI attributes to standard APM metrics (operation.name, service.name, etc.)
Maintains distributed tracing context for service mesh correlation
Uses HTTP transport with automatic retry and backoff

Langfuse Exporter:

Transforms spans into Langfuse’s trace/observation model
Maps AI-specific attributes: model_name, input_tokens, output_tokens, estimated_cost
Preserves conversation threading and parent-child relationships
Handles nested generations (agent → tool → LLM call chains)
Uses Langfuse SDK with custom trace correlation

The translation layer is critical – same OpenTelemetry span, different semantic meaning depending on the destination.

The dual-export coordination logic handles platform-specific translation:

class TelemetryManager:
    def _setup_dual_exporters(self):
        """Configure exporters that speak different platform languages"""
        exporters = []

        # Instana: Expects OTLP format with APM semantic conventions
        if self.config.instana.enabled:
            instana_exporter = self._create_otlp_exporter(
                endpoint=self.config.instana.endpoint,
                headers={"Authorization": f"Bearer {self.config.instana.api_key}"}
            )
            exporters.append(('instana', instana_exporter))

        # Langfuse: Expects trace/generation/observation hierarchy
        if self.config.langfuse.enabled:
            langfuse_exporter = self._create_langfuse_exporter(
                public_key=self.config.langfuse.public_key,
                secret_key=self.config.langfuse.secret_key
            )
            exporters.append(('langfuse', langfuse_exporter))

        # Wrap all exporters with resilience patterns
        return MultiDestinationExporter(exporters)

    def _create_otlp_exporter(self, endpoint, headers):
        """OTLP exporter for traditional APM platforms like Instana"""
        return OTLPSpanExporter(endpoint, headers, timeout=30)

    def _create_langfuse_exporter(self, public_key, secret_key):
        """Langfuse-native exporter for AI observability"""
        return LangfuseSpanExporter(public_key, secret_key)

3. Resilience Layer: Circuit Breaker Pattern

The hard truth about observability in production:

External services fail.
Networks partition.
Langfuse goes down during your critical demo.
Instana times out during peak traffic.
What will go wrong, will go wrong!

Without circuit breakers, a failed exporter becomes a cascading failure:

Export timeout → thread blocking
Thread pool exhaustion → app performance degradation
User requests start timing out → your GenAI service goes down

Circuit breaker states:

CLOSED: Normal operation, all exports attempted
OPEN: Exporter failed repeatedly, exports bypassed to prevent cascade failures
HALF-OPEN: Testing recovery, limited export attempts

Failure thresholds: 5 consecutive failures opens circuit, 30-second recovery window, exponential backoff up to 5 minutes.

Because observability should never break your application:

class CircuitBreakerExporter:
    def export(self, spans):
        """Export with circuit breaker protection"""
        results = []
        for exporter in self.exporters:
            if self.circuit_states[exporter] == "OPEN":
                if self._should_attempt_recovery(exporter):
                    self.circuit_states[exporter] = "HALF_OPEN"
                else:
                    continue

            try:
                result = exporter.export(spans)
                if result.is_success():
                    self._reset_failure_count(exporter)
                results.append(result)
            except Exception as e:
                self._handle_failure(exporter, e)

        return results

Architecture Decision: Why Direct Export

Why direct export over OTEL Collector?

Factor	OTEL Collector	Direct Export
Deployment Complexity	Additional service to deploy, configure, scale	Single library dependency
Operational Overhead	Monitoring, logging, alerting for collector	Self-contained, minimal ops burden
Latency Impact	Network hop + processing time	Direct HTTP calls, <5ms overhead
Failure Modes	Single point of failure for ALL telemetry	Per-exporter circuit breakers, graceful degradation
Development Velocity	Complex configuration, environment-specific tuning	Environment variables, works everywhere

The genius of this approach: It leverages OpenTelemetry’s flexible exporter architecture to simultaneously send traces to both platforms while maintaining the simplicity of a single library.

Data flow specifics:

Traceloop SDK captures AI operations (LLM calls, agent workflows)
Our bridge intercepts spans before export
Each exporter transforms spans into platform-native format
Circuit breakers protect against cascading failures
Both platforms receive correlated traces with identical trace IDs

Enterprise Architecture Team: “All our telemetry goes through Instana!” GenAI Engineering Team: “We have granular LLM insights in Langfuse!” Platform Engineering Team: “It’s one simple library with no additional infrastructure!”

The Walk-through

Configuration: Environment-Driven Flexibility

The bridge uses a comprehensive configuration system that supports both platforms. Create a .env file for proper configuration:

# Service identification
OTEL_SERVICE_NAME=my-ai-app
OTEL_SERVICE_VERSION=1.0.0
DEPLOYMENT_ENVIRONMENT=production

# Enable exporters (choose one or both)
OTEL_EXPORTER_LANGFUSE_ENABLED=true
LANGFUSE_ENDPOINT=https://cloud.langfuse.com
LANGFUSE_API_KEY=your-langfuse-key
LANGFUSE_SECRET_KEY=your-secret-key
LANGFUSE_PUBLIC_KEY=your-public-key

OTEL_EXPORTER_INSTANA_ENABLED=true
INSTANA_ENDPOINT=http://instana-agent:4318
INSTANA_API_KEY=your-instana-key

# Optional: SSL/TLS Configuration for Corporate Environments
INSTANA_SSL_VERIFY=true
INSTANA_SSL_CA_BUNDLE=/path/to/ca-bundle.crt
LANGFUSE_SSL_VERIFY=true
LANGFUSE_SSL_CA_BUNDLE=/path/to/ca-bundle.crt

Real-World FastAPI Integration

Here’s how it looks in a production FastAPI application:

from fastapi import FastAPI
from genai_telemetry_bridge import TelemetryManager
from genai_telemetry_bridge.diagnostics import get_diagnostic_router
import boto3
import json

# Initialize telemetry before app creation
manager = TelemetryManager.configure()
app = FastAPI(title="GenAI Service")

# Add built-in diagnostic endpoints
diagnostic_router = get_diagnostic_router()
app.include_router(diagnostic_router, prefix="/telemetry", tags=["telemetry"])

async def call_ai_model(prompt: str, model_id: str) -> str:
    # Automatic instrumentation for Bedrock calls
    bedrock = boto3.client('bedrock-runtime')
    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps({
            "prompt": prompt,
            "max_tokens": 1000,
            "temperature": 0.7
        })
    )
    return json.loads(response['body'].read())['completion']

@app.post("/chat")
async def chat_endpoint(message: str, model: str = "eu.anthropic.claude-sonnet-4-20250514-v1:0"):
    # Automatic tracing and dual export to Instana + Langfuse
    result = await call_ai_model(message, model)

    return {
        "response": result,
        "model": model,
        "trace_available": True  # Check /telemetry/exported_traces for trace data
    }

@app.on_event("shutdown")
async def shutdown():
    manager.shutdown()

# The telemetry bridge handles the rest:
# - Instana gets APM metrics and distributed traces
# - Langfuse gets detailed LLM conversation flows
# - Both get correlated via OpenTelemetry trace IDs
# - Built-in diagnostic API available at /telemetry/* endpoints

Performance Benchmarks

Latency Impact (The Good News)

✅ Empirical Results from Corporate Environment:
✅ Baseline (no telemetry): 11.94ms
✅ Single export (Langfuse): 12.58ms (+5.4% overhead)
✅ Single export (Instana): 12.80ms (+7.2% overhead)
✅ Dual export: 13.06ms (+9.4% overhead)

For GenAI applications where LLM calls take 1-10 seconds,

✅ 1ms telemetry overhead = 0.01-0.1% impact (essentially noise level)

Memory Footprint (Also Good News)

Memory usage after 10,000 traces:
- Buffer storage: ~8.2MB
- Export processing: ~1.8MB
- Total overhead: ~10MB
That’s less than a single Chrome tab 🙃

Export Success Rates (The Really Good News)

✅ Instana Export Success: 99.7%
✅ Langfuse Export Success: 99.9%
✅ Circuit Breaker Activations: 0.1%
✅ Data Loss Events: 0%

Hard-Earned Lessons

1. Simplicity Beats Elegance (Every. Single. Time.)

I spent days having a back and forth with Google Gemini and verifying my thoughts, then a few mermaid architectural diagrams with message queues, processing pipelines, and microservices figuring out how I would productionalize an OTEL collector with resource attributes mapping. The solution that actually worked? A single library that exports to two places.

The lesson: Your architecture should solve problems, not create them. If you need a whiteboard to explain how observability works, you’re probably overthinking it.

2. Circuit Breakers Aren’t Optional – They’re Insurance

Here’s what happens without circuit breakers: One bad Langfuse or Clickhouse deployment takes down our entire GenAI application because trace exports start timing out. Yes, I learned this the hard way at 2 AM after deploying this in our ephemeral environment.

The lesson: Observability failures should never cascade to user-facing functionality. Build defensive patterns from day one.

3. Configuration Is the Ultimate Peace Offering

Different teams have different needs. Development wants everything logged, staging wants moderate sampling, production wants performance-first:

# Development: See everything
TELEMETRY_SAMPLE_RATE=1.0
LANGFUSE_ENABLED=true

# Production: Performance first
TELEMETRY_SAMPLE_RATE=0.1
LANGFUSE_ENABLED=false  # Instana only for critical paths

The lesson: Flexibility prevents religious wars about the “right” configuration.

4. Contributing Back to the Ecosystem

After building our dual-export solution, I realized this wasn’t just our problem. The entire community could benefit from native multi-exporter support in the Traceloop SDK. So I created a GitHub issue proposing built-in support for additional exporters beyond just Langfuse and Instana (our use case).

The response was positive, and discussions are ongoing about the best approach to implement this upstream. Sometimes the solution you build for your specific needs can spark broader improvements in the open source ecosystem.

The lesson: When you solve a real problem, consider giving back. Your specific enterprise constraint might be a common pain point that deserves a community solution.

5. Evidence-Based Engineering Wins Technical Debates

After sharing my approach on LinkedIn, the Traceloop CEO expressed legitimate concerns about performance impact, suggesting that dual export would “slow down your entire app” and calling direct dual export a “bad practice” that affects production apps for observability.

Valid concerns from someone who built the framework! But rather than argue theoretically, I ran comprehensive benchmarks using the actual test suite to measure real-world impact:

Empirical Results (Corporate Environment):

Note: Used Jaeger running in a Docker container to mock Instana export, since I lacked write-ce have granular LLM insights in Langfuse!” Papable API key access for Instana and would need K8s deployment to use the Instana agent properly.

Baseline (no telemetry): 11.94ms
Single export (Langfuse): 12.58ms (+5.4% overhead)
Single export (Jaeger - Instana Mock): 12.80ms (+7.2% overhead)
Dual export: 13.06ms (+9.4% overhead)

The key insight: For GenAI applications where LLM calls typically take 1-10 seconds, a 1ms telemetry overhead represents 0.01-0.1% impact—essentially noise level.

The lesson: When experts challenge your approach, respond with data, not opinions. Evidence-based engineering beats theoretical arguments every time. Sometimes the “unconventional” solution is actually the pragmatic one.

6. Diagnostic APIs Are Not Optional – They’re Your Troubleshooting Lifeline

Building 13 diagnostic endpoints (/telemetry/exported_traces, /trace_analyzer/{trace_id}, etc.) felt like over-engineering at first. In production, they became indispensable. When traces weren’t appearing in Langfuse, a quick /telemetry/exported_traces check revealed a network timeout issue that would have taken hours to debug otherwise.

The lesson: Always build observability into your observability. You’ll need to debug your monitoring system just as much as your application code.

7. Corporate SSL/TLS Is a Special Kind of Hell

“Just set SSL_VERIFY=false” works in development. In production with corporate proxies, custom CA bundles, and security policies, you need proper certificate handling. SSL configuration consumed 40% of our integration debugging time.

The lesson: Test with production-like SSL/TLS configurations early. Corporate networking complexity is not optional – plan for it from day one.

8. Optional Dependencies Are Enterprise-Friendly Dependencies

Not everyone needs Bedrock or Google ADK instrumentation. Not everyone uses FastAPI. Making instrumentation optional through dependency groups (uv add --group otel-bedrock otel-google-adk otel-fastapi) kept the core library lightweight while supporting diverse use cases.

The lesson: Design for modularity from the start. Optional features should be optional dependencies, not forced complexity for everyone.

9. Observability Needs Observability – Monitor Your Monitoring

Building telemetry infrastructure without monitoring the telemetry system itself is like building a fire alarm without testing if it can actually make sound. We learned this when traces silently stopped flowing to Langfuse due to a network timeout, and we only discovered it hours later during a demo.

# Built-in health checks and self-monitoring
@app.get("/telemetry/health")
async def telemetry_health():
    return {
        "exporters": {
            "langfuse": await check_langfuse_connection(),
            "instana": await check_instana_connection()
        },
        "circuit_breakers": get_circuit_breaker_states(),
        "last_successful_export": get_last_export_timestamp()
    }

The lesson: Your observability tools need observability too. Build health checks, circuit breaker status endpoints, and export success monitoring from day one. You’ll debug your monitoring system just as much as your application code.

10. Graceful Startup and Shutdown Are Not Optional Features

Nothing screams “amateur hour” like observability code that crashes during application startup because Langfuse is temporarily unavailable, or leaves hanging connections during shutdown. Enterprise applications start and stop frequently – deployments, scaling events, maintenance windows.

We learned this during a Kubernetes rolling deployment when half our pods were stuck in terminating state because telemetry exporters weren’t properly closing connections.

# Proper lifecycle management
class TelemetryManager:
    async def __aenter__(self):
        await self.initialize()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        await self.shutdown(timeout=30)  # Hard timeout

    def shutdown(self, timeout: int = 30):
        """Graceful shutdown with timeout"""
        # Close exporters, flush buffers, release resources

The lesson: Design for the lifecycle, not just the happy path. Applications that can’t start cleanly when dependencies are unavailable, or can’t shut down gracefully under load, create operational headaches that compound over time.

Outcome

For Infrastructure Teams (Instana):

Full APM visibility into GenAI services
Integration with existing alerting and dashboards
Compliance with enterprise monitoring standards
Distributed tracing across entire service mesh

For GenAI Engineering Teams (Langfuse):

Granular LLM conversation analysis
Token usage and cost optimization insights
Prompt engineering experiment tracking
Rich community-driven analytics features

For Platform Teams:

Zero additional infrastructure to maintain
Minimal performance impact (< 1% latency overhead)
High reliability (99.7%+ export success rate)
Simple configuration and deployment

Why “Both” Is Sometimes the Right Answer

Few months ago, I was stuck between two teams with valid but conflicting requirements.

Enterprise architects needed Instana compliance.
GenAI engineers needed Langfuse insights.
Platform engineering needed something that wouldn’t break production.

The traditional enterprise approach? Pick one tool, make everyone else adapt. Someone always loses.

The startup approach? Use whatever tool you want, worry about enterprise compliance later. Technical debt accumulates.

The solution that actually worked? Build a bridge. Give everyone what they need without forcing anyone to compromise on their core requirements.

The key insight: You don’t have to choose between enterprise compliance and developer productivity. You don’t have to sacrifice AI observability for operational standards. You don’t have to pick sides in tool wars.

With the right architectural approach, you can have both. And sometimes, “both” is the most pragmatic answer to an enterprise dilemma.

P.S. – That enterprise architect who initially shut down Langfuse? He’s now one of the biggest advocates for our dual-export approach. Turns out, when you solve the compliance problem, people are surprisingly open to better tooling. 😊

Conclusion

What started as an enterprise constraint — “use only Instana” — turned into an opportunity to build something that benefits everyone. The Python module genai-telemetry-bridge isn’t just a technical solution; it’s proof that pragmatic engineering can bridge organizational divides.

The real win wasn’t the 9.4% performance overhead (which turned out to be negligible for GenAI applications) or even the 99.7% export success rate. It was watching teams that were previously at odds now collaborate on observability strategies. Infrastructure teams got their standardization, GenAI engineers got their insights, and platform teams got their simplicity.

Few months later, this dual-export approach has become our standard for GenAI observability. More importantly, it sparked conversations about how enterprise tooling decisions don’t have to be zero-sum games.

Sometimes the best architecture isn’t the most elegant one in the textbook — it’s the one that solves real problems for real teams without forcing anyone to compromise their core needs.

And hey, if you’re stuck in a similar enterprise tool conflict, remember: you don’t always have to choose sides. Sometimes you just need to build a bridge.