LangSmith for Debugging and Monitoring LLMs — Trace Every AI Call • Meteora Web Agency

Your LLM application sometimes hallucinates, hangs on identical inputs, or burns tokens for no reason. Classic logs only show input and output, not the intermediate reasoning, tool calls, or parsing failures. The problem is you're paying for every wasted token with no visibility into where the value leaks. At Meteora Web, after years debugging complex systems (ERP, e-commerce, SaaS platforms), we've brought the same discipline to AI pipelines. Here's why LangSmith isn't optional — it's the dashboard that turns your LLM from a black box into a controllable system.

Why Use LangSmith for Debugging LLM Applications?

An LLM application isn't a pure function. Each run can involve a chain of steps: prompt template, retrieval from a vector store, external API call, response generation, structured output parsing. If one step fails, the error propagates silently. Traditional logging captures only the final result; LangSmith captures every single trace — timing, tokens spent, intermediate inputs/outputs, exceptions. We use it to quickly answer questions like: "why did the customer get an empty response?" — instead of reproducing the bug locally, we open the trace and see exactly where the retrieval returned zero results.

The hidden cost of manual debugging

Imagine a customer support AI. A user writes "cancel order 12345" and the AI replies "order not found". Why? Could be wrong embedding, overly aggressive chunking, misconfigured tool. Without tracing, you lose hours reproducing and adding logs. With LangSmith, the trace shows the context retrieved from the vector store — maybe the chunk ended with "..." cutting off the number. We've seen that happen. Fixed in ten minutes instead of two days.

How Does LangSmith Tracing Work?

LangSmith integrates with LangChain (but also supports custom frameworks via Python/JS SDK). Each call to a model, tool, retriever generates a run. Runs nest into traces representing an entire user session or job. Everything is sent via API to LangSmith Cloud (or on-premise for enterprise). The web interface shows an interactive timeline with latency, token cost, input/output, errors.


from langsmith import Client
from langchain_openai import ChatOpenAI
from langchain.smith import RunEvalConfig, run_on_dataset

client = Client()  # uses LANGCHAIN_API_KEY by default

# Create a unique trace for a single call
with client.trace(
    project_name="my-app",
    run_type="chain",
    name="customer-support"
) as rt:
    llm = ChatOpenAI(model="gpt-4o-mini")
    result = llm.invoke("What is the status of my order?")
    rt.end(outputs={"response": result.content})

Set up environment in two minutes

You need two environment variables: LANGCHAIN_API_KEY (get it at smith.langchain.com) and LANGCHAIN_PROJECT (your project name). Then import langsmith and use the client.trace context manager, or if you use LangChain, enable automatic tracing with langchain.llms that integrates seamlessly. We recommend creating one project per environment (dev, staging, prod) and enabling tracing selectively via configuration flags.

What Metrics Should You Monitor in Production with LangSmith?

LangSmith is not just post-hoc debugging: it offers real-time monitoring with customizable dashboards. Key metrics:

Latency per step — identify bottlenecks (e.g., slow retrieval vs generation)
Tokens consumed per run — compare models and prompts to optimize costs
Error rate by run type — failed tool calls, parsing exceptions
User feedback — send ratings (like/dislike) and correlate with traces
Cost distribution by endpoint — if you use multiple providers, see where you spend

Building a cost monitoring loop with feedback

We built a system that, for every generated response, sends a feedback to LangSmith with the computed cost. Then we filter by project and get a daily report: "gpt-4o-mini generated 90% of responses at acceptable cost, but 10% exceeded due to long prompts." We reduced costs by 30% simply by trimming unnecessary prompts.


# Example: sending cost feedback
client.create_feedback(
    run_id=run.id,
    key="cost_usd",
    score=0.0025  # dollars spent on that run
)

How to Integrate LangSmith into Your LangChain Project?

If you use LangChain (Python or TypeScript), integration is transparent. Just set environment variables and LangChain will automatically send traces. For custom projects (without LangChain), use the client SDK directly. Let's see both scenarios.

Automatic integration with LangChain


import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "support-chat"

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer concisely."),
    ("human", "{input}")
])
chain = prompt | llm
result = chain.invoke({"input": "Where is my package?"})
print(result.content)

Every chain.invoke call produces a full trace. You can see duration, tokens, and even the prompt content sent to the API.

Manual integration without LangChain (native SDK)


from langsmith import Client
import openai

client = Client()
openai.api_key = "sk-..."

with client.trace(
    project_name="custom-app",
    run_type="llm",
    name="gpt-call"
) as rt:
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    rt.end(outputs={"response": response.choices[0].message.content},
            inputs={"messages": [{"role": "user", "content": "Hello"}]})

Practical Debugging Examples with LangSmith: Trace Sessions and Feedback

A real case: a chatbot for a fashion company. The AI had to answer about sizes and availability. Sometimes it answered correctly, other times it said "product not found". We enabled tracing and discovered the retriever used an empty description field for some products. The trace showed the retrieved context: zero documents. Fixed by populating missing descriptions.

Using sessions to debug multi-turn flows

LangSmith supports sessions: you can group multiple runs under a single identifier (e.g., user conversation). This way you see the entire conversation: each message is a run nested under the session. We use it to analyze where the AI loses context after three turns.


with client.trace(
    project_name="conversation-demo",
    run_type="session",
    name="user-1234-session"
) as session:
    for turn in ["Hello", "I want a red sweater", "Size M"]:
        with session.run(
            run_type="llm",
            name=f"turn-{turn[:5]}"
        ) as run:
            response = llm.invoke(turn)
            run.end(outputs={"response": response.content})

What To Do Next

Enable LangSmith today — Go to smith.langchain.com, create a free account (up to 1,000 runs/day) and get your API key. Set LANGCHAIN_API_KEY and LANGCHAIN_PROJECT in your development environment.
Identify a critical flow — Pick a pipeline you use in production or testing. Add tracing with the context manager or set LANGCHAIN_TRACING_V2 = true. Run one call and watch the trace in real time.
Set up error alerts — LangSmith supports webhooks. Connect Slack or email to receive notifications when the error rate exceeds a threshold. We use it to intervene before the customer notices.
Send cost feedback — After each run, calculate the cost (tokens * price per token) and send it as feedback. In a few days you'll have a cost dashboard.
Deepen your knowledge with the Pillar Guide — To master LangChain from A to Z, read our Pillar Guide on LangChain and LLM for Developers.

LangSmith gives you the visibility you need when running an AI system in production. We no longer consider it an extra — it's the dashboard that lets us sleep soundly, knowing every token spent is tracked and every anomaly is under control.

LangSmith for Debugging and Monitoring LLMs — Trace Every AI Call Without Losing Time

Why Use LangSmith for Debugging LLM Applications?

The hidden cost of manual debugging

How Does LangSmith Tracing Work?

Set up environment in two minutes

What Metrics Should You Monitor in Production with LangSmith?

Building a cost monitoring loop with feedback

How to Integrate LangSmith into Your LangChain Project?

Automatic integration with LangChain

Manual integration without LangChain (native SDK)

Practical Debugging Examples with LangSmith: Trace Sessions and Feedback

Using sessions to debug multi-turn flows

What To Do Next

> AUTHOR_EXTRACTED

Ing. Calogero Bono

We build the digital presence your business deserves.

Stay in the loop

> MW_JOURNAL LATEST_LOGS

Summer of Ludd in New York Teaches Gen Z to Live Without Technology for a Week

Slate launches America's cheapest electric pickup at $24,999 with LFP batteries sourced from China

Spain vs Austria face off for a spot in the round of 16: Lamine Yamal leads La Roja in Los Angeles

QR Digital Menu for Restaurants — Boost Orders and Cut Dining Room Errors

California manure methane program may increase long-term warming, studies show