Office Hours 5: LangChain/LangSmith
notes
llms
This Q&A session on LangChain/LangSmith covers topics like product differentiation, features, use cases, agent workflows, data set creation, and full-stack development for ML engineers. .
This post is part of the following series:
- Mastering LLMs Course Notes: My notes from the course Mastering LLMs: A Conference For Developers & Data Scientists by Hamel Husain and Dan Becker.
LangSmith’s Position in the Observability Market
- Question: How does LangSmith differentiate itself from other observability tools in the market?
- Answer:
- LLM Application Focus: LangSmith is specifically designed for LLM applications, offering specialized features like message and document visualization for debugging.
- Guide: Log retriever traces
- Chains of LLM Calls: It emphasizes visualizing and analyzing entire chains of LLM calls and retrieval steps, which is crucial for complex applications.
- Human-in-the-Loop Features: LangSmith prioritizes human interaction with features like:
- Data visualization
- Annotation queues for collaboration with subject matter experts
- Side-by-side comparisons for evaluating improvements
- Alignment of evaluators with human preferences
- How-to guides: Human feedback
- Pairwise Evaluation: LangSmith enables pairwise evaluation of models, leading to more stable results.
- Guide: Run pairwise evaluations
- Strong Support and Openness: LangSmith is praised for its excellent support, responsive team, and open APIs that allow integration with other tools.
- LLM Application Focus: LangSmith is specifically designed for LLM applications, offering specialized features like message and document visualization for debugging.
LangSmith’s Support for Human Annotation and Action Items
- Question: What support does LangSmith offer for human annotation, annotation queues, and taking action on user feedback?
- Answer:
- Annotation Queues:
- Data can be sent to annotation queues programmatically (e.g., based on user thumbs down) or manually.
- Annotators can provide feedback, edit outputs, and add corrected examples to datasets.
- Guide: Use annotation queues
- Datasets for Improvement: Corrected examples in datasets can be used for testing and future model improvement.
- Concept: Evaluation
- Few-Shot Learning: LangSmith aims to be a platform for gathering few-shot example datasets, which can be used for personalization by pulling down the most similar examples during runtime.
- Annotation Queues:
Understanding the LangChain “Lang” Namespace
- Question: What’s the difference between Langchain, Langsmith, Langgraph, Langflow, and Langserve?
- Answer:
- LangChain: The foundational open-source package for building LLM apps, offering a runtime, abstractions, integrations, and off-the-shelf chains.
- LangFlow (Not Langchain Company): A low-code/no-code UI built on top of LangChain.
- LangServe: Simplifies deploying LangChain applications by exposing them as FastAPI endpoints.
- LangGraph: An extension of LangChain specifically designed for building and managing highly controllable agent-based workflows.
- LangSmith: A standalone observability and testing tool for LLM apps, usable with or without LangChain.
When to Use LangChain vs. LangGraph
- Question: When would you choose LangChain, and when is LangGraph the better option?
- Answer:
- LangChain: Ideal for beginners and for rapidly prototyping simple LLM applications with single LLM calls.
- LangGraph: Suited for advanced teams building complex, agentic workflows that require:
- Cyclical agent execution
- Fine-grained control
- Built-in persistence
- Streaming and background modes
Popularity of TypeScript vs. Python in LLM Tools
- Question: How does the usage of TypeScript APIs compare to Python APIs in LangChain and related tools?
- Answer:
- Python Dominates: Python remains more popular overall, possibly due to:
- A larger community focused on LLM application prototyping.
- Stronger ecosystem for data engineering tasks related to retrieval.
- TypeScript for Generative UI: TypeScript is gaining traction, especially for applications involving generative UI, which is more challenging to implement in Python.
- Python Dominates: Python remains more popular overall, possibly due to:
Generative UI Explained
- Question: What is generative UI, and how does it work?
- Answer:
- Beyond Simple Chat: Generative UI enables LLMs to return more than text; they send UI components to create richer interfaces.
- Example: Instead of a list of weather data, an LLM might return a dynamic graph component with zoom and interaction capabilities.
- Vercel AI SDK Integration: LangChain now integrates with Vercel’s AI SDK for easier development of generative UI experiences.
- Vercel AI SDK: TypeScript toolkit designed to help developers build AI-powered applications with React, Next.js, Vue, Svelte, Node.js, and more.
Defining “Agentic” in the Context of LLMs
- Question: What does “agentic” mean in the context of LLMs, and is it a significant distinction?
- Answer:
- LLM in Control: An agentic system is one where the LLM controls the application’s control flow and decision-making process.
- More Than Function Calling: While related to function calling, agentic systems go further by enabling LLMs to loop, adapt, and make dynamic decisions about the next steps.
- Implications for Development: This distinction introduces new challenges and considerations in UX design, observability, and testing.
LangChain/LangSmith Features for Agentic Workflows
- Question: What features in LangChain/LangSmith specifically aid in developing and managing agentic workflows?
- Answer:
- LangGraph’s Strengths:
- Controllability: LangGraph’s low-level design provides a high degree of control, which is essential for managing complex agents.
- Persistence and Human-in-the-Loop: Built-in persistence and easy access to execution history enable checkpointing, resuming from specific states, and human intervention when needed.
- LangSmith’s Role: While not agent-specific, LangSmith’s observability features are particularly valuable for debugging and understanding complex, agentic applications.
- LangGraph’s Strengths:
Multiple LLM Collaboration in Practice
- Question: Is the idea of using multiple LLMs with different strengths in a single application realistic?
- Answer:
- Planning and Execution: A common pattern involves a powerful LLM (e.g., GPT-4) for high-level planning and decision-making, while specialized or more cost-effective models (e.g., specialized code generation models) handle specific tasks.
Building Evaluation Sets with LangSmith
- Question: What’s the most effective way to use LangSmith for creating evaluation sets?
- Answer:
- Manual Seeding: Begin with a small set (5-10) of manually crafted examples.
- Production Feedback Loop: Integrate with production logs to capture real-user interactions and identify edge cases.
- Iterative Refinement:
- Manually add challenging or interesting cases to the dataset.
- Encourage user feedback and incorporate relevant examples.
- Consider synthetic data generation to expand the dataset, but prioritize human review and labeling.
- Concepts: Evaluation
- How-to guides: Evaluation
- Tutorial: Evaluate your LLM application
Recommended Stack for Full-Stack ML Engineers
- Question: What’s a good technology stack for Python-centric ML engineers who want to build and ship full-stack applications?
- Answer:
- Python-First Options:
- Streamlit/Gradio: Excellent for rapid prototyping and simpler applications.
- LangChain Templates: Explore and adapt existing LangChain repositories with Python backends.
- LangServe: Easily deploy LangChain apps.
- Long-Term Goal: Aim to become proficient in 2-3 languages (Python, JavaScript/TypeScript, SQL) for greater flexibility and control over the entire application stack.
- Tips:
- Leverage LLMs (like ChatGPT) to assist with JavaScript/TypeScript code generation and understanding.
- Don’t shy away from forking and modifying existing repositories to learn and adapt.
- Python-First Options: