Conference Talk 13: When to Fine-Tune with Paige Bailey
notes
llms
In this talk, Paige Bailey, Generative AI Developer Relations lead at Google, discusses Google’s AI landscape with a focus on Gemini models and their applications.
This post is part of the following series:
- Mastering LLMs Course Notes: My notes from the course Mastering LLMs: A Conference For Developers & Data Scientists by Hamel Husain and Dan Becker.
- Google AI Landscape and Gemini
- Understanding Context Windows
- Fine-tuning vs. Prompting vs. Retrieval
- Prompting Strategies and Examples
- Retrieval Augmented Generation
- Fine-tuning Considerations and Gemma
Google AI Landscape and Gemini
Vertex AI:
- Vertex AI: Collection of APIs, compute infrastructure, model deployment tools available through Google Cloud, geared towards enterprise use. Comparable to Azure Open AI services.
- Gemini Developer API (through AI Studio): Easier path for rapid prototyping and personal projects. Comparable to OpenAI APIs.
Gemini Flash Fine-Tuning:
- Gemini 1.5 Flash: Google’s most performant, efficient, and cost-effective model, boasting a 1 million token context window (and growing).
- Supports fine-tuning and is part of an early tester program.
Gemini Nano & Gemma:
- Gemini Nano: Brief mention of its planned integration into Chrome and Pixel/Android devices (details deferred).
- Gemma:
- Open-source versions of Gemini, available on Hugging Face, Kaggle, and Ollama, making local experimentation easy.
- Kaggle hosts checkpoints, code samples, and runnable notebooks.
Generative AI and Google
- Google’s history in machine learning: TensorFlow, transformer models (BERT, AlphaFold, AlphaStar, AlphaGo, T5), and now Gemini.
- Generative AI extends beyond text and code, mentioning:
- Gemini: Google’s flagship model (currently on version 1.5)
Gemini Model Features
- Multimodal Understanding: Processes images, audio, text, code, video, and more simultaneously.
- State-of-the-art Performance: Excels across various tasks, though reliant on academic benchmarks (discussed later).
- Embedded Reasoning: Strong capabilities in chain-of-thought and step-by-step reasoning.
- Scalable Deployment: Optimized for both large-scale (Google products) and small-scale (edge devices) use cases.
- Efficiency and Privacy: Focus on cost-effective token analysis, reduced inference compute, and on-device processing for privacy preservation.
- Model Options:
- Gemini 1.5 Pro: High-performance, efficient model.
- Gemini Nano: Ultra-small model for edge deployments.
- Gemma: Open-sourced models (2B and 7B parameters)
- Key considerations for integration: user experience, performance, and cost trade-offs.
- Available Options:
- Gemini 1.5 Flash: Fast, 1 million token context window.
- Gemini 1.5 Pro: 2 million token context window
- Gemini Flash for Code:
- Performs well for code generation and structured outputs like JSON out-of-the-box.
- Fine-tuning and using code examples in the context window further enhance results.
- Applicable to code generation, translation, debugging, code review, etc.
Understanding Context Windows
- Importance of Context Window Size:
- Historically limited to 2,000-8,000 tokens, hindering model capability.
- Current models: GPT-4 Turbo (128,000+), Claude (2,000), Gemini (2 million).
- Impact of Larger Context Windows:
- Can handle massive amounts of data (emails, texts, videos, codebases, research papers).
- Reduces the need for fine-tuning, as more information can be provided at inference time.
- Allows for more complex and nuanced outputs.
Fine-tuning vs. Prompting vs. Retrieval
Common Questions & Trade-offs
- Key decision points when working with large language models.
- Considerations:
- Prompt Design: Simple, cost-effective, but may require detailed prompts.
- Fine-Tuning:
- Increasingly difficult to justify due to maintenance overhead and rapid release of new open-source models.
- Recommended only when other options fail or for on-premise/local data requirements.
- Recommendations:
- Start with Closed-Source APIs: Rapid iteration, prove product-market fit, focus on UX.
- Hire ML Team When Necessary: If highly specialized fine-tuning becomes essential.
Model Evaluation & Its Importance
- Limitations of Academic Benchmarks:
- Example: HumanEval
- Often misinterpreted as involving human evaluation (it doesn’t).
- Tests a narrow scope of Python function completion with simplistic tasks.
- Not representative of real-world software engineering or other programming languages.
- Example: HumanEval
- HumanEval X: Created to address some limitations of HumanEval, but still has limitations.
- Key Takeaways:
- Carefully consider the relevance and limitations of evaluation metrics.
- Prioritize custom evaluations tailored to your specific use case and business needs.
Prompting Strategies and Examples
Power of Prompting & Video Understanding
- Detailed Example:
- Using Gemini in AI Studio to analyze a 44-minute video.
- Asking the model to find a specific event (paper removed from a pocket), identify information on the paper, and provide the timestamp.
- Demonstrates the ability to understand and extract information from lengthy video content, potentially revolutionizing video analysis workflows.
- Implications:
- Transforms how we interact with video content, making it searchable and analyzable at scale.
- Also applicable to large text documents (PDFs with images, graphs, code) for summarization, analysis, and research.
- Prefix Caching:
- Optimizes API calls for repeated analysis of the same codebase or repository.
- Improves latency and grounds responses within a consistent context.
AI Studio Overview & Examples
- Key Features:
- Adjust stop sequences, top-k configurations, and temperature.
- Toggle between Gemini models (Pro, Flash, etc.).
- Access prompt gallery, cookbook, and getting started resources.
- View past prompts and outputs.
- Examples:
- Scraping GitHub issues and Stack Overflow questions for analysis.
- Converting COBOL code to Java with specific instructions and architecture preferences.
- Key Takeaway: With detailed instructions, models can achieve impressive results, much like a skilled contractor team.
Retrieval Augmented Generation
Retrieval in Google Products
- gemini.google.com (formerly Bard):
- Example: Querying for information about the San Francisco Ferry Building and requesting recommendations.
- Results are grounded in Google Search, with an option to view source citations and confidence levels.
- Personalized Retrieval: The concept can be extended to internal corporate data and codebases.
Fine-tuning Considerations and Gemma
- Fine-Tuning:
- Should be approached with caution and a clear understanding of the maintenance commitment.
- Consider the rapid evolution of open-source models.
- Gemma Family:
- Solid starting point for open-source fine-tuning.
- Available in 2B and 7B parameter sizes, with both instruction-tuned and non-instruction-tuned variants.
- CodeGemma: For code-related tasks.
- RecurrentGemma: For sequential data.
- PaliGemma: Open-vision language model.
- Resources:
- Deployment: Easy one-click deployment to Google Cloud.
- Model Builders: Provides automatic comparisons and prompt management.
About Me:
I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.
Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.