Conference Talk 13: When to Fine-Tune with Paige Bailey

notes
llms
In this talk, Paige Bailey, Generative AI Developer Relations lead at Google, discusses Google’s AI landscape with a focus on Gemini models and their applications.
Author

Christian Mills

Published

July 25, 2024

This post is part of the following series:
  • Mastering LLMs Course Notes: My notes from the course Mastering LLMs: A Conference For Developers & Data Scientists by Hamel Husain and Dan Becker.

Google AI Landscape and Gemini

  • Vertex AI:

    • Vertex AI: Collection of APIs, compute infrastructure, model deployment tools available through Google Cloud, geared towards enterprise use. Comparable to Azure Open AI services.
    • Gemini Developer API (through AI Studio): Easier path for rapid prototyping and personal projects. Comparable to OpenAI APIs.
  • Gemini Flash Fine-Tuning:

    • Gemini 1.5 Flash: Google’s most performant, efficient, and cost-effective model, boasting a 1 million token context window (and growing).
    • Supports fine-tuning and is part of an early tester program.
  • Gemini Nano & Gemma:

    • Gemini Nano: Brief mention of its planned integration into Chrome and Pixel/Android devices (details deferred).
    • Gemma:
      • Open-source versions of Gemini, available on Hugging Face, Kaggle, and Ollama, making local experimentation easy.
      • Kaggle hosts checkpoints, code samples, and runnable notebooks.

Generative AI and Google

  • Google’s history in machine learning: TensorFlow, transformer models (BERT, AlphaFold, AlphaStar, AlphaGo, T5), and now Gemini.
  • Generative AI extends beyond text and code, mentioning:
    • Imagen 2: Detailed image generation.
    • Chirp: Speech-to-text with multilingual capabilities and a small model footprint.
  • Gemini: Google’s flagship model (currently on version 1.5)

Gemini Model Features

  • Multimodal Understanding: Processes images, audio, text, code, video, and more simultaneously.
  • State-of-the-art Performance: Excels across various tasks, though reliant on academic benchmarks (discussed later).
  • Embedded Reasoning: Strong capabilities in chain-of-thought and step-by-step reasoning.
  • Scalable Deployment: Optimized for both large-scale (Google products) and small-scale (edge devices) use cases.
  • Efficiency and Privacy: Focus on cost-effective token analysis, reduced inference compute, and on-device processing for privacy preservation.
  • Model Options:
    • Gemini 1.5 Pro: High-performance, efficient model.
    • Gemini Nano: Ultra-small model for edge deployments.
    • Gemma: Open-sourced models (2B and 7B parameters)
  • Key considerations for integration: user experience, performance, and cost trade-offs.
  • Available Options:
    • Gemini 1.5 Flash: Fast, 1 million token context window.
    • Gemini 1.5 Pro: 2 million token context window
  • Gemini Flash for Code:
    • Performs well for code generation and structured outputs like JSON out-of-the-box.
    • Fine-tuning and using code examples in the context window further enhance results.
    • Applicable to code generation, translation, debugging, code review, etc.

Understanding Context Windows

  • Importance of Context Window Size:
    • Historically limited to 2,000-8,000 tokens, hindering model capability.
    • Current models: GPT-4 Turbo (128,000+), Claude (2,000), Gemini (2 million).
  • Impact of Larger Context Windows:
    • Can handle massive amounts of data (emails, texts, videos, codebases, research papers).
    • Reduces the need for fine-tuning, as more information can be provided at inference time.
    • Allows for more complex and nuanced outputs.

Fine-tuning vs. Prompting vs. Retrieval

Common Questions & Trade-offs

  • Key decision points when working with large language models.
  • Considerations:
    • Prompt Design: Simple, cost-effective, but may require detailed prompts.
    • Fine-Tuning:
      • Increasingly difficult to justify due to maintenance overhead and rapid release of new open-source models.
      • Recommended only when other options fail or for on-premise/local data requirements.
  • Recommendations:
    • Start with Closed-Source APIs: Rapid iteration, prove product-market fit, focus on UX.
    • Hire ML Team When Necessary: If highly specialized fine-tuning becomes essential.

Model Evaluation & Its Importance

  • Limitations of Academic Benchmarks:
    • Example: HumanEval
      • Often misinterpreted as involving human evaluation (it doesn’t).
      • Tests a narrow scope of Python function completion with simplistic tasks.
      • Not representative of real-world software engineering or other programming languages.
  • HumanEval X: Created to address some limitations of HumanEval, but still has limitations.
  • Key Takeaways:
    • Carefully consider the relevance and limitations of evaluation metrics.
    • Prioritize custom evaluations tailored to your specific use case and business needs.

Prompting Strategies and Examples

Power of Prompting & Video Understanding

  • Detailed Example:
    • Using Gemini in AI Studio to analyze a 44-minute video.
    • Asking the model to find a specific event (paper removed from a pocket), identify information on the paper, and provide the timestamp.
    • Demonstrates the ability to understand and extract information from lengthy video content, potentially revolutionizing video analysis workflows.
  • Implications:
    • Transforms how we interact with video content, making it searchable and analyzable at scale.
    • Also applicable to large text documents (PDFs with images, graphs, code) for summarization, analysis, and research.
  • Prefix Caching:
    • Optimizes API calls for repeated analysis of the same codebase or repository.
    • Improves latency and grounds responses within a consistent context.

AI Studio Overview & Examples

  • Key Features:
    • Adjust stop sequences, top-k configurations, and temperature.
    • Toggle between Gemini models (Pro, Flash, etc.).
    • Access prompt gallery, cookbook, and getting started resources.
    • View past prompts and outputs.
  • Examples:
    • Scraping GitHub issues and Stack Overflow questions for analysis.
    • Converting COBOL code to Java with specific instructions and architecture preferences.
  • Key Takeaway: With detailed instructions, models can achieve impressive results, much like a skilled contractor team.

Retrieval Augmented Generation

Retrieval in Google Products

  • gemini.google.com (formerly Bard):
    • Example: Querying for information about the San Francisco Ferry Building and requesting recommendations.
    • Results are grounded in Google Search, with an option to view source citations and confidence levels.
  • Personalized Retrieval: The concept can be extended to internal corporate data and codebases.

Fine-tuning Considerations and Gemma


About Me:
  • I’m Christian Mills, a deep learning consultant specializing in computer vision and practical AI implementations.
  • I help clients leverage cutting-edge AI technologies to solve real-world problems.
  • Learn more about me or reach out via email at [email protected] to discuss your project.