Office Hours 1: Axolotl Q&A with Wing Lian
notes
llms
This Q&A session covered various topics, including template-free prompt construction, data type selection for HuggingFace datasets, DPO and RLHF, understanding chat templates and dataset types, ensuring consistent tokenization, multimodal fine-tuning, and future directions for Axolotl.
This post is part of the following series:
- Mastering LLMs Course Notes: My notes from the course Mastering LLMs: A Conference For Developers & Data Scientists by Hamel Husain and Dan Becker.
1. Template-Free Prompt Construction in Axolotl
- Purpose: Offers flexibility in defining custom chat roles and formats.
- Format: Uses simple input-output pairs with labels (true for model output, false for user input).
- Advantages:
- Easier to understand for some users.
- Translates well to platforms with existing input-output data.
- Disadvantages:
- Less flexible for changing chat templates later.
- Requires more manual string handling during inference.
- Recommendation: Use only if existing chat templates are insufficient.
- Documentation: https://openaccess-ai-collective.github.io/axolotl/docs/input_output.html
- Example:
{ "segments": [ { "label": true, "text": "<s>Hello\n" }, { "label": true, "text": "hi there!. " }, { "label": false, "text": "goodbye " }, { "label": true, "text": "farewell</s>" } ] }
2. How to Decide Data Type for Datasets on HuggingFace?
- Dataset Types:
- Alpaca: Instruction-based, with separate fields for instruction, input, and output.
- ShareGPT: Conversation-based, with variations in field names (e.g., human/user, GPT/assistant, value/content). Axolotl’s
type: sharegpt
handles most variations. - DPO: Includes chosen and rejected responses for preference learning. Variations exist in field names and prompt construction.
- KTO: Similar to DPO but without explicit preference pairs.
- User Defined: Allows custom formatting defined in the YAML file.
- Recommendation: Choose the type that best matches the dataset structure and fine-tuning objective.
- Examples:
- argilla/dpo-mix-7k: Chosen, rejected fields.
- Intel/orca_dpo_pairs: Question, chosen, rejected fields.
- argilla/kto-mix-15k: Prompt, completion, label, rating.
3. DPO and RLHF
- DPO (Direct Preference Optimization):
- Trains a model to maximize the probability of choosing preferred responses over rejected ones.
- Simpler to implement than RLHF but limited to single-turn preference learning.
- RLHF (Reinforcement Learning from Human Feedback):
- Uses reinforcement learning to optimize a model based on human feedback.
- More complex but potentially leads to higher quality alignment and multi-turn capabilities.
- Future Direction: Both DPO and RLHF are important, with a potential shift towards more robust RL-based methods.
4. Difference Between Chat Template and Datasets Type Parameters
- Chat Template: Defines the specific format of the chat conversation (e.g., LLAMA3, Mistral, ChatML). It gets added to the tokenizer config.
- Datasets Type: Specifies how Axolotl should parse and structure the input dataset (e.g., share_gpt, alpaca, dpo).
- Interaction:
- Setting a chat template can automatically set the output format for certain dataset types.
- The
chat_template
parameter in the YAML file overrides any default settings.
5. No Ops for Validation
- Currently, Axolotl lacks built-in validation checks for potential issues like:
- Rounding errors when saving models in different precision formats (e.g., float32 vs. bfloat16).
- Tokenization discrepancies between training and inference.
- Recommendation:
- Carefully manage model precision during saving.
- Implement custom checks to compare tokenization between training and inference pipelines.
6. Trust No One for Tokenization
- Key Takeaway: Always verify the actual tokens being fed into the model, as string handling and YAML parsing can introduce subtle errors.
- Example: YAML can remove trailing spaces in non-quoted strings, potentially affecting tokenization.
- Recommendation: Implement rigorous checks to ensure consistent tokenization between training and inference.
7. Ensuring Consistent Tokenization
- Challenge: Tokenization differences can arise from:
- Separate vs. concatenated tokenization of input and output strings.
- Special token handling in different chat templates.
- Recommendations:
- Use the
chat_template
parameter in Axolotl to enforce consistent formatting. - Implement tests to compare tokenization between fine-tuning and inference setups.
- Consider introducing minor tokenization variations during training as a form of data augmentation.
- Use the
8. Tokenizer Configs from Training to Inference
- Importance: Consistent tokenizer configurations are crucial for seamless transition from training to inference.
- Axolotl’s Approach: Setting the
chat_template
parameter in the YAML file updates the tokenizer config, which is then used by inference engines. - Challenge: Not all inference engines may fully support or utilize the chat template information.
- Recommendation: Verify that the chosen inference engine correctly interprets and applies the tokenizer config, including the chat template.
9. Multimodal Fine-tuning
- Current Status: Axolotl lacks native support for multimodal datasets and models.
- Challenges:
- Handling image data and integrating it with text data.
- Adapting to evolving approaches for multimodal tokenization and model architectures.
- Future Direction:
- Implementing dataset handling for images and other modalities.
- Potentially supporting both Lava-like approaches and native multimodal models.
- Call for Contributions: Help is needed in developing and implementing multimodal capabilities.
10. Is RLHF Still a Common Fine-tuning Technique?
- Answer: Yes, RLHF and other preference-based tuning methods (like DPO) are becoming increasingly common.
- Reasoning:
- Supervised fine-tuning has limitations in achieving high-quality alignment.
- RLHF and DPO enable learning from human preferences, leading to better model behavior.
- Future Trend: Expect to see wider adoption of both RL-based and non-RL preference optimization techniques.
11. DPO Limitations and RL Advantages
- DPO Limitation: Primarily designed for single-turn preference learning.
- RL Advantages:
- Supports multi-turn conversations and intermediate rewards.
- Can lead to better alignment and more nuanced model behavior.
- Trade-offs:
- RLHF is more complex and data-intensive than DPO.
- DPO is simpler to implement and doesn’t require a separate reward model.
12. Sample Files for PaliGemma and Phi-3
- Phi-3:
- Should work with existing Phi-2 configurations by swapping the baseline model.
- May require setting
trust_remote_code: true
in the YAML file.
- PaliGemma:
- No specific examples available yet.
- LLM fine-tuning might be possible, but full support requires multimodal dataset handling.
13. Conversational Datasets vs. QA Pairs
- Assumption: Conversational datasets are always more effective for fine-tuning.
- Clarification: The choice depends on the specific use case and desired model behavior.
- Recommendations:
- QA Pairs: Suitable for single-turn interactions or when mimicking a retrieval-based system.
- Conversational Datasets: Beneficial for training models to engage in multi-turn dialogue.
- Instruction Tuning: Recommended for gaining intuition about conversational datasets and fine-tuning.
14. Training Datasets for Completion Models
- Dataset Characteristics: Typically similar to pre-training datasets, often with a single “text” field.
- Examples:
- Story generation datasets.
- Any dataset focused on text completion or continuation.
15. Prompt Template for LLAMA3 and LLAMA Index
- Goal: Fine-tune LLAMA3 for use with LLAMA Index, which uses an OpenAI-like message abstraction.
- Recommendation:
- Choose a chat template that aligns with the message-based format (e.g., ChatML).
- Avoid instruction-based templates as they might not be suitable for multi-turn interactions.
16. Future Directions of Axolotl
- Areas for Contribution:
- Join the Axolotl Discord server and contribute to discussions.
- Explore the GitHub repository for open issues and feature requests.
- Developing new features and improving existing ones.
- Ongoing Development:
- Building a turnkey platform for simplified fine-tuning and deployment (similar to Modal).
- Integrating DPO, PPO, and enhanced dataset pipelines.
- Creating a user-friendly CLI and cloud integration.
17. VRAM Estimation
- Need: A tool for accurate VRAM estimation based on Axolotl configurations.
- Challenges:
- Complexities introduced by techniques like FSDP and DeepSpeed.
- Variations in VRAM usage based on batch sizes and model parallelism.
- Potential Approach: Leverage existing LLM math estimations and account for the impact of distributed training techniques.
18. Vibe Checks During Training
- Goal: Evaluate model performance and “vibes” during training.
- Options:
- Periodic Checkpointing: Pause training, run inference on the checkpoint, and resume.
- Dedicated Evaluation: Use a separate process to run inference on an eval dataset and log the results.
- Callbacks: Implement callbacks to trigger inference on demand during training.
- Challenges:
- VRAM limitations might make it difficult to run inference alongside training.
- Ensuring consistent tokenization and prompt handling between training and evaluation.
19. Familiarizing with Prompt Templates
- Recommendation: Use the
axolotl.cli.preprocess
command with thedebug
flag to visualize how Axolotl processes and tokenizes prompts. - Output: Displays the tokenized prompt with color-coding to distinguish between input, output, and masked tokens.
20. Axolotl vs. Unsloth
- Unsloth:
- Specialized for Lora fine-tuning.
- Offers memory optimizations but might be limited in GPU scalability.
- Axolotl:
- Provides a more comprehensive framework for fine-tuning, including prompt management and dataset handling.
- Focuses on performance optimizations like sample packing.
- Recommendation: Choose the tool that best aligns with your specific needs and priorities.
21. Quick and Dirty Fine-tuning
- Recommendation:
- Start with a small “tiny llama” example for faster iteration.
- Use Gradio inference for quick model evaluation.
22. Function Calling Fine-Tunes
- Dataset Example: Glade datasets from the Noose team.
- Configuration: Might require specific role handling and a compatible version of the ShareGPT dataset type.
23. Visualizing Tokenization in Batches
- Challenge: Axolotl’s sample packing happens at runtime, making it difficult to visualize tokenization in batches during pre-processing.
- Potential Approach: Modify the Transformers or LLAMA model code to print or log input IDs during the forward pass.
About Me:
I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.
Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.