Latest
In this lecture, Bill Dally discusses the historical progress of deep learning, driven by hardware advancements, especially GPUs, and explores future directions focusing on improving performance and efficiency through techniques like optimized number representation, sparsity, and specialized hardware accelerators.
CUDA MODE Lecture 8: CUDA Performance ChecklistLecture #8 provides a comprehensive guide to CUDA performance optimization techniques, covering key concepts like memory coalescing, occupancy, control divergence, tiling, privatization, thread coarsening, and algorithm rewriting with better math, illustrated with practical examples and profiling using NCU to improve kernel performance.
CUDA MODE Lecture 7: Advanced QuantizationLecture #7 discusses GPU quantization techniques in PyTorch, focusing on performance optimizations using Triton and CUDA kernels for dynamic and weight-only quantization, including challenges and future directions.
Deploying YOLOX for Real-Time Object Tracking on Jetson Orin NanoLearn how to deploy a quantized YOLOX model on an NVIDIA Jetson Orin Nano for real-time object detection and tracking using ONNX Runtime with TensorRT.
Notes on Atomic Awakening: A New Look at the History and Future of Nuclear PowerMy notes from the book Atomic Awakening: A New Look at the History and Future of Nuclear Power by James Mahaffey.
Tutorials
Step-by-step tutorials for setting up essential tools and platforms, designed to provide a solid foundation for a diverse range of projects.
Fine-Tuning Image Classifiers with PyTorch and the timm library for BeginnersLearn how to fine-tune image classification models with PyTorch and the timm library by creating a hand gesture recognizer in this easy-to-follow guide for beginners.
Training YOLOX Models for Real-Time Object Detection in PyTorchLearn how to train YOLOX models for real-time object detection in PyTorch by creating a hand gesture detection model.
ONNX Runtime in UnityTutorials for integrating ONNX Runtime into the Unity game engine.
TensorFlow.js in UnityIn this tutorial series, we explore how to create TensorFlow.js plugins for the Unity game engine.
Notes
My notes from various books.
CUDA MODE Lecture NotesMy notes from the CUDA MODE reading group lectures run by Andreas Kopf and Mark Saroufim.
EducationMy notes from resources on education.
HistoryMy notes from resources on history.
Mastering LLMs Course NotesMy notes from the course Mastering LLMs: A Conference For Developers & Data Scientists by Hamel Husain and Dan Becker.
About Me
I’m Christian Mills, an independent deep learning consultant specializing in computer vision, synthetic data generation, and practical AI implementations. My mission is to help clients transform ideas into real-world solutions.
I combine hands-on experience with technical expertise and clear communication to guide projects from conception to deployment.
My Expertise
- Custom AI solution development
- Automated synthetic data pipelines
- Real-time object detection and tracking systems
- LLM integration and fine-tuning
- AI Strategy Consulting
Ready to leverage AI for your business? Reach out via email at [email protected] to discuss your project.