Primary Options
- Achieve a high-ranking score in an ML competition.
- Contribute to a popular ML open-source project.
- Write a paper that gets published (this is mostly for transitioning STEM researchers).
Christian Mills
May 27, 2022
Create a solid portfolio that requires high-effort focus and develop rigorous work habits.
The safest portfolio projects involve publishing papers, machine learning competitions, and contributing to open-source projects.
The second-best projects are creating live ML products, collaborating with people in the industry, and developing ML content with high engagement.
Result-based portfolio projects have metrics or testimonials, a context, and third-party validation.
Improve promising existing projects instead of coming up with gut project ideas.
Schools & Courses | Description |
---|---|
Codecademy | Learn to code for free. |
Scrimba | Interactive courses for frontend development. |
freeCodeCamp | Learn to code for free, building projects. |
42 | 42 is a tuition-free, peer-to-peer, project-based, online computer science training program. |
Holberton School | Learn software development in a collaborative, project-based environment. |
Boot Camps | Description |
---|---|
Bloom Institute of Technology | Bloom Institute of Technology is an online tech school that offers a deferred tuition program. |
Tools | Description |
---|---|
TensorFlow.js | Develop ML models in JavaScript, and use ML directly in the browser or in Node.js. |
ONNX Runtime Web | ONNX Runtime Web is a Javascript library for running ONNX models on browsers and on Node.js. |
Eigen (C++) compiled with Web Assembly | Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. |
PyScript | PyScript is a framework that allows users to create rich Python applications in the browser using HTML’s interface and the power of Pyodide, WASM, and modern web technologies. |
Courses | Description |
---|---|
Fast.ai: Practical Deep Learning for Coders | Fast.ai provides a practical, application-first approach to deep learning. |
Kaggle: 30 Days of ML | Machine learning beginner → Kaggle competitor in 30 days. |
Made With ML Course | Learn how to responsibly deliver value with ML. |
Skills |
---|
Get comfortable working with lots of tools mixing off-the-shelf library calls with dabbling in the source code, context switching, and debugging. |
Spot potential risks and weaknesses with your solutions and how to mitigate them. |
Learn the types of problems machine learning can and cannot solve. |
Learn when to use paid APIs, open-source, or custom solutions. |
Learn rudimentary awareness of how your model impacts a business, including privacy, UI/UX, legal, ethics, and the business model. |
Communicate expectations and timelines to technical and non-technical stakeholders. |
Learn how and when to mitigate risk from your inexperience. |
Understand what data is available and how to get more. |
Extract, visualize, clean, and load data. |
Understand the data and use it to make informed decisions. |
Understand the type of problem and how to find a solution. |
Set and measure appropriate objectives and success criteria. |
Develop baseline models. |
Train models with state-of-the-art results. |
Quickly and efficiently debug models. |
Visualize model performance. |
Deploy models and understand memory, cost, queries-per-second, and latency. |
Open-Source Projects | Description |
---|---|
FFCV | FFCV is a drop-in data loading system that dramatically increases data throughput in model training. |
EleutherAI | EleutherAI is a grassroots collective of researchers working to open source AI research. |
Hugging Face | The AI community building the future. |
PyTorch Lightning | Scale your PyTorch models, without the boilerplate. |
LAION | The Large-scale Artificial Intelligence Open Network |
Replicate | Replicate makes it easy to share your machine learning model. |
timm | PyTorch image models, scripts, pretrained weights |
Segmentation Models | Segmentation models with pretrained backbones. |
OpenAI Gym | Gym is a standard API for reinforcement learning, and a diverse collection of reference environments. |
Albumentations | Albumentations is a computer vision tool that boosts the performance of deep convolutional neural networks. |
einops | einops provides flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, jax, and others. |
FLAX | Flax is a neural network library for JAX that is designed for flexibility. |
fast.ai | fastai simplifies training fast and accurate neural nets using modern best practices. |
ONNX Runtime | ONNX Runtime is a cross-platform inference and training machine-learning accelerator. |
Best-of Machine Learning with Python | A ranked list of awesome machine learning Python libraries. Updated weekly. |
Title: Industry ML problems
Hi Jane,
I’m self-studying deep learning [Link to github] and I’m looking
for problems I can tackle for my portfolio.
Given your interesting work on Twitters’s recommendation
system [link to their blog], I thought you could have exposure to
other unique industry problems.
I’m thinking of using Twitter’s API to do an NLP analysis to
detect the percentage of bots on Twitter. Is that a good entry-
level problem to tackle or can you think of something else?
Cheers,
Bob
Talent projects are 1-4 week open, result-driven projects that help you stand out once in an interview.
Talent projects focus on novelty and result in a demo, blog post, or visual.
Talent projects indicate a passion for a particular topic, establish personal branding, and help create a developer advocacy skillset.
Talent projects are hard to execute and introduce more noise for recruiters.
X-factor Projects | Description |
---|---|
DIY Self Driving - A Holiday Side Project | DIY Self Driving - A Holiday Side Project |
lucidrains GitHub repositories | How to turn novel papers into prototypes |
How to collect data in the wild and create irl demos | The cold start problem: how to build your machine learning portfolio |
A skin cancer classification model with 90% accuracy on
Benchmark X with a previous SOTA of 85%. Published in
Machine Learning Conference X as the first author.
A released open-source contribution to PyTorch, the LAMB
optimizer [link], and a blog post [link].
You can also request a quote from the framework’s team.
"X made a fast and well-documented implementation of
the LAMB optimizer in PyTorch.", Employee X at
Facebook Meta. [endorsement link], [commit link] and a
blog post [link]
Public endorsements on Twitter, LinkedIn, or GitHub make them verifiable.
A super-resolution model in production and a live UI. [link]
Optimized deployment taking the original RAM footprint
from 1 GB to 150 MB, and the CPU inference from 4
seconds to 30 ms. [Google Colab benchmark link]. 100
weekly users [Stats screenshot], 250 stars on GitHub [link],
and seen on Hacker News [link] and recommended by X,
at Famous company. [link to tweet]
Title: Entry-level ML positions
Hi John,
I hope you’ve had an excellent week so far!
I first saw your product on Product Hunt. I loved the user
interface, and I was impressed by the quality of the generative
model. I’m currently looking for an entry-level ML position.
I’ve made open source contributions to PyTorch and ranked in
the top 5% in a popular image segmentation competition on
Kaggle. You can find more details in my portfolio [github] and
[linkedin] here.
If you have any opportunities at [company] or know anyone else
hiring, please let me know.
Cheers,
Jane
I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.
Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.