Notes on No ML Degree Book

professional-growth
notes
My notes on Emil Wallner’s guide on how to land your first machine learning job without a degree.
Author

Christian Mills

Published

May 27, 2022

Key Points

Self-learning ML

  • Start with software engineering, then transition to machine learning.
  • Use free peer-to-peer CS schools to learn programming and Fastai to learn machine learning.

Hireability

  • Search for small companies, companies with specific needs, and organizations with practical interviews.
  • Employers hire self-learners based on validated real-world results.

Portfolio

  • Create a solid portfolio that requires high-effort focus and develop rigorous work habits.

  • The safest portfolio projects involve publishing papers, machine learning competitions, and contributing to open-source projects.

  • The second-best projects are creating live ML products, collaborating with people in the industry, and developing ML content with high engagement.

  • Result-based portfolio projects have metrics or testimonials, a context, and third-party validation.

  • Improve promising existing projects instead of coming up with gut project ideas.

Programming

Start with Programming

  • There are far more entry-level positions in software development.
  • Aim for at least six months to two years of study and work experience.

No-degree Tech Schools and Online Courses

Schools & Courses Description
Codecademy Learn to code for free.
Scrimba Interactive courses for frontend development.
freeCodeCamp Learn to code for free, building projects.
42 42 is a tuition-free, peer-to-peer, project-based, online computer science training program.
Holberton School Learn software development in a collaborative, project-based environment.

Boot Camps

Boot Camps Description
Bloom Institute of Technology Bloom Institute of Technology is an online tech school that offers a deferred tuition program.

Computer Science

  • 90% of today’s models train on and deploy to servers.
  • Most work focuses on making the data, training, and production process faster by improving efficiency and organization.
  • A practical computer science curriculum that focuses on projects and programming provides a solid base.

Front-end and Mobile

  • Running ML models on personal computers and phones provides compelling cost, latency, and privacy benefits.
  • Major shifts on the client side include human-in-the-loop, prompt engineering, and active learning.
  • Creating smaller intermediate models, workflows, and programs to interact with server-side models is crucial.
Tools Description
TensorFlow.js Develop ML models in JavaScript, and use ML directly in the browser or in Node.js.
ONNX Runtime Web ONNX Runtime Web is a Javascript library for running ONNX models on browsers and on Node.js.
Eigen (C++) compiled with Web Assembly Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
PyScript PyScript is a framework that allows users to create rich Python applications in the browser using HTML’s interface and the power of Pyodide, WASM, and modern web technologies.

Machine Learning

Learning Machine Learning

  • Prioritize creating a great resume instead of building competitive interview skills.
    • A strong portfolio weighs heavier than a boot camp graduation.
  • Focus on ML opportunities that have practical interviews and light theory requirements.
  • Learn data-centric problem-solving tools.
  • Identify, scope, communicate and solve problems.
  • Build a portfolio with externally validated results.
  • Gain a light overview of ML and statistics.
  • Many companies look for strong programmers and offer on-the-job ML training.

Practical ML Courses

  • Pick a practical ML course and study it for one month.
    • After the first month, 90% of your focus should be on your portfolio.
  • Classic machine learning is still prevalent in the industry and often shows up in interviews.
Courses Description
Fast.ai: Practical Deep Learning for Coders Fast.ai provides a practical, application-first approach to deep learning.
Kaggle: 30 Days of ML Machine learning beginner → Kaggle competitor in 30 days.
Made With ML Course Learn how to responsibly deliver value with ML.
Skills
Get comfortable working with lots of tools mixing off-the-shelf library calls with dabbling in the source code, context switching, and debugging.
Spot potential risks and weaknesses with your solutions and how to mitigate them.
Learn the types of problems machine learning can and cannot solve.
Learn when to use paid APIs, open-source, or custom solutions.
Learn rudimentary awareness of how your model impacts a business, including privacy, UI/UX, legal, ethics, and the business model.
Communicate expectations and timelines to technical and non-technical stakeholders.
Learn how and when to mitigate risk from your inexperience.
Understand what data is available and how to get more.
Extract, visualize, clean, and load data.
Understand the data and use it to make informed decisions.
Understand the type of problem and how to find a solution.
Set and measure appropriate objectives and success criteria.
Develop baseline models.
Train models with state-of-the-art results.
Quickly and efficiently debug models.
Visualize model performance.
Deploy models and understand memory, cost, queries-per-second, and latency.

A Base Portfolio

Weak Portfolio Projects

  • Don’t include toy projects like MNIST on your resume.
  • ML projects that are too hard for the recruiter to evaluate or lack results don’t help your resume.
  • Self-learners need to differentiate themselves from fake and low-effort portfolios.

Degree Equivalent Portfolio Projects

  • Degree equivalent projects are 1-3 months long, results-driven projects that provide evidence you can do the job and are easy for recruiters to understand.
  • A non-expert recruiter needs third-party validation that you didn’t copy-paste your projects.

Primary Options

  • Achieve a high-ranking score in an ML competition.
  • Contribute to a popular ML open-source project.
  • Write a paper that gets published (this is mostly for transitioning STEM researchers).

Secondary Options (require more effort for recruiters)

  • Create an ML project with real users (ideally, a deployed model with a UI).
  • Create an industry-specific solution with a mentor that provides testimonials.
  • Create ML content with high engagement, such as blogging, podcasts, and videos.

High-effort Projects

  • Smaller ML competitions like niche competitions on Kaggle, Numerai, ML conference competitions, or company competitions are great portfolio projects.
  • Open-source contributions to up-and-coming projects are often the best way to collaborate and get to know people in ML.
Open-Source Projects Description
FFCV FFCV is a drop-in data loading system that dramatically increases data throughput in model training.
EleutherAI EleutherAI is a grassroots collective of researchers working to open source AI research.
Hugging Face The AI community building the future.
PyTorch Lightning Scale your PyTorch models, without the boilerplate.
LAION The Large-scale Artificial Intelligence Open Network
Replicate Replicate makes it easy to share your machine learning model.
timm PyTorch image models, scripts, pretrained weights
Segmentation Models Segmentation models with pretrained backbones.
OpenAI Gym Gym is a standard API for reinforcement learning, and a diverse collection of reference environments.
Albumentations Albumentations is a computer vision tool that boosts the performance of deep convolutional neural networks.
einops einops provides flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, jax, and others.
FLAX Flax is a neural network library for JAX that is designed for flexibility.
fast.ai fastai simplifies training fast and accurate neural nets using modern best practices.
ONNX Runtime ONNX Runtime is a cross-platform inference and training machine-learning accelerator.
Best-of Machine Learning with Python A ranked list of awesome machine learning Python libraries. Updated weekly.

Industry Portfolio Projects

  • Portfolio projects that solve a real problem need someone that vouches for your solution.
  • Email ten to twenty ML engineers at startups you respect and ask them for industry problems with accessible data you can tackle.
  • Try people on Twitter with less than 10k followers and a blog.
  • ML engineers can put you in a good starting point, scope the project, help you when you get stuck, and potentially hire or recommend you later.
  • Approach people who post data-related freelance projects on freelancer marketplaces and look at sites that post pro-bono data projects.

Email Example

Title: Industry ML problems
Hi Jane,
I’m self-studying deep learning [Link to github] and I’m looking
for problems I can tackle for my portfolio.
Given your interesting work on Twitters’s recommendation
system [link to their blog], I thought you could have exposure to
other unique industry problems.
I’m thinking of using Twitter’s API to do an NLP analysis to
detect the percentage of bots on Twitter. Is that a good entry-
level problem to tackle or can you think of something else?
Cheers,
Bob

Talent Projects

  • Talent projects are 1-4 week open, result-driven projects that help you stand out once in an interview.

  • Talent projects focus on novelty and result in a demo, blog post, or visual.

  • Talent projects indicate a passion for a particular topic, establish personal branding, and help create a developer advocacy skillset.

  • Talent projects are hard to execute and introduce more noise for recruiters.

X-factor Projects Description
DIY Self Driving - A Holiday Side Project DIY Self Driving - A Holiday Side Project
lucidrains GitHub repositories How to turn novel papers into prototypes
How to collect data in the wild and create irl demos The cold start problem: how to build your machine learning portfolio

Developer Advocacy Projects

  • Developer advocacy roles focus on engagement through content and can work as a transition into more technical roles.
  • External excitement is an indicator that you made a unique contribution.

Self Evaluation

Metric-based Portfolio Items

  • Portfolio items need metrics, context, and third-party validation.
  • Third-party validation reduces the doubt you made a mistake or made things up.

Example Portfolio Item

A skin cancer classification model with 90% accuracy on
Benchmark X with a previous SOTA of 85%. Published in
Machine Learning Conference X as the first author.

Testimonial-Based Portfolio Items

  • Contributing to popular frameworks demonstrates you understand what they need and understand the framework enough to improve and pass a technical review for your submission.

Example Portfolio Item

A released open-source contribution to PyTorch, the LAMB
optimizer [link], and a blog post [link].
  • You can also request a quote from the framework’s team.

    "X made a fast and well-documented implementation of
    the LAMB optimizer in PyTorch.", Employee X at
    Facebook Meta. [endorsement link], [commit link] and a
    blog post [link]
  • Public endorsements on Twitter, LinkedIn, or GitHub make them verifiable.

Product Portfolio Items

  • Live demos allow recruiters to test projects without technical expertise.
  • Using a recent or custom model is more likely to convince cynical recruiters your project is original.
  • Open-ended projects need a cluster of validation to compensate for less interpretable results.
  • When applicable, deploy the model on a scalable back-end on a large cloud provider and show evidence it supports at least 100 QPS.

Example Portfolio Item

A super-resolution model in production and a live UI. [link]
Optimized deployment taking the original RAM footprint
from 1 GB to 150 MB, and the CPU inference from 4
seconds to 30 ms. [Google Colab benchmark link]. 100
weekly users [Stats screenshot], 250 stars on GitHub [link],
and seen on Hacker News [link] and recommended by X,
at Famous company. [link to tweet]

Ideas

Base Portfolio Ideas

  • Avoid gut ideas until you have a few years of experience to tell if it’s good.
  • Good base portfolio ideas are validated problems and allow you to translate hard work into outcomes with as few risks as possible.

Talent Project Ideas

  • It is better to overdeliver on a tiny real problem than to make a vague attempt on an ambitious one.
  • Successful talent projects lead to short and clear stories.
  • Try ranking a few dozen project areas rather than deciding between a few specific ideas. Ideas that appear in any mainstream channel are likely overused.

Sourcing Ideas

Ranking Ideas

  • Try to gather at least 20-30 project ideas before ranking them.
  • Can you impress a non-technical person in less than 30 seconds?
  • Can you find a quick way to run the model?
  • Do you have enough computing resources and knowledge for the project?
  • Is there an apparent angle to improve the project?
  • Does the project excite you?
  • Ranking a project often depends on whether you will use the project for your resume, personal marketing, getting a developer advocacy role, or something else.
  • Narrow down the list to five projects and pick the most exciting one.
  • If you can’t create a baseline within the first week, move on to something else.

Promoting Projects

  • Have something highly visual or a few-click online demo to make it more shareable on social media.
  • Spread good vibes on Twitter and have a good reply game to build an audience.
  • Share projects that use specific tools/products in their Slack channels and Discord groups.
  • Add models to model platforms like Huggingface Spaces, Replicate, Modelplace, and Runway models.
  • Look into using Google’s Keyword tool, other mainstream SEO tools, Google’s trending topics, and Youtube’s Keyword tool to increase search traffic.

Workflow

High-effort Focus

  • Starting with a blank page and trying to build your first project requires high-effort focus.
  • Resource for building high-effort focus
  • Building high-effort focus requires good sleep, exercise, and food routines.
  • Sleep at least 8 hours, ideally 9 hours, in a quiet, dark, and cool environment.
  • Exercise at least 20 minutes per day to elevate your heart rate.
  • Eat healthy food that does not spike your sugar levels.
  • Use tiny habits to gradually increase your capacity for high-effort focus to 1-3 hours per day.

Learning Schedule

  • 8:00am-2:00pm: High-effort focus (scoping, coding, major refactoring)
  • 2:00pm-6:00pm: Low-effort focus (light debugging and simple refactoring)
  • 6:00pm-10:00pm: Mid-effort focus (learning gaps + skimming)
  • Try taking long breaks during lunch for exercise and leisure to reenergize for another high-effort session.

Job Hunting

  • The ideal hiring process for self-learners is specialized, practical, or small-scale.
    • This hiring process is more prevalent with smaller organizations, startups, companies with specific cultures, or specialized teams within larger organizations.
    • Hiring managers are technical, and questions cater to each candidate’s work are reflect skills for the job.

High-growth Startups and Small Organizations

  • Smaller companies that have technical hiring managers, technical founders, and few applicants are good choices for self-learners.
  • These companies need people who can add value on day one.
  • The hiring process can vary more between small companies, and you often need to do more adjacent work related to ML.
  • Places to look for startups
  • Reach out to your professional, social, and personal networks.
  • Research institutions often need people who can do the more engineering-heavy side of their ML research.

Midsized and Large Companies

  • Search for no-degree graduates on LinkedIn and see where they work or ask them directly.
  • Look for companies that actively look for non-degree candidates.
  • Attract companies with online and social media presence.
  • Use interview-as-a-service companies like TripleByte.
  • Browse Glassdoor and look for companies with practical interviews.

Resume

  • Aim for half a page with essential contact information, tech jobs, and one or two bullet points with your most impressive ML projects.
  • Don’t refer to yourself as an ML enthusiast, add jargon, or make the resume more than one page.

Email Templates for Contacting Startups

  • Email the company’s founder and CEO and send two follow-up emails one week apart if they don’t reply.
  • Email companies regardless of whether they have open positions.

Email Template

Title: Entry-level ML positions
Hi John,
I hope you’ve had an excellent week so far!
I first saw your product on Product Hunt. I loved the user
interface, and I was impressed by the quality of the generative
model. I’m currently looking for an entry-level ML position.
I’ve made open source contributions to PyTorch and ranked in
the top 5% in a popular image segmentation competition on
Kaggle. You can find more details in my portfolio [github] and
[linkedin] here.
If you have any opportunities at [company] or know anyone else
hiring, please let me know.
Cheers,
Jane

Interview Prep

  • Practice describing yourself concisely in 30 seconds and giving a 30 seconds overview of each critical project.
  • Do some of the easy LeetCode questions in Python and practice mock interviews with friends.
  • Invest a few hours researching the problems companies you like are working on and get data on how they are likely to approach them.
    • Have specific questions and the ability to discuss their problems in detail.

Plan B

  • Apply for software roles closely related to machine learning.
  • Apply for software roles in companies that do a lot of ML.
  • Apply for developer advocacy and content marketing roles in ML companies.
  • Apply for product manager and analytic roles related to ML.
  • Bid on ML contracting opportunities or software projects related to ML.

References