Conference Talk 15: Modal - Simple Scalable Serverless Services

notes

llms

In this talk, Charles Frye provides a deeper dive into Modal, exploring its capabilities beyond fine-tuning LLMs and demonstrating how it empowers users to build and deploy scalable, cost-efficient, and serverless applications with simplicity using Python.

Author

Christian Mills

Published

August 25, 2024

This post is part of the following series:

Mastering LLMs Course Notes: My notes from the course Mastering LLMs: A Conference For Developers & Data Scientists by Hamel Husain and Dan Becker.

Introduction

Speaker: Charles (@charles_irl on Twitter)
Topic: A deeper dive into Modal, focusing on its broader applications beyond fine-tuning LLMs.
Slides: Simple Scalable Serverless Services

Q&A Session 1

Database Service Availability

Modal does not currently offer a managed database service, particularly serverless Postgres.

Challenges of Serverless Postgres

Two main types of databases:
- OLTP (Online Transaction Processing): Difficult to scale due to row-level operations and complex joins.
- OLAP (Online Analytical Processing): More straightforward to run on Modal using examples with tools like DuckDB and parquet files stored in S3.

Challenges of Scaling Transaction Processing

Distributed transaction processing databases are more challenging to build and scale effectively.

Recommendations for Serverless Postgres: Neon, Superbase

For serverless Postgres, Modal recommends using external services like Neon or Superbase, which integrate well with Modal’s serverless API apps.

Q&A Session 2

Storage Pricing

While Modal does not currently charge for storage, it plans to implement pricing eventually.
The goal is to price storage at a rate comparable to S3, Modal’s underlying storage provider.

Q&A Session 3

DDoS Attack Prevention

Modal does not currently have built-in DDoS protection but acknowledges its importance and plans to offer it in the future.

Current Mitigation Strategies

Developers can implement authentication middleware in FastAPI or Flask to restrict access.

Importance of Authentication and Rate Limiting

Authentication and rate limiting are crucial for preventing unauthorized access and mitigating DDoS attacks.

Potential for Cloudflare DDoS Protection

Integrating with services like Cloudflare for DDoS protection is worth exploring.

WebSockets and Max Execution Time

For questions related to WebSockets and maximum execution time, Modal recommends reaching out on their Slack channel for more specific guidance.

Clarification on Django’s Async Support

A participant clarifies that Django supports asynchronous views and requests when running under ASGI.

Introduction

Modal Overview

Modal Vision: Scalable, Cost-Efficient, Serverless Services

Defining “Scalable Services”

Three Key Service Requirements: Input/Output, Storage, Compute

Scalability as Table Stakes

Challenges and Importance of Scalability

Distributed Systems for Scalability

Modal’s Solution: Simple Scalable Services with Python

Pythonic Tools for Building Services

Modal Dashboard Overview

Model Inference Function Example

Scaling Up and Down Resources

Handling Multiple Inputs

Cron Jobs with Modal

Q&A Session 1

Database Service Availability

Challenges of Serverless Postgres

Running Analytical Workloads on Modal

Challenges of Scaling Transaction Processing

Recommendations for Serverless Postgres: Neon, Superbase

Storage in Modal

Importance of Data Storage

Focus on Long-Term Storage

File System Abstractions: Volumes

Use Cases: Storing Weights, Datasets

Examples of Stored Volumes

Handling Large Datasets

Volumes: Optimized for Write Once, Read Many Workloads

Explanation of Write Once, Read Many

Benefits for Scaling

Q&A Session 2

Storage Pricing

Addressing Other Storage-Related Questions

Input and Output in Modal

FastAPI Integration

Benefits of Using FastAPI with Modal

Asynchronous Python, Documentation, Scalability, Performance

Modal’s Handling of Synchronous and Asynchronous Functions

Flexibility and Performance with Async

Web Endpoints for Exposing Services

FastAPI as a Dependency

Creating URLs from Python Functions

Asynchronous Server Gateway Interface (ASGI)

Flexibility Beyond FastAPI

WSGI and Flask Support

Comparison of WSGI and ASGI

Potential Trade-offs with WSGI

Running Arbitrary Web Servers

Q&A Session 3

DDoS Attack Prevention

Current Mitigation Strategies

Importance of Authentication and Rate Limiting

Potential for Cloudflare DDoS Protection

WebSockets and Max Execution Time

Clarification on Django’s Async Support

Addressing Storage-Related Questions

Serverless Nature of Modal

Importance of Serverless Architecture

Serverless for Cost Efficiency and Developer Experience

Variable Resource Utilization

Provisioning for Peaks and Cost Implications

Resource Utilization Challenges

The Rise of Cloud Computing

Manual Provisioning and Its Drawbacks

Handling Traffic Spikes

Reducing Costs but Potentially Sacrificing User Experience

Automatic Provisioning and Autoscaling

Kubernetes and Autoscaling

Lag in Autoscaling

Granularity of Autoscaling

Achieving Serverless: Matching Costs to Resource Utilization

Scaling to Zero

Functions as a Service (FaaS)

Benefits of Serverless: Cost Savings, Improved User Experience

Why Use a Serverless Platform Like Modal?

Why Use a Serverless Platform Like Modal?

Amortizing Engineering Complexity

Smoothing Fluctuations with Multiple Users

Economics of Serverless Computing