Building Serverless APIs

Serverless has been hyped to death. "No servers to manage!" "Infinite scale!" "Pay only for what you use!" The marketing is loud, but the reality is more nuanced. Serverless APIs are genuinely great for certain workloads and genuinely painful for others.

This guide cuts through the noise. We'll look at how serverless APIs actually work, how to build them on AWS Lambda, Azure Functions, and Cloudflare Workers, how to deal with cold starts, and when serverless is the wrong choice entirely.

What Serverless Actually Means

"Serverless" doesn't mean no servers. It means you don't manage servers. The cloud provider handles provisioning, scaling, patching, and capacity planning. You write functions, deploy them, and pay per invocation.

For APIs, this means:

Each HTTP request triggers a function execution
Functions spin up, handle the request, and shut down
You're billed for execution time, not idle time
Scaling is automatic (within limits)

The PetStore API is a good example to think through. A traditional deployment runs a server 24/7 waiting for requests. A serverless deployment runs code only when someone actually calls GET /api/pets. At low traffic, serverless is dramatically cheaper. At high, sustained traffic, the math changes.

AWS Lambda: The Most Mature Option

AWS Lambda is the most widely used serverless platform. It supports Node.js, Python, Java, Go, Ruby, and .NET.

Basic Lambda Function for a REST API

import json
import boto3
from datetime import datetime

# Initialize outside handler for connection reuse
dynamodb = boto3.resource('dynamodb')
pets_table = dynamodb.Table('pets')

def handler(event, context):
    """Main Lambda handler — routes requests to appropriate functions"""
    http_method = event['httpMethod']
    path = event['path']
    path_params = event.get('pathParameters') or {}

    try:
        if path == '/api/pets' and http_method == 'GET':
            return list_pets(event)
        elif path.startswith('/api/pets/') and http_method == 'GET':
            return get_pet(path_params['id'])
        elif path == '/api/pets' and http_method == 'POST':
            return create_pet(event)
        else:
            return response(404, {'error': 'Not found'})
    except Exception as e:
        print(f'Error: {e}')
        return response(500, {'error': 'Internal server error'})

def list_pets(event):
    query_params = event.get('queryStringParameters') or {}
    limit = int(query_params.get('limit', 20))

    result = pets_table.scan(Limit=limit)
    return response(200, {'data': result['Items']})

def get_pet(pet_id):
    result = pets_table.get_item(Key={'id': pet_id})
    pet = result.get('Item')

    if not pet:
        return response(404, {'error': 'Pet not found'})

    return response(200, pet)

def create_pet(event):
    body = json.loads(event['body'])
    pet = {
        'id': generate_id(),
        'name': body['name'],
        'species': body['species'],
        'created_at': datetime.utcnow().isoformat()
    }

    pets_table.put_item(Item=pet)
    return response(201, pet)

def response(status_code, body):
    return {
        'statusCode': status_code,
        'headers': {
            'Content-Type': 'application/json',
            'Access-Control-Allow-Origin': '*'
        },
        'body': json.dumps(body)
    }

Deploying with API Gateway

Lambda functions need an HTTP trigger. AWS API Gateway connects HTTP requests to Lambda:

# serverless.yml (using Serverless Framework)
service: petstore-api

provider:
  name: aws
  runtime: python3.11
  region: us-east-1
  environment:
    PETS_TABLE: ${self:service}-pets-${sls:stage}
  iam:
    role:
      statements:
        - Effect: Allow
          Action:
            - dynamodb:GetItem
            - dynamodb:PutItem
            - dynamodb:Scan
            - dynamodb:UpdateItem
            - dynamodb:DeleteItem
          Resource: !GetAtt PetsTable.Arn

functions:
  api:
    handler: src/handler.handler
    events:
      - http:
          path: /api/{proxy+}
          method: ANY
          cors: true

resources:
  Resources:
    PetsTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: ${self:provider.environment.PETS_TABLE}
        BillingMode: PAY_PER_REQUEST
        AttributeDefinitions:
          - AttributeName: id
            AttributeType: S
        KeySchema:
          - AttributeName: id
            KeyType: HASH

Deploy with:

npm install -g serverless
serverless deploy --stage prod

Using AWS SAM Instead

AWS SAM (Serverless Application Model) is AWS's native option:

# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Runtime: python3.11
    Timeout: 30
    Environment:
      Variables:
        PETS_TABLE: !Ref PetsTable

Resources:
  PetStoreFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: handler.handler
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /api/{proxy+}
            Method: ANY

  PetsTable:
    Type: AWS::DynamoDB::Table
    Properties:
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: id
          AttributeType: S
      KeySchema:
        - AttributeName: id
          KeyType: HASH

sam build && sam deploy --guided

Azure Functions

Azure Functions is Microsoft's serverless platform. It integrates tightly with the Azure ecosystem.

import azure.functions as func
import json
import logging

app = func.FunctionApp()

@app.route(route="pets", methods=["GET"])
def list_pets(req: func.HttpRequest) -> func.HttpResponse:
    limit = int(req.params.get('limit', 20))

    # Fetch from Cosmos DB or Azure SQL
    pets = fetch_pets_from_db(limit)

    return func.HttpResponse(
        json.dumps({'data': pets}),
        status_code=200,
        mimetype='application/json'
    )

@app.route(route="pets/{pet_id}", methods=["GET"])
def get_pet(req: func.HttpRequest, pet_id: str) -> func.HttpResponse:
    pet = fetch_pet_by_id(pet_id)

    if not pet:
        return func.HttpResponse(
            json.dumps({'error': 'Pet not found'}),
            status_code=404,
            mimetype='application/json'
        )

    return func.HttpResponse(
        json.dumps(pet),
        status_code=200,
        mimetype='application/json'
    )

@app.route(route="pets", methods=["POST"])
def create_pet(req: func.HttpRequest) -> func.HttpResponse:
    try:
        body = req.get_json()
    except ValueError:
        return func.HttpResponse(
            json.dumps({'error': 'Invalid JSON'}),
            status_code=400,
            mimetype='application/json'
        )

    pet = save_pet(body)

    return func.HttpResponse(
        json.dumps(pet),
        status_code=201,
        mimetype='application/json'
    )

Deploy with Azure CLI:

az login
az functionapp create \
  --resource-group petstore-rg \
  --consumption-plan-location eastus \
  --runtime python \
  --runtime-version 3.11 \
  --functions-version 4 \
  --name petstore-api \
  --storage-account petstorage

func azure functionapp publish petstore-api

Cloudflare Workers: Edge Serverless

Cloudflare Workers run at the edge — in data centers close to your users. This eliminates most cold start issues and provides very low latency globally.

// worker.js
export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url);
    const path = url.pathname;
    const method = request.method;

    // Route requests
    if (path === '/api/pets' && method === 'GET') {
      return listPets(request, env);
    } else if (path.match(/^\/api\/pets\/\d+$/) && method === 'GET') {
      const id = path.split('/').pop();
      return getPet(id, env);
    } else if (path === '/api/pets' && method === 'POST') {
      return createPet(request, env);
    }

    return new Response(JSON.stringify({ error: 'Not found' }), {
      status: 404,
      headers: { 'Content-Type': 'application/json' }
    });
  }
};

async function listPets(request, env) {
  const url = new URL(request.url);
  const limit = parseInt(url.searchParams.get('limit') || '20');

  // Use Cloudflare KV or D1 (SQLite at the edge)
  const { results } = await env.DB.prepare(
    'SELECT * FROM pets LIMIT ?'
  ).bind(limit).all();

  return jsonResponse(200, { data: results });
}

async function getPet(id, env) {
  const pet = await env.DB.prepare(
    'SELECT * FROM pets WHERE id = ?'
  ).bind(id).first();

  if (!pet) {
    return jsonResponse(404, { error: 'Pet not found' });
  }

  return jsonResponse(200, pet);
}

async function createPet(request, env) {
  const body = await request.json();

  const result = await env.DB.prepare(
    'INSERT INTO pets (name, species, created_at) VALUES (?, ?, ?) RETURNING *'
  ).bind(body.name, body.species, new Date().toISOString()).first();

  return jsonResponse(201, result);
}

function jsonResponse(status, body) {
  return new Response(JSON.stringify(body), {
    status,
    headers: {
      'Content-Type': 'application/json',
      'Access-Control-Allow-Origin': '*'
    }
  });
}

Wrangler config:

# wrangler.toml
name = "petstore-api"
main = "src/worker.js"
compatibility_date = "2026-01-01"

[[d1_databases]]
binding = "DB"
database_name = "petstore"
database_id = "your-database-id"

Deploy:

npx wrangler deploy

Cold Starts: The Real Problem

Cold starts are the biggest pain point in serverless. When a function hasn't been called recently, the cloud provider needs to spin up a new container, load your runtime, and initialize your code. This adds latency — sometimes hundreds of milliseconds.

Cold Start Times by Platform

Platform	Typical Cold Start	With Large Dependencies
Cloudflare Workers	< 5ms	< 5ms
AWS Lambda (Node.js)	100-500ms	500ms-2s
AWS Lambda (Python)	200-800ms	1-3s
AWS Lambda (Java)	1-5s	3-10s
Azure Functions	200ms-2s	1-4s

Cloudflare Workers have virtually no cold starts because they use V8 isolates instead of containers.

Strategies to Reduce Cold Starts

1. Keep functions warm with scheduled pings:

# Lambda function that pings your API function
import boto3

def warmer_handler(event, context):
    lambda_client = boto3.client('lambda')

    # Invoke API function with a warm-up event
    lambda_client.invoke(
        FunctionName='petstore-api',
        InvocationType='Event',  # Async
        Payload='{"source": "warmer"}'
    )

# In your API handler, skip processing for warm-up events
def handler(event, context):
    if event.get('source') == 'warmer':
        return {'statusCode': 200, 'body': 'warm'}

    # Normal request handling...

Schedule with EventBridge:

# serverless.yml
functions:
  warmer:
    handler: src/warmer.warmer_handler
    events:
      - schedule: rate(5 minutes)

2. Use Provisioned Concurrency (AWS Lambda):

# serverless.yml
functions:
  api:
    handler: src/handler.handler
    provisionedConcurrency: 5  # Keep 5 instances warm
    events:
      - http:
          path: /api/{proxy+}
          method: ANY

This eliminates cold starts but costs more — you pay for the provisioned instances even when idle.

3. Minimize package size:

# requirements.txt — only include what you need
boto3==1.34.0
# Don't include: pandas, numpy, scipy unless you actually need them

Use Lambda layers for shared dependencies:

layers:
  commonDeps:
    path: layer
    compatibleRuntimes:
      - python3.11

functions:
  api:
    handler: src/handler.handler
    layers:
      - !Ref CommonDepsLambdaLayer

4. Move initialization outside the handler:

import boto3

# This runs once per container, not per request
dynamodb = boto3.resource('dynamodb')
pets_table = dynamodb.Table('pets')

# Pre-load config
config = load_config()

def handler(event, context):
    # dynamodb and config are already initialized
    # This is fast
    result = pets_table.get_item(Key={'id': event['id']})
    return response(200, result['Item'])

Stateless Design

Serverless functions are stateless by nature. Each invocation is independent — you can't store data in memory between requests.

What This Means in Practice

# WRONG: This doesn't work in serverless
cache = {}  # This is per-container, not shared

def handler(event, context):
    pet_id = event['pathParameters']['id']

    if pet_id in cache:  # Might miss if different container handles request
        return response(200, cache[pet_id])

    pet = fetch_from_db(pet_id)
    cache[pet_id] = pet  # Only cached in this container
    return response(200, pet)

# RIGHT: Use external cache (Redis/ElastiCache)
import redis

redis_client = redis.Redis(host=os.environ['REDIS_HOST'])

def handler(event, context):
    pet_id = event['pathParameters']['id']

    # Check shared cache
    cached = redis_client.get(f'pet:{pet_id}')
    if cached:
        return response(200, json.loads(cached))

    pet = fetch_from_db(pet_id)

    # Store in shared cache
    redis_client.setex(f'pet:{pet_id}', 300, json.dumps(pet))

    return response(200, pet)

Session Management

Traditional sessions (stored in memory or local files) don't work. Use JWTs or external session stores:

import jwt

SECRET_KEY = os.environ['JWT_SECRET']

def authenticate(event):
    auth_header = event['headers'].get('Authorization', '')

    if not auth_header.startswith('Bearer '):
        return None

    token = auth_header[7:]

    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=['HS256'])
        return payload
    except jwt.ExpiredSignatureError:
        return None
    except jwt.InvalidTokenError:
        return None

def handler(event, context):
    user = authenticate(event)

    if not user:
        return response(401, {'error': 'Unauthorized'})

    # user data is in the JWT, no session lookup needed
    pets = fetch_pets_for_user(user['user_id'])
    return response(200, {'data': pets})

Cost Optimization

Serverless can be very cheap or surprisingly expensive depending on how you use it.

Understanding the Pricing Model

AWS Lambda pricing (as of early 2026): - $0.20 per 1 million requests - $0.0000166667 per GB-second of compute

For a function using 128MB memory running for 100ms: - Cost per request: 0.128 GB × 0.1s × $0.0000166667 = $0.000000213 - 1 million requests: ~$0.41

That's cheap. But it adds up with high traffic or long-running functions.

Optimization Strategies

Right-size memory allocation:

# Test different memory sizes
# More memory = faster execution = potentially lower cost
# 128MB at 500ms vs 512MB at 100ms — same cost, but 512MB is faster

# Use AWS Lambda Power Tuning to find optimal memory
# https://github.com/alexcasalboni/aws-lambda-power-tuning

Reduce execution time:

# Use connection pooling with RDS Proxy
# Direct RDS connections are slow to establish

# SLOW: New connection per invocation
def handler(event, context):
    conn = psycopg2.connect(DATABASE_URL)  # ~100ms
    # ...

# FAST: Use RDS Proxy (connection pooling)
# Configure RDS_PROXY_ENDPOINT in environment
def handler(event, context):
    conn = psycopg2.connect(RDS_PROXY_URL)  # ~5ms via proxy
    # ...

Use async processing for non-critical work:

import boto3

sqs = boto3.client('sqs')

def handler(event, context):
    body = json.loads(event['body'])

    # Save pet synchronously (user needs this)
    pet = save_pet(body)

    # Queue non-critical work asynchronously
    sqs.send_message(
        QueueUrl=os.environ['NOTIFICATIONS_QUEUE'],
        MessageBody=json.dumps({
            'type': 'pet_created',
            'pet_id': pet['id'],
            'owner_email': body['owner_email']
        })
    )

    # Return immediately — don't wait for email to send
    return response(201, pet)

Cache aggressively at API Gateway level:

# serverless.yml
functions:
  api:
    handler: src/handler.handler
    events:
      - http:
          path: /api/pets
          method: GET
          caching:
            enabled: true
            ttlInSeconds: 300  # Cache list for 5 minutes

Serverless Frameworks Comparison

Framework	Platforms	Language	Best For
Serverless Framework	AWS, Azure, GCP	Any	Multi-cloud, mature ecosystem
AWS SAM	AWS only	Any	AWS-native, CloudFormation integration
AWS CDK	AWS only	TypeScript/Python/Java	Infrastructure as code
Pulumi	Multi-cloud	TypeScript/Python/Go	Modern IaC
Wrangler	Cloudflare only	JS/TS	Edge computing

For most teams starting out, Serverless Framework or AWS SAM are the pragmatic choices.

Real Limitations of Serverless

Be honest about the downsides:

Execution time limits:- AWS Lambda: 15 minutes max - Azure Functions: 10 minutes (Consumption plan) - Cloudflare Workers: 30 seconds (CPU time)

Long-running operations (video processing, large data exports) don't fit serverless well.

Payload size limits:- API Gateway: 10MB request/response - Lambda: 6MB synchronous, 256KB async

Concurrency limits:- AWS Lambda: 1,000 concurrent executions per region (default, can be increased) - Sudden traffic spikes can hit this limit

Vendor lock-in:Each platform has its own event format, SDK, and services. Migrating from Lambda to Azure Functions isn't trivial.

Debugging is harder:Local development requires emulation tools (SAM local, Wrangler dev). Distributed tracing is essential but adds complexity.

Database connections:Traditional databases (PostgreSQL, MySQL) have connection limits. Serverless functions can exhaust them quickly. Use connection poolers (RDS Proxy, PgBouncer) or serverless-native databases (DynamoDB, Cosmos DB, D1).

When to Use Serverless (and When Not To)

Good fit:- APIs with variable or unpredictable traffic - Event-driven processing (webhooks, file uploads) - Background jobs and scheduled tasks - Prototypes and MVPs - APIs with low sustained traffic

Poor fit:- APIs with consistently high, sustained traffic (traditional servers are cheaper) - Long-running operations (>15 minutes) - Applications requiring persistent connections (WebSockets need special handling) - Latency-sensitive applications where cold starts are unacceptable - Teams without cloud expertise (operational complexity is real)

Conclusion

Serverless APIs are a genuine shift in how you build and operate software. The operational benefits are real — no servers to patch, automatic scaling, pay-per-use pricing. But so are the limitations — cold starts, stateless constraints, vendor lock-in, and debugging complexity.

The PetStore API is a good candidate for serverless: it has variable traffic, simple CRUD operations, and doesn't need persistent connections. A high-frequency trading API or a video processing pipeline would be a poor fit.

Start with Cloudflare Workers if you need low latency globally and simple use cases. Use AWS Lambda with Serverless Framework for more complex applications that need the full AWS ecosystem. Avoid serverless for workloads with consistently high traffic or long-running operations.

The best architecture is the one that fits your actual workload, team skills, and budget — not the one with the best marketing.