Handling Long-Running Operations in APIs

Some operations just take time. Generating a report with millions of rows, processing a video upload, training a machine learning model — these can't finish in the 30 seconds your API gateway allows for a request. The naive approach is to make the client wait. The better approach is to return

TRY NANO BANANA FOR FREE

Handling Long-Running Operations in APIs

TRY NANO BANANA FOR FREE
Contents

Some operations just take time. Generating a report with millions of rows, processing a video upload, training a machine learning model — these can't finish in the 30 seconds your API gateway allows for a request.

The naive approach is to make the client wait. The better approach is to return immediately with a job ID and let the client check back later. This guide covers the patterns that make async operations work: job creation, status polling, webhooks for completion, progress reporting, and cancellation.

The Problem With Synchronous Long Operations

Let's say you're building a feature in the PetStore API that generates a PDF report of all orders for a given month. For a busy store, this might take 2 minutes.

Here's what happens if you try to do it synchronously:

// Client makes a request
const response = await fetch('https://petstore.example.com/reports/orders?month=2024-03', {
  method: 'POST'
});

// Client waits... and waits... and waits...
// After 30 seconds, the API gateway times out
// The server is still generating the report, but the client gets a 504 Gateway Timeout

Even if you increase the timeout, you've got other problems: - The client can't do anything else while waiting - If the connection drops, the work is lost - You can't show progress - The user can't cancel the operation

The Async Job Pattern

The solution: return immediately with a job ID, then let the client poll for status.

Step 1: Create the Job

POST /reports/orders
Content-Type: application/json

{
  "month": "2024-03",
  "format": "pdf"
}

The server creates a job record and returns immediately:

HTTP/1.1 202 Accepted
Location: /jobs/550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "createdAt": "2024-03-13T10:00:00Z",
  "statusUrl": "/jobs/550e8400-e29b-41d4-a716-446655440000"
}

The 202 Accepted status code means "I've accepted your request, but I haven't finished processing it yet."

Step 2: Poll for Status

The client polls the status URL:

GET /jobs/550e8400-e29b-41d4-a716-446655440000

While the job is running:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "progress": 45,
  "createdAt": "2024-03-13T10:00:00Z",
  "startedAt": "2024-03-13T10:00:05Z"
}

When it's done:

HTTP/1.1 303 See Other
Location: /reports/550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "progress": 100,
  "createdAt": "2024-03-13T10:00:00Z",
  "startedAt": "2024-03-13T10:00:05Z",
  "completedAt": "2024-03-13T10:02:30Z",
  "resultUrl": "/reports/550e8400-e29b-41d4-a716-446655440000"
}

The 303 See Other status tells the client to fetch the result from the Location header.

Implementing the Server Side

Here's a Node.js implementation using Bull (a Redis-based job queue):

const express = require('express');
const Queue = require('bull');
const { v4: uuidv4 } = require('uuid');

const app = express();
app.use(express.json());

// Create a job queue
const reportQueue = new Queue('reports', {
  redis: { host: 'localhost', port: 6379 }
});

// Job processor - runs in a separate worker process
reportQueue.process(async (job) => {
  const { month, format } = job.data;

  // Update progress as we go
  await job.progress(10);

  // Fetch orders for the month
  const orders = await fetchOrdersForMonth(month);
  await job.progress(50);

  // Generate the PDF
  const pdfBuffer = await generatePDF(orders, format);
  await job.progress(90);

  // Upload to S3 or save to disk
  const url = await saveReport(job.id, pdfBuffer);
  await job.progress(100);

  return { url };
});

// Create a job
app.post('/reports/orders', async (req, res) => {
  const { month, format = 'pdf' } = req.body;

  const jobId = uuidv4();
  const job = await reportQueue.add(
    { month, format },
    { jobId, attempts: 3, backoff: { type: 'exponential', delay: 2000 } }
  );

  res.status(202).json({
    jobId: job.id,
    status: 'pending',
    createdAt: new Date().toISOString(),
    statusUrl: `/jobs/${job.id}`
  });
});

// Check job status
app.get('/jobs/:jobId', async (req, res) => {
  const job = await reportQueue.getJob(req.params.jobId);

  if (!job) {
    return res.status(404).json({ error: 'Job not found' });
  }

  const state = await job.getState();
  const progress = job.progress();

  if (state === 'completed') {
    const result = job.returnvalue;
    return res.status(303)
      .location(result.url)
      .json({
        jobId: job.id,
        status: 'completed',
        progress: 100,
        createdAt: new Date(job.timestamp).toISOString(),
        completedAt: new Date(job.finishedOn).toISOString(),
        resultUrl: result.url
      });
  }

  if (state === 'failed') {
    return res.status(200).json({
      jobId: job.id,
      status: 'failed',
      error: job.failedReason,
      createdAt: new Date(job.timestamp).toISOString()
    });
  }

  res.json({
    jobId: job.id,
    status: state, // 'waiting', 'active', 'delayed'
    progress: progress || 0,
    createdAt: new Date(job.timestamp).toISOString()
  });
});

Client-Side Polling

The client needs to poll the status endpoint until the job completes:

async function createReportAndWait(month, format) {
  // Create the job
  const createRes = await fetch('https://petstore.example.com/reports/orders', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ month, format })
  });

  const { jobId, statusUrl } = await createRes.json();

  // Poll for completion
  while (true) {
    await sleep(2000); // Poll every 2 seconds

    const statusRes = await fetch(`https://petstore.example.com${statusUrl}`);
    const status = await statusRes.json();

    console.log(`Job ${jobId}: ${status.status} (${status.progress}%)`);

    if (status.status === 'completed') {
      // Fetch the result
      const reportRes = await fetch(`https://petstore.example.com${status.resultUrl}`);
      return reportRes.blob();
    }

    if (status.status === 'failed') {
      throw new Error(`Job failed: ${status.error}`);
    }
  }
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Exponential Backoff

Polling every 2 seconds is fine for short jobs, but wasteful for long ones. Use exponential backoff:

async function pollWithBackoff(statusUrl, maxWait = 30000) {
  let delay = 1000; // Start with 1 second

  while (true) {
    await sleep(delay);

    const res = await fetch(statusUrl);
    const status = await res.json();

    if (status.status === 'completed' || status.status === 'failed') {
      return status;
    }

    // Double the delay, up to maxWait
    delay = Math.min(delay * 2, maxWait);
  }
}

Webhooks for Completion

Polling is simple but inefficient. Webhooks let the server notify the client when the job finishes.

Client Provides a Callback URL

POST /reports/orders
Content-Type: application/json

{
  "month": "2024-03",
  "format": "pdf",
  "callbackUrl": "https://client.example.com/webhooks/report-complete"
}

Server Calls the Webhook

When the job completes, the server POSTs to the callback URL:

reportQueue.on('completed', async (job, result) => {
  const { callbackUrl } = job.data;

  if (callbackUrl) {
    try {
      await fetch(callbackUrl, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          jobId: job.id,
          status: 'completed',
          resultUrl: result.url,
          completedAt: new Date().toISOString()
        })
      });
    } catch (err) {
      console.error(`Failed to call webhook ${callbackUrl}:`, err);
      // Optionally retry with exponential backoff
    }
  }
});

Client Receives the Webhook

app.post('/webhooks/report-complete', (req, res) => {
  const { jobId, status, resultUrl } = req.body;

  console.log(`Job ${jobId} completed. Result at ${resultUrl}`);

  // Download the report
  downloadReport(resultUrl);

  // Acknowledge receipt
  res.status(200).send('OK');
});

Webhook Security

Always verify webhooks to prevent spoofing:

const crypto = require('crypto');

function signWebhook(payload, secret) {
  return crypto
    .createHmac('sha256', secret)
    .update(JSON.stringify(payload))
    .digest('hex');
}

// Server side
const signature = signWebhook(webhookPayload, process.env.WEBHOOK_SECRET);
await fetch(callbackUrl, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-Webhook-Signature': signature
  },
  body: JSON.stringify(webhookPayload)
});

// Client side
app.post('/webhooks/report-complete', (req, res) => {
  const signature = req.headers['x-webhook-signature'];
  const expectedSignature = signWebhook(req.body, process.env.WEBHOOK_SECRET);

  if (signature !== expectedSignature) {
    return res.status(401).send('Invalid signature');
  }

  // Process the webhook
  // ...
});

Progress Reporting

For long jobs, users want to know how far along things are. The job processor should update progress as it goes:

reportQueue.process(async (job) => {
  const { month } = job.data;

  await job.progress(0);

  // Step 1: Fetch orders (30% of the work)
  const orders = await fetchOrdersForMonth(month);
  await job.progress(30);

  // Step 2: Process each order (50% of the work)
  const processedOrders = [];
  for (let i = 0; i < orders.length; i++) {
    processedOrders.push(await processOrder(orders[i]));
    await job.progress(30 + (50 * (i + 1) / orders.length));
  }

  // Step 3: Generate PDF (20% of the work)
  const pdf = await generatePDF(processedOrders);
  await job.progress(100);

  return { url: await saveReport(job.id, pdf) };
});

The client can display this progress:

async function pollWithProgress(statusUrl, onProgress) {
  while (true) {
    const res = await fetch(statusUrl);
    const status = await res.json();

    onProgress(status.progress, status.status);

    if (status.status === 'completed' || status.status === 'failed') {
      return status;
    }

    await sleep(2000);
  }
}

// Usage
await pollWithProgress('/jobs/123', (progress, status) => {
  console.log(`${status}: ${progress}%`);
  updateProgressBar(progress);
});

Cancellation

Users should be able to cancel long-running jobs:

DELETE /jobs/550e8400-e29b-41d4-a716-446655440000

Server side:

app.delete('/jobs/:jobId', async (req, res) => {
  const job = await reportQueue.getJob(req.params.jobId);

  if (!job) {
    return res.status(404).json({ error: 'Job not found' });
  }

  const state = await job.getState();

  if (state === 'completed' || state === 'failed') {
    return res.status(400).json({ error: 'Cannot cancel completed job' });
  }

  await job.remove();

  res.json({
    jobId: job.id,
    status: 'cancelled'
  });
});

For jobs that are already running, you need cooperative cancellation:

reportQueue.process(async (job) => {
  const { month } = job.data;

  const orders = await fetchOrdersForMonth(month);

  for (let i = 0; i < orders.length; i++) {
    // Check if the job has been cancelled
    const state = await job.getState();
    if (state === 'failed') {
      throw new Error('Job cancelled');
    }

    await processOrder(orders[i]);
    await job.progress(30 + (50 * (i + 1) / orders.length));
  }

  // ...
});

PetStore API Example: Batch Pet Import

Let's put it all together with a realistic example: importing a CSV of 10,000 pets.

Create the Job

const formData = new FormData();
formData.append('file', csvFile);
formData.append('callbackUrl', 'https://client.example.com/webhooks/import-complete');

const res = await fetch('https://petstore.example.com/pets/import', {
  method: 'POST',
  body: formData
});

const { jobId, statusUrl } = await res.json();

Server Processes the Import

const importQueue = new Queue('pet-imports');

app.post('/pets/import', upload.single('file'), async (req, res) => {
  const jobId = uuidv4();
  const job = await importQueue.add({
    filePath: req.file.path,
    callbackUrl: req.body.callbackUrl
  }, { jobId });

  res.status(202).json({
    jobId: job.id,
    statusUrl: `/jobs/${job.id}`
  });
});

importQueue.process(async (job) => {
  const { filePath, callbackUrl } = job.data;

  const rows = await parseCSV(filePath);
  await job.progress(10);

  const results = { created: 0, failed: 0, errors: [] };

  for (let i = 0; i < rows.length; i++) {
    try {
      await createPet(rows[i]);
      results.created++;
    } catch (err) {
      results.failed++;
      results.errors.push({ row: i + 1, error: err.message });
    }

    await job.progress(10 + (90 * (i + 1) / rows.length));
  }

  // Call the webhook
  if (callbackUrl) {
    await fetch(callbackUrl, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        jobId: job.id,
        status: 'completed',
        results
      })
    });
  }

  return results;
});

Client Receives Webhook

app.post('/webhooks/import-complete', (req, res) => {
  const { jobId, results } = req.body;

  console.log(`Import ${jobId} completed:`);
  console.log(`  Created: ${results.created}`);
  console.log(`  Failed: ${results.failed}`);

  if (results.errors.length > 0) {
    console.log('Errors:');
    results.errors.forEach(e => {
      console.log(`  Row ${e.row}: ${e.error}`);
    });
  }

  res.status(200).send('OK');
});

Summary

Long-running operations need async patterns:

  • Return 202 Accepted immediately with a job ID
  • Let clients poll a status endpoint for progress
  • Use webhooks to notify clients when jobs complete
  • Report progress as a percentage so users know what's happening
  • Allow cancellation for jobs that haven't finished
  • Use a proper job queue (Bull, BullMQ, Celery) for reliability
  • Implement exponential backoff for polling to reduce load

These patterns turn a blocking 2-minute request into a responsive async flow that doesn't tie up connections, shows progress, and can be cancelled if needed.