Pagination That Scales to Millions of Records

Offset-based pagination breaks at scale. Learn why cursor-based pagination is better and how to implement it in REST APIs with real examples.

TRY NANO BANANA FOR FREE

Pagination That Scales to Millions of Records

TRY NANO BANANA FOR FREE
Contents

Your API returns all results in one response. It works fine with 100 records. But what happens when you have 10,000? Or 1 million?

Without pagination, your API will: - Consume excessive memory - Take too long to respond - Overwhelm clients with data they can't process - Crash under load

You need pagination. But not all pagination is equal. The approach you choose affects performance, scalability, and user experience.

The Three Pagination Approaches

1. Offset-Based Pagination (Page Numbers)

This is the most common approach. Clients request pages by number:

GET /pets?page=1&limit=20
GET /pets?page=2&limit=20
GET /pets?page=3&limit=20

The API translates this to SQL:

-- Page 1
SELECT * FROM pets LIMIT 20 OFFSET 0;

-- Page 2
SELECT * FROM pets LIMIT 20 OFFSET 20;

-- Page 3
SELECT * FROM pets LIMIT 20 OFFSET 40;

Pros: - Simple to implement - Easy to understand - Supports jumping to any page - Shows total page count

Cons: - Performance degrades with large offsets - Inconsistent results when data changes - Doesn't work well with real-time data - Database must scan and skip rows

2. Cursor-Based Pagination (Keyset Pagination)

Clients request pages using a cursor (pointer to a specific record):

GET /pets?limit=20
GET /pets?limit=20&cursor=eyJpZCI6IjAxOWI0MTMyIn0
GET /pets?limit=20&cursor=eyJpZCI6IjAxOWI0MTUzIn0

The cursor encodes the last record's ID. The API uses it to fetch the next page:

-- First page
SELECT * FROM pets ORDER BY id LIMIT 20;

-- Next page (cursor contains id='019b4132')
SELECT * FROM pets WHERE id > '019b4132' ORDER BY id LIMIT 20;

Pros: - Consistent performance regardless of page depth - Handles real-time data correctly - No duplicate or missing records - Efficient database queries

Cons: - Can't jump to arbitrary pages - No total page count - More complex to implement - Cursor format must be stable

3. Seek-Based Pagination (Timestamp/ID Range)

Similar to cursor-based but uses explicit parameters:

GET /pets?limit=20&since_id=019b4132
GET /pets?limit=20&before_id=019b4153

Pros: - Same performance benefits as cursor-based - More transparent (no encoded cursor) - Easy to bookmark specific ranges

Cons: - Exposes internal IDs - Requires clients to track IDs - Can't jump to arbitrary pages

Why Offset Pagination Breaks at Scale

Offset pagination seems simple, but it has serious problems.

Problem 1: Performance Degradation

When you request page 1000 with 20 items per page:

SELECT * FROM pets LIMIT 20 OFFSET 19980;

The database must: 1. Scan 19,980 rows 2. Skip all of them 3. Return the next 20

This gets slower as the offset increases. At page 10,000, the database scans 200,000 rows just to return 20.

Benchmark (1 million records): - Page 1: 5ms - Page 100: 50ms - Page 1,000: 500ms - Page 10,000: 5,000ms (5 seconds!)

Cursor-based pagination maintains constant performance: - Any page: 5-10ms

Problem 2: Inconsistent Results

Imagine a user is browsing page 2 of pet listings. While they're viewing it, someone adds a new pet at the top of the list.

When the user clicks "Next" to page 3, they see: - Page 2 again (shifted down by one position) - Or they skip a record entirely

This happens because offset pagination uses absolute positions. When data changes, positions shift.

Cursor-based pagination doesn't have this problem. The cursor points to a specific record, not a position. New records don't affect pagination.

Problem 3: No Total Count

To show "Page 1 of 500," you need the total count:

SELECT COUNT(*) FROM pets;

This query gets expensive with large tables. On a table with 10 million rows, COUNT(*) can take seconds.

You have three options: 1. Don't show total count (best for large datasets) 2. Cache the count (update periodically, accept staleness) 3. Estimate the count (use database statistics)

Cursor-based pagination avoids this problem by not showing total pages. Instead, it shows "Load More" or "Next Page."

Implementing Cursor-Based Pagination

Here's how to implement cursor-based pagination correctly.

Step 1: Choose a Cursor Field

The cursor field must be: - Unique (or unique when combined with another field) - Sequential (ordered) - Immutable (doesn't change) - Indexed (for performance)

Good choices: - Auto-incrementing ID - UUID v7 (time-ordered) - Timestamp + ID (for non-unique timestamps)

Bad choices: - Non-unique fields (status, category) - Mutable fields (updated_at can change) - Unindexed fields (slow queries)

Step 2: Return Cursor in Response

Include pagination metadata in the response:

{
  "data": [
    {
      "id": "019b4132-70aa-764f-b315-e2803d882a24",
      "name": "Max",
      "species": "DOG"
    },
    // ... 19 more records
  ],
  "pagination": {
    "nextCursor": "eyJpZCI6IjAxOWI0MTUzLThmM2EtN2MyZC1hMWI1LWY0ZThkOWMzYTdiMiJ9",
    "hasMore": true,
    "limit": 20
  }
}

The nextCursor is a base64-encoded JSON object:

// Encode
const cursor = Buffer.from(JSON.stringify({
  id: "019b4153-8f3a-7c2d-a1b5-f4e8d9c3a7b2"
})).toString('base64');

// Decode
const decoded = JSON.parse(Buffer.from(cursor, 'base64').toString());

Encoding the cursor hides implementation details and allows you to change the format later.

Step 3: Query Using Cursor

When a cursor is provided, use it to fetch the next page:

async function getPets(limit = 20, cursor = null) {
  let query = 'SELECT * FROM pets WHERE 1=1';
  const params = [];

  if (cursor) {
    const decoded = JSON.parse(Buffer.from(cursor, 'base64').toString());
    query += ' AND id > ?';
    params.push(decoded.id);
  }

  query += ' ORDER BY id ASC LIMIT ?';
  params.push(limit + 1); // Fetch one extra to check if there's more

  const results = await db.query(query, params);

  const hasMore = results.length > limit;
  const data = hasMore ? results.slice(0, limit) : results;

  const nextCursor = hasMore
    ? Buffer.from(JSON.stringify({ id: data[data.length - 1].id })).toString('base64')
    : null;

  return {
    data,
    pagination: {
      nextCursor,
      hasMore,
      limit
    }
  };
}

This approach: - Fetches limit + 1 records to check if more exist - Returns only limit records - Generates nextCursor from the last record - Sets hasMore based on whether extra records exist

Step 4: Handle Sorting

Cursor pagination works with custom sorting, but the cursor must include all sort fields:

GET /pets?sort=age,id&order=asc&limit=20

The cursor includes both age and id:

{
  "age": 3,
  "id": "019b4153-8f3a-7c2d-a1b5-f4e8d9c3a7b2"
}

The query uses both fields:

SELECT * FROM pets
WHERE (age > 3) OR (age = 3 AND id > '019b4153-8f3a-7c2d-a1b5-f4e8d9c3a7b2')
ORDER BY age ASC, id ASC
LIMIT 21;

This ensures correct ordering and no duplicate/missing records.

Best Practices

1. Set Reasonable Limits

Don't let clients request unlimited records:

const limit = Math.min(parseInt(req.query.limit) || 20, 100);

Cap the maximum at 100 (or whatever makes sense for your API). This prevents abuse and ensures consistent performance.

2. Provide Both Forward and Backward Pagination

Support both nextCursor and previousCursor:

{
  "data": [...],
  "pagination": {
    "previousCursor": "eyJpZCI6IjAxOWI0MTAwIn0",
    "nextCursor": "eyJpZCI6IjAxOWI0MTUzIn0",
    "hasMore": true,
    "hasPrevious": true
  }
}

This allows clients to navigate in both directions.

3. Make Cursors Opaque

Don't expose raw IDs as cursors:

Bad:

GET /pets?since_id=019b4153

Good:

GET /pets?cursor=eyJpZCI6IjAxOWI0MTUzIn0

Opaque cursors let you change the implementation without breaking clients.

4. Include Metadata

Help clients understand pagination state:

{
  "data": [...],
  "pagination": {
    "nextCursor": "...",
    "hasMore": true,
    "limit": 20,
    "returnedCount": 20
  }
}

The returnedCount helps clients detect the last page (when returnedCount < limit).

5. Document Cursor Format

Even though cursors are opaque, document what they contain:

Cursors are base64-encoded JSON objects containing:
- id: The last record's ID
- Additional fields based on sort parameters

Cursors are valid for 24 hours. After that, start a new pagination session.

This helps developers debug issues.

When to Use Each Approach

Use offset pagination when: - Dataset is small (< 10,000 records) - Users need to jump to specific pages - Total count is required - Data rarely changes

Use cursor pagination when: - Dataset is large (> 10,000 records) - Performance matters - Data changes frequently - Real-time consistency is important

Use seek pagination when: - You want cursor benefits with transparent parameters - Clients need to bookmark specific ranges - You're okay exposing IDs

Real-World Example: Modern PetStore API

The Modern PetStore API uses cursor-based pagination:

GET /v1/pets?limit=20

Response:
{
  "data": [
    {
      "id": "019b4132-70aa-764f-b315-e2803d882a24",
      "name": "Max",
      "species": "DOG",
      "breed": "Golden Retriever",
      "age": 3,
      "status": "AVAILABLE"
    },
    // ... 19 more
  ],
  "pagination": {
    "nextCursor": "eyJpZCI6IjAxOWI0MTUzLThmM2EtN2MyZC1hMWI1LWY0ZThkOWMzYTdiMiJ9",
    "hasMore": true,
    "limit": 20
  }
}

Next page:

GET /v1/pets?limit=20&cursor=eyJpZCI6IjAxOWI0MTUzLThmM2EtN2MyZC1hMWI1LWY0ZThkOWMzYTdiMiJ9

This approach scales to millions of pets without performance degradation.

Conclusion

Pagination is not optional for production APIs. Choose the right approach:

  • Small datasets: Offset pagination works fine
  • Large datasets: Use cursor-based pagination
  • Real-time data: Cursor pagination prevents inconsistencies
  • Performance-critical: Cursor pagination maintains constant speed

The Modern PetStore API demonstrates cursor-based pagination with proper error handling, sorting support, and clear documentation. Use it as a reference for your own APIs.

Stop using ?page=1000 and start using cursors. Your database will thank you.

Try It Yourself: Explore the Modern PetStore API