Error Handling - Learn

Error Handling

9 patterns

Error response structure, retry strategies, idempotency, timeouts, and graceful degradation. You'll hit this when clients can't tell a validation error from a server crash, or retries cause duplicate orders.

Error response format

easy

Avoid

// Simple error response
app.use((err, req, res, next) => {
  res.status(err.status || 500).json({
    error: err.message,
  });
});

// Simple error response
app.use((err, req, res, next) => {
  res.status(err.status || 500).json({
    error: err.message,
  });
});

Prefer

// RFC 7807 Problem Details
app.use((err, req, res, next) => {
  res.status(err.status || 500).json({
    type: "https://api.example.com/errors/validation",
    title: "Validation Error",
    status: err.status,
    detail: err.message,
    instance: req.originalUrl,
  });
});

// RFC 7807 Problem Details
app.use((err, req, res, next) => {
  res.status(err.status || 500).json({
    type: "https://api.example.com/errors/validation",
    title: "Validation Error",
    status: err.status,
    detail: err.message,
    instance: req.originalUrl,
  });
});

Why avoid

Ad-hoc error objects with a single message string give clients no structured way to distinguish error types, map them to UI states, or build reliable retry logic. Every API ends up inventing its own format, forcing clients to write custom parsing for each one.

Why prefer

RFC 7807 Problem Details provides a standardized error format with well-defined fields like type, title, status, detail, and instance. Clients can parse errors consistently across different APIs without guessing the shape of the response body.

RFC 7807: Problem Details for HTTP APIs

Validation status codes

easy

Avoid

// Return 500 for validation errors
app.post("/users", (req, res) => {
  const errors = validate(req.body);
  if (errors.length > 0) {
    return res.status(500).json({
      title: "Server Error",
      errors,
    });
  }
  // ... create user
});

// Return 500 for validation errors
app.post("/users", (req, res) => {
  const errors = validate(req.body);
  if (errors.length > 0) {
    return res.status(500).json({
      title: "Server Error",
      errors,
    });
  }
  // ... create user
});

Prefer

// Return 400 for validation errors
app.post("/users", (req, res) => {
  const errors = validate(req.body);
  if (errors.length > 0) {
    return res.status(400).json({
      title: "Validation Error",
      errors,
    });
  }
  // ... create user
});

// Return 400 for validation errors
app.post("/users", (req, res) => {
  const errors = validate(req.body);
  if (errors.length > 0) {
    return res.status(400).json({
      title: "Validation Error",
      errors,
    });
  }
  // ... create user
});

Why avoid

Using 500 Internal Server Error for validation failures conflates client mistakes with actual server bugs. Alerting systems will fire false alarms, retry logic will pointlessly retry requests that can never succeed, and clients have no signal that they need to fix their input.

Why prefer

400 Bad Request tells the client that the problem is with its input, not the server. This distinction matters because clients know they should fix the request before retrying. Monitoring systems also rely on 4xx vs 5xx to separate client mistakes from server failures.

MDN: 400 Bad Request

Error messages and internal details

easy

Avoid

// Full error details in response
app.use((err, req, res, next) => {
  res.status(500).json({
    message: err.message,
    stack: err.stack,
    query: err.sql,
    host: process.env.DB_HOST,
  });
});

// Full error details in response
app.use((err, req, res, next) => {
  res.status(500).json({
    message: err.message,
    stack: err.stack,
    query: err.sql,
    host: process.env.DB_HOST,
  });
});

Prefer

// Generic error with trace ID
app.use((err, req, res, next) => {
  console.error(err); // log full details
  res.status(500).json({
    title: "Internal Server Error",
    detail: "Something went wrong. Please try again.",
    traceId: req.id,
  });
});

// Generic error with trace ID
app.use((err, req, res, next) => {
  console.error(err); // log full details
  res.status(500).json({
    title: "Internal Server Error",
    detail: "Something went wrong. Please try again.",
    traceId: req.id,
  });
});

Why avoid

Exposing stack traces, SQL statements, and database hostnames in API responses is a security vulnerability. Attackers can use this information to map your infrastructure, identify vulnerable dependencies, and craft targeted attacks against your database.

Why prefer

Safe error responses hide implementation details (stack traces, SQL queries, hostnames) while providing a trace ID so support teams can look up the full error in server logs. This prevents attackers from gathering information about your infrastructure.

OWASP: Error Handling Cheat Sheet

Retry strategy for POST requests

medium

Avoid

// Retry without idempotency
async function charge(amount: number) {
  for (let i = 0; i < 3; i++) {
    try {
      return await fetch("/api/charge", {
        method: "POST",
        body: JSON.stringify({ amount }),
      });
    } catch {
      await delay(1000 * 2 ** i);
    }
  }
}

// Retry without idempotency
async function charge(amount: number) {
  for (let i = 0; i < 3; i++) {
    try {
      return await fetch("/api/charge", {
        method: "POST",
        body: JSON.stringify({ amount }),
      });
    } catch {
      await delay(1000 * 2 ** i);
    }
  }
}

Prefer

// Retry with idempotency key
async function charge(amount: number) {
  const key = crypto.randomUUID();
  for (let i = 0; i < 3; i++) {
    try {
      return await fetch("/api/charge", {
        method: "POST",
        headers: { "Idempotency-Key": key },
        body: JSON.stringify({ amount }),
      });
    } catch {
      await delay(1000 * 2 ** i);
    }
  }
}

// Retry with idempotency key
async function charge(amount: number) {
  const key = crypto.randomUUID();
  for (let i = 0; i < 3; i++) {
    try {
      return await fetch("/api/charge", {
        method: "POST",
        headers: { "Idempotency-Key": key },
        body: JSON.stringify({ amount }),
      });
    } catch {
      await delay(1000 * 2 ** i);
    }
  }
}

Why avoid

Retrying a POST request without an idempotency key risks duplicate side effects. If the first request succeeded but the response was lost due to a network error, the retry will create a second charge. For financial operations this can mean double-billing a customer.

Why prefer

An idempotency key generated once before the retry loop ensures the server processes the payment exactly once, even if the client sends multiple requests. The server checks the key and returns the original response for duplicate requests instead of charging again.

RFC 9110: HTTP Semantics (Idempotent Methods)

Timeout handling

medium

Avoid

// Race with a timeout promise
async function fetchData(url: string) {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error("Timeout")), 5000)
  );
  const res = await Promise.race([
    fetch(url),
    timeout,
  ]);
  return res.json();
}

// Race with a timeout promise
async function fetchData(url: string) {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error("Timeout")), 5000)
  );
  const res = await Promise.race([
    fetch(url),
    timeout,
  ]);
  return res.json();
}

Prefer

// AbortController with signal
async function fetchData(url: string) {
  const controller = new AbortController();
  const id = setTimeout(
    () => controller.abort(), 5000
  );
  try {
    const res = await fetch(url, {
      signal: controller.signal,
    });
    return await res.json();
  } finally {
    clearTimeout(id);
  }
}

// AbortController with signal
async function fetchData(url: string) {
  const controller = new AbortController();
  const id = setTimeout(
    () => controller.abort(), 5000
  );
  try {
    const res = await fetch(url, {
      signal: controller.signal,
    });
    return await res.json();
  } finally {
    clearTimeout(id);
  }
}

Why avoid

Promise.race resolves the outer promise on timeout, but the fetch request keeps running in the background. The connection stays open, the response body is still being downloaded, and the callback will eventually resolve with no one listening. This wastes bandwidth and can cause memory leaks.

Why prefer

AbortController actually cancels the underlying HTTP request when the timeout fires. The browser tears down the TCP connection and frees resources immediately. The finally block cleans up the timer if the fetch completes before the timeout.

MDN: AbortController

Downstream failure handling

medium

Avoid

// Direct service call
async function getPrice(id: string) {
  try {
    const res = await fetch(
      `${PRICING_API}/items/${id}`
    );
    return await res.json();
  } catch {
    return { price: null, source: "error" };
  }
}

// Direct service call
async function getPrice(id: string) {
  try {
    const res = await fetch(
      `${PRICING_API}/items/${id}`
    );
    return await res.json();
  } catch {
    return { price: null, source: "error" };
  }
}

Prefer

// Service call with state tracking
const breaker = new CircuitBreaker({
  threshold: 5,
  resetTimeout: 30_000,
});

async function getPrice(id: string) {
  if (breaker.isOpen()) {
    return { price: null, source: "cache" };
  }
  try {
    const res = await fetch(
      `${PRICING_API}/items/${id}`
    );
    breaker.recordSuccess();
    return await res.json();
  } catch {
    breaker.recordFailure();
    return { price: null, source: "fallback" };
  }
}

// Service call with state tracking
const breaker = new CircuitBreaker({
  threshold: 5,
  resetTimeout: 30_000,
});

async function getPrice(id: string) {
  if (breaker.isOpen()) {
    return { price: null, source: "cache" };
  }
  try {
    const res = await fetch(
      `${PRICING_API}/items/${id}`
    );
    breaker.recordSuccess();
    return await res.json();
  } catch {
    breaker.recordFailure();
    return { price: null, source: "fallback" };
  }
}

Why avoid

Calling a failing service on every request wastes time waiting for inevitable timeouts. If the service is down, every request adds load to an already struggling system, increases response times for your users, and can cause cascading failures across your infrastructure.

Why prefer

A circuit breaker stops sending requests to a service that has failed repeatedly. After a threshold of failures, the circuit opens and subsequent calls return a fallback immediately. This prevents cascading failures, reduces latency for users, and gives the downstream service time to recover.

Azure: Circuit Breaker Pattern

Bulk operation partial failure

hard

Avoid

// All-or-nothing bulk response
app.post("/api/users/bulk", async (req, res) => {
  try {
    const users = await Promise.all(
      req.body.users.map(createUser)
    );
    res.status(201).json({ users });
  } catch (err) {
    res.status(500).json({
      error: "Bulk operation failed",
    });
  }
});

// All-or-nothing bulk response
app.post("/api/users/bulk", async (req, res) => {
  try {
    const users = await Promise.all(
      req.body.users.map(createUser)
    );
    res.status(201).json({ users });
  } catch (err) {
    res.status(500).json({
      error: "Bulk operation failed",
    });
  }
});

Prefer

// Per-item status in bulk response
app.post("/api/users/bulk", async (req, res) => {
  const results = await Promise.allSettled(
    req.body.users.map(createUser)
  );
  const response = results.map((r, i) => ({
    index: i,
    status: r.status === "fulfilled"
      ? "created" : "failed",
    data: r.status === "fulfilled"
      ? r.value : undefined,
    error: r.status === "rejected"
      ? r.reason.message : undefined,
  }));
  const allOk = response.every(
    r => r.status === "created"
  );
  res.status(allOk ? 201 : 207).json(response);
});

// Per-item status in bulk response
app.post("/api/users/bulk", async (req, res) => {
  const results = await Promise.allSettled(
    req.body.users.map(createUser)
  );
  const response = results.map((r, i) => ({
    index: i,
    status: r.status === "fulfilled"
      ? "created" : "failed",
    data: r.status === "fulfilled"
      ? r.value : undefined,
    error: r.status === "rejected"
      ? r.reason.message : undefined,
  }));
  const allOk = response.every(
    r => r.status === "created"
  );
  res.status(allOk ? 201 : 207).json(response);
});

Why avoid

Promise.all rejects on the first failure and discards all results, including items that succeeded. The client has no way to know which items were created and which were not. Re-submitting the entire batch risks duplicating the items that already succeeded.

Why prefer

Promise.allSettled processes every item regardless of individual failures, and the response includes per-item status. HTTP 207 Multi-Status signals that the response contains mixed results. Clients can identify which items succeeded and retry only the failures.

MDN: Promise.allSettled()

Rate limit exceeded response

hard

Avoid

// Basic 429 response
app.use(rateLimiter({
  max: 100,
  handler: (req, res) => {
    res.status(429).json({
      error: "Too many requests",
    });
  },
}));

// Basic 429 response
app.use(rateLimiter({
  max: 100,
  handler: (req, res) => {
    res.status(429).json({
      error: "Too many requests",
    });
  },
}));

Prefer

// 429 with headers
app.use(rateLimiter({
  max: 100,
  handler: (req, res) => {
    res.set({
      "Retry-After": "30",
      "X-RateLimit-Limit": "100",
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": String(
        Math.ceil(Date.now() / 1000) + 30
      ),
    });
    res.status(429).json({
      title: "Rate Limit Exceeded",
      detail: "Try again in 30 seconds.",
      retryAfter: 30,
    });
  },
}));

// 429 with headers
app.use(rateLimiter({
  max: 100,
  handler: (req, res) => {
    res.set({
      "Retry-After": "30",
      "X-RateLimit-Limit": "100",
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": String(
        Math.ceil(Date.now() / 1000) + 30
      ),
    });
    res.status(429).json({
      title: "Rate Limit Exceeded",
      detail: "Try again in 30 seconds.",
      retryAfter: 30,
    });
  },
}));

Why avoid

A bare 429 response without Retry-After or rate limit headers forces clients to guess when they can retry. Most will use aggressive exponential backoff or fixed intervals, leading to either thundering herd problems when all clients retry simultaneously or unnecessarily long delays.

Why prefer

Including the Retry-After header and rate limit metadata (limit, remaining, reset) lets clients implement smart backoff automatically. Well-behaved clients read these headers to schedule their next request precisely, reducing unnecessary retries and server load.

MDN: 429 Too Many Requests

Retry backoff strategy

hard

Avoid

async function fetchWithRetry(url: string) {
  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await fetch(url);
    } catch {
      const delay = 1000 * 2 ** attempt;
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("All retries failed");
}

async function fetchWithRetry(url: string) {
  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await fetch(url);
    } catch {
      const delay = 1000 * 2 ** attempt;
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("All retries failed");
}

Prefer

async function fetchWithRetry(url: string) {
  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await fetch(url);
    } catch {
      const base = 1000 * 2 ** attempt;
      const jitter = Math.random() * base;
      await new Promise((r) => setTimeout(r, jitter));
    }
  }
  throw new Error("All retries failed");
}

async function fetchWithRetry(url: string) {
  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await fetch(url);
    } catch {
      const base = 1000 * 2 ** attempt;
      const jitter = Math.random() * base;
      await new Promise((r) => setTimeout(r, jitter));
    }
  }
  throw new Error("All retries failed");
}

Why avoid

Pure exponential backoff without jitter causes a thundering herd problem. All clients that failed at the same moment will retry at the same moment (after 1s, then 2s, then 4s). Each synchronized retry wave can re-overwhelm the recovering server, potentially causing a cycle of failures that takes much longer to resolve.

Why prefer

Adding random jitter spreads retry attempts across time. When a server goes down and 1,000 clients all start retrying, pure exponential backoff makes them all retry at exactly the same intervals (1s, 2s, 4s), creating repeated traffic spikes. Jitter randomizes the timing so retries arrive gradually, giving the server a smooth recovery window instead of repeated bursts.

AWS: Exponential Backoff and Jitter

Authentication & Authorization

API Consumption