Exponential Backoff with Jitter — Part 2: The Implementation
19 March 2026
In Part 1 we covered the theory — why naive retries cause thundering herd and how timeouts, exponential backoff, and jitter solve it. Here we build the full implementation.
Prerequisites
- Node.js 18+ (native
fetchsupport) - Basic familiarity with
async/awaitand REST APIs - Understanding of HTTP status codes
Step 1 — Constants and configuration
const BASE_DELAY_MS = 1_000;
const MAX_DELAY_MS = 8_000;
const MAX_ATTEMPTS = 5;
const REQUEST_TIMEOUT = 10_000;
// Only these status codes are worth retrying.
// 4xx client errors (except 408, 429) won't fix themselves on retry.
const RETRIABLE_CODES = new Set([408, 429, 500, 502, 503, 504]);
MAX_ATTEMPTS not MAX_RETRIES — naming matters. MAX_ATTEMPTS = 5 means 5 total tries (first attempt + 4 retries). MAX_RETRIES = 5 would mean 6 total tries. Using “attempts” makes the off-by-one error impossible.
new Set() for RETRIABLE_CODES — Set.has() is O(1) versus O(n) for Array.includes(). Negligible for 5 elements, but sets communicate intent better: this is a membership check, not a sequence.
Step 2 — Sleep helper
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
Step 3 — Full jitter delay
const getBackoffDelay = (attempt) => {
const exponential = BASE_DELAY_MS * Math.pow(2, attempt);
const capped = Math.min(MAX_DELAY_MS, exponential);
return Math.random() * capped;
};
Step 4 — Request timeout with AbortController
const fetchWithTimeout = (url, options = {}) => {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), REQUEST_TIMEOUT);
return fetch(url, { ...options, signal: controller.signal })
.finally(() => clearTimeout(timeoutId));
};
Always call clearTimeout in finally — if the request succeeds quickly, you don’t want a ghost timeout firing later.
Step 5 — Respecting Retry-After headers
Many APIs tell you exactly how long to wait. Ignoring this header is wasteful and disrespectful to the API:
const getDelay = (attempt, response) => {
const retryAfter = response?.headers?.get('Retry-After');
if (retryAfter) {
const parsed = parseInt(retryAfter, 10);
if (!isNaN(parsed)) return parsed * 1_000;
const date = new Date(retryAfter);
if (!isNaN(date)) return Math.max(0, date - Date.now());
}
return getBackoffDelay(attempt);
};
Step 6 — Structured logging
const log = (level, message, meta = {}) => {
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
level,
message,
...meta,
}));
};
JSON-structured logs are parseable by every observability platform. Never use console.log('retrying...') in production — you can’t query free text.
Log retriable errors at WARN not ERROR — a retriable error that eventually succeeds is not an error. It’s noise at ERROR level that fires your alerts on transient blips.
Complete implementation
import 'dotenv/config';
const BASE_DELAY_MS = 1_000;
const MAX_DELAY_MS = 8_000;
const MAX_ATTEMPTS = 5;
const REQUEST_TIMEOUT = 10_000;
const RETRIABLE_CODES = new Set([408, 429, 500, 502, 503, 504]);
const BASE_URL = 'https://rickandmortyapi.com/api/character';
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
const log = (level, message, meta = {}) =>
console.log(JSON.stringify({ timestamp: new Date().toISOString(), level, message, ...meta }));
const getBackoffDelay = (attempt) => {
const cap = Math.min(MAX_DELAY_MS, BASE_DELAY_MS * Math.pow(2, attempt));
return Math.random() * cap;
};
const getDelay = (attempt, response) => {
const retryAfter = response?.headers?.get('Retry-After');
if (retryAfter) {
const seconds = parseInt(retryAfter, 10);
if (!isNaN(seconds)) return seconds * 1_000;
const date = new Date(retryAfter);
if (!isNaN(date)) return Math.max(0, date - Date.now());
}
return getBackoffDelay(attempt);
};
const fetchWithTimeout = (url, options = {}) => {
const controller = new AbortController();
const id = setTimeout(() => controller.abort(), REQUEST_TIMEOUT);
return fetch(url, { ...options, signal: controller.signal }).finally(() => clearTimeout(id));
};
const fetchWithRetry = async (url, options = {}) => {
const headers = {
'Content-Type': 'application/json',
...(process.env.API_TOKEN && { Authorization: `Bearer ${process.env.API_TOKEN}` }),
...options.headers,
};
let attempt = 0;
while (attempt < MAX_ATTEMPTS) {
const start = Date.now();
let response;
try {
response = await fetchWithTimeout(url, { ...options, headers });
} catch (networkError) {
const isTimeout = networkError.name === 'AbortError';
if (attempt + 1 >= MAX_ATTEMPTS) throw networkError;
const delay = getBackoffDelay(attempt);
log('WARN', isTimeout ? 'Request timed out, retrying' : 'Network error, retrying', {
error: networkError.message, attempt: attempt + 1, retryInMs: Math.round(delay),
});
await sleep(delay);
attempt++;
continue;
}
if (response.ok) return response.json();
if ([401, 403].includes(response.status)) throw new Error(`Auth error ${response.status}`);
if (response.status === 404) throw new Error(`Not found: ${url}`);
if (!RETRIABLE_CODES.has(response.status)) throw new Error(`Non-retriable HTTP ${response.status}`);
if (attempt + 1 >= MAX_ATTEMPTS) {
throw new Error(`All ${MAX_ATTEMPTS} attempts failed (last status: ${response.status})`);
}
const delay = getDelay(attempt, response);
log('WARN', 'Retriable error, backing off', {
status: response.status, attempt: attempt + 1, retryInMs: Math.round(delay),
});
await sleep(delay);
attempt++;
}
};
Paginated fetching — retries at one layer only
const fetchAllPages = async (baseUrl) => {
const results = [];
let page = 1;
while (true) {
const data = await fetchWithRetry(`${baseUrl}?page=${page}`);
if (!data?.results?.length) break;
results.push(...data.results);
log('INFO', `Page ${page} complete`, { count: data.results.length, total: results.length });
if (!data.info?.next) break;
page++;
}
return results;
};
fetchAllPages has no retry logic of its own — all retries happen inside fetchWithRetry. This is intentional. If both layers retry with 5 attempts each, a single failing page fires 25 requests. In a microservices stack with 3 layers independently retrying: 5³ = 125 requests for what the user sees as one page load.
The rule: retry at one point in the call stack.
Summary
Five things that separate production-grade retry logic from naive retries:
- Timeout every request — use
AbortController, never let fetch hang - Cap your backoff —
Math.min(MAX_DELAY_MS, ...)prevents absurd wait times - Add full jitter —
Math.random() * cappedDelayprevents thundering herds - Respect
Retry-After— the API knows better than you how long to wait - Retry at one layer only — don’t multiply retries across your call stack