Article · Habr

Performance budgets a team will actually keep

How to gate Core Web Vitals and bundle size in CI from field data, why INP is the hard one, and how to keep the build fair.

Every team I have worked on eventually adds a performance budget, and most of them quietly stop honoring it within a quarter. The problem is rarely the metric — it is that the gate is built on the wrong data and breaks the wrong person’s build. Here is how I set budgets that survive contact with a real team.

Tie budgets to vitals and bytes, not vibes

A budget is only useful if it fails a pull request before the regression reaches users. That means it lives in CI, and it measures things that map to user pain: the three Core Web Vitals — LCP, INP, CLS — plus the JS bytes you ship to get there. Vitals tell you how the page feels; bundle size is the lever you actually pull in code review.

Keep the two layers separate. Bundle size is cheap and deterministic, so check it on every commit. Vitals are noisy and depend on the environment, so treat them as a slower signal. A minimal byte gate looks like this:

# fail the build if any first-party entry chunk crosses its gzip budget
npx size-limit --json | node ./scripts/assert-budgets.mjs --route=/app --max-gzip=170kb

type Budget = { route: string; lcpMs: number; inpMs: number; cls: number; jsGzipKb: number };
const budgets: Record<string, Budget> = {
  '/app':     { route: '/app',     lcpMs: 2500, inpMs: 200, cls: 0.1, jsGzipKb: 170 },
  '/app/doc': { route: '/app/doc', lcpMs: 2800, inpMs: 200, cls: 0.1, jsGzipKb: 240 },
};

A budget nobody can fail is a poster, not a gate.

Set the numbers from the field, not the lab

The most common mistake is copying the lab thresholds from a Lighthouse run on a developer laptop. Lab numbers are reproducible but fictional: a fast machine on fast Wi-Fi tells you almost nothing about a mid-range Android on a flaky connection. Budgets have to come from field data — CrUX for the public web, your own RUM if you have it.

The field gives you a distribution, so commit to a percentile. The standard is the 75th: a vitals metric “passes” when 75% of real visits clear the threshold. Pick the percentile once, write it down, and budget against that number — not against your median, which always looks great and protects no one.

Read the current p75 for each route from CrUX or RUM, then set the budget slightly tighter — a ratchet, not a wish.
Segment by device class; a single global number hides the slow phones that are doing the suffering.
Re-baseline on a schedule, not in a panic the morning a build goes red.

INP is the one that will hurt

LCP and CLS are mostly solved by now — preload the hero, reserve space for images, done. INP is different. It replaced FID in 2024, and unlike FID it measures the full latency of every interaction through to the next paint, not just the first input’s queueing delay. You cannot game it with a fast first tap.

INP regressions almost always trace back to the main thread being busy when the user clicks. The usual suspects:

long tasks — any single block over 50ms; a fat reducer or a synchronous layout read on click is enough;
heavy event handlers that do real work inline instead of scheduling it;
hydration, where a framework re-attaches listeners to the whole tree and blocks input for hundreds of milliseconds right after load.

The fix is almost never “optimize the function” — it is “stop blocking the main thread.” Yield after visual feedback, push non-urgent work behind scheduler.postTask or at least a setTimeout(0), and break long tasks so the browser can paint the interaction:

button.addEventListener('click', async () => {
  applyVisualFeedback();             // paints this frame, keeps INP low
  await scheduler.yield?.();          // hand the main thread back to the browser
  await runExpensiveWork();           // the heavy part now runs after paint
});

Make the gate fair, or it gets disabled

The technical part is the easy half. The hard half is social: at some point the gate turns a teammate’s perfectly reasonable feature PR red, they did not write the slow code, and now they are blocked. Do that twice and someone adds // budget: skip and the whole thing dies.

A fair gate has three properties. Budgets are per-route, so a heavy editor page does not impose its limits on the marketing landing. There is a small tolerance — a few percent of slack and a check against a rolling baseline rather than an absolute line, so ordinary noise does not flip the build. And every budget has a named owner, so when a route goes over, the gate pings the person who can actually decide, not the unlucky author of the triggering commit.

When a budget is genuinely exceeded, the failure message should say what grew, by how much, and who owns it — and offer a one-line, logged, expiring waiver. A waiver that needs a human’s name and disappears in two weeks is honest; a permanent silent skip is how budgets rot.

What I check before I trust a budget

Before I call a performance budget real rather than decorative, I run down this list:

the gate runs in CI and can actually block a merge, not just print a warning;
thresholds come from field p75, not a lab run on someone’s M-series laptop;
there is a per-route budget and a documented tolerance against a rolling baseline;
INP has its own line, and long-task count is tracked, not just the aggregate score;
every budget names an owner, and waivers are explicit, logged, and expiring;
the numbers ratchet down over time instead of drifting up to whatever shipped last.

On the local-first editor I work on, the budget that stuck was not the strictest one — it was the one with honest per-route numbers and a two-week waiver. It caught a hydration regression that had quietly added tens of milliseconds to INP, and nobody had to be the villain to get it fixed.