Cutting the initial bundle by 25%, without lazy-loading everything
A measured approach to bundle auditing — webpack-bundle-analyzer and source-map-explorer, where the bytes actually hide, route- vs component-level splitting, and the over-splitting trap.
The initial JavaScript bundle on the Mail.ru Cloud file manager had grown the way every bundle grows: one reasonable import at a time, none of them obviously wrong. By the time I profiled it, the app shell shipped 412 KB of compressed JS — brotli, the bytes a real user actually downloads — before a single file appeared on screen. I got it to 306 KB, a hair over 25% off. Almost none of that came from React.lazy. It came from three measured changes I’ll break down at the end, and from one rule I held to the whole way through: don’t touch anything you haven’t profiled first.
Measure before you cut
Profiling comes first. Every bundle-size story that opens with “I added React.lazy everywhere” ends with a slower app and a developer who can’t explain why. Before changing a line, get a picture of where the bytes are.
Two tools, two questions. webpack-bundle-analyzer tells you what is in each chunk and how big — it reads the stats output and draws the treemap where a 90 KB date library is impossible to miss. source-map-explorer answers a sharper question: what actually shipped. It works off the production source maps, after minification and tree-shaking, so it counts the bytes that survived rather than the ones you imported.
# treemap from the real production build, not a dev bundle
npx webpack --config webpack.prod.js --profile --json > stats.json
npx webpack-bundle-analyzer stats.json -m static -r report.html
# what actually shipped, per module, from the emitted source maps
npx source-map-explorer dist/assets/*.js
Both tools report raw and gzip sizes, but your CDN almost certainly serves brotli, which is meaningfully smaller — so treat the analyzer’s number as a ranking signal for which modules are fat, and take the real baseline from the Network panel on a production build. On the file manager that baseline was 412 KB brotli, and four lines owned most of it: a PDF rendering stack, an in-browser image editor, a date library carrying every locale, and a polyfill that showed up twice.
Where the bytes actually hide
The treemap almost always tells three kinds of story, and the file manager had one of each.
Duplicated dependencies. Two copies of core-js rode in through different transitive ranges — one injected by the Babel preset, another pinned by a UI-kit dependency that had never bumped its range. You ship the polyfills twice and run them once. Forcing a single version with an overrides entry needed no code change and gave back 28 KB.
One heavy library doing a small job. That date library carried every locale it ships with for exactly one feature: the “modified 2 minutes ago” labels in file lists. Swapping it for date-fns plus the platform Intl.RelativeTimeFormat covered the same feature and took 47 KB off the entry. The biggest single wins tend to live here, and they’re usually a replacement or a narrower import — not a split.
Modules that load eagerly but run rarely. The PDF viewer and the image editor were both top-level imports in the shell, yet well under a tenth of sessions ever opened either one. They had no business sitting in the bytes that block first paint.
- Deduplicate first —
npm ls <pkg>finds the conflicting ranges, and an override or resolution collapses them to one copy for free. - Replace the heavy-for-the-job libraries before you split anything — that’s where the largest and cheapest wins are.
- Then split, and split the rarely-run modules, not whatever happens to sit at the top of the treemap.
Split by route first, by component second
Code splitting has two natural seams, and they are not equal.
Route-level splitting is the high-leverage one. Each top-level route becomes its own chunk, loaded when the user navigates to it, so the entry bundle carries only the shell plus the landing route. With a router this is mechanical:
// route-level: the settings area and its deps never load until you go there
const Settings = lazy(() => import('./routes/Settings'));
const PdfViewer = lazy(() => import('./routes/PdfViewer'));
<Suspense fallback={<RouteSkeleton />}>
<Routes>
<Route path="/settings/*" element={<Settings />} />
<Route path="/file/:id/pdf" element={<PdfViewer />} />
</Routes>
</Suspense>
Component-level splitting is the scalpel — for a heavy thing inside an otherwise light route: a modal, a rich editor, a chart below the fold. The rule I hold to is that a component earns a split only when it is both heavy and off the critical path. A 120 KB editor that opens behind a button, yes. A 4 KB dropdown, never — the chunk overhead and the loading flicker cost more than you save.
React.lazy plus Suspense covers the React tree; for non-component code — a parser, a formatter, a worker payload — a bare dynamic import() does the same job and resolves to the module:
// defer the heavy validator until the user actually submits
async function validateUpload(file: File) {
const { scan } = await import('./heavy/scan');
return scan(file);
}
Every lazy boundary needs a real fallback and an error boundary around it. A chunk can fail to load on a flaky connection, and a split that white-screens on a network blip is worse than the bytes it saved.
Tune splitChunks so caches survive
Import-level splitting decides what loads when. splitChunks decides how those bytes group into files, and good grouping is mostly about cache lifetime. Application code changes every deploy; third-party code changes monthly. Mix them in one file and every deploy busts the vendor cache for no reason.
// webpack.prod.js — separate vendor from app, isolate the churny giants
optimization: {
runtimeChunk: 'single', // keep the webpack runtime out of every chunk
splitChunks: {
chunks: 'all',
cacheGroups: {
// big, independently-versioned libs get their own long-lived file
react: { test: /[\\/]node_modules[\\/](react|react-dom|scheduler)[\\/]/, name: 'react', priority: 20 },
// everything else from node_modules, shared across routes
vendor: { test: /[\\/]node_modules[\\/]/, name: 'vendor', priority: 10 },
// code imported by 2+ async chunks, hoisted so it loads once
shared: { minChunks: 2, name: 'shared', priority: 5, reuseExistingChunk: true },
},
},
},
The react group is deliberate: it almost never changes, so it gets its own file with a stable hash that survives most deploys. The shared group catches utilities two lazy routes both pull in, so they download once instead of riding along in both chunks. Content-hashed filenames make the whole thing safe to cache for a year.
The over-splitting trap
Here is the failure mode nobody warns you about until you cause it. Split too aggressively and opening one route fires a waterfall of dependent requests: the route chunk loads, the browser parses it and only then discovers it imports a shared chunk, parses that, and only then reaches the vendor chunk. Each step is a round trip that can’t begin until the previous one has finished.
It’s tempting to blame connection overhead, but that diagnosis is a decade out of date. On HTTP/2 and HTTP/3 the browser reuses a single multiplexed connection, so setup is not the cost. The cost is discovery order: a chunk deep in the import graph can’t be requested until its parent has downloaded and been parsed, so a deep chunk tree becomes a deep, serial waterfall no matter how many requests the connection carries in parallel. You’ve traded one 200 KB download for six chained 30 KB ones, and on a mid-range phone the route is slower than before you “optimized” it.
- Watch the request waterfall in DevTools, not just the chunk sizes — a long diagonal staircase is chained discovery, the thing to kill.
- Keep the graph shallow: a few meaningful chunks beat a deep tree of tiny interdependent ones.
- Flatten the chain with preload.
<link rel="modulepreload">lets the browser fetch a route’s whole chunk set in parallel instead of discovering it one level at a time. The catch is the content-hashed filename — you can’t hand-writevendor.a1b2c3.js, because the hash only exists after the build. In practice you read the chunk-to-file mapping from the manifest the bundler emits and inject the tags per route, or let the router fire the dynamicimport()on hover or route intent so the fetch is already in flight by the time the user commits. - Set a floor with
splitChunks.minSizeso trivial modules never earn their own file.
Did it actually get faster?
Bundle size is a proxy. The thing you are actually moving is how fast the page becomes usable, and that lives in Core Web Vitals and the loading milestones around them.
- FCP (First Contentful Paint) moves first when you shrink the entry chunk — less JS to download and parse before the first render.
- LCP (Largest Contentful Paint) follows when the blocking script that delayed the main content is deferred out of the critical path.
- TTI / TBT — Time to Interactive and Total Blocking Time — are where deferred JS pays off most: less script on the main thread at startup means the page stops being a pretty, frozen screenshot sooner.
Measure them the way you measured the bytes: a production build, a throttled mid-range device profile, three runs, take the median. Lab numbers from Lighthouse for the before/after delta; field data from CrUX or RUM to confirm it held for real users. A 25% smaller entry bundle that doesn’t move FCP on a real device didn’t help anyone.
What I check before I call a bundle “audited”
Before I trust a bundle result, I run down this list:
- there is a written baseline — real wire size and the top contributors — from a production build, taken before any change;
- duplicates are gone and heavy-for-their-job libraries are replaced or narrowed, before a single
lazywas added; - splitting is route-first, component-second, and every component split is both heavy and off the critical path;
splitChunksseparates app code from vendor code, so a deploy does not needlessly bust the third-party cache;- the request waterfall on a throttled connection is shallow — no diagonal staircase of dependent chunks;
- FCP, LCP and TBT moved the right way on a mid-range device profile, not just the number in the analyzer.
So, the numbers. 412 KB brotli down to 306 on the Mail.ru Cloud entry — about a quarter. The deduplication gave back 28 KB, the date-library swap another 47, and moving the PDF viewer and the image editor into route chunks took 31 off the path to first paint. No single lazy() carried this — it was three measured changes, each checked against the baseline before and after. And I didn’t trust any of them until FCP moved on a throttled mid-range phone, because a smaller number in the analyzer that a real user can’t feel is just a smaller number.