Virtualizing thousands of files, and why the grid is the hard part

How windowing actually works, why a grid is harder than a list, and the React render work that gets you the rest of the way to 60fps — memo, memoized selectors, and knowing when it's cargo cult.

A folder with twelve thousand files isn’t an edge case in a cloud drive — it’s a normal Tuesday for anyone who’s ever dumped a camera roll into one place. On the Mail.ru Cloud file manager those folders were where the UI fell apart: the first render of a big folder held the main thread for the better part of a second, scrolling sat somewhere in the low twenties of frames per second, and the tab’s memory climbed with every folder you opened and never quite came back down. The cause was the most boring thing imaginable: we rendered every file. Twelve thousand rows, or twelve thousand grid cards, all in the DOM at once — whether or not you’d ever scroll to them.

The fix is virtualization, and the principle fits in one sentence: render only what’s on screen, plus a small buffer. The difficulty isn’t the principle; it’s the details. A list is easy to virtualize; a grid is markedly harder; and even a working window doesn’t reach 60fps on its own — the React render work around it covers the rest of the distance. I’ll go through both parts. The discipline here is the same as everywhere in performance work: don’t optimize anything you haven’t profiled first.

How windowing actually works

The bottleneck is the DOM. The size of your data barely matters: a large array sits in memory just fine. What the browser struggles with is something else — twelve thousand live DOM nodes, because each one carries layout boxes, style resolution, paint, and a slot in the tree that React reconciles on every update. What counts is the node count, not the row count.

Windowing breaks that link. You keep a single scroll container with one tall inner element — a spacer whose height equals the full content height, so the scrollbar behaves as if everything were rendered. Inside it you render only the rows whose vertical range intersects the viewport, absolutely positioned at their computed offset, and you recompute that slice as scrollTop changes. Set the offset with transform: translateY() rather than top: unlike top, that shift goes to the compositor without touching layout or paint. The result is thirty nodes on screen instead of twelve thousand, whatever the folder size. Memory goes flat, the first render is bounded by the viewport, and reconciliation only ever touches the handful of rows in the window.

content-visibility: auto suggests itself here, so it’s worth saying upfront why it isn’t the tool for this. The property skips layout and paint for off-screen content, but it leaves the nodes in the DOM. What hurts us is precisely their count and the memory behind them — so content-visibility doesn’t close this gap: it’s about the render cost of off-screen content, not the size of the tree.

Two details separate a smooth windower from a flickery one. The first is overscan: a few extra rows past each edge of the viewport, so a fast flick doesn’t outrun the renderer and show a band of blank space. Set it too low and a quick scroll flashes blank; set it too high and you render rows nobody reaches. The right number is small, and you won’t guess it on paper — you find it by scrolling.

The second detail is keys, and this one bites quietly. Key the rows by a stable file id, not by their index. On a pure scroll the difference barely shows: the window slides under either scheme, and React reuses most of the nodes, swapping only the rows at the edges. It all goes wrong the moment the list reorders. Sort by date, flip a filter — and with index keys React pairs the new items to the existing DOM by position, reusing each node for whatever file now lands at that index. The node stays; the file under it changes. Whatever the row was holding locally — a selection highlight, a half-typed rename, the height you just measured for it — is now stuck to the wrong file. Stable ids make React carry the node along with its file instead of leaving it in place.

A list is easy. A grid is not.

A fixed-height list is the easy case, and it’s worth seeing why before the hard case makes sense. Every row is the same height, so an item’s offset is just index * rowHeight, the total height is count * rowHeight, and the visible slice is two divisions: the first index is scrollTop / rowHeight, the last is (scrollTop + viewportHeight) / rowHeight. Nothing to measure, no per-item state to keep.

Three things break that, and a real grid breaks all three at once.

Variable heights. The moment a row can be tall or short — a file name that wraps to two lines, a preview thumbnail that may or may not be there — you no longer know an item’s offset without knowing the height of every item above it. You estimate, render, measure what actually landed, and store it. Offsets become a running sum of known-or-estimated heights, and the total height stays your best guess until everything has been measured at least once.

Measurement. You only learn a row’s real height after it renders, so you measure it in a layout effect — or with a ResizeObserver, since a thumbnail loading later changes the height — and write it into a cache. Cache by file id, not by index: index-keyed measurements go stale the instant the list sorts or filters, and you end up positioning rows with another row’s height. When a measured height differs from the estimate you used, every offset below it shifts; and if that row was above the viewport, the content under the user’s cursor jumps unless you correct scrollTop by the delta in the same frame. Getting that scroll anchoring right is most of what makes variable-height windowing usable. Get it wrong and the list twitches every time something above the fold finishes measuring.

Two dimensions. A grid adds columns on top of all of that. With fixed cells it’s still arithmetic — items per row is floor(containerWidth / cellWidth), a card’s row is floor(index / perRow) — but perRow changes on every resize, so the whole layout is a function of width and has to recompute when the container does. Let the cells vary in height and the columns stop sharing a baseline: each one fills independently, their bottoms drift apart, and “which row is at this scroll position” is no longer a single division. You track an offset per column and ask which cells across all columns intersect the viewport. That’s the masonry problem, and it’s where hand-rolled virtualization usually starts leaking edge cases.

// offsets derived from measured-or-estimated heights, keyed by file id
const measured = new Map<string, number>();        // id -> real height once seen

function rowOffsets(ids: string[], estimate: number) {
  const offsets = new Array(ids.length);
  let running = 0;
  for (let i = 0; i < ids.length; i++) {
    offsets[i] = running;
    running += measured.get(ids[i]) ?? estimate;     // fall back to the estimate
  }
  return { offsets, totalHeight: running };
}

That O(n) pass is fine as long as you run it when a measurement lands or the list changes — not on every scroll event. The offsets are derived state: memoize them, read them on scroll, and recompute only when an input actually moves. Run the loop per frame and you’ve rebuilt the per-frame cost the whole exercise was meant to delete.

None of this is unprecedented: react-window, react-virtualized, and TanStack Virtual have solved it — down to the parts that never make it into a blog example: sub-pixel rounding, momentum scroll on iOS, and a dozen smaller things you meet only by shipping them. I’ve hand-rolled a windower exactly once, to understand it. In production I reach for a maintained library and spend the effort on the measurement cache and the cell components, where the product-specific pain actually lives.

The render work around it

Windowing gets the node count down. On its own it doesn’t get you to 60fps: every scroll recomputes the slice and re-renders the windowed container, and if that re-render drags its rows along with it, the cost has simply moved from “twelve thousand nodes once” to “thirty nodes sixty times a second.” From here the React render work takes over — here’s what it’s made of.

Memo the rows. Wrap the row or cell in React.memo so that when the window shifts, only the rows that entered or left actually reconcile — the ones that stayed put bail out at the memo check. This is the single highest-leverage optimization in a virtualized list, and it’s also the one most often defeated by accident: memo compares props by reference, so the moment you hand a row an inline onClick={() => select(id)} or a fresh style={{}} object, it re-renders on every scroll frame regardless. The memo and the referential stability of its props are really one optimization in two parts.

Memoize the selectors. The windowed list reads a derived slice of state — the current folder’s files, sorted and filtered. If that selector returns a new array every render, the list re-renders every render, and your memoized rows are diffing against fresh references for nothing. A memoized selector (reselect, RTK’s createSelector) over a normalized store returns the same reference while its inputs haven’t changed, which is what lets the whole chain below it stay still. It’s an easy step to skip, and skipping it is why a perfectly good React.memo upstream never fires: the rows are diffing fresh arrays no matter what you did to them.

Let updates batch. A scroll or a multi-select can trigger several state updates in a row, and you want them to commit once, not once each. React 18 auto-batches updates in event handlers, timeouts, and promises, so most of this is free now. But the moment your scroll position lives in an external store or a raw subscription, you’re back to making sure a burst of updates collapses into a single commit instead of a render per event.

And the senior half of all three is knowing when it’s cargo cult. useMemo and useCallback aren’t free: they allocate, they hold a dependency array, and they run a comparison on every render. Wrapping a leaf that re-renders twice a session in memo, or memoizing a value that never crosses a memo boundary, costs more than it saves and buys you nothing but noise in the diff. The rule I hold to: memoization earns its place on the hot path — the row that renders sixty times a second — and almost nowhere else. Everywhere else, prove it with the Profiler before you reach for it.

// stable identity: the row only re-renders when its own data changes
const FileRow = memo(function FileRow({ file, onSelect }: FileRowProps) {
  return (
    <div className="row" onClick={() => onSelect(file.id)}>
      {file.name}
    </div>
  );
});

// onSelect is stable across renders, so memo on FileRow actually holds
const onSelect = useCallback((id: string) => dispatch(select(id)), [dispatch]);

The inline arrow inside FileRow is fine, by the way — it’s created during the row’s own render, which only happens when the row’s props change. What defeats memo is a fresh reference handed into a memoized component, like onSelect. What that component does with an inline handler on a plain <div> internally costs nothing.

It’s worth seeing the pieces in one place, because apart they look like more than they are. Offsets feed the window, the window renders memoized rows, and the scroll handler does nothing but move a number:

// upperBound is a standard binary search; useDispatch is from react-redux/RTK;
// .viewport carries height + overflow-y: auto in CSS.
function VirtualFileList({ ids, files }: VirtualListProps) {
  const [scrollTop, setScrollTop] = useState(0);
  const dispatch = useDispatch();
  const viewportH = 600;                  // measured from the container in real code
  const OVERSCAN = 4;

  // derived per list change, not per scroll frame. NB: rowOffsets reads the mutable
  // `measured` cache — once you add the measurement effect below, a new height won't
  // change `ids`, so add a version counter to these deps or the memo goes stale.
  const { offsets, totalHeight } = useMemo(() => rowOffsets(ids, 48), [ids]);

  // first/last visible row, by binary search into the cumulative offsets
  const first = Math.max(0, upperBound(offsets, scrollTop) - 1 - OVERSCAN);
  const last = Math.min(ids.length, upperBound(offsets, scrollTop + viewportH) + OVERSCAN);

  const onSelect = useCallback((id: string) => dispatch(select(id)), [dispatch]);

  return (
    <div className="viewport" onScroll={(e) => setScrollTop(e.currentTarget.scrollTop)}>
      <div style={{ height: totalHeight, position: 'relative' }}>
        {ids.slice(first, last).map((id, i) => (
          <div key={id} style={{ position: 'absolute', top: 0, transform: `translateY(${offsets[first + i]}px)`, width: '100%' }}>
            <FileRow file={files[id]} onSelect={onSelect} />
          </div>
        ))}
      </div>
    </div>
  );
}

What’s not in those thirty lines is the measurement effect that fills measured as rows render — a ResizeObserver per row, writing into the cache by id — and the scroll-anchoring correction for when a height above the viewport changes. Wire that effect in and you trip the trap flagged in the comment: it writes through the cache without touching ids, so the offsets memo has to key on a version that ticks per write, or it never sees the new heights. And this is still the single-column list. The masonry grid from the title, with its independent column offsets, is the part I said I take a library for: these pieces are fiddly enough on their own that I’d rather reach for one than ship the block above.

What virtualization breaks in accessibility

The window keeps thirty rows of twelve thousand in the DOM, and that’s exactly why it breaks a few things a plain list gets for free.

A screen reader only sees what’s mounted. To assistive tech, a list of twelve thousand items looks like a list of thirty. The fix is explicit semantics: give the container the right role for the meaning (listbox, grid, table), and give each row an aria-setsize with the full count and an aria-posinset with its real index. Then the screen reader announces “3 of 12,000,” not “3 of 30.”

Focus lives on a node. A focused row scrolls past the edge and unmounts — focus collapses to <body> along with the active element, and the user drops out of keyboard navigation. You have to manage this: a roving tabindex, restoring focus when the row mounts again, or keeping the active row in the window on purpose.

Keyboard navigation sits on top of that. Arrow-down on the last visible row has to move scrollTop first, so the target row enters the window and mounts, and only then move focus to it — it may not be in the DOM at all right now. On a plain list you never think about this. On a virtualized one it’s separate work, and without it you can’t get through the list from the keyboard.

Did the frames actually land?

The node count in the inspector isn’t the goal. The goal is scrolling that doesn’t stutter on a mid-range phone, and to know it does, you have to watch it with instruments.

React Profiler tells you what re-rendered and how long each commit took. Record a scroll, and the flamegraph says it immediately: if every visible row lights up on each frame, your memoization or your selector is leaking; if only the entering and leaving rows commit, the windowing is doing its job.
The Performance panel tells you whether you’re hitting frame budget. At 60fps you have ~16.6ms per frame for script, layout, and paint together; a scripted scroll that shows long tasks or a forced reflow — reading layout in the same tick you wrote to it — is where the dropped frames come from. The diagonal of dropped frames in the frames track is the thing to kill.
The honest number is dropped frames under a fixed scroll, not an average FPS. The average hides the stutter; the 95th-percentile frame time is what the user feels. Throttle the CPU 4–6×, scroll a genuinely large folder at a steady velocity, and count the frames that blew the budget.

What I check before I call a list “smooth”

the DOM node count is flat as the folder grows — a 200-item and a 20,000-item folder hold the same handful of rows on screen;
rows are keyed by file id, so the window can slide without tearing down and rebuilding DOM;
overscan is tuned by scrolling, not guessed — no blank bands on a fast flick, no rendering rows nobody reaches;
variable heights are measured into an id-keyed cache, and a late measurement above the viewport anchors the scroll instead of jumping it;
the row is memoized and every prop it takes is referentially stable, so a scroll commits only the rows that changed;
accessibility survives virtualization: rows carry aria-setsize and aria-posinset for the full set, focus isn’t lost when a row unmounts on scroll, and the keyboard can reach rows that aren’t in the window right now;
the Profiler shows only entering and leaving rows committing on scroll, and the frames track holds budget on a throttled mid-range profile.

So, the result — measured the way the section above prescribes: Chrome’s 6× CPU throttle, the heaviest real folders, a scripted scroll at a fixed velocity. That twelve-thousand-file folder used to spend about 740ms on the main thread before first paint and then crawl through the scroll at roughly 22fps. Afterward the first render was bounded by the viewport at ~64ms, and the scroll held 60 with a 95th-percentile frame time under the 16.6ms budget; the DOM node count and the tab’s memory both went flat regardless of folder size. The Profiler told the cleanest version of the story: the windowed list’s commit on a scroll frame fell from ~41ms — every visible row re-rendering — to about 3ms once the rows were memoized and the selector stopped handing down fresh arrays. Virtualization did the structural half; the memo and the stable selector did the rest. And I didn’t trust the numbers until the frames also held on an actual mid-range phone, not just the throttled profile — a list that’s “virtualized” on paper and still janks on real hardware is just a more complicated way to stutter.