The Biggest UX Win on FilmFlux Wasn’t UI. It Was Caching.

March 2026·7 min read

I cut FilmFlux's YouTube API costs by 100x without changing a single piece of UI.

Most product wins look like a button. The win that scaled FilmFlux from a side project into something used by ten thousand people a month wasn't a button. It was an architecture decision I almost didn't make, because as a designer, caching felt like someone else's problem.

What I assumed

When I started, I assumed two things: the YouTube API was the bottleneck, and there was nothing I could do about it without paying more for quota.

Wrong on both counts. The bottleneck wasn't the API. It was the call pattern. And the call pattern was a design decision masquerading as an engineering one.

The 100x cost cut

Every time FilmFlux refreshed a channel's videos, it called YouTube's search.list endpoint. That endpoint costs 100 quota units per page. Across forty channels and two pages each, that's 8,000 units per scan, and the daily quota burn was over 24,000 units. We were going to hit YouTube's rate limit by mid-afternoon.

The fix wasn't more quota. It was a different endpoint. playlistItems.list does the same job at 1 quota unit per page. Same forty channels, same two pages: 80 units per scan instead of 8,000. Daily burn dropped from 24,000 to 240.

A hundredfold cost reduction from picking the right verb.

The reason it took me weeks to find: it wasn't a bug. The original endpoint worked. It just cost a hundred times what it needed to. Bugs scream. Bad architecture stays quiet.

Three-layer cache

The API fix bought me runway. The cache made the app feel like a different product.

I built three layers, each catching the same request at a different stage. Most views resolve before the database is ever called.

In-memory cache

First stop. 5-minute TTL on stable data, 1-minute on dynamic content like trending. Tap a movie, tap back, no loading state. The data is still in memory.

localStorage cache

Persists across sessions with the same TTL logic. Reopen the app the next day and your previous browse state hydrates back into memory before you can blink.

SSR on first paint

The very first visit to FilmFlux gets data baked into the HTML. No spinner-then-content gap on cold start.

SSR→Memory→localStorage→Database

Most views never reach the database.

The second thing I was wrong about

I assumed users wanted fresh data and would wait for it. Wrong again. Users want instant data, and they want it to be right.

Stale-while-revalidate gives them both: show the cached version immediately, fetch the fresh version in the background, swap it in when it's ready. The user feels speed; the data eventually catches up. Nobody notices the swap.

Once the pattern was in place, there was no loading state for content that had been seen before. The app started feeling like it remembered where you'd left off.

Smarter TTLs

Not all data changes at the same rate. Trending and homepage rows get a 1-minute TTL. Stable collections live for hours. The app stays fresh where freshness matters and stays cheap everywhere else.

Prefetching for intent

When users hover a route or open the PWA, key data preloads in the background. By the time the navigation actually happens, the data is already there. Wait time and backend load both drop without the user ever seeing a spinner.

The real impact

YouTube API quota burn dropped 100x (24,000/day → 240/day).
Most page views resolve from cache before a DB call is made.
No loading states on any content the user has seen before.
The app stayed responsive as monthly users grew past ten thousand.

Most users never think about caching. They just feel that the app is fast.

A takeaway

As a designer who builds, this taught me something I'd been missing for years: performance is UX. Caching is product design. The most senior decision a designer can make is often the one that doesn't show up in the mockup at all.

If you only ever review designs in Figma, you'll never catch a 100x cost win. The decisions that matter most in a product like this live in the call pattern, not the canvas.

I'm starting every new product now with a question I never used to ask: where will the speed live, before I draw a single screen?

filmflux.app

CachingPerformanceUXPWASSRArchitecture

Back to Writing