The Biggest UX Win on FilmFlux Wasn’t UI. It Was Caching.
I cut FilmFlux's YouTube API costs by 100x without changing a single piece of UI.
Most product wins look like a button. The win that scaled FilmFlux from a side project into something used by ten thousand people a month wasn't a button. It was an architecture decision I almost didn't make, because as a designer, caching felt like someone else's problem.
What I assumed
When I started, I assumed two things: the YouTube API was the bottleneck, and there was nothing I could do about it without paying more for quota.
Wrong on both counts. The bottleneck wasn't the API. It was the call pattern. And the call pattern was a design decision masquerading as an engineering one.
The 100x cost cut
Every time FilmFlux refreshed a channel's videos, it called YouTube's search.list endpoint. That endpoint costs 100 quota units per page. Across forty channels and two pages each, that's 8,000 units per scan, and the daily quota burn was over 24,000 units. We were going to hit YouTube's rate limit by mid-afternoon.
The fix wasn't more quota. It was a different endpoint. playlistItems.list does the same job at 1 quota unit per page. Same forty channels, same two pages: 80 units per scan instead of 8,000. Daily burn dropped from 24,000 to 240.
A hundredfold cost reduction from picking the right verb.
The reason it took me weeks to find: it wasn't a bug. The original endpoint worked. It just cost a hundred times what it needed to. Bugs scream. Bad architecture stays quiet.
Three-layer cache
The API fix bought me runway. The cache made the app feel like a different product.
I built three layers, each catching the same request at a different stage. Most views resolve before the database is ever called.
In-memory cache
First stop. 5-minute TTL on stable data, 1-minute on dynamic content like trending. Tap a movie, tap back, no loading state. The data is still in memory.
localStorage cache
Persists across sessions with the same TTL logic. Reopen the app the next day and your previous browse state hydrates back into memory before you can blink.
SSR on first paint
The very first visit to FilmFlux gets data baked into the HTML. No spinner-then-content gap on cold start.
Most views never reach the database.
The second thing I was wrong about
I assumed users wanted fresh data and would wait for it. Wrong again. Users want instant data, and they want it to be right.
Stale-while-revalidate gives them both: show the cached version immediately, fetch the fresh version in the background, swap it in when it's ready. The user feels speed; the data eventually catches up. Nobody notices the swap.
Once the pattern was in place, there was no loading state for content that had been seen before. The app started feeling like it remembered where you'd left off.
Smarter TTLs
Not all data changes at the same rate. Trending and homepage rows get a 1-minute TTL. Stable collections live for hours. The app stays fresh where freshness matters and stays cheap everywhere else.
Prefetching for intent
When users hover a route or open the PWA, key data preloads in the background. By the time the navigation actually happens, the data is already there. Wait time and backend load both drop without the user ever seeing a spinner.
The real impact
- YouTube API quota burn dropped 100x (24,000/day → 240/day).
- Most page views resolve from cache before a DB call is made.
- No loading states on any content the user has seen before.
- The app stayed responsive as monthly users grew past ten thousand.
Most users never think about caching. They just feel that the app is fast.
A takeaway
As a designer who builds, this taught me something I'd been missing for years: performance is UX. Caching is product design. The most senior decision a designer can make is often the one that doesn't show up in the mockup at all.
If you only ever review designs in Figma, you'll never catch a 100x cost win. The decisions that matter most in a product like this live in the call pattern, not the canvas.
I'm starting every new product now with a question I never used to ask: where will the speed live, before I draw a single screen?