Data Caching
Overscore can cache your BigQuery results as Parquet files and load them locally using DuckDB-WASM. This means your dashboards can filter, sort, and aggregate data instantly in the browser — without hitting BigQuery on every interaction.
How it works (plain English)
When caching is enabled for a query, here's what happens:
- Your query runs against BigQuery as usual
- The results are saved as a Parquet file (a compact, columnar data format) and stored alongside your dashboard
- When someone opens the dashboard, the Parquet file is downloaded to their browser
- DuckDB-WASM — a full SQL database running in the browser — loads the Parquet file
- Any filtering, sorting, or drill-downs run as local SQL queries against DuckDB, with no network requests
The result: your dashboard feels instant. Users can slice and dice the data without waiting for BigQuery round trips.
When to use caching
Caching works best when:
- Your data doesn't change every minute. If hourly or daily freshness is fine, caching is a great fit.
- Users filter and explore the data. Caching shines when a single query result gets sliced many ways in the UI.
- You want to reduce BigQuery costs. Cached queries only hit BigQuery when the cache expires, not on every page load.
- Your result sets are moderate size. Parquet is efficient, but caching a 500MB result set will slow down the initial page load.
Caching is less useful when:
- Data must be real-time (sub-minute freshness)
- Each query is unique with different parameters every time
- Result sets are very large (hundreds of megabytes)
Cache modes
In the Hub, go to your dashboard's settings page and choose a cache mode. Caching is configured per-dashboard — all queries share the same mode and refresh interval.
None (default)
Every page load hits BigQuery directly. Simple, always fresh, but slower for large datasets.
Lazy
Results are cached after the first request. Subsequent visitors get instant loads from the cache until the TTL expires. The next visitor after expiry triggers a fresh BigQuery query and waits for it — then the cache is warm again.
Best for: dashboards with moderate traffic where occasional cold loads are acceptable.
Scheduled
A background worker automatically refreshes the cache on the TTL interval, even when nobody is viewing the dashboard. Every visitor always gets instant loads — there's never a cold start.
Best for: ops dashboards on TVs, frequently-viewed dashboards, or anywhere the first-load speed matters.
Cost note: Scheduled mode runs BigQuery on every interval whether or not anyone is viewing the dashboard. If your dashboard is only checked once a day, a 15-minute scheduled refresh means 96 BigQuery runs per day. Match the interval to your actual needs.
TTL (time-to-live)
The TTL controls how often cached data is refreshed:
| TTL | Good for |
|-----|----------|
| 900 (15 min) | Frequently changing data, ops dashboards |
| 3600 (1 hour) | Operational dashboards refreshed throughout the day |
| 86400 (24 hours) | Daily reports, historical data |
| 604800 (7 days) | Stable reference data that rarely changes |
In lazy mode, the TTL is how long a cached result is considered fresh — after it expires, the next request triggers a new BigQuery query. In scheduled mode, the TTL is the refresh interval — the background worker checks every 5 minutes and refreshes any query whose cache is older than the TTL.
Monitoring and alerts
The Hub tracks every cache refresh — success, failure, duration, row count, and file size. You can see the full history on the dashboard detail page under Cache Refresh History.
If a query fails to refresh 3 times in a row, the Hub shows a red alert banner on the dashboard page. The old cache remains available (stale but usable) while the issue is investigated. You can click Retry now to manually trigger a refresh.
How it works (technical details)
Parquet storage
When a cached query runs, the server:
- Executes the SQL against BigQuery
- Serializes the result rows into Apache Parquet format (columnar, compressed)
- Uploads the
.parquetfile to Cloudflare R2, keyed by project slug, dashboard slug, and query name - Returns the query results along with a
parquet_urlpointing to the Parquet file
Parquet was chosen because it's compact (typically 5-10x smaller than JSON for tabular data) and DuckDB can query it directly without loading the entire file into memory.
DuckDB-WASM in the browser
The @overscore/client package includes DuckDB-WASM, which initializes when your dashboard loads. When the useQuery hook detects a cached query:
- It fetches the Parquet file from the
parquet_url - Registers it as a virtual table in DuckDB
- Runs any local filtering or aggregation as SQL against the in-browser database
- Returns the results to your component — all without a network request
// In your dashboard component
const { data, loading, query, cachedAt, refetch } = useQuery("revenue_by_month");
// All rows (works with or without cache)
const allRows = data;
// Filter locally via SQL (only when cached)
if (query) {
const filtered = await query(
"SELECT * FROM revenue_by_month WHERE region = 'US' ORDER BY total_revenue DESC"
);
}
// Force refresh this query from BigQuery
<button onClick={refetch}>Refresh Data</button>
// Show when data was last refreshed
{cachedAt && <span>Data as of {cachedAt.toLocaleString()}</span>}
To refresh all queries on a dashboard at once — for example a global refresh button in the header:
import { useRefreshAll } from "@overscore/client";
const refreshAll = useRefreshAll();
<button onClick={refreshAll}>↻ Refresh All</button>
Automatic live updates
When the background worker refreshes a cache, open dashboards detect the change automatically and reload the data — no page refresh needed. This happens via a lightweight status poll every 60 seconds.
You don't need to write any code for this. As long as dashboardSlug is passed to the OverscoreProvider (which the starter template does by default), auto-polling is active. When caching is disabled, the poll frequency drops to every 5 minutes, just to detect if caching gets turned on.
Cache invalidation
Caches are invalidated in four ways:
- TTL expiry — the standard mechanism. After the TTL passes, the next request fetches fresh data.
- Manual refresh — in the Hub, click Refresh cache on any query to force a new BigQuery execution.
- Redeployment — running
npx @overscore/cli deployautomatically invalidates all caches for that dashboard. The next request or scheduled refresh generates fresh data matching your new code.
Fallback behavior
If the Parquet file fails to load (network error, corrupted file), the client automatically falls back to a live BigQuery request via the /api/query endpoint. Your dashboard keeps working — it just won't have the local filtering speed until the cache is restored.
Row count and performance
Dashboard performance depends on how much data each query returns. Cached results are transferred as Arrow files and loaded into DuckDB-WASM in the browser, so row count directly impacts load time and memory usage.
| Row count | Experience | |-----------|------------| | Under 25K rows | Fast — loads instantly, smooth interactions | | 25K–100K rows | Good — loads in 1-2 seconds, still responsive | | 100K–500K rows | Slower — noticeable load time, may lag on mobile devices | | 500K+ rows | Not supported — queries are rejected above 500K rows |
The most important optimization is pre-aggregating in SQL. A dashboard powered by a query returning 200 rows of daily metrics will always be faster than one pulling 200K rows of individual events and aggregating in the browser.
For example, instead of:
SELECT * FROM events WHERE date >= '2026-01-01'
-- Returns 350K rows
Write:
SELECT date, channel, COUNT(*) as events, SUM(revenue) as revenue
FROM events WHERE date >= '2026-01-01'
GROUP BY date, channel
-- Returns 1,200 rows
If users need to drill into individual records, pull a moderate dataset (under 100K rows) and use the query() function for local filtering — that keeps the initial load fast while still supporting detailed exploration.
Performance tips
- Pre-aggregate in SQL. Let BigQuery do the heavy lifting. Send summary data to the dashboard, not raw event tables.
- Select only the columns you need. Fewer columns means smaller Arrow files and faster transfers. Avoid
SELECT *. - Keep cached result sets under 10MB for the best experience. Arrow is compact, but large files still take time to download and parse.
- Use broader queries for caching, then filter locally. Instead of caching
WHERE region = 'US', cache the full dataset and let DuckDB filter in the browser. - Set appropriate TTLs. Longer TTLs reduce BigQuery costs but mean staler data. Match the TTL to how often your underlying data changes.
Next steps
- useQuery Hook — how to query data in your dashboard components
- BigQuery Setup — connect your data source
- API Reference — query endpoint details