Building a $0 data warehouse

Most startups don't need a warehouse bill — they need a warehouse habit. I built one on DuckDB + dbt + GitHub Actions: 4,090,836 NYC taxi trips, modeled and tested, re-running monthly for $0.

Why is $0 actually possible?

The data fits on one machine, DuckDB is embarrassingly fast there, dbt is open source, and GitHub Actions gives public repos free scheduled compute. The full transform-and-test cycle runs in 6.3 seconds; the entire cloud job — checkout to committed receipts — took 29 seconds.

What does dbt add over raw SQL?

Contracts. Every layer carries tests — 10 builds and tests green this run — and the staging rules caught 67,018 junk rows before they touched a mart, including 52,063 zero-duration trips — drop-off timestamp equal to pick-up. When next month's data lands, the same tests decide whether it ships.

How does it run itself?

A cron workflow on the 5th of each month: load the newest TLC file, dbt build, commit fresh receipts and mart exports. The first scheduled-infrastructure run is public — green in 29 seconds, with row counts identical to the local run.

Key takeaways

4,090,836 rows, $0/month — laptop-scale data doesn't need cloud-warehouse pricing.
Tests are the product: 10 dbt builds and tests gate every refresh; 67,018 bad rows never reached a mart.
Schedule it or it's a demo — the cron run is the difference between a pipeline and a habit.

Keep reading: How much does it cost to outsource data analysis? and the full case study.