Building a $0 data warehouse
Most startups don't need a warehouse bill — they need a warehouse habit. I built one on DuckDB + dbt + GitHub Actions: 4,090,836 NYC taxi trips, modeled and tested, re-running monthly for $0.
Why is $0 actually possible?
The data fits on one machine, DuckDB is embarrassingly fast there, dbt is open source, and GitHub Actions gives public repos free scheduled compute. The full transform-and-test cycle runs in 6.3 seconds; the entire cloud job — checkout to committed receipts — took 29 seconds.
What does dbt add over raw SQL?
Contracts. Every layer carries tests — 10 builds and tests green this run — and the staging rules caught 67,018 junk rows before they touched a mart, including 52,063 zero-duration trips — drop-off timestamp equal to pick-up. When next month's data lands, the same tests decide whether it ships.
How does it run itself?
A cron workflow on the 5th of each month: load the newest TLC file, dbt build, commit fresh receipts and mart exports. The first scheduled-infrastructure run is public — green in 29 seconds, with row counts identical to the local run.
Key takeaways
- 4,090,836 rows, $0/month — laptop-scale data doesn't need cloud-warehouse pricing.
- Tests are the product: 10 dbt builds and tests gate every refresh; 67,018 bad rows never reached a mart.
- Schedule it or it's a demo — the cron run is the difference between a pipeline and a habit.
Keep reading: How much does it cost to outsource data analysis? and the full case study.
Read the full writeup → the case study
The newsletter
Receipts in your inbox.
Every build and post, as it ships. No fluff.