A $0 warehouse, run by GitHub Actions
4,090,836 taxi trips modeled, tested, and re-run monthly — for $0/month.
Jul 4, 2026
Problem
Startups pay warehouse bills for data that fits on a laptop. May 2026's NYC yellow-taxi data is 4,090,836 rows — real warehouse scale — and it needs exactly zero paid infrastructure.
What I built
An ELT warehouse that runs itself: a loader pulls the newest TLC month into DuckDB, dbt models it — staging, a daily fact table, a month summary — with tests on every layer, and GitHub Actions re-runs the whole thing on the 5th of each month and commits fresh receipts.
Result
4,090,836 raw rows in; 4,023,818 after staging filters — 67,018 junk rows caught by explicit rules, 52,063 of them zero-duration trips whose drop-off timestamp equals their pick-up. All 10 dbt builds and tests green, transform plus test in 6.3 seconds. The first scheduled-infrastructure run went green on GitHub Actions in 29 seconds, and reproduced the local row counts exactly — the run history is public.
What this costs you
A build like this is the automation tier: $1,000–$2,000, 3–5 days. Running it costs you nothing, forever.
Buy this build: $1,000–$2,000, 3–5 days. Work with freddyxai →
Read the full writeup → Building a $0 data warehouse
The newsletter
Receipts in your inbox.
Every build and post, as it ships. No fluff.