Skip to content

A $0 warehouse, run by GitHub Actions

4,090,836 taxi trips modeled, tested, and re-run monthly — for $0/month.

Jul 4, 2026

$0/moto run (4,090,836 rows)
4,090,836rows loaded
10dbt builds + tests passing
6.3 stransform + test runtime
$0/moinfrastructure cost
DuckDBdbtGitHub Actions

Problem

Startups pay warehouse bills for data that fits on a laptop. May 2026's NYC yellow-taxi data is 4,090,836 rows — real warehouse scale — and it needs exactly zero paid infrastructure.

What I built

An ELT warehouse that runs itself: a loader pulls the newest TLC month into DuckDB, dbt models it — staging, a daily fact table, a month summary — with tests on every layer, and GitHub Actions re-runs the whole thing on the 5th of each month and commits fresh receipts.

Result

4,090,836 raw rows in; 4,023,818 after staging filters — 67,018 junk rows caught by explicit rules, 52,063 of them zero-duration trips whose drop-off timestamp equals their pick-up. All 10 dbt builds and tests green, transform plus test in 6.3 seconds. The first scheduled-infrastructure run went green on GitHub Actions in 29 seconds, and reproduced the local row counts exactly — the run history is public.

What this costs you

A build like this is the automation tier: $1,000–$2,000, 3–5 days. Running it costs you nothing, forever.

Buy this build: $1,000–$2,000, 3–5 days. Work with freddyxai →

Read the full writeup → Building a $0 data warehouse

The newsletter

Receipts in your inbox.

Every build and post, as it ships. No fluff.

Work with freddyxai