Back to Blog
Results
January 15, 2026
10 min read

From 0 to TestFlight in 7 Days: The Numbers

A detailed breakdown of how AI agents helped me ship a complete iOS app to TestFlight in just one week.

From 0 to TestFlight in 7 Days: The Numbers

Everyone talks about AI making development "faster." But what does that actually mean in practice? Let me show you the numbers from a real project.

The Project: Parlay

Parlay is a golf partner matching app. Think "dating app for golfers" but with location-based matching, handicap verification, and course availability. Not a toy example—a production app with real complexity.

The Goal: Ship to TestFlight in one week.

The Result: 10 calendar days, 18 hours of active orchestration, ~35 hours of agent work.

Let me break down exactly what that means.

The Numbers

Time Investment

PhaseTraditional TimeAgent TimeOrchestration Time
Architecture & Tech Spec20-30 hours15 minutes15 minutes
Planning & Story Breakdown15-20 hours15 minutes15 minutes
Implementation60-80 hours~30 hours~15 hours
Code Review10-15 hours10 minutes2 hours
Testing & Bug Fixes15-20 hours~5 hours~1 hour
Total120-165 hours~35 hours~18 hours

Traditional timeline: 8-10 weeks (assuming 20 hours/week) Agent timeline: 10 calendar days

Cost Analysis

Traditional
$17,250
Agentic
$153.80
99.1% cost reduction
Same MVP. Same features. Same quality. $153.80 in API tokens vs $17,250 in developer salaries.

Traditional Approach:

  • Senior developer @ $150/hr × 120 hours = $18,000
  • Or: Mid-level developer @ $100/hr × 165 hours = $16,500
  • Average: $17,250

Agent Approach:

  • Claude API tokens: $153.80
  • My orchestration time: 18 hours (but I was also product managing)
  • Total: $153.80

Savings: 99.1%

Output Metrics

  • Lines of code written: ~6,500
  • Epics completed: 4 of 10 (40% of total scope)
  • Issues closed: 28
  • Components created: 47
  • API endpoints: 12
  • Test coverage: 65%
  • Production bugs found post-TestFlight: 2 (both UI edge cases)

What Actually Happened: Day-by-Day

10-DAY BUILD TIMELINE — HUMAN vs AGENT WORK
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
Auth
Profiles
Location
Matching
DevOps
My time
Agent working Complete Human touchpoints
5 parallel workstreams. 18 hours of human time. Agents worked while I slept.

Day 1: Saturday (4 hours)

Morning: Architecture (30 minutes)

  • Launched Architect agent with product requirements
  • Agent generated 15-page technical specification
  • Included: tech stack decisions, data models, API design, screen flows
  • I reviewed and made 3 architectural adjustments
  • Traditional time: 20-30 hours

Afternoon: Planning (30 minutes)

  • Planning agent broke down spec into 10 epics
  • Generated 52 GitHub issues with acceptance criteria
  • I prioritized and marked 4 epics as MVP
  • Traditional time: 15-20 hours

Evening: Kickoff (3 hours)

  • Set up GitHub repo, Supabase project, Expo environment
  • Launched first 3 implementation agents in parallel
    • Epic 1: Authentication & Onboarding
    • Epic 2: User Profile Management
    • Epic 3: Location Services
  • Agents worked while I made dinner

Day 1 Result:

  • 4 hours of my time
  • 12 agent-hours of work
  • Architecture + Planning complete
  • 3 epics in progress

Day 2-3: Sunday-Monday (Agents Working, 2 hours monitoring)

Agents continued building while I:

  • Checked in every few hours
  • Reviewed PRs and approved merges
  • Caught one peer dependency conflict (agents tried to install incompatible versions of React Navigation)
  • Launched 4th epic: Matching Algorithm

Weekend Result:

  • 2 hours of my time
  • 28 agent-hours of work
  • Epics 1-3: 80% complete
  • Epic 4: 40% complete

Day 4-5: Tuesday-Wednesday (4 hours orchestration)

This is where things got interesting. Agents work perfectly on isolated tasks, but coordination requires human oversight.

Issues Encountered:

  1. Case Sensitivity Collision

    • Two agents independently created GolferMultiSelect.tsx and GolferMultiselect.tsx
    • Worked fine on macOS, crashed on Linux CI
    • Fix: Added component registry validation (15 minutes)
  2. Missing Peer Dependencies

    • Implementation agent installed @react-navigation/native but missed peer deps
    • Review agent caught it before merge (this is why we have a review layer!)
    • Fix: Updated agent instructions with peer dependency checks (20 minutes)
  3. Styling Inconsistencies

    • Agents followed Tailwind patterns but made different spacing choices
    • Not wrong, just inconsistent
    • Fix: Created design system documentation for agents (45 minutes)

Days 4-5 Result:

  • 4 hours debugging and refining agent instructions
  • 18 agent-hours of work
  • Epics 1-3: 95% complete
  • Epic 4: 85% complete

Day 6-7: Thursday-Friday (3 hours testing & deployment)

  • Ran full test suite (agents wrote tests alongside features)
  • Fixed 2 UI edge cases manually (faster than instructing agents)
  • Set up TestFlight deployment
  • Launched DevOps agent to configure CI/CD
  • First TestFlight build deployed

Days 6-7 Result:

  • 3 hours of my time
  • 6 agent-hours (mostly DevOps work)
  • 4 epics complete and in TestFlight
  • 6 remaining epics ready for next sprint

Day 8-10: Weekend Testing (2 hours)

  • Distributed TestFlight to 10 alpha users
  • Collected feedback
  • Fixed 2 production bugs (both UI edge cases)
  • Iterated on UX based on user feedback

Final Result:

  • 18 total hours of my time
  • ~35 agent-hours of work
  • Production-ready MVP in TestFlight
  • Real users testing real features

Breaking Down the 99% Savings

Some people ask: "But isn't your time worth money too?"

Fair question. Let's do the math:

If I valued my 18 hours at $200/hr:

  • My time: 18 × $200 = $3,600
  • Agent API costs: $153.80
  • Total: $3,753.80

Traditional cost: $17,250

Savings: 78.2%

But here's the crucial point: Those 18 hours weren't equivalent to coding hours.

I was:

  • Making product decisions
  • Reviewing architecture
  • Prioritizing features
  • Testing UX flows
  • Making architectural trade-offs

Those are high-leverage activities that I'd need to do anyway, even with a full development team. The difference is that I wasn't also:

  • Writing boilerplate code
  • Setting up auth flows
  • Building CRUD endpoints
  • Styling components
  • Writing tests

The agents handled the 80% of work that's pattern matching. I focused on the 20% that requires product judgment.

What Took the Most Time?

Breaking down my 18 hours:

ActivityTimePercentage
Product decisions & UX review6 hours33%
Architecture decisions3 hours17%
Debugging agent issues4 hours22%
Code review & merges2 hours11%
DevOps & deployment2 hours11%
Testing & bug fixes1 hour6%

The biggest surprise? Only 22% was dealing with agent issues. And those issues were solvable—once I encoded the fix into agent instructions, it never happened again.

WHERE MY 18 HOURS ACTUALLY WENT
Product & UX
6h · 33%
Agent debugging
4h · 22%
Architecture
3h · 17%
Code review
2h
DevOps
2h
Testing
1h
78% of my time went to high-leverage work (product, architecture, review). Zero time writing code.

The Compounding Effect

Here's where this gets really interesting. This was my second project with the orchestrator. My first project (PlayTrack) took 3 weeks.

PlayTrack (Project #1):

  • 3 weeks calendar time
  • ~25 hours orchestration
  • $200 in API costs
  • Building the orchestrator while building the app

Parlay (Project #2):

  • 10 days calendar time
  • ~18 hours orchestration
  • $154 in API costs
  • Using battle-tested orchestrator

Project #3 (hypothetical):

  • 1 week calendar time?
  • ~12 hours orchestration?
  • Sub-$150 API costs?

Each project makes the orchestrator smarter. Each mistake caught gets encoded into validation. Each pattern learned becomes reusable.

THE COMPOUNDING EFFECT — COST & TIME PER PROJECT
$200
$154
~$100?
PlayTrack
3 weeks · 25hrs
Parlay
10 days · 18hrs
Project #3
1 week? · 12hrs?
The orchestrator gets smarter with each project. Cost and time decrease while quality increases.

What the Numbers Don't Show

Metrics are useful, but they don't capture everything:

Things Agents Did Better Than Expected:

  • Wrote more consistent code than most junior developers
  • Created comprehensive tests without being prompted
  • Followed patterns precisely once established
  • Worked 24/7 (I woke up to completed PRs)
  • Zero complaints, zero meetings, zero onboarding

Things That Still Required Human Judgment:

  • "Should this be a modal or a new screen?" (UX decision)
  • "Optimize for speed or maintainability here?" (Architecture trade-off)
  • "Is this feature MVP or nice-to-have?" (Product prioritization)
  • "Does this feel right?" (Intuition and taste)

The agents execute perfectly. But someone needs to define "perfect."

The Real ROI

The ROI isn't just about saving money on development. It's about:

1. Speed to Market

  • 10 days vs 8-10 weeks
  • Test market fit 8x faster
  • Iterate on user feedback immediately

2. Resource Efficiency

  • $154 vs $17,250
  • 99%+ cost savings
  • Makes micro-SaaS experiments viable

3. Focus

  • I spent time on product, not plumbing
  • 80% of my time on user experience and business logic
  • 20% on orchestration and review

4. Learning

  • Each project improves the orchestrator
  • Mistakes become systematically prevented
  • Patterns become reusable across projects

The Skeptic's Questions

Q: "What about maintenance?" A: Agents can maintain code as well as they can write it. Bug fixes are just more tasks.

Q: "What about technical debt?" A: Agents follow patterns consistently. Less copy-paste coding, more systematic approaches. Technical debt comes from inconsistency, not automation.

Q: "What about edge cases?" A: Agents miss edge cases humans miss. The difference is you can encode edge cases into validation once discovered. Humans forget, systems don't.

Q: "Is this realistic for complex apps?" A: I'm testing limits. So far, standard SaaS/mobile patterns work great. Novel algorithms or complex legacy systems? Not yet.

What's Next

These numbers are from Project #2. I'm currently on Project #3 and tracking even more detailed metrics:

  • Tokens per feature
  • Agent error rates by epic type
  • Time to first working build
  • Bug escape rate
  • Test coverage by agent

I'm also open-sourcing the orchestrator and documenting every decision, every mistake, every lesson learned.

In the next article: How to teach agents your workflow and encode 20 years of experience into agent instructions.


This is part 2 of a 6-part series on building production software with AI agents. ← Part 1: The Journey | Part 3: Teaching Agents →

© 2026 David Shak