From 0 to TestFlight in 7 Days: The Numbers
A detailed breakdown of how AI agents helped me ship a complete iOS app to TestFlight in just one week.
From 0 to TestFlight in 7 Days: The Numbers
Everyone talks about AI making development "faster." But what does that actually mean in practice? Let me show you the numbers from a real project.
The Project: Parlay
Parlay is a golf partner matching app. Think "dating app for golfers" but with location-based matching, handicap verification, and course availability. Not a toy example—a production app with real complexity.
The Goal: Ship to TestFlight in one week.
The Result: 10 calendar days, 18 hours of active orchestration, ~35 hours of agent work.
Let me break down exactly what that means.
The Numbers
Time Investment
| Phase | Traditional Time | Agent Time | Orchestration Time |
|---|---|---|---|
| Architecture & Tech Spec | 20-30 hours | 15 minutes | 15 minutes |
| Planning & Story Breakdown | 15-20 hours | 15 minutes | 15 minutes |
| Implementation | 60-80 hours | ~30 hours | ~15 hours |
| Code Review | 10-15 hours | 10 minutes | 2 hours |
| Testing & Bug Fixes | 15-20 hours | ~5 hours | ~1 hour |
| Total | 120-165 hours | ~35 hours | ~18 hours |
Traditional timeline: 8-10 weeks (assuming 20 hours/week) Agent timeline: 10 calendar days
Cost Analysis
Traditional Approach:
- Senior developer @ $150/hr × 120 hours = $18,000
- Or: Mid-level developer @ $100/hr × 165 hours = $16,500
- Average: $17,250
Agent Approach:
- Claude API tokens: $153.80
- My orchestration time: 18 hours (but I was also product managing)
- Total: $153.80
Savings: 99.1%
Output Metrics
- Lines of code written: ~6,500
- Epics completed: 4 of 10 (40% of total scope)
- Issues closed: 28
- Components created: 47
- API endpoints: 12
- Test coverage: 65%
- Production bugs found post-TestFlight: 2 (both UI edge cases)
What Actually Happened: Day-by-Day
Day 1: Saturday (4 hours)
Morning: Architecture (30 minutes)
- Launched Architect agent with product requirements
- Agent generated 15-page technical specification
- Included: tech stack decisions, data models, API design, screen flows
- I reviewed and made 3 architectural adjustments
- Traditional time: 20-30 hours
Afternoon: Planning (30 minutes)
- Planning agent broke down spec into 10 epics
- Generated 52 GitHub issues with acceptance criteria
- I prioritized and marked 4 epics as MVP
- Traditional time: 15-20 hours
Evening: Kickoff (3 hours)
- Set up GitHub repo, Supabase project, Expo environment
- Launched first 3 implementation agents in parallel
- Epic 1: Authentication & Onboarding
- Epic 2: User Profile Management
- Epic 3: Location Services
- Agents worked while I made dinner
Day 1 Result:
- 4 hours of my time
- 12 agent-hours of work
- Architecture + Planning complete
- 3 epics in progress
Day 2-3: Sunday-Monday (Agents Working, 2 hours monitoring)
Agents continued building while I:
- Checked in every few hours
- Reviewed PRs and approved merges
- Caught one peer dependency conflict (agents tried to install incompatible versions of React Navigation)
- Launched 4th epic: Matching Algorithm
Weekend Result:
- 2 hours of my time
- 28 agent-hours of work
- Epics 1-3: 80% complete
- Epic 4: 40% complete
Day 4-5: Tuesday-Wednesday (4 hours orchestration)
This is where things got interesting. Agents work perfectly on isolated tasks, but coordination requires human oversight.
Issues Encountered:
-
Case Sensitivity Collision
- Two agents independently created
GolferMultiSelect.tsxandGolferMultiselect.tsx - Worked fine on macOS, crashed on Linux CI
- Fix: Added component registry validation (15 minutes)
- Two agents independently created
-
Missing Peer Dependencies
- Implementation agent installed
@react-navigation/nativebut missed peer deps - Review agent caught it before merge (this is why we have a review layer!)
- Fix: Updated agent instructions with peer dependency checks (20 minutes)
- Implementation agent installed
-
Styling Inconsistencies
- Agents followed Tailwind patterns but made different spacing choices
- Not wrong, just inconsistent
- Fix: Created design system documentation for agents (45 minutes)
Days 4-5 Result:
- 4 hours debugging and refining agent instructions
- 18 agent-hours of work
- Epics 1-3: 95% complete
- Epic 4: 85% complete
Day 6-7: Thursday-Friday (3 hours testing & deployment)
- Ran full test suite (agents wrote tests alongside features)
- Fixed 2 UI edge cases manually (faster than instructing agents)
- Set up TestFlight deployment
- Launched DevOps agent to configure CI/CD
- First TestFlight build deployed
Days 6-7 Result:
- 3 hours of my time
- 6 agent-hours (mostly DevOps work)
- 4 epics complete and in TestFlight
- 6 remaining epics ready for next sprint
Day 8-10: Weekend Testing (2 hours)
- Distributed TestFlight to 10 alpha users
- Collected feedback
- Fixed 2 production bugs (both UI edge cases)
- Iterated on UX based on user feedback
Final Result:
- 18 total hours of my time
- ~35 agent-hours of work
- Production-ready MVP in TestFlight
- Real users testing real features
Breaking Down the 99% Savings
Some people ask: "But isn't your time worth money too?"
Fair question. Let's do the math:
If I valued my 18 hours at $200/hr:
- My time: 18 × $200 = $3,600
- Agent API costs: $153.80
- Total: $3,753.80
Traditional cost: $17,250
Savings: 78.2%
But here's the crucial point: Those 18 hours weren't equivalent to coding hours.
I was:
- Making product decisions
- Reviewing architecture
- Prioritizing features
- Testing UX flows
- Making architectural trade-offs
Those are high-leverage activities that I'd need to do anyway, even with a full development team. The difference is that I wasn't also:
- Writing boilerplate code
- Setting up auth flows
- Building CRUD endpoints
- Styling components
- Writing tests
The agents handled the 80% of work that's pattern matching. I focused on the 20% that requires product judgment.
What Took the Most Time?
Breaking down my 18 hours:
| Activity | Time | Percentage |
|---|---|---|
| Product decisions & UX review | 6 hours | 33% |
| Architecture decisions | 3 hours | 17% |
| Debugging agent issues | 4 hours | 22% |
| Code review & merges | 2 hours | 11% |
| DevOps & deployment | 2 hours | 11% |
| Testing & bug fixes | 1 hour | 6% |
The biggest surprise? Only 22% was dealing with agent issues. And those issues were solvable—once I encoded the fix into agent instructions, it never happened again.
The Compounding Effect
Here's where this gets really interesting. This was my second project with the orchestrator. My first project (PlayTrack) took 3 weeks.
PlayTrack (Project #1):
- 3 weeks calendar time
- ~25 hours orchestration
- $200 in API costs
- Building the orchestrator while building the app
Parlay (Project #2):
- 10 days calendar time
- ~18 hours orchestration
- $154 in API costs
- Using battle-tested orchestrator
Project #3 (hypothetical):
- 1 week calendar time?
- ~12 hours orchestration?
- Sub-$150 API costs?
Each project makes the orchestrator smarter. Each mistake caught gets encoded into validation. Each pattern learned becomes reusable.
What the Numbers Don't Show
Metrics are useful, but they don't capture everything:
Things Agents Did Better Than Expected:
- Wrote more consistent code than most junior developers
- Created comprehensive tests without being prompted
- Followed patterns precisely once established
- Worked 24/7 (I woke up to completed PRs)
- Zero complaints, zero meetings, zero onboarding
Things That Still Required Human Judgment:
- "Should this be a modal or a new screen?" (UX decision)
- "Optimize for speed or maintainability here?" (Architecture trade-off)
- "Is this feature MVP or nice-to-have?" (Product prioritization)
- "Does this feel right?" (Intuition and taste)
The agents execute perfectly. But someone needs to define "perfect."
The Real ROI
The ROI isn't just about saving money on development. It's about:
1. Speed to Market
- 10 days vs 8-10 weeks
- Test market fit 8x faster
- Iterate on user feedback immediately
2. Resource Efficiency
- $154 vs $17,250
- 99%+ cost savings
- Makes micro-SaaS experiments viable
3. Focus
- I spent time on product, not plumbing
- 80% of my time on user experience and business logic
- 20% on orchestration and review
4. Learning
- Each project improves the orchestrator
- Mistakes become systematically prevented
- Patterns become reusable across projects
The Skeptic's Questions
Q: "What about maintenance?" A: Agents can maintain code as well as they can write it. Bug fixes are just more tasks.
Q: "What about technical debt?" A: Agents follow patterns consistently. Less copy-paste coding, more systematic approaches. Technical debt comes from inconsistency, not automation.
Q: "What about edge cases?" A: Agents miss edge cases humans miss. The difference is you can encode edge cases into validation once discovered. Humans forget, systems don't.
Q: "Is this realistic for complex apps?" A: I'm testing limits. So far, standard SaaS/mobile patterns work great. Novel algorithms or complex legacy systems? Not yet.
What's Next
These numbers are from Project #2. I'm currently on Project #3 and tracking even more detailed metrics:
- Tokens per feature
- Agent error rates by epic type
- Time to first working build
- Bug escape rate
- Test coverage by agent
I'm also open-sourcing the orchestrator and documenting every decision, every mistake, every lesson learned.
In the next article: How to teach agents your workflow and encode 20 years of experience into agent instructions.
This is part 2 of a 6-part series on building production software with AI agents. ← Part 1: The Journey | Part 3: Teaching Agents →