Building Real Features: What Works vs What Doesn't

Let's talk about failures.

Not the "oops, we shipped a typo" failures. The "I wasted 4 hours trying to get agents to do something they fundamentally can't do yet" failures.

After building two production apps with AI agents, I've learned where the boundaries are. This is the honest assessment—what works brilliantly, what works with caveats, and what doesn't work at all (yet).

The Scorecard

Task Type	Verdict	Time Saved	Key Caveat
CRUD Operations	Works brilliantly	85-90%	None — this is agents' sweet spot
UI Components	Works brilliantly	80-85%	Need established patterns first
Authentication	Works brilliantly	80-85%	None — better than most humans
API Integration	Works brilliantly	75-80%	None — agents add retry logic you'd skip
Testing	Works brilliantly	75-80%	None — agents don't get bored writing tests
Complex State Mgmt	Works with caveats	60-70%	Specify the approach or agents over-engineer
Algorithms	Works with caveats	50-60%	Correct but not optimized — specify Big O
UI/UX Design	Works with caveats	50-60%	Functional but no taste — plan to iterate
Refactoring	Works with caveats	50-60%	Needs direction — won't spot tech debt alone
Architecture Decisions	Doesn't work yet	0%	Requires human judgment about the future
Complex Debugging	Doesn't work yet	0%	Gets stuck in loops on novel issues
Cross-Agent Coordination	Doesn't work yet	0%	Agents can't see each other's work
Product Decisions	Doesn't work yet	0%	No business/user context
Novel Problem Solving	Doesn't work yet	0%	Pattern matchers struggle without patterns

What Works Brilliantly

1. Standard CRUD Operations

The Task: Build a user profile management system—create, read, update, delete user data.

What Agents Did:

Created data models in Supabase
Built API endpoints
Implemented forms with validation
Added error handling
Wrote unit tests

Time: 45 minutes (agent time), 10 minutes (review time)

Traditional Time: 4-6 hours

Quality: Production-ready on first review. Zero bugs in two months.

Why It Works: Agents have seen thousands of CRUD implementations in their training data. The patterns are well-established. There's a "right way" to do it.

2. UI Components (Following Patterns)

The Task: Build a reusable card component for displaying user profiles—photo, name, bio, action buttons.

What Agents Did:

Created component following React Native patterns
Implemented responsive layout
Added loading and error states
Made it configurable with props
Wrote Storybook documentation
Added unit tests

Time: 30 minutes (agent time), 5 minutes (review time)

Traditional Time: 2-3 hours

Quality: Pixel-perfect, reusable, well-documented.

Why It Works: Once you establish a component pattern, agents follow it precisely. They don't get creative (unless you want them to). They don't take shortcuts. They implement exactly what you specify.

3. Authentication Flows

The Task: Implement full authentication—sign up, sign in, password reset, session management.

What Agents Did:

Set up Supabase auth
Created auth screens
Implemented form validation
Added error handling for all cases
Set up protected routes
Configured session persistence
Wrote integration tests

Time: 2 hours (agent time), 20 minutes (review time)

Traditional Time: 8-12 hours

Quality: Rock solid. Handles edge cases I didn't even think to specify.

Why It Works: Authentication is well-documented with established best practices. Agents know the security patterns. They don't cut corners because they don't get tired or impatient.

4. API Integration

The Task: Integrate Google Maps API for location services—geocoding, reverse geocoding, distance calculation.

What Agents Did:

Set up API keys and environment config
Created service layer with proper error handling
Implemented retry logic for failed requests
Added response caching
Created TypeScript types for responses
Wrote mocks for testing

Time: 1 hour (agent time), 15 minutes (review time)

Traditional Time: 4-6 hours

Quality: Better than I would have written manually (I wouldn't have added retry logic or caching without being prompted).

Why It Works: API patterns are consistent. Agents understand request/response cycles, error handling, and async operations. They implement defensive code by default.

5. Testing

The Task: Write comprehensive tests for a matching algorithm feature.

What Agents Did:

Created unit tests for all functions
Added edge case testing
Wrote integration tests
Created test fixtures
Achieved 85% code coverage
Added helpful test descriptions

Time: 45 minutes (agent time), 10 minutes (review time)

Traditional Time: 3-4 hours (let's be honest, we often skip comprehensive testing)

Quality: More thorough than most human-written tests.

Why It Works: Agents aren't bored by repetitive test writing. They don't skip edge cases because they're tired. They write tests with the same care as production code.

What Works (With Caveats)

1. Complex State Management

The Task: Build a multi-step form with validation, state persistence, and conditional logic.

What Worked:

Agents built the state machine correctly
Form validation worked perfectly
State persistence worked

What Didn't:

First attempt used overcomplicated state structure
Needed one iteration to simplify
Agent chose useState when Zustand would have been better

Time: 2 hours (agent time), 1 hour (review and iteration)

Traditional Time: 6-8 hours

The Caveat: Agents can build complex state management, but you need to specify the approach. They'll make reasonable choices, but not always optimal ones. Architecture review is critical.

Lesson Learned: Be very specific about state management patterns in your agent instructions. Include examples of when to use local vs global vs server state.

2. Algorithm Implementation

The Task: Build a golf partner matching algorithm—match users by handicap range, location proximity, and availability.

What Worked:

Logic was correct
Edge cases handled
Performance was reasonable

What Didn't:

First version was O(n²) when O(n log n) was possible
Didn't optimize database queries
Needed refinement for scale

Time: 3 hours (agent time), 2 hours (optimization)

Traditional Time: 8-12 hours

The Caveat: Agents will implement working algorithms, but won't automatically optimize for performance. They solve the problem correctly, but not necessarily efficiently.

Lesson Learned: Specify performance requirements upfront. Include Big O expectations. Request database query optimization explicitly.

3. UI/UX Design

The Task: Design and implement a dashboard screen with charts, stats, and navigation.

What Worked:

Layout was functional
Components were properly structured
Accessibility was handled

What Didn't:

Spacing was off (too cramped)
Color choices were "fine" but not great
Interactions felt mechanical

Time: 2 hours (agent time), 1 hour (UX refinement)

Traditional Time: 6-10 hours (including design iteration)

The Caveat: Agents can implement functional UIs, but they don't have taste. They need detailed design specs or will produce "generic but acceptable" UIs.

Lesson Learned: Either provide detailed design specs (Figma, etc.) or plan to iterate on UX after implementation. Agents are great at implementation, weak at aesthetic judgment.

4. Refactoring

The Task: Refactor a monolithic component into smaller, reusable pieces.

What Worked:

Extracted components correctly
Maintained functionality
Improved code organization

What Didn't:

Didn't identify all refactoring opportunities
Sometimes over-abstracted (created tiny components that weren't reusable)
Needed guidance on what to extract

Time: 1.5 hours (agent time), 1 hour (review and refinement)

Traditional Time: 4-6 hours

The Caveat: Agents can refactor with direction, but won't spontaneously identify tech debt. You need to point out what needs refactoring and why.

Lesson Learned: Treat refactoring as explicit tasks with clear goals. Don't expect agents to "clean up code" without specific instructions.

What Doesn't Work (Yet)

1. Architectural Decision-Making

The Problem: Agent was building a feature and needed to choose between two approaches—optimizing for speed vs maintainability.

What Happened:

Agent asked for clarification (good!)
When told "your choice," picked the simpler option
Didn't consider future implications or trade-offs

Why It Doesn't Work: Architecture requires judgment about future requirements, team dynamics, and business context. Agents lack this broader perspective.

The Workaround: Make architectural decisions yourself. Give agents the architecture, not the responsibility to create it.

2. Debugging Complex Issues

The Problem: A feature worked in development but crashed in production with a cryptic error.

What Happened:

Agent tried standard debugging steps
Suggested common fixes (none worked)
Got stuck in a loop of trying the same approaches
Couldn't reason through environment differences

Why It Doesn't Work: Debugging requires intuition, pattern recognition across different contexts, and creative hypothesis generation. Agents are great at systematic debugging but weak at "hmm, that's weird" moments.

The Workaround: I debugged it in 20 minutes (it was a timezone issue). Some things still need human debugging intuition.

3. Contextual Coordination

The Problem: Two agents worked on related features that needed to integrate.

What Happened:

Agent A built a matching algorithm
Agent B built a notification system
Both worked perfectly in isolation
Integration required manual coordination

Why It Doesn't Work: Agents don't have institutional memory. They can't see what other agents are building. They work in isolated contexts.

The Workaround: This is where orchestration matters. Human coordinates the integration, agents implement it.

4. Product Decisions

The Problem: Agent asked "Should this be a required field or optional?"

What Happened:

Agent couldn't make the decision
Needed product context about user experience
Needed business context about data requirements

Why It Doesn't Work: Product decisions require understanding user needs, business goals, and strategic priorities. Agents lack this context.

The Workaround: Don't expect agents to make product decisions. Define requirements clearly. When in doubt, agents should ask (and they usually do).

5. Novel Problem Solving

The Problem: Needed to implement a feature with no clear precedent—a novel matching algorithm based on specific business rules.

What Happened:

Agent tried to apply standard approaches
Got stuck when patterns didn't fit
Needed significant human guidance

Why It Doesn't Work: Agents are pattern matchers. When there's no pattern to match, they struggle. Novel work still needs human creativity.

The Workaround: Break down novel problems into smaller, pattern-based pieces. Or just build it yourself and have agents implement the patterns you create.

The 80/20 Reality

After two projects, here's my honest assessment:

80% of software development is pattern matching:

CRUD operations
API integration
Form validation
Authentication
UI components
Testing
Deployment

Agents handle this 80% brilliantly.

20% of software development is creative problem-solving:

Architecture decisions
Complex debugging
Product trade-offs
Novel algorithms
Cross-system coordination

Humans still need to handle this 20%.

But here's the key insight: That 20% is the high-leverage work.

Before agents, I spent:

80% of my time on boilerplate and implementation
20% on architecture and product decisions

With agents, I spend:

20% reviewing agent work
80% on architecture and product decisions

My time shifted from low-leverage to high-leverage work.

WHERE MY TIME GOES: BEFORE vs AFTER AGENTS

Before agents

80% Implementation (boilerplate, coding)

20%

After agents

20%

80% Architecture, product decisions, strategy

The ratio flipped. Same hours, radically different leverage.

The Failure Gallery

Let me share some real failures (with lessons learned):

Failure #1: The Case-Sensitive Component

What Happened: Two agents independently created UserMultiSelect.tsx and UserMultiselect.tsx. Both worked on macOS. Both crashed on Linux CI.

The Lesson: Implemented component registry. Required agents to check for existing components before creating new ones. Never happened again.

Pattern Encoded: Pre-creation checks for all components.

Failure #2: The Peer Dependency Hell

What Happened: Agent installed @react-navigation/native but missed 5 peer dependencies. App crashed on startup.

The Lesson: Created 5-layer validation stack for dependency management. Added peer dependency checks to agent protocols, review process, and pre-commit hooks.

Pattern Encoded: Multi-layer dependency validation.

Failure #3: The Overengineered Abstraction

What Happened: Agent created a "flexible, reusable configuration system" for something that needed a simple object. Added unnecessary complexity.

The Lesson: Added to agent instructions: "Prefer simple solutions. Only abstract when reuse is guaranteed, not hypothetical."

Pattern Encoded: Simplicity-first principle.

Failure #4: The Missing Error Boundary

What Happened: Agent built a perfect feature but forgot error boundary. One edge case crashed the entire app.

The Lesson: Updated component checklist to require error boundaries for all screens. Made it a review requirement.

Pattern Encoded: Error boundary requirement.

The Learning Loop

Here's the pattern I've developed:

Agent builds feature
Issue discovered in review or production
Analyze root cause
Encode fix into agent instructions
Issue never happens again

Each failure makes the system stronger. That's the compounding effect in action.

What to Expect

If you're starting with agent-driven development:

Month 1:

Agents will make mistakes
You'll spend time debugging
You'll wonder if it's worth it

Month 2:

Patterns emerge
Instructions improve
Error rate drops

Month 3:

Agents feel reliable
Review is quick
Productivity multiplies

The key is surviving Month 1 and encoding the lessons.

The Honest ROI

Let me be completely transparent about what "works" means:

Features that worked first try: ~60% Features that needed one iteration: ~30% Features that needed significant rework: ~10%

Compare to traditional development: Features that worked first try: ~40% Features that needed one iteration: ~40% Features that needed significant rework: ~20%

Agents aren't perfect. But they're better than average, and they're improving every day (as I encode more patterns).

FIRST-TRY SUCCESS RATE: AGENTS vs TRADITIONAL

Agent Development

60% First try

30% One iteration

10%

Traditional Development

40% First try

40% One iteration

20%

Agents: more first-try successes, half the rework rate. And improving with every pattern encoded.

What's Next

In the next article, I'll dive deep into multi-agent orchestration—how to coordinate multiple agents building features in parallel, how to prevent conflicts, and how to scale from one agent to ten.

This is part 4 of a 6-part series on building production software with AI agents. ← Part 3: Teaching Agents | Part 5: Multi-Agent Orchestration →

Building Real Features: What Works vs What Does Not

Building Real Features: What Works vs What Doesn't

The Scorecard

What Works Brilliantly

1. Standard CRUD Operations

2. UI Components (Following Patterns)

3. Authentication Flows

4. API Integration

5. Testing

What Works (With Caveats)

1. Complex State Management

2. Algorithm Implementation

3. UI/UX Design

4. Refactoring

What Doesn't Work (Yet)

1. Architectural Decision-Making

2. Debugging Complex Issues

3. Contextual Coordination

4. Product Decisions

5. Novel Problem Solving

The 80/20 Reality

The Failure Gallery

Failure #1: The Case-Sensitive Component

Failure #2: The Peer Dependency Hell

Failure #3: The Overengineered Abstraction

Failure #4: The Missing Error Boundary

The Learning Loop

What to Expect

The Honest ROI

What's Next