Debugging a TestFlight Crash with AI Agents: A 392MB Invisible Image

How a single PNG brought down our entire app - and what the debugging process reveals about agentic development.

The Setup

I'm building a React Native social app using Agentic-First Development. We'd just shipped a batch of features - block/report functionality, conversation filtering, push notifications - and submitted a fresh build to TestFlight.

I opened the app on my iPhone 11. Splash screen appeared. Login screen flashed briefly. Then: black screen, an iOS spinner, and the phone dumped me back to the lock screen.

Every. Single. Time.

The Misleading Clues

Working with Claude, we started peeling back layers. The first thing we found: a debug flag (CLEAR_SESSION_ON_STARTUP = true) was nuking the auth session on every launch. Easy fix, but the crash persisted.

Then we found a race condition. The RootNavigator was resolving auth state and onboarding status independently, and for a brief window, the app thought the user was logged out. It would flash the login screen, then realize there was a session, then flip to the app navigator - a jarring white flash every few seconds. The onboarding polling was tearing down the entire navigation tree on a 2-second interval, resetting user selections mid-flow.

We fixed all of that. The app was smoother. But it still crashed on TestFlight.

Here's the thing that made this brutal: it didn't crash in development mode. Expo Go was fine. Even npx expo run:ios --device was fine. Only production/release builds triggered the crash. That ruled out most of our debugging tools.

Anatomy of a Crash

Here's the kill chain — how a single oversized PNG made it from an innocent asset folder to a hard crash on a real device:

Step	What Happened	Why It Matters
1. Asset created	Designer (or AI) generated `logo.png` at 11,605 x 8,445px	Looks fine in Finder — only 852KB on disk
2. Bundled into build	Expo/EAS packages asset into the production binary	No size validation in the build pipeline
3. App launches	iOS loads the splash → login → main screen	App appears to work for a split second
4. Image decode	UIKit decodes PNG to raw bitmap on the main thread	11,605 x 8,445 x 4 bytes = 392MB in RAM
5. Memory spike	iPhone 11 (3GB total) hits 83%+ memory pressure	OS issues a memory warning — app doesn't respond fast enough
6. Watchdog kill	`UIKit-runloop` timeout > 1,000ms	iOS terminates the app. User sees a black screen and is dumped to the lock screen

The brutal part: steps 1-3 work perfectly in development. Expo Go lazy-loads and downsamples. The bitmap bomb only detonates in production builds where iOS handles assets natively.

Finding the Real Root Cause

The Debug Trail

This wasn't a clean "find the bug" story. It was a layered investigation where every tool gave us a piece — but not the answer — until the crash report cracked it open.

Round 1: Xcode console. We ran the production build through Xcode with the device connected. The console showed memory warnings and a UIKit-runloop timeout, but the stack traces were partially symbolicated. We could see ImageIO and libz in the call stack — image decompression — but couldn't pinpoint which image.

Round 2: Expo debugger. No help. The crash only happened in production builds, and the Expo development tools don't attach to release binaries the same way. This is the trap: the app ran perfectly in expo start and even npx expo run:ios --device in debug mode.

Round 3: Strategic debug logging. Claude Code inserted console.log breadcrumbs at every lifecycle stage — app init, auth resolution, navigation mount, screen render. We rebuilt, deployed to TestFlight, and watched the Xcode console. The logs showed the app was getting through auth and navigation setup before dying. That narrowed it to something in the initial render — not the auth logic we'd already fixed.

Round 4: The crash report. This is what broke it open. I pulled the actual .ips crash report from my iPhone (Settings → Privacy → Analytics → Analytics Data) and sent it directly to Claude Code. The full symbolicated crash report showed the exact kill chain:

UIKit-runloop-<AppName>: timeout 1103ms

The call stack pointed through ImageIO and libz — image decompression. The iOS watchdog had killed our app because the main thread was blocked for over a second. Claude immediately asked me to check the image dimensions:

sips --getProperty pixelWidth --getProperty pixelHeight assets/logo.png

The result:

pixelWidth: 11605
pixelHeight: 8445

Our logo - displayed at maybe 120 pixels wide on screen - was an 11,605 x 8,445 pixel PNG. iOS decodes images to full-resolution bitmaps in memory on the main thread. That single image decompressed to roughly 392MB of RAM. On an iPhone 11 with 3GB total, already under 83% memory pressure, this was instant death.

The fix took 10 seconds:

sips --resampleWidth 1200 assets/logo.png --out assets/logo.png
sips --resampleWidth 1200 assets/logo_dark.png --out assets/logo_dark.png
sips --resampleWidth 600 assets/hero.png --out assets/hero.png

11,605 pixels down to 1,200. 852KB down to 103KB. And more importantly: ~392MB decoded bitmap down to ~4MB.

The crash was gone.

Why Development Mode Didn't Catch It

In Expo Go and debug builds, Metro serves assets differently. Images may be loaded lazily, cached differently, or not decoded at full resolution on the main thread. The production build bundles assets into the binary and iOS handles them natively - which means full bitmap decode on the UI thread at launch.

This is why you can't skip TestFlight testing. Development builds are a different animal.

What We Actually Fixed (The Full List)

The debugging session uncovered six separate issues. The image was the crash, but the others were real problems too:

Issue	Impact	Fix
Oversized images (11605x8445)	iOS watchdog kill, ~392MB bitmap	Resized to 1200px max width
Auth race condition	Flash of login screen on launch	`isReady` guard with native splash screen
Onboarding polling teardown	Navigation tree unmounting every 2s, resetting user selections	`initialCheckDone` ref to skip `setProfileLoading(true)` on polls
Silent login failures	No error shown to user on bad credentials	Added `Alert.alert` for returned errors
Require cycle (AuthContext <-> usePushNotifications)	Potential Hermes production crash	Extracted `removePushToken` to separate service
Non-functional Google Sign-In button	Button visible but did nothing	Hidden, created issue for later implementation

The Agentic Development Angle

Here's what's interesting about this debugging session from an agentic development perspective.

Claude read the crash report and identified the root cause in one pass. The UIKit-runloop timeout plus ImageIO in the stack trace was enough. It knew that iOS decodes to full bitmap, knew to check dimensions with sips, and knew the math: width x height x 4 bytes per pixel = decoded size.

The layered debugging was methodical. We didn't just shotgun fixes. Each issue was identified, fixed, tested, and committed before moving to the next. The agent maintained context across all six issues without losing track.

The agent caught things I wouldn't have checked. The require cycle between AuthContext and usePushNotifications was a ticking time bomb - it worked fine in development but could cause random native crashes in Hermes production builds. That's deep React Native knowledge that most developers learn the hard way.

Total time from "it crashes on TestFlight" to "it works on TestFlight": about 3 hours. That includes building, submitting, and testing on device multiple times. The actual diagnosis-to-fix for the crash itself was maybe 15 minutes once we had the crash report.

The Lesson

Always check your image dimensions. Not file size - pixel dimensions. A 500KB PNG can decode to hundreds of megabytes. iOS doesn't care that you're displaying it at 120 pixels wide. It decodes the full thing.

Add this to your build checklist:

find assets -name "*.png" -exec sips --getProperty pixelWidth --getProperty pixelHeight {} \;

If anything is over 1200px wide and you're displaying it in a mobile app, resize it. Your users' phones will thank you.

Lessons Learned: The CLAUDE.md Rules This Crash Created

After this incident, I added explicit rules to my project's CLAUDE.md — the instruction file that governs how AI agents work in the codebase. These now fire on every build:

Image Asset Rules (added post-crash)

No image asset shall exceed 1200px on its longest edge. Before committing any new image, run sips --getProperty pixelWidth --getProperty pixelHeight and resize if needed. Non-negotiable.

Always calculate decoded memory cost. Width x Height x 4 bytes = decoded bitmap size. If it exceeds 10MB decoded, it must be resized or converted to a progressive format. A 500KB file on disk can be a 400MB bomb in RAM.

Never trust development mode for crash validation. Expo Go and debug builds handle assets differently from production. Any crash fix must be verified on a real device via TestFlight or a release build.

Pull the actual crash report. When a production build crashes, the device crash report (Settings → Privacy → Analytics → Analytics Data) contains the symbolicated stack trace. This is the source of truth — not console logs, not guesswork.

This is the agentic development feedback loop in action: a production incident becomes a permanent rule that every future agent session inherits. The agent that caused the oversized image will never make that mistake again — not because it "learned," but because the CLAUDE.md now explicitly forbids it. Human judgment encoded as agent instructions.

Built with Agentic-First Development. The debugging session that produced this post was a collaboration between me and Claude - from crash report analysis to root cause fix to this writeup.

If you're a founder or CTO interested in implementing Agentic-First Development at your company, let's talk.