GitHub/ipsw

Fork 0

mirror of https://github.com/blacktop/ipsw.git synced 2026-05-08 12:22:26 +00:00

Files

T

History

blacktop ce5f49639e Merge branch 'master' into feat/diff_pipeline

2025-12-21 12:33:29 -07:00

pipeline

chore: more prog

2025-10-27 12:28:30 -06:00

adapter.go

chore: more prog

2025-10-27 12:28:30 -06:00

ARCHITECTURE.md

chore: more prog

2025-10-27 12:28:30 -06:00

CACHE_ARCHITECTURE.md

chore: more prog

2025-10-27 12:28:30 -06:00

diff.go

Merge branch 'master' into feat/diff_pipeline

2025-12-21 12:33:29 -07:00

FINAL_TEST_RESULTS.md

chore: add current code

2025-10-08 11:03:05 -06:00

format.go

chore: more prog

2025-10-27 12:28:30 -06:00

html.go

feat: ipsw diff can now handle LARGE diff markdown via splitting it up into many smaller files

2024-06-25 00:23:54 -06:00

IMPLEMENTATION_STATUS.md

chore: more prog

2025-10-27 12:28:30 -06:00

md.go

chore: more prog

2025-10-27 12:28:30 -06:00

OPTIMIZATION_RESULTS.md

chore: add current code

2025-10-08 11:03:05 -06:00

PROFILING.md

chore: add current code

2025-10-08 11:03:05 -06:00

README.md

chore: more prog

2025-10-27 12:28:30 -06:00

TASKS.md

chore: more prog

2025-10-27 12:28:30 -06:00

TESTING_GUIDE.md

chore: add current code

2025-10-08 11:03:05 -06:00

README.md

IPSW Diff Pipeline Refactor

Status: 🚧 In Progress — Event-Driven File Streaming Rollout Branch: feat/diff_pipeline Start Date: 2025-10-01 Last Updated: 2025-10-26 Current Focus: Converting remaining handlers to the matcher-based ZIP/DMG walker

Quick Links

📊 IMPLEMENTATION_STATUS.md - Current progress summary 📐 ARCHITECTURE.md - Technical design 📋 TASKS.md - Detailed task tracking 💾 CACHE_ARCHITECTURE.md - Caching deep dive 🔬 PROFILING.md - Performance profiling guide 🎯 FINAL_TEST_RESULTS.md - Full production test results 🚀 OPTIMIZATION_RESULTS.md - DSC memory optimization journey

Quick Start

For agents/developers new to this feature:

Start here → IMPLEMENTATION_STATUS.md for current progress
Read this document for overview
Review ARCHITECTURE.md for technical design
Check TASKS.md for what's next
See CACHE_ARCHITECTURE.md for caching details
Launch Constraints diffs now run automatically (look for the 🚦 section in reports).

AEA Tip: Use --pem-db /path/to/aea_keys.json (output of ipsw extract fcs-keys) to ensure modern AEA-encrypted DMGs decrypt even before Apple publishes new FCS keys.

Overview

Refactoring the ipsw diff command from a monolithic sequential implementation to a modular pipeline-based architecture with intelligent resource management and concurrent execution. The latest milestone adds a streaming ZIP/DMG walker plus handler “matchers,” allowing us to unzip/decrypt each artifact exactly once while every interested handler consumes the data it needs.

Latest Update – Oct 26, 2025

✅ FileSubscription API routes every IPSW/DMG file through a single pass.
✅ Files, Features, Launchd, DSC, and iBoot now subscribe to file events (no more redundant aea or unzip work).
🚧 MachO/Entitlements/Launch Constraints still depend on the legacy cache; they are being migrated next.
🚧 Firmware/Kernelcache still call bespoke extractors; ZIP subscriptions are queued.
🔁 Regression script (hack/diff-regression.sh) needs to re-run once the remaining handlers move to the new flow.

The Problem

The current ipsw diff implementation has critical performance and resource issues:

❌ Current Implementation:
- Execution Time: 20-30 minutes
- Memory Usage: 60GB+ RAM (!)
- File Parsing: 60,000+ operations (30k files parsed 2-4 times)
- Concurrency: None (sequential execution)
- Mount Operations: 8-12 (redundant mounts/unmounts)
- Architecture: Monolithic, tightly coupled

The Solution

Pipeline-based architecture with handler grouping and two-phase caching:

✅ New Implementation (VALIDATED):
- Execution Time: 8m 45s (60-70% faster)
- Memory Usage: 721 MB peak (99% reduction, verified)
- File Parsing: 30,000 operations (each file parsed once)
- Concurrency: Parallel handlers within DMG groups
- Mount Operations: 6-8 (one per DMG type)
- Architecture: Modular, extensible handlers

Key Innovations

1. Handler-Based Pipeline

Self-contained handlers that declare their DMG dependencies:

type Handler interface {
    Name() string
    DMGTypes() []DMGType              // What DMGs this needs
    Enabled(cfg *Config) bool         // Conditional execution
    Execute(ctx context.Context, exec *Executor) (*Result, error)
}

2. Event-Driven ZIP/DMG Streaming

Mount each DMG (or open the IPSW zip) once, walk every file, and fire handler matchers on the fly:

Pass 1: ZIP (no mounts)
  → Firmware/IBoot/Kernelcache/etc. consume entries via matchers

Pass 2+: DMGTypeSystemOS, FileSystem, AppOS, Exclave…
  → Handlers subscribing to those DMGs receive callbacks while the mount is live

Handlers declare their interest via:

func (h *FilesHandler) FileSubscriptions() []pipeline.FileSubscription {
    return []pipeline.FileSubscription{
        {ID: "zip", Source: pipeline.SourceZIP},
        {ID: "filesystem", Source: pipeline.SourceDMG, DMGType: pipeline.DMGTypeFileSystem},
    }
}

The executor now streams each file through these subscriptions, so we unzip/decrypt once, parse once, and store only the diff metadata required.

3. Two-Phase MachO Caching

Problem: MachO files were parsed 2-4 times by different handlers

Solution: Scan once, cache all data, handlers read from memory

Phase 1: Data Collection
  - Mount DMG
  - Scan all MachOs ONCE
  - Extract symbols, sections, strings, entitlements
  - Store in shared cache (~840MB for 30k files)

Phase 2: Handler Consumption
  - MachO handler reads from cache (no I/O)
  - Entitlements handler reads from cache (no I/O)
  - Other handlers run concurrently

4. Comprehensive Profiling

Built-in profiling using Go 1.25 Flight Recorder to identify and eliminate bottlenecks:

Flight Recorder: Always-on profiling with <1% overhead
Post-mortem analysis: Capture last 5 seconds before crash
Full trace: CPU, memory, goroutines, GC, syscalls in one file
Execution statistics: Mount ops, parse ops, cache hits, memory usage

# Enable profiling
ipsw diff --profile old.ipsw new.ipsw

# Analyze with interactive trace viewer
go tool trace flight.trace

Project Structure

internal/diff/
├── README.md                    # ← You are here (overview)
├── IMPLEMENTATION_STATUS.md     # ✨ Current progress summary
├── ARCHITECTURE.md              # Technical architecture & design
├── TASKS.md                     # Implementation tasks & timeline
├── CACHE_ARCHITECTURE.md        # Two-phase caching deep dive
├── PROFILING.md                 # ✅ Performance profiling guide
│
├── diff.go                      # Legacy implementation (923 lines)
├── adapter.go                   # ✅ Bridge to new pipeline (203 lines)
│
└── pipeline/                    # New pipeline package
    ├── handler.go               # ✅ Handler interface (164 lines)
    ├── types.go                 # ✅ Core types (211 lines)
    ├── executor.go              # ✅ Pipeline orchestration (801 lines)
    ├── cache.go                 # ✅ MachO cache infrastructure (132 lines)
    ├── profiling.go             # ✅ Flight recorder profiling (254 lines)
    │
    └── handlers/                # Handler implementations
        ├── kernelcache.go       # ✅ Kernelcache diff (194 lines)
        ├── dsc.go               # ✅ DYLD Shared Cache (146 lines)
        ├── launchd.go           # ✅ Launchd config (70 lines)
        ├── firmware.go          # ✅ Firmware diff (60 lines)
        ├── iboot.go             # ✅ iBoot strings (154 lines)
        ├── features.go          # ✅ Feature flags (130 lines)
        ├── files.go             # ✅ File listings (89 lines)
        ├── entitlements.go      # ✅ Entitlements (76 lines - cache-optimized)
        ├── kdk.go               # ✅ KDK DWARF (89 lines)
        └── macho.go             # ✅ MachO diff (107 lines - cache-based)

Current Progress (as of 2025-10-03)

Overall: Functionally Complete - See Known Limitations ✅

✅ Phase 1: Core Infrastructure (100%)

Pipeline package structure created
Handler interface and DMGType system
Executor with mount/unmount logic
Thread-safe context management
DMG grouping and concurrent execution
Execution statistics tracking

✅ Phase 2: Handler Migration (100% - 10 of 10 handlers) 🎉

All Handlers Complete:

KernelcacheHandler (with signature symbolication support)
DSCHandler (with WebKit version extraction)
LaunchdHandler
FirmwareHandler
IBootHandler
FeaturesHandler
FilesHandler
EntitlementsHandler (cache-optimized, see limitations)
KDKHandler
MachOHandler (cache-based)

✅ Phase 3: MachO Cache System (100%) 🎉

Cache types and infrastructure (Task 3.1-3.2)
Cache population in Executor (Task 3.3)
MachO handler using cache (Task 3.4)
Entitlements migrated to cache (Task 3.5)
Cache performance metrics (Task 3.6)

✅ Phase 4: Profiling & Optimization (100%) 🎉

Go 1.25 Flight Recorder profiling (Task 4.1)
Detailed performance metrics (Task 4.2)
Performance analysis on real IPSWs (Task 4.3) - COMPLETED
Targeted optimizations (Task 4.4) - COMPLETED
- DSC memory optimization: 94% reduction (15.4 GB → 721 MB)
- Streaming pair diff: Process 4,180 images one-by-one
- Manual GC strategy: Every 200 images in parallel mode
- Full production test: All handlers validated

🎯 Phase 5: Extended Features (Optional)

Core functionality complete. Future enhancements:

Advanced progress reporting with ETA
Handler middleware framework
Additional DMG types
Performance regression testing

Performance Results (Validated)

Test Date: 2025-10-03 | IPSWs: iPhone18,1 26.0 → 26.0.1 | See FINAL_TEST_RESULTS.md

Metric	Before	After	Improvement
Execution Time	20-30 min	8m 45s	60-70% ✅
Memory Usage	60GB+	721 MB	99% ✅
File Parsing	60k+ ops	30k ops	50% ✅
Mount Operations	8-12	6-8	40% ✅
DSC Processing	15.4 GB peak	<1 GB peak	94% ✅

Key Achievements:

All handlers working in parallel groups
4,180 DSC images processed via streaming pair diff
Manual GC strategy keeps memory under 1 GB
Flight recorder profiling validates optimizations

Performance Profiling

The pipeline includes comprehensive profiling and performance metrics. See PROFILING.md for full details.

Quick Start

Enable verbose metrics:

ipsw diff old.ipsw new.ipsw --verbose

Enable flight recorder profiling (Go 1.25+):

ipsw diff old.ipsw new.ipsw --profile --profile-dir ./profiles

Available Metrics

Per-handler timing: Execution time for each handler
Memory tracking: Start, end, peak, and delta
Cache metrics: Population time, file counts, errors
DMG operations: Mount/unmount counts and timing
GC statistics: Pause times and run counts

Example Output

Execution time: 2m34s
Handlers run: 8, skipped: 3
Cache populated: 15234 + 15678 files in 23.4s

Handler execution times:
  DYLD Shared Cache: 1m12s
  MachO: 18.7s
  Kernelcache: 5.2s
  ...

Memory usage:
  Start: 45.2 MiB
  Peak: 1.2 GiB
  Delta: +867.2 MiB

Key Design Decisions

Why Handlers?

Modularity: Each diff operation is independent
Testability: Mock DMG mounting for unit tests
Extensibility: Add new handlers without changing core
Concurrency: Handlers run in parallel within groups

Why Two-Phase Caching?

Performance: Eliminate 50% of file I/O operations
Memory: 840MB cache vs 60GB redundant parsing
Consistency: All handlers see same parsed data
Simplicity: Scan once, consume many times

Why DMG Grouping?

Resource Efficiency: Mount each DMG type once
Concurrency: Parallel execution within groups
Safety: Sequential between groups (clean unmount)

Testing Strategy

Current Status: Manual testing only. Automated tests are TODO.

Manual Validation (Completed 2025-10-03):

✅ Full production test with real IPSWs (iPhone18,1 26.0 → 26.0.1)
✅ All 10 handlers verified against legacy implementation
✅ Performance metrics validated (see FINAL_TEST_RESULTS.md)
✅ Memory optimization confirmed (99% reduction to 721 MB peak)
✅ Execution time improvement validated (60-70% faster)

TODO - Automated Test Coverage:

Unit tests for each handler
Integration tests for full pipeline
Regression tests for performance
Comparison tests against legacy output
CI/CD integration

How to Contribute

Adding a New Handler

Create file in internal/diff/pipeline/handlers/
Implement Handler interface
Declare DMG dependencies in DMGTypes()
Register in adapter.go
Add integration test
Update TASKS.md checklist

Example:

type MyHandler struct{}

func (h *MyHandler) Name() string { return "My Feature" }

func (h *MyHandler) DMGTypes() []pipeline.DMGType {
    return []pipeline.DMGType{pipeline.DMGTypeSystemOS}
}

func (h *MyHandler) Enabled(cfg *pipeline.Config) bool {
    return cfg.MyFeature
}

func (h *MyHandler) Execute(ctx context.Context, exec *pipeline.Executor) (*pipeline.Result, error) {
    // Get mounted DMG
    oldMount, _ := exec.OldCtx.GetMount(pipeline.DMGTypeSystemOS)
    newMount, _ := exec.NewCtx.GetMount(pipeline.DMGTypeSystemOS)

    // Do diff work
    data := performDiff(oldMount.MountPath, newMount.MountPath)

    return &pipeline.Result{
        HandlerName: h.Name(),
        Data:        data,
    }, nil
}

Documentation Map

Document	Purpose	Read When
README.md (this file)	Overview, quick start, current status	First time / quick refresh
ARCHITECTURE.md	Technical design, execution flow, patterns	Implementing features
TASKS.md	Task breakdown, timeline, acceptance criteria	Planning work
CACHE_ARCHITECTURE.md	Two-phase caching deep dive	Implementing cache or handlers using cache

Migration Strategy

Since we're on a feature branch, no backward compatibility concerns:

Phase 1: Core infrastructure ✅
Phase 2: Port all handlers (in progress)
Phase 3: Add MachO caching (critical for memory)
Phase 4: Add profiling and optimize
Phase 5: Extended features (optional)

No feature flags needed - this is a clean rewrite on a branch.

Known Limitations

Feature Parity with Legacy

✅ LaunchConstraints parity restored via the dedicated --launch-constraints handler (Self/Parent/Responsible all diffed from the MachO cache).

Resolved Issues

✅ AEA file cleanup (.dmg.aea files left in directory)
✅ DMG extraction before decryption
✅ Pipeline infrastructure working
✅ All config options supported (Signatures, PemDB, AllowList, BlockList, etc.)

Outstanding Issues

⚠️ Files handler fails on certain IPSWs with AEA decryption errors (non-critical)
⚠️ Broken symlinks generate verbose warnings (cosmetic issue)
⚠️ No automated test coverage (manual testing only)

Resources

Code References

Legacy implementation: internal/diff/diff.go (923 LOC)
New pipeline core: internal/diff/pipeline/executor.go (801 LOC)
All handlers combined: internal/diff/pipeline/handlers/*.go (1,115 LOC)
Example handler: internal/diff/pipeline/handlers/dsc.go (146 LOC)

Test Data

Successfully tested with:
- iPhone18,1_26.0_23A345_Restore.ipsw
- iPhone18,1_26.0.1_23A355_Restore.ipsw
Output: /tmp/ipsw-diff-test/26_0_23A345__vs_26_0_1_23A355/

Dependencies

Go 1.25.0+ (required):
- Index-less for range loops
- errgroup.Go() method
- Flight Recorder profiling (new in 1.25)
- Updated go.mod to: go 1.25.0 / toolchain go1.25.1
golang.org/x/sync/errgroup
Existing internal/commands/* packages
Existing pkg/* packages (dyld, macho, info, etc.)

Profiling Resources

Go 1.25 Flight Recorder Blog Post
Traditional profiling docs: Profiling Go Programs

Questions?

For new agents/developers joining this feature:

What's the current state? Check "Current Progress" section above
What should I work on? See "Known Limitations" and TODO items
How does the pipeline work? Read ARCHITECTURE.md execution flow
How does caching work? Read CACHE_ARCHITECTURE.md
Where are the tests? Currently manual testing only - automated tests are TODO

Success Criteria

Pipeline refactor is complete when:

✅ All 10 handlers ported and tested
✅ MachO caching implemented and working
✅ Performance targets met (60-70% faster, <1GB RAM)
✅ Profiling infrastructure in place
✅ Production test with all handlers passing
⚠️ Documentation complete (updated with accurate metrics)
⚠️ Automated test coverage (TODO)

Status: ✅ FUNCTIONALLY COMPLETE | ⚠️ Manual Testing Only

Full production test completed 2025-10-03 with all 10 handlers enabled, achieving 721 MB peak memory (99% reduction from 60GB+) and 8m 45s execution time (60-70% faster than 20-30 min baseline). See Known Limitations for minor feature gaps.

README.md Unescape Escape

IPSW Diff Pipeline Refactor

Quick Links

Quick Start

Overview

Latest Update – Oct 26, 2025

The Problem

The Solution

Key Innovations

1. Handler-Based Pipeline

2. Event-Driven ZIP/DMG Streaming

3. Two-Phase MachO Caching

4. Comprehensive Profiling

Project Structure

Current Progress (as of 2025-10-03)

✅ Phase 1: Core Infrastructure (100%)

✅ Phase 2: Handler Migration (100% - 10 of 10 handlers) 🎉

✅ Phase 3: MachO Cache System (100%) 🎉

✅ Phase 4: Profiling & Optimization (100%) 🎉

🎯 Phase 5: Extended Features (Optional)

Performance Results (Validated)

Performance Profiling

Quick Start

Available Metrics

Example Output

Key Design Decisions

Why Handlers?

Why Two-Phase Caching?

Why DMG Grouping?

Testing Strategy

How to Contribute

Adding a New Handler

Documentation Map

Migration Strategy

Known Limitations

Feature Parity with Legacy

Resolved Issues

Outstanding Issues

Resources

Code References

Test Data

Dependencies

Profiling Resources

Questions?

Success Criteria

README.md