diff --git a/VMP_TESTING_NOTES.md b/VMP_TESTING_NOTES.md
index a8af767..7623d66 100644
--- a/VMP_TESTING_NOTES.md
+++ b/VMP_TESTING_NOTES.md
@@ -64,6 +64,8 @@ Profiler findings:
 Use `simple/protected381/*`.
 - These are the best local VMP-style performance targets.
 - Continue profiling semantics/memory/folder helpers there.
+- For large control-flow/semantics/inlining changes, run `python test.py vmp` from repo root. That command now fails required targets on diagnostics errors **or** `blocks_completed == 0`, while still reporting the older VMP 3.6 sample as best-effort only.
+- The current safe configuration keeps loop-header generalization disabled and relies on a higher basic-block budget for the stable 3.8.x samples.
 
 ### If the goal is older protected/VMP 3.6 support
 Use `simple/protected/simple_target_protected.vmp.exe`, but do not treat it as a normal instruction-semantics-only problem.
diff --git a/docs/BUILDING.md b/docs/BUILDING.md
index 4a2bcf6..0f9c73c 100644
--- a/docs/BUILDING.md
+++ b/docs/BUILDING.md
@@ -1,145 +1,91 @@
-# Docker
+# Building Mergen
 
-To build Mergen in Docker run the following commands:
+This file owns build and toolchain setup. For pipeline details, use `ARCHITECTURE.md`. For rewrite/test workflows, use `docs/REWRITE_BASELINE.md`.
 
-## Build image
+## Preferred Development Flow
+Mergen is developed and CI-tested primarily on Windows with Ninja, `clang-cl`, and LLVM 18.
+
+### Prerequisites
+- CMake on `PATH`
+- Ninja on `PATH`
+- Visual Studio C++ toolchain installed (the scripts rely on `clang-cl` finding MSVC headers/libs; they do not call `VsDevCmd.bat`)
+- `LLVM_DIR` pointing at LLVM 18 CMake config, or a local `../llvm18-install/lib/cmake/llvm`
+- Rust/Cargo on `PATH` for the default iced backend
+
+### Default iced backend
+```bat
+cmd /c scripts\dev\configure_iced.cmd
+cmd /c scripts\dev\build_iced.cmd
+```
+
+Outputs:
+- `build_iced/lifter.exe`
+- `build_iced/rewrite_microtests.exe`
+
+### Alternate Zydis-only backend
+Use this only when you need the fallback lane or backend-specific debugging.
+
+```bat
+cmd /c scripts\dev\configure_zydis.cmd
+cmd /c scripts\dev\build_zydis.cmd
+```
+
+## Verify After Building
+Primary checks:
+
+```bat
+python test.py quick
+python test.py all
+```
+
+Useful targeted checks:
+
+```bat
+python test.py baseline
+python test.py micro --check-flags
+python test.py negative
+python test.py vmp
+```
+
+Use `python test.py vmp` for larger control-flow/semantics/inlining changes when you want a quick sanity pass over the local VMProtect targets without making it part of the default `quick` gate. Required 3.8.x targets must finish with `blocks_completed > 0`; the older VMP 3.6 sample remains best-effort only.
+
+## Useful Environment Variables
+- `LLVM_DIR` — points CMake at `LLVMConfig.cmake`
+- `MERGEN_BUILD_JOBS` — overrides build parallelism (default `4`)
+- `CMAKE_C_COMPILER` / `CMAKE_CXX_COMPILER` — optional compiler override for the configure scripts
+
+Example:
+
+```bat
+set MERGEN_BUILD_JOBS=8
+cmd /c scripts\dev\build_iced.cmd
+```
+
+## Secondary Flows
+### Docker
+The checked-in `Dockerfile` is primarily a build/export container: its default `CMD` copies the built `lifter` binary to `/output/lifter`.
 
 ```bash
 docker build . -t mergen
+mkdir -p output
+docker run --rm -v "$PWD/output":/output mergen
 ```
----
 
-## Run
-
-Place target binary in the Mergen's root dir, then run following command.
-
-Note that you have to replace target.exe with your binary and 0x123456789 with your obfuscated function address.
+If you want to run the built lifter inside the container instead of exporting it, override the command explicitly:
 
 ```bash
-# Powershell
-docker run --rm -v ${PWD}:/data mergen target.exe 0x123456789
-
-# command prompt
-docker run --rm -v %cd%:/data mergen target.exe 0x123456789
-
-# bash
-docker run --rm -v $PWD:/data mergen target.exe 0x123456789
+docker run --rm -v "$PWD":/work mergen /root/Mergen/build/lifter /work/target.exe 0x123456789
 ```
----
 
-# Windows
+### Manual CMake
+Use direct CMake only when debugging cmkr/CMake behavior. Day-to-day development should go through `scripts/dev/*.cmd`.
 
-Here's a detailed guide to setting up your environment to build LLVM 18.1.0 on Windows, using Clang and Ninja, and configuring it to compile Mergen.
-
----
-
-# Building LLVM 18.1.0 from Scratch on Windows
-
-To set up and build LLVM 18.1.0 from scratch on Windows, follow these steps. This guide includes instructions on installing the necessary tools, setting up Visual Studio and the correct SDK, configuring paths, and building with Ninja.
-
-### Prerequisites
-
----
-
-1. **Download and Install LLVM 18.1.0**
-    - Download the LLVM 18.1.0 pre-built installer for Windows from this link: [LLVM-18.1.0-win64.exe](https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.0/LLVM-18.1.0-win64.exe).
-    - Run the installer and follow the on-screen instructions.
-    - During installation, choose the option to set the `PATH` environment variable for either:
-        - **All users** or **Current user only**, depending on your preference.
-    - This configuration will make `clang` and `clang++` directly accessible from any command prompt or terminal.
-
-2. **Download LLVM Source**
-    - Download LLVM 18.1.0 source from the [official release page](https://github.com/llvm/llvm-project/releases/tag/llvmorg-18.1.0).
-    - Direct link to the source: [llvmorg-18.1.0.zip](https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-18.1.0.zip)
-    - Extract the archive to a directory of your choice (e.g., `C:\llvm-project`).
-
-3. **Install Visual Studio 2022**
-    - Download and install **Visual Studio 2022** (Community edition is sufficient).
-    - During installation, ensure you select:
-        - **Desktop development with C++** workload.
-        - Under Individual Components, check:
-            - **MSVC v143 - VS 2022 C++ x64/x86 build tools**
-            - **C++ CMake tools for Windows**
-            - **Windows 10 SDK (10.0.19041.0)** or newer.
-    - Make a note of the installation path, typically:
-        - `C:\Program Files\Microsoft Visual Studio\2022\Community`
-
-4. **Set Up System Environment Variables**
-    - Open the Environment Variables settings in Windows and add the following paths to your `Path` variable:
-        - **Ninja** (if not already installed):
-            - Download Ninja from [Ninja GitHub](https://github.com/ninja-build/ninja/releases) and add the path to `ninja.exe` to your `Path`.
-        - **CMake**:
-            - If CMake is not installed, download it from [CMake's website](https://cmake.org/download/) and add it to your `Path`.
-        - **LLVM tools** (once LLVM is built, add the installation directory to `Path` as needed).
-
-5. **Additional Environment Variables for LLVM and Visual Studio Paths**
-    - Define `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER` paths to ensure LLVM uses the correct compiler:
-        - `CMAKE_C_COMPILER="C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.41.34120/bin/Hostx64/x64/cl.exe"`
-        - `CMAKE_CXX_COMPILER="C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.41.34120/bin/Hostx64/x64/cl.exe"`
-    - Set the Windows Kit paths, specifically `RC` and `MT`:
-        - `CMAKE_RC_COMPILER="C:/Program Files (x86)/Windows Kits/10/bin/10.0.19041.0/x64/rc.exe"`
-        - `CMAKE_MT="C:/Program Files (x86)/Windows Kits/10/bin/10.0.19041.0/x64/mt.exe"`
-    - Optionally, define `LLVM_INSTALL_PREFIX` for the installation directory:
-        - Example: `LLVM_INSTALL_PREFIX="C:\llvm_stuff"`
-
-### Building LLVM with Ninja
-
-1. **Open a Developer Command Prompt**
-    - Open a command prompt configured for Visual Studio:
-        - Navigate to **Start Menu > Visual Studio 2022 > Developer Command Prompt for Visual Studio 2022**.
-
-2. **Configure the Build with CMake**
-    - Navigate to the root of the LLVM source directory:
-      ```bash
-      cd C:\llvm-project
-      ```
-    - Run CMake with the following configuration:
-      ```bash
-      cmake -G "Ninja" -S llvm -B build -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DCMAKE_INSTALL_PREFIX="C:\llvm_stuff" -DLLVM_HOST_TRIPLE=x86_64-pc-windows-msvc -DCMAKE_C_COMPILER="C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.41.34120/bin/Hostx64/x64/cl.exe" -DCMAKE_CXX_COMPILER="C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.41.34120/bin/Hostx64/x64/cl.exe" -DCMAKE_RC_COMPILER="C:/Program Files (x86)/Windows Kits/10/bin/10.0.19041.0/x64/rc.exe" -DCMAKE_MT="C:/Program Files (x86)/Windows Kits/10/bin/10.0.19041.0/x64/mt.exe"
-      ```
-
-3. **Install LLVM**
-    - Once built, install LLVM to the specified installation directory:
-      ```bash
-      ninja -C build install
-      ```
-
-Here’s the updated **Setting Up Mergen Build** section with the repository URL included and configured for a recursive clone:
-
----
-
-### Setting Up Mergen Build
-
-With LLVM successfully built and installed, you can configure Mergen to use the newly built LLVM.
-
-1. **Set the `LLVM_DIR` Environment Variable**
-    - To ensure Mergen can locate the LLVM CMake configuration files, set `LLVM_DIR` as a system environment variable.
-    - Open Command Prompt as Administrator and run the following command:
-      ```cmd
-      setx LLVM_DIR "c:\llvm_stuff\build\lib\cmake\llvm" /M
-      ```
-    - Alternatively, in PowerShell (also as Administrator), use:
-      ```powershell
-      [System.Environment]::SetEnvironmentVariable("LLVM_DIR", "C:\llvm_stuff\build\lib\cmake\llvm", "Machine")
-      ```
-    - This makes `LLVM_DIR` available system-wide, allowing CMake to locate LLVM when building Mergen. Restart any command prompt or terminal session to ensure the environment variable is recognized.
-
-2. **Clone the Mergen Repository** (recursively, to include submodules):
-   ```bash
-   git clone --recursive https://github.com/NaC-L/Mergen.git
-   cd Mergen
-   ```
-
-3. **Run CMake for Mergen Build**
-    - Configure CMake to use Clang as the compiler:
-      ```bash
-      cmake -G Ninja -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="clang++" -DCMAKE_C_COMPILER="clang"
-      ```
-
-4. **Build Mergen with Ninja**
-   ```bash
-   ninja
-   ```
-
----
+```bat
+cmake -G Ninja -S . -B build_iced -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR="..."
+cmake --build build_iced --config Release --parallel 4
+```
 
+## Build Configuration Boundaries
+- `cmake.toml` is the source of truth.
+- `CMakeLists.txt` is generated from `cmake.toml`; do not hand-edit it.
+- `build*/` directories are generated outputs, not source-controlled configuration.
diff --git a/docs/REWRITE_BASELINE.md b/docs/REWRITE_BASELINE.md
index 5c4d934..fb7e2e2 100644
--- a/docs/REWRITE_BASELINE.md
+++ b/docs/REWRITE_BASELINE.md
@@ -114,6 +114,7 @@ This gate asserts explicit failure behavior for malformed manifests/vectors, vec
 - every expected pattern declared in `instruction_microtests.json` is present in that IR output
 A rewrite change is not acceptable if this baseline fails.
 `python test.py quick` and `python test.py all` additionally run runtime semantic validation for **all** samples after baseline lifting, executing each lifted IR module via LLVM `lli` and asserting correct return values across all declared input vectors. This prevents regressions where lifted IR looks structurally correct (passes pattern checks) but computes wrong results.
+For larger control-flow, semantics, or inlining changes, also run `python test.py vmp` to make sure the stable local VMProtect targets still lift without hard regression.
 
 
 ## Runtime semantic regression
@@ -130,22 +131,19 @@ Samples without a `semantic` field are not tested. The `semantic` field is optio
 
 ### Coverage summary
 
-| Category | Samples | Total cases |
-|---|---|---|
-| Constant-return (no inputs) | 8 | 8 |
-| Single-input branching | 12 | 87 |
-| Multi-input | 1 | 5 |
-| Jump-table dispatch (absolute qword) | 2 | 16 |
-| Jump-table dispatch (rel32 / shifted / shared / computation) | 4 | 31 |
-| Jump-table dispatch (C-compiled /O2, 16-case) | 1 | 10 |
-| **Total** | **28** | **146** (+ 1 skipped: calc_cout) |
+Current active quick-gate semantic coverage is **30 samples / 165 cases**.
+
+Notable current state:
+- `dummy_vm_loop` and `bytecode_vm_loop` remain active VM-shaped control-flow samples.
+- `stack_vm_loop` and `calc_sum_to_n` are currently marked `skip` because the safe VMP configuration disables loop-header generalization and these samples still exceed the block budget without it.
+- `calc_cout` remains `ci_skip` because its C++ codegen is toolchain-dependent on CI.
 
 ## Call-boundary ABI framework
 
 The lifter includes a cross-ABI call-boundary contract (`AbiCallContract.hpp`) that models:
 
 - **ABI kind**: x64 MSVC, x86 cdecl/stdcall/fastcall, unknown
-- **Call model mode**: `compat` (default) or `strict` (opt-in)
+- **Call model mode**: `strict` (default) or `compat` (diagnostic fallback)
 - **Call effects**: argument registers, return registers, volatile clobber set, stack cleanup convention, memory effect assumption
 
 ### Dual-mode behavior
diff --git a/docs/SCOPE.md b/docs/SCOPE.md
index 77d2112..8042ef2 100644
--- a/docs/SCOPE.md
+++ b/docs/SCOPE.md
@@ -1,42 +1,44 @@
 # Scope
 
-## Purpose
+This file owns the support matrix and quality contract. For pipeline order and invariants, use `ARCHITECTURE.md`. For build/test workflow, use `docs/BUILDING.md` and `docs/REWRITE_BASELINE.md`.
 
-Mergen is a function-level LLVM IR lifting engine for deobfuscation and devirtualization of x64 protected functions. It translates obfuscated native code into LLVM IR, enabling standard compiler optimizations to recover readable control flow and semantics from virtualized or mutated instruction streams.
+## Purpose
+Mergen is a function-level LLVM IR lifting engine for deobfuscation and devirtualization of x64 protected functions. It lifts one target function from a PE binary into LLVM IR so downstream optimization and analysis can recover readable control flow and semantics.
 
 ## Supported
-
 | Area | Details |
-|------|---------|
-| Architecture | x86-64 (PE binaries) |
-| Instruction set | 115 handlers covering general-purpose integer, BMI1/BMI2, bit manipulation, string ops, conditional moves, flag manipulation, and SSE2 integer XMM ops (`MOVDQA`, `PAND`, `POR`, `PXOR`) |
-| Control flow | Linear, conditional branches (2-way), direct jumps, call/ret, multi-target jump tables (absolute qword, RIP-relative dword offset, base-shifted, shared targets) |
-| Output | LLVM IR (text), optimizable via LLVM pass pipeline |
-| Calling convention awareness | x64 Microsoft ABI (cross-ABI framework: x64 MSVC, x86 cdecl/stdcall/fastcall). Dual-mode: `compat` (default, preserves exploration stability) and `strict` (ABI-enforced clobber/memory effects, opt-in). |
-| Optimization profiles | safe, aggressive, debug (planned — Phase 2) |
+|---|---|
+| Architecture | x86-64 PE binaries |
+| Instruction set | 115 handlers covering general-purpose integer ops, BMI1/BMI2, bit manipulation, string ops, conditional moves, flag manipulation, and SSE2 integer XMM ops (`MOVDQA`, `PAND`, `POR`, `PXOR`) |
+| Control flow | Linear flow, 2-way branches, direct jumps, call/ret, and tested multi-target jump-table shapes (absolute qword, RIP-relative dword offset, shifted-base, shared-target) |
+| Output | LLVM IR text suitable for LLVM optimization passes |
+| Call-boundary model | Cross-ABI framework for x64 MSVC and x86 cdecl/stdcall/fastcall; `strict` is the operational default, `compat` remains available as a diagnostic fallback |
+| Determinism | Canonical naming and golden-hash verification are part of the current contract |
 
 ## Unsupported / Known Limitations
-
 | Limitation | Status |
-|------------|--------|
-| Indirect jumps with >2 targets (jump tables) | Tested: absolute qword tables, rel32 offset tables, shifted-base tables, shared-target tables, symbolic computation in case bodies. IR quality note: switch dispatches on concrete target addresses, not logical case indices. |
-| Floating-point / wider SSE / AVX instructions (outside `MOVDQA`, `PAND`, `POR`, `PXOR`) | Not lifted |
+|---|---|
+| Floating-point / wider SSE / AVX outside the listed SSE2 integer ops | Not lifted |
 | Self-modifying code | Not supported |
-| Multi-function / whole-binary lifting | Single function scope only |
-| ELF / Mach-O / non-PE formats | Not supported |
-| 32-bit x86 | Not supported |
+| Whole-binary lifting | Out of scope; Mergen is function-level |
+| Non-PE formats | Not supported |
+| 32-bit x86 lifting | Not supported |
 | ARM / RISC-V / other architectures | Not supported |
-| Automatic ABI/prototype normalization | Stage 2 complete: prototype minimization strips unused parameters post-optimization. Call-boundary ABI contract with dual-mode (strict/compat). |
-| Deterministic output | Implemented: CanonicalNamingPass strips address-derived suffixes from block/value names. Same input produces identical IR across rebuilds. Golden hash verification in CI. |
+| Jump-table IR quality | Supported shapes still dispatch on concrete target addresses, not logical case indices |
+| Loop-header generalization | Temporarily disabled while the team keeps required VMP 3.8.x targets on the safe high-budget path |
 
 ## Tested Protectors
-
-- **VMProtect** — examples exist; reliability varies by protection level.
-- **Themida** — examples exist; reliability varies by protection level.
+- VMProtect — examples exist; reliability varies by protection level
+- Themida — examples exist; reliability varies by protection level
 
 ## Quality Contract
+- Handler coverage: 112/115 handlers with oracle-backed verification
+- Active regression corpus: 30 semantic samples / 165 runtime semantic cases; `stack_vm_loop` and `calc_sum_to_n` are tracked known limitations, and `calc_cout` remains CI-skipped because its C++ codegen is toolchain-dependent
+- Determinism: golden IR hashes are enforced for tracked outputs
+- CI gates: register/flag correctness, rewrite baseline, semantic regression, and Windows build lanes
+- Targeted VMP gate: `python test.py vmp` must keep required 3.8.x targets at `blocks_completed > 0`; VMP 3.6 remains best-effort only
 
-- Handler test coverage: 97.4% (112/115 with oracle verification against Unicorn).
-- Regression corpus: 28 active samples, 146 semantic test cases, 42 golden IR hashes (asm-only; C-compiled excluded as address-dependent).
-- Jump table coverage: 7 samples across 6 patterns (absolute, rel32, shifted, shared, computation, C-compiled /O2).
-- CI gates enforce register and flag correctness.
+## Non-goals
+- General-purpose decompilation
+- Multi-function whole-program recovery
+- Broad architecture expansion before x64 protected-function reliability improves
diff --git a/lifter/analysis/CustomPasses.hpp b/lifter/analysis/CustomPasses.hpp
index 1cf4dbd..b73dedd 100644
--- a/lifter/analysis/CustomPasses.hpp
+++ b/lifter/analysis/CustomPasses.hpp
@@ -127,6 +127,28 @@ public:
       uint64_t min_offset = UINT64_MAX;
       uint64_t max_offset = 0;
       bool found_any = false;
+      auto getMaxAccessBytes = [&](llvm::GetElementPtrInst* GEP) -> uint64_t {
+        uint64_t maxBytes = 1;
+        for (auto* User : GEP->users()) {
+          if (auto* LI = llvm::dyn_cast<llvm::LoadInst>(User)) {
+            maxBytes = std::max(
+                maxBytes,
+                static_cast<uint64_t>(
+                    M.getDataLayout().getTypeStoreSize(LI->getType()).getFixedValue()));
+            continue;
+          }
+          if (auto* SI = llvm::dyn_cast<llvm::StoreInst>(User)) {
+            if (SI->getPointerOperand() == GEP) {
+              maxBytes = std::max(
+                  maxBytes,
+                  static_cast<uint64_t>(M.getDataLayout()
+                                           .getTypeStoreSize(SI->getValueOperand()->getType())
+                                           .getFixedValue()));
+            }
+          }
+        }
+        return maxBytes;
+      };
       struct PendingGEP {
         llvm::GetElementPtrInst* gep;
         bool constant_offset;
@@ -144,8 +166,10 @@ public:
           if (auto* CI = dyn_cast<ConstantInt>(OffOp)) {
             uint64_t val = CI->getZExtValue();
             if (isStackAddress(val)) {
+              const uint64_t accessBytes = getMaxAccessBytes(GEP);
               min_offset = std::min(min_offset, val);
-              max_offset = std::max(max_offset, val);
+              max_offset =
+                  std::max(max_offset, val + std::max<uint64_t>(1, accessBytes) - 1);
               found_any = true;
               pending.push_back({GEP, true, val});
             }
@@ -155,10 +179,12 @@ public:
           auto offsetKB = computeKnownBits(OffOp, M.getDataLayout());
           uint64_t kb_min = offsetKB.getMinValue().getZExtValue();
           uint64_t kb_max = offsetKB.getMaxValue().getZExtValue();
+          const uint64_t accessBytes = getMaxAccessBytes(GEP);
           // Accept if the entire KnownBits range falls within stack bounds.
           if (isStackAddress(kb_min) && isStackAddress(kb_max)) {
             min_offset = std::min(min_offset, kb_min);
-            max_offset = std::max(max_offset, kb_max);
+            max_offset =
+                std::max(max_offset, kb_max + std::max<uint64_t>(1, accessBytes) - 1);
             found_any = true;
             pending.push_back({GEP, false, 0});
           } else if (auto* SI = dyn_cast<SelectInst>(OffOp)) {
@@ -169,7 +195,10 @@ public:
               uint64_t fv = cast<ConstantInt>(SI->getFalseValue())->getZExtValue();
               if (isStackAddress(tv) && isStackAddress(fv)) {
                 min_offset = std::min({min_offset, tv, fv});
-                max_offset = std::max({max_offset, tv, fv});
+                max_offset = std::max(
+                    {max_offset,
+                     tv + std::max<uint64_t>(1, accessBytes) - 1,
+                     fv + std::max<uint64_t>(1, accessBytes) - 1});
                 found_any = true;
                 pending.push_back({GEP, false, 0});
               }
@@ -219,8 +248,21 @@ public:
   Value* mem = nullptr;
   x86_64FileReader file;
   MemoryPolicy mempolicy;
-  GEPLoadPass(Value* val, uint8_t* filebase, MemoryPolicy mempolicy)
-      : mem(val), file(filebase), mempolicy(mempolicy){};
+  uint64_t stackLower;
+  uint64_t stackUpper;
+  GEPLoadPass(Value* val, uint8_t* filebase, MemoryPolicy mempolicy,
+              uint64_t reserve)
+      : mem(val),
+        file(filebase),
+        mempolicy(mempolicy),
+        stackLower(reserve <= STACKP_VALUE ? STACKP_VALUE - reserve : 0),
+        stackUpper(STACKP_VALUE + reserve) {
+    assert(reserve <= STACKP_VALUE &&
+           "reserve exceeds STACKP_VALUE; stackLower would underflow");
+  }
+  bool isTrackedStackAddress(uint64_t val) const {
+    return val >= stackLower && val <= stackUpper;
+  }
 
   llvm::PreservedAnalyses run(llvm::Module& M, llvm::ModuleAnalysisManager&) {
     bool hasChanged = false;
@@ -241,6 +283,7 @@ public:
           if (!ConstInt) continue;
 
           uint64_t constintvalue = ConstInt->getZExtValue();
+          if (isTrackedStackAddress(constintvalue)) continue;
           if (mempolicy.isSymbolic(constintvalue)) continue;
 
           if (!file.address_to_mapped_address(constintvalue)) continue;
@@ -397,115 +440,6 @@ public:
   }
 };
 
-// Removes unused function arguments from lifted function signatures.
-// Runs after all optimization passes so that dead arguments are truly unused.
-// Creates a new function with only live parameters, remaps the body, and
-// replaces all call sites.
-class PrototypeMinimizationPass
-    : public llvm::PassInfoMixin<PrototypeMinimizationPass> {
-public:
-  llvm::PreservedAnalyses run(llvm::Module& M, llvm::ModuleAnalysisManager&) {
-    bool changed = false;
-
-    // Collect functions first to avoid mutating the module while iterating.
-    llvm::SmallVector<llvm::Function*, 4> funcs;
-    for (auto& F : M) {
-      if (F.isDeclaration()) continue;
-      funcs.push_back(&F);
-    }
-
-    for (auto* F : funcs) {
-      // Identify which arguments are live (have at least one use).
-      llvm::SmallVector<unsigned, 16> liveArgIndices;
-      for (auto& Arg : F->args()) {
-        if (!Arg.use_empty())
-          liveArgIndices.push_back(Arg.getArgNo());
-      }
-
-      // Nothing to do if all args are used.
-      if (liveArgIndices.size() == F->arg_size()) continue;
-
-      // Skip functions that have non-call users (bitcasts, constant exprs,
-      // aliases, function pointer stores). We can only safely rewrite plain
-      // CallInst users; anything else would be left pointing at an empty shell.
-      bool hasUnsupportedUsers = false;
-      for (auto* U : F->users()) {
-        if (!llvm::isa<llvm::CallInst>(U)) {
-          hasUnsupportedUsers = true;
-          break;
-        }
-      }
-      if (hasUnsupportedUsers) continue;
-
-      // Build the new function type with only live parameters.
-      llvm::SmallVector<llvm::Type*, 8> newParamTypes;
-      for (unsigned idx : liveArgIndices)
-        newParamTypes.push_back(F->getFunctionType()->getParamType(idx));
-
-      auto* newFnTy = llvm::FunctionType::get(
-          F->getReturnType(), newParamTypes, false);
-
-      // Save the name before we create the new function.
-      std::string origName = F->getName().str();
-
-      // Rename old function to make room for the new one.
-      F->setName(origName + ".old");
-
-      auto* newFn = llvm::Function::Create(
-          newFnTy, F->getLinkage(), origName, &M);
-
-      // Name the new function's arguments after the originals.
-      for (unsigned i = 0; i < liveArgIndices.size(); ++i) {
-        llvm::Argument* oldArg = F->getArg(liveArgIndices[i]);
-        newFn->getArg(i)->setName(oldArg->getName());
-      }
-
-      // Move the function body: splice all basic blocks into newFn.
-      newFn->splice(newFn->begin(), F);
-
-      // Remap old live args to new args.
-      for (unsigned i = 0; i < liveArgIndices.size(); ++i) {
-        llvm::Argument* oldArg = F->getArg(liveArgIndices[i]);
-        llvm::Argument* newArg = newFn->getArg(i);
-        oldArg->replaceAllUsesWith(newArg);
-      }
-
-      // Dead args have no uses (that's why they're dead), but defensively
-      // replace with undef to satisfy the verifier before erasure.
-      for (auto& Arg : F->args()) {
-        if (!Arg.use_empty())
-          Arg.replaceAllUsesWith(llvm::UndefValue::get(Arg.getType()));
-      }
-
-      // Update call sites that directly call the old function.
-      // Only handle plain CallInst; InvokeInst/CallBrInst would need
-      // successor edge preservation which the lifter never emits.
-      llvm::SmallVector<llvm::CallInst*, 4> callsToRewrite;
-      for (auto* U : F->users()) {
-        if (auto* CI = llvm::dyn_cast<llvm::CallInst>(U))
-          callsToRewrite.push_back(CI);
-      }
-      for (auto* CI : callsToRewrite) {
-        llvm::SmallVector<llvm::Value*, 8> newCallArgs;
-        for (unsigned idx : liveArgIndices)
-          newCallArgs.push_back(CI->getArgOperand(idx));
-
-        llvm::IRBuilder<> B(CI);
-        auto* newCall = B.CreateCall(newFn, newCallArgs);
-        newCall->setCallingConv(CI->getCallingConv());
-        if (!CI->getType()->isVoidTy())
-          CI->replaceAllUsesWith(newCall);
-        CI->eraseFromParent();
-      }
-
-      F->eraseFromParent();
-      changed = true;
-    }
-
-    return changed ? llvm::PreservedAnalyses::none()
-                   : llvm::PreservedAnalyses::all();
-  }
-};
 
 // Normalizes switch instructions that dispatch on concrete addresses back to
 // logical case indices. The lifter's GEPLoadPass folds memory reads from jump
diff --git a/lifter/analysis/PathSolver.ipp b/lifter/analysis/PathSolver.ipp
index fc89695..c7f8677 100644
--- a/lifter/analysis/PathSolver.ipp
+++ b/lifter/analysis/PathSolver.ipp
@@ -14,6 +14,7 @@
 #include <llvm/IR/Instructions.h>
 #include <llvm/IR/Module.h>
 #include <llvm/IR/Value.h>
+#include <llvm/IR/CFG.h>
 #include <llvm/Support/Casting.h>
 #include <limits>
 
@@ -42,6 +43,57 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
     return target;
   };
 
+  struct ResolvedTargetBlock {
+    BasicBlock* block;
+    bool reusedBackedge;
+    bool generalizedBackup;
+  };
+
+  auto resolveTargetBlock = [&](uint64_t target, const std::string& name)
+      -> ResolvedTargetBlock {
+    if (auto* reused = getLiftedBackedgeBB(target)) {
+      return {reused, true, false};
+    }
+
+    const bool backwardVisitedTarget =
+        visitedAddresses.contains(target) &&
+        target <= blockInfo.block_address;
+    auto it = addrToBB.find(target);
+    const bool generalizedHeaderLooksSimple =
+        it == addrToBB.end() || !it->second || llvm::pred_size(it->second) <= 1;
+    const bool wantsGeneralization =
+        currentPathSolveAllowsLoopGeneralization() &&
+        generalizedHeaderLooksSimple &&
+        !generalizedLoopAddresses.contains(target) &&
+        backwardVisitedTarget;
+    if (wantsGeneralization) {
+      if (currentPathSolveContext == PathSolveContext::DirectJump) {
+        stackBypassGeneralizedLoopAddresses.insert(target);
+      }
+      const bool generalizedBackup =
+          stackBypassGeneralizedLoopAddresses.contains(target);
+      if (pendingLoopGeneralizationAddresses.contains(target) &&
+          it != addrToBB.end() && it->second && it->second->empty()) {
+        return {it->second, false, generalizedBackup};
+      }
+      pendingLoopGeneralizationAddresses.insert(target);
+      if (it != addrToBB.end() && it->second && !it->second->empty()) {
+        return {replaceWithGeneralizedLoopBlock(target, name), false,
+                generalizedBackup};
+      }
+      return {getOrCreateBB(target, name), false, generalizedBackup};
+    }
+
+    return {getOrCreateBB(target, name), false, false};
+  };
+
+  auto backupQueuedTarget = [&](BasicBlock* targetBlock, bool generalizedBackup) {
+    if (targetBlock == liftAbortBlock) {
+      return;
+    }
+    branch_backup(targetBlock, generalizedBackup);
+  };
+
   // do static polymorphism here
 
   PATH_info result = PATH_unsolved;
@@ -51,12 +103,15 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
     result = PATH_solved;
     run = 0;
 
-    auto bb_solved = getOrCreateBB(dest, "bb_solved_const");
+    auto resolved = resolveTargetBlock(dest, "bb_solved_const");
 
-    builder->CreateBr(bb_solved);
-    blockInfo = BBInfo(dest, bb_solved);
-    printvalue2("pushing block");
-    unvisitedBlocks.push_back(blockInfo);
+    builder->CreateBr(resolved.block);
+    if (!resolved.reusedBackedge) {
+      blockInfo = BBInfo(dest, resolved.block);
+      printvalue2("pushing block");
+      unvisitedBlocks.push_back(blockInfo);
+      backupQueuedTarget(blockInfo.block, resolved.generalizedBackup);
+    }
 
     return result;
   }
@@ -68,12 +123,15 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
       std::cout << "Solved the constraint and moving to next path\n"
                 << std::flush;
 
-      auto bb_solved = getOrCreateBB(dest, "bb_solved");
+      auto resolved = resolveTargetBlock(dest, "bb_solved");
 
-      builder->CreateBr(bb_solved);
-      blockInfo = BBInfo(dest, bb_solved);
-      printvalue2("pushing block");
-      unvisitedBlocks.push_back(blockInfo);
+      builder->CreateBr(resolved.block);
+      if (!resolved.reusedBackedge) {
+        blockInfo = BBInfo(dest, resolved.block);
+        printvalue2("pushing block");
+        unvisitedBlocks.push_back(blockInfo);
+        backupQueuedTarget(blockInfo.block, resolved.generalizedBackup);
+      }
 
       return solved;
     }
@@ -89,11 +147,14 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
     dest = normalizeTargetAddress(pv[0].getZExtValue());
     result = PATH_solved;
 
-    auto bb_solved = getOrCreateBB(dest, "bb_single");
-    builder->CreateBr(bb_solved);
-    blockInfo = BBInfo(dest, bb_solved);
-    printvalue2("pushing block");
-    unvisitedBlocks.push_back(blockInfo);
+    auto resolved = resolveTargetBlock(dest, "bb_single");
+    builder->CreateBr(resolved.block);
+    if (!resolved.reusedBackedge) {
+      blockInfo = BBInfo(dest, resolved.block);
+      printvalue2("pushing block");
+      unvisitedBlocks.push_back(blockInfo);
+      backupQueuedTarget(blockInfo.block, resolved.generalizedBackup);
+    }
     return result;
   }
   if (pv.size() == 2) {
@@ -154,8 +215,10 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
         normalizeTargetAddress(firstcase.getZExtValue());
     const uint64_t secondTarget =
         normalizeTargetAddress(secondcase.getZExtValue());
-    auto bb_true = getOrCreateBB(firstTarget, "bb_true");
-    auto bb_false = getOrCreateBB(secondTarget, "bb_false");
+    auto trueTarget = resolveTargetBlock(firstTarget, "bb_true");
+    auto falseTarget = resolveTargetBlock(secondTarget, "bb_false");
+    auto* bb_true = trueTarget.block;
+    auto* bb_false = falseTarget.block;
     printvalue(condition);
     auto BR = builder->CreateCondBr(condition, bb_true, bb_false);
 
@@ -177,22 +240,34 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
     // lifters.push_back(newlifter);
 
     // store mem&reg info for BB
-    addUnvisitedAddr(blockInfo);
-    addUnvisitedAddr(newblock);
+    if (!falseTarget.reusedBackedge && blockInfo.block != liftAbortBlock) {
+      addUnvisitedAddr(blockInfo);
+    }
+    if (!trueTarget.reusedBackedge && newblock.block != liftAbortBlock) {
+      addUnvisitedAddr(newblock);
+    }
 
     // Constant conditions are already resolved — only track assumptions
     // for instruction-produced conditions that need runtime disambiguation.
     if (auto* condInst = dyn_cast<Instruction>(condition)) {
-      assumptions[condInst] = 0;
-      branch_backup(blockInfo.block);
+      if (!falseTarget.reusedBackedge && blockInfo.block != liftAbortBlock) {
+        assumptions[condInst] = 0;
+        backupQueuedTarget(blockInfo.block, falseTarget.generalizedBackup);
+      }
 
-      this->assumptions[condInst] = 1;
-      branch_backup(newblock.block);
+      if (!trueTarget.reusedBackedge && newblock.block != liftAbortBlock) {
+        this->assumptions[condInst] = 1;
+        backupQueuedTarget(newblock.block, trueTarget.generalizedBackup);
+      }
     } else {
       // Condition is a constant (e.g., from folded ICMP). Both branches
       // are statically determined — back them up without assumption state.
-      branch_backup(blockInfo.block);
-      branch_backup(newblock.block);
+      if (!falseTarget.reusedBackedge && blockInfo.block != liftAbortBlock) {
+        backupQueuedTarget(blockInfo.block, falseTarget.generalizedBackup);
+      }
+      if (!trueTarget.reusedBackedge && newblock.block != liftAbortBlock) {
+        backupQueuedTarget(newblock.block, trueTarget.generalizedBackup);
+      }
     }
 
     debugging::doIfDebug([&]() {
@@ -209,8 +284,12 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
     // Default must stay unresolved because computePossibleValues() is heuristic.
     unsigned bitWidth = simplifyValue->getType()->getIntegerBitWidth();
 
-    auto* bb_default_unresolved = BasicBlock::Create(
-        function->getContext(), "bb_switch_default_unresolved", function);
+    auto* bb_default_unresolved =
+        createBudgetedBasicBlock("bb_switch_default_unresolved", current_address);
+    if (bb_default_unresolved == liftAbortBlock) {
+      builder->CreateRet(UndefValue::get(function->getReturnType()));
+      return PATH_unsolved;
+    }
     DTU->applyUpdates(
         {{DominatorTree::Insert, this->blockInfo.block, bb_default_unresolved}});
 
diff --git a/lifter/core/LiftDiagnostics.hpp b/lifter/core/LiftDiagnostics.hpp
index ffd3fdb..729d97e 100644
--- a/lifter/core/LiftDiagnostics.hpp
+++ b/lifter/core/LiftDiagnostics.hpp
@@ -45,7 +45,9 @@ enum class DiagCode : uint16_t {
   // Pipeline (5xx)
   LiftComplete              = 500,
   OptimizationComplete      = 501,
-  SignatureSearchComplete    = 502,
+  SignatureSearchComplete   = 502,
+  LiftBlockBudgetExceeded   = 503,
+
 };
 
 struct DiagnosticEntry {
@@ -90,6 +92,7 @@ public:
   const std::vector<DiagnosticEntry>& getEntries() const { return entries; }
   size_t size() const { return entries.size(); }
   bool empty() const { return entries.empty(); }
+  bool hasErrors() const { return countBySeverity(DiagSeverity::Error) != 0; }
 
   size_t countBySeverity(DiagSeverity sev) const {
     size_t n = 0;
diff --git a/lifter/core/LiftDriver.hpp b/lifter/core/LiftDriver.hpp
index 71718b3..ad250a1 100644
--- a/lifter/core/LiftDriver.hpp
+++ b/lifter/core/LiftDriver.hpp
@@ -15,7 +15,13 @@ inline void runLiftWorklist(lifterConcolic<>* lifter) {
     }
 
     filter = true;
-    lifter->load_backup(bbinfo.block);
+    if (lifter->currentBlockUsesGeneralizedLoopState()) {
+      lifter->load_generalized_backup(bbinfo.block);
+    } else {
+      lifter->load_backup(bbinfo.block);
+    }
+    lifter->currentBlockRestoreMode =
+        lifterConcolic<>::BlockRestoreMode::Normal;
     lifter->finished = 0;
 
     // Speculative call bail-out: the callee exceeded the inline budget.
diff --git a/lifter/core/Lifter.cpp b/lifter/core/Lifter.cpp
index fab5393..72a8ab8 100644
--- a/lifter/core/Lifter.cpp
+++ b/lifter/core/Lifter.cpp
@@ -48,7 +48,7 @@ bool InitFunction_and_LiftInstructions(const uint64_t runtime_address,
 
   runLifterPipeline(stageContext->lifter.get(), stageContext->runtimeContext,
                     fileData.data(), fileData);
-  return true;
+  return !stageContext->lifter->diagnostics.hasErrors();
 }
 
 // #define TEST
diff --git a/lifter/core/LifterClass.hpp b/lifter/core/LifterClass.hpp
index 748c1a4..4181f92 100644
--- a/lifter/core/LifterClass.hpp
+++ b/lifter/core/LifterClass.hpp
@@ -230,10 +230,13 @@ concept lifterConcept = Registers<R> && requires(T t) {
     t.SetRegisterValue_impl(std::declval<R>(), std::declval<llvm::Value*>())
   } -> std::same_as<void>;
   {
-    t.branch_backup_impl(std::declval<llvm::BasicBlock*>())
+    t.branch_backup_impl(std::declval<llvm::BasicBlock*>(), false)
   } -> std::same_as<void>;
   {
-    t.branch_backup_impl(std::declval<llvm::BasicBlock*>())
+    t.load_backup_impl(std::declval<llvm::BasicBlock*>())
+  } -> std::same_as<void>;
+  {
+    t.load_generalized_backup_impl(std::declval<llvm::BasicBlock*>())
   } -> std::same_as<void>;
 };
 
@@ -281,6 +284,27 @@ public:
   Disassembler dis;
   MemoryPolicy memoryPolicy;
   uint64_t stackReserve = 0x1000; // clamped reserve, set by configureDefaultMemoryPolicy
+  bool isTrackedStackAddress(uint64_t address) const {
+    const uint64_t stackLower =
+        stackReserve <= STACKP_VALUE ? STACKP_VALUE - stackReserve : 0;
+    const uint64_t stackUpper = STACKP_VALUE + stackReserve;
+    return address >= stackLower && address <= stackUpper;
+  }
+  bool isTrackedLocalStackAddress(uint64_t address) const {
+    return isTrackedStackAddress(address) && address < STACKP_VALUE;
+  }
+
+  enum class BlockRestoreMode {
+    Normal,
+    GeneralizedLoop,
+  };
+  enum class PathSolveContext {
+    Unknown,
+    ConditionalBranch,
+    DirectJump,
+    IndirectJump,
+    Ret,
+  };
   FunctionInlinePolicy inlinePolicy;
 
   // ABI call-boundary configuration.
@@ -371,6 +395,27 @@ public:
   SpeculativeCallInfo speculativeCall;
   uint32_t speculativeCallBudget    = 0;   // instructions remaining (0 = inactive)
   uint32_t maxCallInlineBudget      = 0;   // 0 = disabled (no speculative limit)
+  bool liftBudgetExceeded          = false;
+  uint32_t maxBasicBlockBudget     = 4096;  // 0 = disabled
+  llvm::BasicBlock* liftAbortBlock = nullptr;
+  bool bypassStackConcolicTracking = false;
+  BlockRestoreMode currentBlockRestoreMode = BlockRestoreMode::Normal;
+  PathSolveContext currentPathSolveContext = PathSolveContext::Unknown;
+  
+  class ScopedPathSolveContext {
+    lifterClassBase<Derived, Mnemonic, Register, DisassemblerBase>* lifter;
+    PathSolveContext previous;
+  
+  public:
+    ScopedPathSolveContext(
+        lifterClassBase<Derived, Mnemonic, Register, DisassemblerBase>* lifter,
+        PathSolveContext next)
+        : lifter(lifter), previous(lifter->currentPathSolveContext) {
+      lifter->currentPathSolveContext = next;
+    }
+  
+    ~ScopedPathSolveContext() { lifter->currentPathSolveContext = previous; }
+  };
 
   void runDisassembler(const void* buffer, size_t size = 15) {
 
@@ -445,13 +490,24 @@ public:
   }
 
   // useless in symbolic?
-  void branch_backup(BasicBlock* bb) {
-    static_cast<Derived*>(this)->branch_backup_impl(bb);
+  void branch_backup(BasicBlock* bb, bool generalized = false) {
+    static_cast<Derived*>(this)->branch_backup_impl(bb, generalized);
   }
   // useless in symbolic?
   void load_backup(BasicBlock* bb) {
     static_cast<Derived*>(this)->load_backup_impl(bb);
   }
+  void load_generalized_backup(BasicBlock* bb) {
+    static_cast<Derived*>(this)->load_generalized_backup_impl(bb);
+  }
+  bool currentBlockUsesGeneralizedLoopState() const {
+    return currentBlockRestoreMode == BlockRestoreMode::GeneralizedLoop;
+  }
+  bool currentPathSolveAllowsLoopGeneralization() const {
+    // Disabled until the generalized loop heuristics can preserve real VMP exit
+    // paths without collapsing required 3.8.x targets into non-terminating IR.
+    return false;
+  }
 
   void liftBasicBlockFromAddress(uint64_t addr) {
     ++liftStats.blocks_attempted;
@@ -505,6 +561,16 @@ public:
       ++liftStats.blocks_completed;
   }
 
+  void sealIncompleteBlocks() {
+    for (auto& BB : *fnc) {
+      if (BB.getTerminator())
+        continue;
+      llvm::IRBuilder<> fallbackBuilder(&BB);
+      fallbackBuilder.CreateRet(llvm::UndefValue::get(fnc->getReturnType()));
+    }
+  }
+
+
   bool addUnvisitedAddr(BBInfo& bb) {
     printvalue2(bb.block_address);
     printvalue2("added");
@@ -516,6 +582,11 @@ public:
   filter : filter for empty blocks
   */
   bool getUnvisitedAddr(BBInfo& out, bool filter = 0) {
+    if (liftBudgetExceeded) {
+      unvisitedBlocks.clear();
+      sealIncompleteBlocks();
+      return false;
+    }
     while (!unvisitedBlocks.empty()) {
       out = std::move(unvisitedBlocks.back());
       unvisitedBlocks.pop_back();
@@ -530,6 +601,20 @@ public:
 
       printvalue2("adding :" + std::to_string(out.block_address) +
                   out.block->getName());
+      const bool usesGeneralizedLoopState =
+          pendingLoopGeneralizationAddresses.contains(out.block_address) ||
+          generalizedLoopAddresses.contains(out.block_address);
+      const bool bypassesStackTracking =
+          usesGeneralizedLoopState &&
+          stackBypassGeneralizedLoopAddresses.contains(out.block_address);
+      bypassStackConcolicTracking = bypassesStackTracking;
+      currentBlockRestoreMode = bypassesStackTracking
+                                   ? BlockRestoreMode::GeneralizedLoop
+                                   : BlockRestoreMode::Normal;
+      if (pendingLoopGeneralizationAddresses.contains(out.block_address)) {
+        pendingLoopGeneralizationAddresses.erase(out.block_address);
+        generalizedLoopAddresses.insert(out.block_address);
+      }
       visitedAddresses.insert(out.block_address);
       blockInfo = out;
       return true;
@@ -626,10 +711,67 @@ public:
   // todo : std::set
   std::vector<BBInfo> unvisitedBlocks;
   std::set<uint64_t> visitedAddresses;
+  std::set<uint64_t> generalizedLoopAddresses;
+  std::set<uint64_t> pendingLoopGeneralizationAddresses;
+  std::set<uint64_t> stackBypassGeneralizedLoopAddresses;
   llvm::DenseMap<uint64_t, llvm::BasicBlock*> addrToBB;
 
   // creates an edge to created bb
   // TODO: wrapper for createbr, condbr, switch and update it there.
+  BasicBlock* createBudgetedBasicBlock(const std::string& name, uint64_t diagAddr) {
+    if (maxBasicBlockBudget > 0 && fnc->size() >= maxBasicBlockBudget) {
+      if (!liftBudgetExceeded) {
+        diagnostics.error(
+            DiagCode::LiftBlockBudgetExceeded, diagAddr,
+            "Basic-block budget exceeded during lifting; aborting to avoid loop/state explosion");
+        liftBudgetExceeded = true;
+      }
+      if (!liftAbortBlock) {
+        liftAbortBlock =
+            BasicBlock::Create(context, "bb_lift_budget_exceeded", fnc);
+        llvm::IRBuilder<> abortBuilder(liftAbortBlock);
+        abortBuilder.CreateRet(llvm::UndefValue::get(fnc->getReturnType()));
+      }
+      return liftAbortBlock;
+    }
+
+    return BasicBlock::Create(context, name, fnc);
+  }
+
+
+  BasicBlock* replaceWithGeneralizedLoopBlock(uint64_t addr, const std::string& name) {
+    auto* newBlock = createBudgetedBasicBlock(name, addr);
+    if (newBlock == liftAbortBlock) {
+      return newBlock;
+    }
+
+    auto it = addrToBB.find(addr);
+    if (it != addrToBB.end() && it->second && it->second != newBlock) {
+      it->second->replaceAllUsesWith(newBlock);
+    }
+
+    addrToBB[addr] = newBlock;
+    return newBlock;
+  }
+
+
+  BasicBlock* getLiftedBackedgeBB(uint64_t addr) {
+    if (getControlFlow() != ControlFlow::Unflatten ||
+        !currentPathSolveAllowsLoopGeneralization()) {
+      return nullptr;
+    }
+    if (addr > blockInfo.block_address ||
+        !generalizedLoopAddresses.contains(addr)) {
+      return nullptr;
+    }
+    auto it = addrToBB.find(addr);
+    if (it == addrToBB.end() || it->second->empty()) {
+      return nullptr;
+    }
+    return it->second;
+  }
+
+
   BasicBlock* getOrCreateBB(uint64_t addr, std::string name) {
     if (getControlFlow() == ControlFlow::Basic) {
       auto it = addrToBB.find(addr);
@@ -638,7 +780,10 @@ public:
         return it->second;
       }
     }
-    auto bb = BasicBlock::Create(context, name, fnc);
+    auto bb = createBudgetedBasicBlock(name, addr);
+    if (bb == liftAbortBlock) {
+      return bb;
+    }
     addrToBB[addr] = bb;
     DTU->applyUpdates({{DominatorTree::Insert, this->blockInfo.block, bb}});
 
@@ -674,6 +819,7 @@ protected:
   }
 
 public:
+  void finalizeIncompleteBlocks() { sealIncompleteBlocks(); }
   AliasAnalysis* AA;
   DominatorTree* DT;
   PostDominatorTree* PDT;
diff --git a/lifter/core/LifterClass_Concolic.hpp b/lifter/core/LifterClass_Concolic.hpp
index ff17013..de1d154 100644
--- a/lifter/core/LifterClass_Concolic.hpp
+++ b/lifter/core/LifterClass_Concolic.hpp
@@ -192,26 +192,55 @@ public:
   };
 
   llvm::DenseMap<BasicBlock*, backup_point> BBbackup;
-  void branch_backup_impl(BasicBlock* bb) {
-    //
 
+  backup_point make_generalized_loop_backup(const backup_point& source) {
+    backup_point generalized = source;
+    llvm::DenseMap<uint64_t, ValueByteReference> filteredBuffer;
+    filteredBuffer.reserve(source.buffer.size());
+    for (const auto& entry : source.buffer) {
+      if (!this->isTrackedLocalStackAddress(entry.first)) {
+        filteredBuffer[entry.first] = entry.second;
+      }
+    }
+    generalized.buffer = std::move(filteredBuffer);
+    generalized.cache = InstructionCache();
+    generalized.assumptions.clear();
+    return generalized;
+  }
+
+  void restore_backup_point(const backup_point& snapshot) {
+    vec = snapshot.vec;
+    vecflag = snapshot.vecflag;
+    this->buffer = snapshot.buffer;
+    this->cache = snapshot.cache;
+    this->assumptions = snapshot.assumptions;
+    this->counter = snapshot.ct;
+  }
+
+  void branch_backup_impl(BasicBlock* bb, bool generalized) {
     printvalue2("backing up");
     printvalue2(this->counter);
-    BBbackup[bb] = backup_point(vec, vecflag, this->buffer, this->cache,
-                                this->assumptions, this->counter);
+
+    auto snapshot = backup_point(vec, vecflag, this->buffer, this->cache,
+                                 this->assumptions, this->counter);
+    if (generalized) {
+      snapshot = make_generalized_loop_backup(snapshot);
+    }
+    BBbackup[bb] = std::move(snapshot);
   }
 
   void load_backup_impl(BasicBlock* bb) {
     if (BBbackup.contains(bb)) {
-
       printvalue2("loading backup");
-      backup_point bbinfo = BBbackup[bb];
-      vec = bbinfo.vec;
-      vecflag = bbinfo.vecflag;
-      this->buffer = bbinfo.buffer;
-      this->cache = bbinfo.cache;
-      this->assumptions = bbinfo.assumptions;
-      this->counter = bbinfo.ct;
+      restore_backup_point(BBbackup[bb]);
+    }
+  }
+
+  void load_generalized_backup_impl(BasicBlock* bb) {
+    if (BBbackup.contains(bb)) {
+      printvalue2("loading generalized backup");
+      auto snapshot = make_generalized_loop_backup(BBbackup[bb]);
+      restore_backup_point(snapshot);
     }
   }
 
diff --git a/lifter/core/LifterClass_Symbolic.hpp b/lifter/core/LifterClass_Symbolic.hpp
index e2c5a77..8147c41 100644
--- a/lifter/core/LifterClass_Symbolic.hpp
+++ b/lifter/core/LifterClass_Symbolic.hpp
@@ -132,7 +132,7 @@ public:
 
   constexpr ControlFlow getControlFlow_impl() { return ControlFlow::Basic; }
 
-  void branch_backup_impl(BasicBlock* bb) {
+  void branch_backup_impl(BasicBlock* bb, bool generalized) {
     //
     return;
   }
@@ -141,6 +141,10 @@ public:
     //
     return;
   }
+  void load_generalized_backup_impl(BasicBlock* bb) {
+    //
+    return;
+  }
   void createFunction_impl() {
     std::vector<llvm::Type*> argTypes;
 
diff --git a/lifter/core/LifterPipelineStages.hpp b/lifter/core/LifterPipelineStages.hpp
index 2f7317d..bb9cc9e 100644
--- a/lifter/core/LifterPipelineStages.hpp
+++ b/lifter/core/LifterPipelineStages.hpp
@@ -62,6 +62,7 @@ inline void reportLiftCompletionTiming(double elapsedMilliseconds) {
   });
 }
 inline void emitLiftOutputs(lifterConcolic<>* lifter, double elapsedMilliseconds) {
+  lifter->finalizeIncompleteBlocks();
   lifter->profiler.begin("write_unopt_ir");
   lifter->writeFunctionToFile("output_no_opts.ll");
   lifter->profiler.end();
diff --git a/lifter/core/MergenPB.hpp b/lifter/core/MergenPB.hpp
index 5308ef5..74b2318 100644
--- a/lifter/core/MergenPB.hpp
+++ b/lifter/core/MergenPB.hpp
@@ -12,6 +12,7 @@
 #include <llvm/Support/Casting.h>
 #include <llvm/Support/CommandLine.h>
 #include <llvm/Support/ErrorHandling.h>
+#include <llvm/Transforms/IPO/DeadArgumentElimination.h>
 
 using namespace llvm;
 
@@ -60,7 +61,7 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::run_opts() {
     }
 
     modulePassManager.addPass(
-        GEPLoadPass(memoryArg, this->fileBase, memoryPolicy));
+        GEPLoadPass(memoryArg, this->fileBase, memoryPolicy, this->stackReserve));
     modulePassManager.addPass(ReplaceTruncWithLoadPass());
     modulePassManager.addPass(PromotePseudoStackPass(memoryArg, this->stackReserve));
     modulePassManager.addPass(PromotePseudoMemory(memoryArg));
@@ -72,15 +73,32 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::run_opts() {
     changed = beforeSize != afterSize;
   } while (changed);
 
+  // Rebuild analysis state before the final O2 pipeline. The fixpoint loop above
+  // mutates the module repeatedly with custom passes; fresh managers keep the
+  // final optimization run aligned with standalone `opt -O2` on output_no_opts.ll.
+  llvm::LoopAnalysisManager finalLoopAnalysisManager;
+  llvm::FunctionAnalysisManager finalFunctionAnalysisManager;
+  llvm::CGSCCAnalysisManager finalCGSCCAnalysisManager;
+  llvm::ModuleAnalysisManager finalModuleAnalysisManager;
+
+  passBuilder.registerModuleAnalyses(finalModuleAnalysisManager);
+  passBuilder.registerCGSCCAnalyses(finalCGSCCAnalysisManager);
+  passBuilder.registerFunctionAnalyses(finalFunctionAnalysisManager);
+  passBuilder.registerLoopAnalyses(finalLoopAnalysisManager);
+  passBuilder.crossRegisterProxies(finalLoopAnalysisManager,
+                                   finalFunctionAnalysisManager,
+                                   finalCGSCCAnalysisManager,
+                                   finalModuleAnalysisManager);
+
   modulePassManager =
       passBuilder.buildPerModuleDefaultPipeline(llvm::OptimizationLevel::O2);
 
-  modulePassManager.run(*module, moduleAnalysisManager);
+  modulePassManager.run(*module, finalModuleAnalysisManager);
 
-  // Post-optimization passes: normalize IR, strip unused params, canonicalize names.
+  // Post-optimization passes: normalize IR, drop dead parameters, canonicalize names.
   llvm::ModulePassManager postPassManager;
   postPassManager.addPass(SwitchNormalizationPass());
-  postPassManager.addPass(PrototypeMinimizationPass());
+  postPassManager.addPass(llvm::DeadArgumentEliminationPass());
   postPassManager.addPass(CanonicalNamingPass());
-  postPassManager.run(*module, moduleAnalysisManager);
+  postPassManager.run(*module, finalModuleAnalysisManager);
 }
diff --git a/lifter/memory/GEPTracker.ipp b/lifter/memory/GEPTracker.ipp
index 0100daf..fde7e7a 100644
--- a/lifter/memory/GEPTracker.ipp
+++ b/lifter/memory/GEPTracker.ipp
@@ -90,6 +90,7 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(Value*)::retrieveCombinedValue(
     return extractBytes(orgLoad.get(), 0, byteCount);
   }
 
+
   LLVMContext& context = builder->getContext();
   if (byteCount == 0) {
     return nullptr;
@@ -368,6 +369,11 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::insertMemoryOp(StoreInst* inst) {
 
   auto gepOffsetCI = cast<ConstantInt>(gepOffset);
 
+  if (bypassStackConcolicTracking &&
+      isTrackedLocalStackAddress(gepOffsetCI->getZExtValue())) {
+    return;
+  }
+
   addValueReference(inst->getValueOperand(), gepOffsetCI->getZExtValue());
   // BinaryOperations::WriteTo(gepOffsetCI->getZExtValue());
 }
@@ -771,6 +777,11 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(Value*)::solveLoad(LazyValue load,
 
     auto loadOffsetCIval = loadOffsetCI->getZExtValue();
 
+    if (bypassStackConcolicTracking &&
+        isTrackedLocalStackAddress(loadOffsetCIval)) {
+      return load.get();
+    }
+
     auto valueExtractedFromVirtualStack =
         retrieveCombinedValue(loadOffsetCIval, cloadsize, load);
     if (valueExtractedFromVirtualStack) {
diff --git a/lifter/semantics/Semantics_ControlFlow.ipp b/lifter/semantics/Semantics_ControlFlow.ipp
index 9f25501..168766a 100644
--- a/lifter/semantics/Semantics_ControlFlow.ipp
+++ b/lifter/semantics/Semantics_ControlFlow.ipp
@@ -427,7 +427,8 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_ret() { // fix
   }
 
   SetRegisterValue(Register::RSP, rsp_result);
-
+  
+  ScopedPathSolveContext pathSolveContext(this, PathSolveContext::Ret);
   auto pathResult = solvePath(function, destination, realval);
   if (pathResult == PATH_unsolved) {
     ++liftStats.blocks_unreachable;
@@ -465,6 +466,10 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_jmp() {
   printvalue(trunc);
   uint64_t destination = 0;
   auto function = builder->GetInsertBlock()->getParent();
+  const bool isDirectJump = instruction.types[0] == OperandType::Immediate8 ||
+                            instruction.types[0] == OperandType::Immediate16 ||
+                            instruction.types[0] == OperandType::Immediate32 ||
+                            instruction.types[0] == OperandType::Immediate64;
   switch (instruction.types[0]) {
   case OperandType::Immediate8:
   case OperandType::Immediate16: // todo: test 8 and 16
@@ -476,6 +481,9 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_jmp() {
   default:
     break;
   }
+  ScopedPathSolveContext pathSolveContext(
+      this, isDirectJump ? PathSolveContext::DirectJump
+                         : PathSolveContext::IndirectJump);
   auto pathResult = solvePath(function, destination, trunc);
   if (pathResult == PATH_unsolved) {
     ++liftStats.blocks_unreachable;
diff --git a/lifter/semantics/Semantics_Helpers.ipp b/lifter/semantics/Semantics_Helpers.ipp
index b4a813b..941d81e 100644
--- a/lifter/semantics/Semantics_Helpers.ipp
+++ b/lifter/semantics/Semantics_Helpers.ipp
@@ -306,6 +306,8 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::branchHelper(
     next_jump = createSelectFolder(condition, false_jump, true_jump);
 
   uint64_t destination = 0;
+  ScopedPathSolveContext pathSolveContext(this,
+                                        PathSolveContext::ConditionalBranch);
   PATH_info pathInfo = solvePath(function, destination, next_jump);
   this->hadConditionalBranch = true;
   this->lastConditionalBranchResolved = (pathInfo != PATH_unsolved);
diff --git a/lifter/test/Tester.hpp b/lifter/test/Tester.hpp
index 0f34910..1b4c79e 100644
--- a/lifter/test/Tester.hpp
+++ b/lifter/test/Tester.hpp
@@ -412,6 +412,77 @@ private:
   }
 
 
+  bool runLoopGeneralizationDirectJumpBlocked(std::string& details) {
+    LifterUnderTest lifter;
+    lifter.currentPathSolveContext = LifterUnderTest::PathSolveContext::DirectJump;
+    if (lifter.currentPathSolveAllowsLoopGeneralization()) {
+      details = "  direct-jump loop context must stay disabled until VMP-safe generalization exists\n";
+      return false;
+    }
+    return true;
+  }
+
+  bool runLoopGeneralizationIndirectJumpBlocked(std::string& details) {
+    LifterUnderTest lifter;
+    lifter.currentPathSolveContext = LifterUnderTest::PathSolveContext::IndirectJump;
+    if (lifter.currentPathSolveAllowsLoopGeneralization()) {
+      details = "  indirect-jump dispatcher context must not generalize loop state\n";
+      return false;
+    }
+    return true;
+  }
+
+  bool runGeneralizedLoopWithoutBypassTagKeepsNormalRestore(std::string& details) {
+    LifterUnderTest lifter;
+    auto* bb = llvm::BasicBlock::Create(lifter.context, "loop_header", lifter.fnc);
+    BBInfo queued(0x1000, bb);
+    lifter.pendingLoopGeneralizationAddresses.insert(0x1000);
+    lifter.addUnvisitedAddr(queued);
+
+    BBInfo out;
+    if (!lifter.getUnvisitedAddr(out)) {
+      details = "  failed to dequeue pending generalized loop header\n";
+      return false;
+    }
+    if (lifter.bypassStackConcolicTracking) {
+      details = "  generalized loop without bypass tag unexpectedly disabled concrete stack tracking\n";
+      return false;
+    }
+    if (lifter.currentBlockRestoreMode != LifterUnderTest::BlockRestoreMode::Normal) {
+      details = "  generalized loop without bypass tag should keep normal restore mode\n";
+      return false;
+    }
+    if (!lifter.generalizedLoopAddresses.contains(0x1000)) {
+      details = "  pending generalized loop header was not promoted after dequeue\n";
+      return false;
+    }
+    return true;
+  }
+
+  bool runGeneralizedLoopWithBypassTagUsesGeneralizedRestore(std::string& details) {
+    LifterUnderTest lifter;
+    auto* bb = llvm::BasicBlock::Create(lifter.context, "loop_header", lifter.fnc);
+    BBInfo queued(0x1000, bb);
+    lifter.pendingLoopGeneralizationAddresses.insert(0x1000);
+    lifter.stackBypassGeneralizedLoopAddresses.insert(0x1000);
+    lifter.addUnvisitedAddr(queued);
+
+    BBInfo out;
+    if (!lifter.getUnvisitedAddr(out)) {
+      details = "  failed to dequeue pending generalized loop header with bypass tag\n";
+      return false;
+    }
+    if (!lifter.bypassStackConcolicTracking) {
+      details = "  direct-jump generalized loop should enable stack-bypass restore mode\n";
+      return false;
+    }
+    if (lifter.currentBlockRestoreMode != LifterUnderTest::BlockRestoreMode::GeneralizedLoop) {
+      details = "  direct-jump generalized loop should use generalized restore mode\n";
+      return false;
+    }
+    return true;
+  }
+
   int runCustomKnownBitsTests(const std::string& suiteFilter) {
     int failures = 0;
 
@@ -466,6 +537,14 @@ private:
              &InstructionTester::runScasRepeatPrefixesRejected);
     runCustom("loop_addrsize_override_rejected",
              &InstructionTester::runLoopAddressSizeOverrideRejected);
+    runCustom("loop_generalization_direct_jump_blocked",
+             &InstructionTester::runLoopGeneralizationDirectJumpBlocked);
+    runCustom("loop_generalization_indirect_jump_blocked",
+             &InstructionTester::runLoopGeneralizationIndirectJumpBlocked);
+    runCustom("generalized_loop_without_bypass_tag_keeps_normal_restore",
+             &InstructionTester::runGeneralizedLoopWithoutBypassTagKeepsNormalRestore);
+    runCustom("generalized_loop_with_bypass_tag_uses_generalized_restore",
+             &InstructionTester::runGeneralizedLoopWithBypassTagUsesGeneralizedRestore);
 
     return failures;
   }
diff --git a/lifter/test/test_vectors/golden_ir_hashes.json b/lifter/test/test_vectors/golden_ir_hashes.json
index 6ade911..fd7758b 100644
--- a/lifter/test/test_vectors/golden_ir_hashes.json
+++ b/lifter/test/test_vectors/golden_ir_hashes.json
@@ -1,44 +1,44 @@
 {
-  "bitchain.ll": "190871c33e72c0aed58e34982cf4604f456c4aeb4cfa8e0e6fbae6854cf84626",
+  "bitchain.ll": "49ebc6c639f207f9e005b829381f98d90a3ca3ab675b62840f576dff40c4eea4",
   "bitchain_no_opts.ll": "591950fc2ec674c5695b48ea37adae84c2134a051182569c8c8127813723610b",
-  "branch.ll": "696e3433ee4611ceeae5807453a6adb30d1e0a607a2ee148438deb73ae167284",
+  "branch.ll": "bae4501e89583cdbc68c6cf68b8ff7f0f4405bded20b23d326be0ffc1920be2e",
   "branch_no_opts.ll": "15a89c851a92d60695d7d6ba2f64c9df85e8aef5fb1ce801232ffda2670234ee",
-  "cmov_chain.ll": "dabdb22fb45b942a1cf6b2b5421d9aa2ce0628f308e9d6a2b1ecf2f7b9400706",
+  "cmov_chain.ll": "5dd4dea830ab793b0663496357a5824cfc04523866563c81527153d18142ebb8",
   "cmov_chain_no_opts.ll": "94c6888c366883ff8b061c74e4176955352993c7ee1f7d97a71b9514a27bb635",
-  "diamond.ll": "0a09adef196a80e97c4a35a92bec21a3f12e4ce2b69cccbb5421cfd3c3ee71fe",
+  "diamond.ll": "8b48ec5043647a04c3ff090052b2f650c8c3d74e873d42b75988f08e42017c55",
   "diamond_no_opts.ll": "a1dac2044c8fec3b3de31816609b7c9f9e07a435059df22bb8b128a9dd60dcbb",
-  "indirect.ll": "1cc02a9b634de225515781ad9b574eedaae174084caa966e0a5c8d9a43c0bde2",
+  "indirect.ll": "c910df49a7090493f4780dcece069b66a142d2eb3aa5d9648a6888d35efab3bc",
   "indirect_no_opts.ll": "815d99bf0361c32f7858fd51709ad1c542b1ac7001b938e07fb7f4998199487e",
-  "instr_add.ll": "d36f781fd89faddb92ce9bb790c29c910816f7b51f096461cd6ba148fe7cd85c",
+  "instr_add.ll": "aaaad11b8ba64b587b5337bbb1f375ee3261af2da5a2e18ac4e564c71797272f",
   "instr_add_no_opts.ll": "db5fa506702528a0c6886f774496799d3e976dc502d79f295e5626c760d9db45",
-  "instr_rol.ll": "40f1a3d47a7fe86069aed64d374e29fa93615fbe1692b29f941c1249fb36ee9e",
+  "instr_rol.ll": "29c80b0e1ab7c174814581b908f55373ef7b9a4f50007c7ac82b4d241e3e3059",
   "instr_rol_no_opts.ll": "77ec6301a5fbc3607556871e165607a5b153fae331b6d8de7a9d8060a03db8d2",
-  "instr_sub.ll": "a0ad51a8053e5483e0e4937669b818fd9d53184c0c6bfe0018faddb2dcbb2537",
+  "instr_sub.ll": "406d174d1a589755adbc3ffb9fa725e7d3152bccfbc25c5765916946e88a00a0",
   "instr_sub_no_opts.ll": "4062c1f098be5b7123ce1add00c199ccd347d88605904b97f3c77ea4a546d6a5",
-  "instr_xor.ll": "a0899e35637978c3ff61bd01da3c8f582df4d3e9a688a88a47f606fd41dd8528",
+  "instr_xor.ll": "207426aa7acd94fa1e6fc6c14495cd261d31a37f1610d06b24513a0432753af8",
   "instr_xor_no_opts.ll": "2d664b454bd033c926efa2e392dadf3ad8b64232781ddbece4f7aa655eaa21bb",
-  "jumptable_basic.ll": "f4fd0636e0ccdf9f14dcf6fb5fc0c1bb04ce0fa280f71e58c7fa5a9570140c74",
+  "jumptable_basic.ll": "d0bb5a18c0e633254a20addb813395f3028af044a9e189ccb3e9257e7b60e8d8",
   "jumptable_basic_no_opts.ll": "8aedf3f7017c7adeb55532b2660f06c9e75deeac965cf588c23d39ee64393bac",
-  "jumptable_computation.ll": "b318bb18b0a86622938e449ccd6984421c9fbb78121dc21097c6985c253348aa",
+  "jumptable_computation.ll": "aab5e575e99551e161a56961f0e9304fe54ee7fc636e5a80d0c88fe4891a9f70",
   "jumptable_computation_no_opts.ll": "10e0024697f311e60aa33bf839224311ce36c1f7bf70e8a26fd03f7dacc876ea",
-  "jumptable_dense.ll": "2c2ebb12f6cff955a7eedcb5f8e1cf528d80f0ce4f29574a17a983b5192bad40",
+  "jumptable_dense.ll": "de5a371ea0c637e906e48c9b18428191abd3daa9d9e2606b6d2f2db7a6d375e0",
   "jumptable_dense_no_opts.ll": "e926d3d9f08b32fa8ea7fdfdaa6e53bdc4cc099f2ef37c1fec35d4769514b667",
-  "jumptable_rel32.ll": "64c1d0419bbc1b959ee17cd79b124620a5742554a738782336374fbe1d0d3381",
+  "jumptable_rel32.ll": "a6b3d2ace25f1a7145199329894327795a83822a0eb5ca21d7c47b6df750ac8f",
   "jumptable_rel32_no_opts.ll": "c218374468d39f9460acf43a8613a1d41ece6de51c2aba74f480fc156b0e5eba",
-  "jumptable_shared_targets.ll": "21e30aa659bedff732e2823885f3cff7b02c911d0b5684223caf1210dc4008da",
+  "jumptable_shared_targets.ll": "6945b0311cb05aa72bac1a49b9a91181f9f1690218ac82d99c0fcbb8a9e2a57c",
   "jumptable_shared_targets_no_opts.ll": "1a743479e69daca957adc7ff83ab199ed2dacd8605f6d9955245cb9763082545",
-  "jumptable_shifted.ll": "1463b1177e56bc2575763d47ea3abf330a12e57e36157382f3d5b7f5e812379c",
+  "jumptable_shifted.ll": "d716c2f174aa5c4fc6b66ad78c75281cee102bc0c4594595ecf373fbea4574bc",
   "jumptable_shifted_no_opts.ll": "66a17e146bc7ff9e230b39b795c6e8df0f4b273496d77f301629908bc800f13c",
-  "loop_simple.ll": "750a8531c5452625c83a69c551f7531fe85441040ed54e240de1f9ace158c1af",
+  "loop_simple.ll": "b7917f726dd1a0f888169b067de655ce64e35ca5dd98468cbbd3400556fd56c8",
   "loop_simple_no_opts.ll": "40dfac372a06bdd1eae6f1eab2d60fe6eb8bd4bd336be3f96df281e777cd0be1",
-  "multi_arg.ll": "0f5d869097e562c53dfd0bc80f4b9b348ed29515381c23629a95fc7c92d8cb8c",
+  "multi_arg.ll": "575a6ee62bcd21b1b086a571e8db14716b650b46247b543167adfceef1e43cf1",
   "multi_arg_no_opts.ll": "bb8acf684a8b1864d479becfb7800b1185faece53bbf245d80f98046fe32c212",
-  "nested_branch.ll": "764741f189b4f1d87dd46049ae71dcb0b6e6febd706f3de49c6eff43037e2b80",
+  "nested_branch.ll": "7bbad0569e66d3110b4ccfb78ab043d58bb4ec385c81e84a830f5947e139e7c2",
   "nested_branch_no_opts.ll": "548b78021778c88e91ad3051eae99820ca25637800a4c0706ef0e16a48378f9d",
-  "stack.ll": "104b1a1438eeee91ef6b7d2afb704d25359ce36ca568e1fba75504058ee6d851",
+  "stack.ll": "589ec91bcd6622d69b6bed7db934d2055e0f24f84f7bcd4dfc37ae41d2563503",
   "stack_no_opts.ll": "f9b5604b7483f86702a22314d42e94eeea304a8c1dd8dc3020e4bd2dbc19a88a",
-  "switch_3way.ll": "4bf632634fead328623781db39604772638f9e6215cd8141f0084a4c6cd9191d",
+  "switch_3way.ll": "b278934d7d7a739156d36b4c236a6ed580410e2fea3fd8b5519416d7ecd41359",
   "switch_3way_no_opts.ll": "e9ec881de207f327ecf7692b209e441a3102a6169ebda367477c01042806dd17",
-  "switch_sparse.ll": "b402e7e00321179e9fa29039d9952ad5091aeb99f3c2020e11514b9167069aa3",
+  "switch_sparse.ll": "d6eab54b12ed42d52ec5ff42abe0e969280b187da3368b08f64ec361b0bd7106",
   "switch_sparse_no_opts.ll": "6794aa97870130125ed58decbfdcf4b2a50414719c7268ef304b420b33d985ed"
 }
diff --git a/scripts/dev/profile_simple_vmp.ps1 b/scripts/dev/profile_simple_vmp.ps1
index 0cc2afd..85d63fd 100644
--- a/scripts/dev/profile_simple_vmp.ps1
+++ b/scripts/dev/profile_simple_vmp.ps1
@@ -1,27 +1,84 @@
+param(
+  [string]$LifterPath = '',
+  [string[]]$Filter = @(),
+  [switch]$Validate
+)
+
 $ErrorActionPreference = 'Stop'
 
 $repo = Split-Path -Parent (Split-Path -Parent $PSScriptRoot)
-$lifter = Join-Path $repo 'build_zydis\lifter.exe'
+$simpleDir = Join-Path $repo 'simple'
 $diagPath = Join-Path $repo 'output_diagnostics.json'
+$outputLl = Join-Path $repo 'output.ll'
+$outputNoOptsLl = Join-Path $repo 'output_no_opts.ll'
+
+function Resolve-LifterPath {
+  param([string]$Candidate)
+
+  if ($Candidate) {
+    if (!(Test-Path $Candidate)) {
+      throw "Lifter not found at '$Candidate'"
+    }
+    return $Candidate
+  }
+
+  $preferred = @(
+    (Join-Path $repo 'build_iced\lifter.exe'),
+    (Join-Path $repo 'build_zydis\lifter.exe')
+  )
+
+  foreach ($path in $preferred) {
+    if (Test-Path $path) {
+      return $path
+    }
+  }
+
+  throw "No lifter build found. Expected build_iced\\lifter.exe or build_zydis\\lifter.exe"
+}
+
+$lifter = Resolve-LifterPath $LifterPath
 
 $targets = @(
-  @{ name = 'simple_unprotected'; exe = 'C:/Users/Yusuf/Desktop/mergenrewrite/Mergen/simple/simple_target.exe'; addr = '0x1400113CA'; timeoutSec = 30 },
-  @{ name = 'simple_vmp381_one_vm'; exe = 'C:/Users/Yusuf/Desktop/mergenrewrite/Mergen/simple/protected381/simple_target_one_vm.vmp38.exe'; addr = '0x1400113CA'; timeoutSec = 60 },
-  @{ name = 'simple_vmp381_full'; exe = 'C:/Users/Yusuf/Desktop/mergenrewrite/Mergen/simple/protected381/simple_target.vmp.exe'; addr = '0x1400113CA'; timeoutSec = 60 },
-  @{ name = 'simple_vmp36'; exe = 'C:/Users/Yusuf/Desktop/mergenrewrite/Mergen/simple/protected/simple_target_protected.vmp.exe'; addr = '0x14009E2E1'; timeoutSec = 90 }
+  @{ name = 'simple_unprotected'; exe = (Join-Path $simpleDir 'simple_target.exe'); addr = '0x1400113CA'; timeoutSec = 30; required = $false },
+  @{ name = 'simple_vmp381_one_vm'; exe = (Join-Path $simpleDir 'protected381\simple_target_one_vm.vmp38.exe'); addr = '0x1400113CA'; timeoutSec = 60; required = $true },
+  @{ name = 'simple_vmp381_full'; exe = (Join-Path $simpleDir 'protected381\simple_target.vmp.exe'); addr = '0x1400113CA'; timeoutSec = 60; required = $true },
+  @{ name = 'simple_vmp36'; exe = (Join-Path $simpleDir 'protected\simple_target_protected.vmp.exe'); addr = '0x14009E2E1'; timeoutSec = 90; required = $false }
 )
 
+if ($Filter.Count -gt 0) {
+  $targets = @(
+    $targets | Where-Object {
+      $target = $_
+      ($Filter | Where-Object { $target.name -like ('*' + $_ + '*') }).Count -gt 0
+    }
+  )
+}
+
+if ($targets.Count -eq 0) {
+  throw 'No VMP targets matched the requested filter.'
+}
+
+if ($Validate -and (@($targets | Where-Object { $_.required })).Count -eq 0) {
+  throw 'Validation requires at least one required VMP 3.8.x target in the filtered target set.'
+}
+
+
 function Run-Target($target) {
   if (!(Test-Path $target.exe)) {
-    return [pscustomobject]@{ Name = $target.name; Status = 'missing'; WallMs = $null; TotalMs = $null; PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null; Optimization = $null; WriteOptIr = $null; Note = 'binary not found' }
+    return [pscustomobject]@{
+      Name = $target.name; Required = [bool]$target.required; Status = 'missing'; WallMs = $null; TotalMs = $null
+      PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null
+      Optimization = $null; WriteOptIr = $null; Errors = $null; Warnings = $null
+      BlocksAttempted = $null; BlocksCompleted = $null; Note = 'binary not found'
+    }
   }
 
-  Remove-Item $diagPath -Force -ErrorAction SilentlyContinue
+  Remove-Item $diagPath, $outputLl, $outputNoOptsLl -Force -ErrorAction SilentlyContinue
 
   $job = Start-Job -ScriptBlock {
     param($repoPath, $lifterPath, $exePath, $addr)
     Set-Location $repoPath
-    & $lifterPath $exePath $addr | Out-Null
+    & $lifterPath $exePath $addr *> $null
     return $LASTEXITCODE
   } -ArgumentList $repo, $lifter, $target.exe, $target.addr
 
@@ -32,23 +89,51 @@ function Run-Target($target) {
   if ($null -eq $completed) {
     Stop-Job $job | Out-Null
     Remove-Job $job | Out-Null
-    return [pscustomobject]@{ Name = $target.name; Status = 'timeout'; WallMs = $sw.Elapsed.TotalMilliseconds; TotalMs = $null; PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null; Optimization = $null; WriteOptIr = $null; Note = ('timed out after ' + $target.timeoutSec + 's') }
+    return [pscustomobject]@{
+      Name = $target.name; Required = [bool]$target.required; Status = 'timeout'; WallMs = $sw.Elapsed.TotalMilliseconds; TotalMs = $null
+      PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null
+      Optimization = $null; WriteOptIr = $null; Errors = $null; Warnings = $null
+      BlocksAttempted = $null; BlocksCompleted = $null; Note = ('timed out after ' + $target.timeoutSec + 's')
+    }
   }
 
   $exitCode = Receive-Job $job
   Remove-Job $job | Out-Null
 
   if ($exitCode -ne 0) {
-    return [pscustomobject]@{ Name = $target.name; Status = 'failed'; WallMs = $sw.Elapsed.TotalMilliseconds; TotalMs = $null; PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null; Optimization = $null; WriteOptIr = $null; Note = ('exit code ' + $exitCode) }
+    return [pscustomobject]@{
+      Name = $target.name; Required = [bool]$target.required; Status = 'failed'; WallMs = $sw.Elapsed.TotalMilliseconds; TotalMs = $null
+      PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null
+      Optimization = $null; WriteOptIr = $null; Errors = $null; Warnings = $null
+      BlocksAttempted = $null; BlocksCompleted = $null; Note = ('exit code ' + $exitCode)
+    }
   }
 
-  if (!(Test-Path $diagPath)) {
-    return [pscustomobject]@{ Name = $target.name; Status = 'no_diagnostics'; WallMs = $sw.Elapsed.TotalMilliseconds; TotalMs = $null; PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null; Optimization = $null; WriteOptIr = $null; Note = 'missing output_diagnostics.json' }
+  if (!(Test-Path $diagPath) -or !(Test-Path $outputLl) -or !(Test-Path $outputNoOptsLl)) {
+    return [pscustomobject]@{
+      Name = $target.name; Required = [bool]$target.required; Status = 'missing_output'; WallMs = $sw.Elapsed.TotalMilliseconds; TotalMs = $null
+      PeSetup = $null; SignatureSearch = $null; Lift = $null; WriteUnoptIr = $null
+      Optimization = $null; WriteOptIr = $null; Errors = $null; Warnings = $null
+      BlocksAttempted = $null; BlocksCompleted = $null; Note = 'missing output_diagnostics.json/output.ll/output_no_opts.ll'
+    }
   }
 
   $json = Get-Content $diagPath -Raw | ConvertFrom-Json
+  $errorCount = 0
+  $warningCount = 0
+  if ($json.summary) {
+    $errorCount = [int]$json.summary.error
+    $warningCount = [int]$json.summary.warning
+  }
+
+  $note = ''
+  if ($json.lift_stats.blocks_completed -eq 0) {
+    $note = '0 completed blocks; inspect output.ll before trusting result'
+  }
+
   return [pscustomobject]@{
     Name = $target.name
+    Required = [bool]$target.required
     Status = 'ok'
     WallMs = $sw.Elapsed.TotalMilliseconds
     TotalMs = [double]$json.total_ms
@@ -58,7 +143,11 @@ function Run-Target($target) {
     WriteUnoptIr = [double]$json.profile.write_unopt_ir
     Optimization = [double]$json.profile.optimization
     WriteOptIr = [double]$json.profile.write_opt_ir
-    Note = ''
+    Errors = $errorCount
+    Warnings = $warningCount
+    BlocksAttempted = [int]$json.lift_stats.blocks_attempted
+    BlocksCompleted = [int]$json.lift_stats.blocks_completed
+    Note = $note
   }
 }
 
@@ -70,9 +159,46 @@ $results = foreach ($target in $targets) {
 Write-Host ''
 Write-Host '========== SIMPLE TARGET PROFILE =========='
 foreach ($r in $results) {
+  $tier = if ($r.Required) { 'gate' } else { 'best-effort' }
   if ($r.Status -eq 'ok') {
-    Write-Host ('{0,-22} wall={1,10:F3} ms total={2,10:F3} ms opt={3,10:F3} ms lift={4,8:F3} ms sig={5,8:F3} ms write_u={6,8:F3} ms write_o={7,8:F3} ms' -f $r.Name, $r.WallMs, $r.TotalMs, $r.Optimization, $r.Lift, $r.SignatureSearch, $r.WriteUnoptIr, $r.WriteOptIr)
+    $noteSuffix = ''
+    if ($r.Note) {
+      $noteSuffix = (' note=' + $r.Note)
+    }
+    Write-Host ('{0,-22} tier={1,-11} wall={2,10:F3} ms total={3,10:F3} ms opt={4,10:F3} ms lift={5,8:F3} ms sig={6,8:F3} ms write_u={7,8:F3} ms write_o={8,8:F3} ms errs={9} warns={10} blocks={11}/{12}{13}' -f $r.Name, $tier, $r.WallMs, $r.TotalMs, $r.Optimization, $r.Lift, $r.SignatureSearch, $r.WriteUnoptIr, $r.WriteOptIr, $r.Errors, $r.Warnings, $r.BlocksCompleted, $r.BlocksAttempted, $noteSuffix)
   } else {
-    Write-Host ('{0,-22} status={1} note={2} wall={3}' -f $r.Name, $r.Status, $r.Note, $r.WallMs)
+    Write-Host ('{0,-22} tier={1,-11} status={2} note={3} wall={4}' -f $r.Name, $tier, $r.Status, $r.Note, $r.WallMs)
   }
 }
+
+if ($Validate) {
+  $hasHardRegression = {
+    param($result)
+    return $result.Status -ne 'ok' -or $result.Errors -gt 0 -or
+      ($result.Status -eq 'ok' -and $result.BlocksCompleted -le 0)
+  }
+  $failures = @(
+    $results | Where-Object { $_.Required -and (& $hasHardRegression $_) }
+  )
+  $bestEffortIssues = @(
+    $results | Where-Object { -not $_.Required -and (& $hasHardRegression $_) }
+  )
+  if ($bestEffortIssues.Count -gt 0) {
+    Write-Host ''
+    Write-Host 'Best-effort VMP targets with issues:'
+    foreach ($issue in $bestEffortIssues) {
+      Write-Host ('- {0}: status={1} errors={2} note={3}' -f $issue.Name, $issue.Status, $issue.Errors, $issue.Note)
+    }
+  }
+  if ($failures.Count -gt 0) {
+    Write-Host ''
+    Write-Host 'VMP validation failed for required targets:'
+    foreach ($failure in $failures) {
+      Write-Host ('- {0}: status={1} errors={2} note={3}' -f $failure.Name, $failure.Status, $failure.Errors, $failure.Note)
+    }
+    exit 1
+  }
+
+  Write-Host ''
+  Write-Host ('VMP validation passed for {0} required target(s).' -f (@($results | Where-Object { $_.Required })).Count)
+}
diff --git a/scripts/rewrite/instruction_microtests.json b/scripts/rewrite/instruction_microtests.json
index a64da7f..b7b7a09 100644
--- a/scripts/rewrite/instruction_microtests.json
+++ b/scripts/rewrite/instruction_microtests.json
@@ -94,6 +94,70 @@
         { "expected": 6, "label": "constant: 3+2+1" }
       ]
     },
+    {
+      "name": "dummy_vm_loop",
+      "symbol": "dummy_vm_loop_target",
+      "patterns": [
+        { "line_all": ["and i32", ", 1"] },
+        { "line_all": ["and i32", ", 7"] },
+        "phi i64",
+        "select i1",
+        "icmp ugt i32",
+        "br i1"
+      ],
+      "semantic": [
+        { "inputs": { "RCX": 0 }, "expected": 40, "label": "even opcode takes constant handler" },
+        { "inputs": { "RCX": 1 }, "expected": 0,  "label": "odd opcode loop with limit 1 returns 0" },
+        { "inputs": { "RCX": 3 }, "expected": 3,  "label": "odd opcode loop: 0+1+2" },
+        { "inputs": { "RCX": 5 }, "expected": 10, "label": "odd opcode loop: 0+1+2+3+4" },
+        { "inputs": { "RCX": 7 }, "expected": 21, "label": "odd opcode loop: 0..6" },
+        { "inputs": { "RCX": 8 }, "expected": 40, "label": "even opcode ignores masked loop handler" }
+      ]
+    },
+    {
+      "name": "bytecode_vm_loop",
+      "symbol": "bytecode_vm_loop_target",
+      "patterns": [
+        { "line_all": ["and i32", ", 1"] },
+        { "line_all": ["and i32", ", 7"] },
+        "select i1",
+        "phi i64",
+        "icmp ugt i32",
+        "br i1"
+      ],
+      "semantic": [
+        { "inputs": { "RCX": 0 }, "expected": 40, "label": "even program returns constant handler" },
+        { "inputs": { "RCX": 1 }, "expected": 0,  "label": "odd bytecode loop limit 1 returns 0" },
+        { "inputs": { "RCX": 3 }, "expected": 3,  "label": "odd bytecode loop: 0+1+2" },
+        { "inputs": { "RCX": 5 }, "expected": 10, "label": "odd bytecode loop: 0+1+2+3+4" },
+        { "inputs": { "RCX": 7 }, "expected": 21, "label": "odd bytecode loop: 0..6" },
+        { "inputs": { "RCX": 8 }, "expected": 40, "label": "even program ignores odd loop body" }
+      ]
+    },
+    {
+      "name": "stack_vm_loop",
+      "symbol": "stack_vm_loop_target",
+      "skip": true,
+      "skip_reason": "Safe VMP-mode lifting currently disables loop-header generalization; this stack-based VM loop still exceeds the block budget without it.",
+      "patterns": [
+        { "line_all": ["and i32", ", 1"] },
+        { "line_all": ["and i32", ", 7"] },
+        "switch i32",
+        "phi i32",
+        "sub i32",
+        "select i1",
+        "add i32",
+        "ret i64"
+      ],
+      "semantic": [
+        { "inputs": { "RCX": 0 }, "expected": 40, "label": "even program returns constant handler" },
+        { "inputs": { "RCX": 1 }, "expected": 0,  "label": "odd stack loop limit 1 returns 0" },
+        { "inputs": { "RCX": 3 }, "expected": 3,  "label": "odd stack loop: 0+1+2" },
+        { "inputs": { "RCX": 5 }, "expected": 10, "label": "odd stack loop: 0+1+2+3+4" },
+        { "inputs": { "RCX": 7 }, "expected": 21, "label": "odd stack loop: 0..6" },
+        { "inputs": { "RCX": 8 }, "expected": 40, "label": "even program ignores odd loop body" }
+      ]
+    },
     {
       "name": "bitchain",
       "symbol": "bitchain_target",
@@ -198,6 +262,21 @@
         { "expected": 150, "label": "constant: 10+20+30+40+50" }
       ]
     },
+    {
+      "name": "calc_sum_to_n",
+      "symbol": "calc_sum_to_n",
+      "skip": true,
+      "skip_reason": "Safe VMP-mode lifting currently disables loop-header generalization; this counted-loop sample still explodes the block budget without it.",
+      "patterns": ["phi i32", "icmp slt i32", "add i32", "br i1"],
+      "semantic": [
+        { "inputs": { "RCX": 0 },   "expected": 0,   "label": "n=0" },
+        { "inputs": { "RCX": 1 },   "expected": 0,   "label": "n=1" },
+        { "inputs": { "RCX": 5 },   "expected": 10,  "label": "0+1+2+3+4" },
+        { "inputs": { "RCX": 10 },  "expected": 45,  "label": "0..9" },
+        { "inputs": { "RCX": 32 },  "expected": 496, "label": "0..31" },
+        { "inputs": { "RCX": 100 }, "expected": 496, "label": "clamped to 32" }
+      ]
+    },
     {
       "name": "switch_3way",
       "symbol": "switch_3way_target",
diff --git a/test.py b/test.py
index a5d9c08..497c0cb 100644
--- a/test.py
+++ b/test.py
@@ -239,6 +239,21 @@ def run_semantic(filters: List[str] | None = None, input_ir: Path | None = None)
     _run(args)
 
 
+def run_vmp(filter_tokens: List[str]) -> None:
+    args = [
+        "powershell",
+        "-NoProfile",
+        "-ExecutionPolicy",
+        "Bypass",
+        "-File",
+        str(ROOT / "scripts" / "dev" / "profile_simple_vmp.ps1"),
+        "-Validate",
+    ]
+    if filter_tokens:
+        args.extend(["-Filter", *filter_tokens])
+    _run(args)
+
+
 def run_negative_checks() -> None:
     lifter_path = ROOT / "build_iced" / "lifter.exe"
     if not lifter_path.exists():
@@ -403,6 +418,11 @@ def parse_args() -> argparse.Namespace:
     semantic = sub.add_parser("semantic", help="run runtime semantic regression for all samples")
     semantic.add_argument("--input-ir", type=Path, default=None, help="override IR file (single sample)")
     semantic.add_argument("filter", nargs="*", help="optional sample name filter tokens")
+    vmp = sub.add_parser(
+        "vmp",
+        help="attempt local VMP target lifts (recommended for big control-flow/semantics changes)",
+    )
+    vmp.add_argument("filter", nargs="*", help="optional VMP target name filter tokens")
     return parser.parse_args()
 
 
@@ -459,6 +479,9 @@ def main() -> None:
         run_semantic(args.filter, args.input_ir)
         return
 
+    if command == "vmp":
+        run_vmp(args.filter)
+        return
 
     if command == "flags":
         run_flagstress(args.filter)
diff --git a/testcases/rewrite_smoke/bytecode_vm_loop.c b/testcases/rewrite_smoke/bytecode_vm_loop.c
new file mode 100644
index 0000000..153e808
--- /dev/null
+++ b/testcases/rewrite_smoke/bytecode_vm_loop.c
@@ -0,0 +1,59 @@
+/* Compiler-friendly VM with the loop implemented in VM program-counter state.
+ * Lift target: bytecode_vm_loop_target.
+ * Goal: keep the loop inside interpreter state instead of native source control
+ * flow, while avoiding external bytecode loads and compiler jump tables.
+ */
+#include <stdio.h>
+
+enum FriendlyVmPc {
+    VM_EVEN_CONST = 0,
+    VM_EVEN_HALT = 1,
+    VM_ODD_LOAD_LIMIT = 10,
+    VM_ODD_CLEAR_ACC = 11,
+    VM_ODD_CLEAR_INDEX = 12,
+    VM_ODD_CHECK = 13,
+    VM_ODD_BODY = 14,
+    VM_ODD_HALT = 15,
+};
+
+__declspec(noinline)
+int bytecode_vm_loop_target(int x) {
+    int pc = (x & 1) ? VM_ODD_LOAD_LIMIT : VM_EVEN_CONST;
+    int acc = 0;
+    int index = 0;
+    int limit = 0;
+
+    while (1) {
+        if (pc == VM_EVEN_CONST) {
+            acc = 40;
+            pc = VM_EVEN_HALT;
+        } else if (pc == VM_EVEN_HALT) {
+            return acc;
+        } else if (pc == VM_ODD_LOAD_LIMIT) {
+            limit = x & 7;
+            pc = VM_ODD_CLEAR_ACC;
+        } else if (pc == VM_ODD_CLEAR_ACC) {
+            acc = 0;
+            pc = VM_ODD_CLEAR_INDEX;
+        } else if (pc == VM_ODD_CLEAR_INDEX) {
+            index = 0;
+            pc = VM_ODD_CHECK;
+        } else if (pc == VM_ODD_CHECK) {
+            pc = (index < limit) ? VM_ODD_BODY : VM_ODD_HALT;
+        } else if (pc == VM_ODD_BODY) {
+            acc += index;
+            index += 1;
+            pc = VM_ODD_CHECK;
+        } else if (pc == VM_ODD_HALT) {
+            return acc;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("bytecode_vm_loop(5)=%d bytecode_vm_loop(8)=%d\n",
+           bytecode_vm_loop_target(5), bytecode_vm_loop_target(8));
+    return 0;
+}
diff --git a/testcases/rewrite_smoke/calc_sum_to_n.c b/testcases/rewrite_smoke/calc_sum_to_n.c
new file mode 100644
index 0000000..36662b9
--- /dev/null
+++ b/testcases/rewrite_smoke/calc_sum_to_n.c
@@ -0,0 +1,23 @@
+/* Symbolic trip-count counted loop.
+ * Lift target: calc_sum_to_n — symbolic loop bound with a clamp.
+ * Goal: preserve real loop structure (phi/backedge/compare), not constant-fold.
+ */
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_sum_to_n(int n) {
+    if (n > 32)
+        n = 32;
+
+    int sum = 0;
+    for (int i = 0; i < n; i++)
+        sum += i;
+
+    return sum;
+}
+
+int main(void) {
+    printf("sum_to_n(5)=%d sum_to_n(10)=%d\n",
+           calc_sum_to_n(5), calc_sum_to_n(10));
+    return 0;
+}
diff --git a/testcases/rewrite_smoke/dummy_vm_loop.c b/testcases/rewrite_smoke/dummy_vm_loop.c
new file mode 100644
index 0000000..5e2c7e9
--- /dev/null
+++ b/testcases/rewrite_smoke/dummy_vm_loop.c
@@ -0,0 +1,39 @@
+/* Tiny dummy-VM-style state machine around a real local loop.
+ * Lift target: dummy_vm_loop_target.
+ * Goal: keep a VM-shaped dispatch shell while preserving a normal counted loop
+ * inside one handler, so loop-generalization regressions cannot silently
+ * collapse it into unresolved control flow.
+ */
+#include <stdio.h>
+
+__declspec(noinline)
+int dummy_vm_loop_target(int x) {
+    int opcode = x & 1;
+    int acc = 0;
+
+    while (1) {
+        switch (opcode) {
+        case 0:
+            acc = 40;
+            opcode = 2;
+            break;
+        case 1: {
+            int limit = x & 7;
+            for (int i = 0; i < limit; i++)
+                acc += i;
+            opcode = 2;
+            break;
+        }
+        case 2:
+            return acc;
+        default:
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("dummy_vm_loop(5)=%d dummy_vm_loop(8)=%d\n",
+           dummy_vm_loop_target(5), dummy_vm_loop_target(8));
+    return 0;
+}
diff --git a/testcases/rewrite_smoke/stack_vm_loop.c b/testcases/rewrite_smoke/stack_vm_loop.c
new file mode 100644
index 0000000..30a1bbc
--- /dev/null
+++ b/testcases/rewrite_smoke/stack_vm_loop.c
@@ -0,0 +1,119 @@
+/* Harsher stack-based VM with explicit push/pop/add/sub/jnz-style states.
+ * Lift target: stack_vm_loop_target.
+ * Goal: keep the loop entirely in VM state while modeling a more realistic
+ * stack interpreter than the compiler-friendly register/local VM.
+ *
+ * This version keeps the stack explicit but collapses bookkeeping-only microstates
+ * and uses fixed 2-slot stack transitions so the sample remains lli-executable
+ * without reintroducing the branchy per-slot dispatcher forest that hit budget 503.
+ */
+#include <stdio.h>
+
+#define VM_PUSH0(VALUE)                                                          \
+    do {                                                                        \
+        s0 = (VALUE);                                                           \
+        sp = 1;                                                                 \
+    } while (0)
+
+#define VM_PUSH1(VALUE)                                                          \
+    do {                                                                        \
+        s1 = (VALUE);                                                           \
+        sp = 2;                                                                 \
+    } while (0)
+
+#define VM_POP1(OUT)                                                             \
+    do {                                                                        \
+        (OUT) = s1;                                                             \
+        sp = 1;                                                                 \
+    } while (0)
+
+#define VM_POP0(OUT)                                                             \
+    do {                                                                        \
+        (OUT) = s0;                                                             \
+        sp = 0;                                                                 \
+    } while (0)
+
+enum StackVmPc {
+    VM_EVEN_PUSH_40 = 0,
+    VM_EVEN_HALT = 1,
+
+    VM_ODD_INIT_LIMIT = 10,
+    VM_ODD_INIT_ACC = 11,
+    VM_ODD_INIT_INDEX = 12,
+    VM_ODD_SUB_JNZ = 13,
+    VM_ODD_BODY_ACC = 14,
+    VM_ODD_BODY_INDEX = 15,
+    VM_ODD_HALT = 16,
+};
+
+__declspec(noinline)
+int stack_vm_loop_target(int x) {
+    int sp = 0;
+    int s0 = 0;
+    int s1 = 0;
+    int acc = 0;
+    int index = 0;
+    int limit = 0;
+    int pc = (x & 1) ? VM_ODD_INIT_LIMIT : VM_EVEN_PUSH_40;
+    int lhs = 0;
+    int rhs = 0;
+    int cond = 0;
+
+    while (1) {
+        if (pc == VM_EVEN_PUSH_40) {
+            VM_PUSH0(40);
+            pc = VM_EVEN_HALT;
+        } else if (pc == VM_EVEN_HALT) {
+            VM_POP0(lhs);
+            return lhs;
+        } else if (pc == VM_ODD_INIT_LIMIT) {
+            VM_PUSH0(x & 7);
+            VM_POP0(limit);
+            pc = VM_ODD_INIT_ACC;
+        } else if (pc == VM_ODD_INIT_ACC) {
+            VM_PUSH0(0);
+            VM_POP0(acc);
+            pc = VM_ODD_INIT_INDEX;
+        } else if (pc == VM_ODD_INIT_INDEX) {
+            VM_PUSH0(0);
+            VM_POP0(index);
+            pc = VM_ODD_SUB_JNZ;
+        } else if (pc == VM_ODD_SUB_JNZ) {
+            VM_PUSH0(limit);
+            VM_PUSH1(index);
+            VM_POP1(rhs);
+            VM_POP0(lhs);
+            VM_PUSH0(lhs - rhs);
+            VM_POP0(cond);
+            pc = (cond != 0) ? VM_ODD_BODY_ACC : VM_ODD_HALT;
+        } else if (pc == VM_ODD_BODY_ACC) {
+            VM_PUSH0(acc);
+            VM_PUSH1(index);
+            VM_POP1(rhs);
+            VM_POP0(lhs);
+            VM_PUSH0(lhs + rhs);
+            VM_POP0(acc);
+            pc = VM_ODD_BODY_INDEX;
+        } else if (pc == VM_ODD_BODY_INDEX) {
+            VM_PUSH0(index);
+            VM_PUSH1(1);
+            VM_POP1(rhs);
+            VM_POP0(lhs);
+            VM_PUSH0(lhs + rhs);
+            VM_POP0(index);
+            pc = VM_ODD_SUB_JNZ;
+        } else if (pc == VM_ODD_HALT) {
+            VM_PUSH0(acc);
+            VM_POP0(lhs);
+            return lhs;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("stack_vm_loop(5)=%d stack_vm_loop(8)=%d\n",
+           stack_vm_loop_target(5), stack_vm_loop_target(8));
+    return 0;
+}