Autoresearch/lets craete more test cases complex loops with v 20260425 (#203)

* baseline: 3 fully-wired VM samples (dummy/bytecode/stack vm loops) Result: {"status":"keep","vm_sample_count":3,"total_semantic_cases":177,"manifest_samples":33} * added 3 toy VM samples: register-machine, nested loops, branchy loop body Result: {"status":"keep","vm_sample_count":6,"total_semantic_cases":205,"manifest_samples":36} * added 3 more VM samples: factorial (mul recurrence), collatz (data-dep path), gcd (modulo-driven non-counted loop) Result: {"status":"keep","vm_sample_count":9,"total_semantic_cases":231,"manifest_samples":39} * added 3 more VM samples: fibonacci (two-state recurrence), switch-dispatched VM, countdown loop (reverse induction) Result: {"status":"keep","vm_sample_count":12,"total_semantic_cases":259,"manifest_samples":42} * added 3 bitwise/multiplicative VM samples: popcount (zero-test loop), power (two symbolic operands), bitreverse (shift+OR fixed trip count) Result: {"status":"keep","vm_sample_count":15,"total_semantic_cases":289,"manifest_samples":45} * added 3 VM samples: linear search with early-exit, dual-counter parity split (two phis), XOR accumulator with multiplication Result: {"status":"keep","vm_sample_count":18,"total_semantic_cases":315,"manifest_samples":48} * added 2 VM samples: LCG mixed mul/add/mask recurrence and stack-table-driven next-PC dispatch Result: {"status":"keep","vm_sample_count":20,"total_semantic_cases":335,"manifest_samples":50} * added vm_callret_loop: VM with explicit return-PC stack, two call sites converging on the same subroutine handler chain Result: {"status":"keep","vm_sample_count":21,"total_semantic_cases":346,"manifest_samples":51} * all 49 manifest samples lift and verify against actual IR. Patterns rewritten to match what the lifter emits: switch i32 dispatchers, mul nuw nsw shapes, llvm.bitreverse.i8 intrinsic, mul i33 + lshr i33 closed-form for triangular sums. Removed 2 samples that exposed real lifter limitations: vm_callret_loop (rstack indirect pc, BB budget exceeded) and vm_switch_dispatch_loop (lifted to constant -1). Result: {"status":"keep","vm_sample_count":19,"total_semantic_cases":313,"manifest_samples":49} * 19/19 vm samples now pass both rewrite-regression IR pattern verification AND lli runtime semantic check (168 semantic cases total). Fixed branchy by adding explicit i=0/count=0 init in BV_LOAD_LIMIT (dual_counter pattern); collatz already fixed by collapsing CV_INIT into CV_LOAD_N. Captured all observed lifter limitations in autoresearch.md. Result: {"status":"keep","vm_sample_count":19,"total_semantic_cases":313,"manifest_samples":49} * added vm_hamming_loop: bitwise loop with TWO symbolic operands (a=x&0xF, b=(x>>4)&0xF), XOR-then-popcount body. Used the dual_counter init-state pattern from the start so it passed lli semantic check on the first try. Result: {"status":"keep","vm_sample_count":20,"total_semantic_cases":323,"manifest_samples":50} * added vm_lfsr_loop: 8-bit Galois LFSR with conditional XOR-and-shift recurrence; symbolic seed and trip count both derived from x. Used dual_counter init pattern up front; passed lift + lli on first attempt. Result: {"status":"keep","vm_sample_count":21,"total_semantic_cases":333,"manifest_samples":51} * added vm_rotate_loop: 8-bit left rotation via shl|lshr|or pattern with symbolic value and rotate count. Distinct from existing shift loops in that bits wrap around. Result: {"status":"keep","vm_sample_count":22,"total_semantic_cases":343,"manifest_samples":52} * vm_powermod_loop now passes both pattern verification (urem matched) and lli semantic check (11/11 cases). Square-and-multiply modular exponentiation is the most lifter-stressing sample yet: combines bitwise LSB extraction, conditional multiply-and-mod, exponent shift, and base squaring all in one body. Result: {"status":"keep","vm_sample_count":23,"total_semantic_cases":354,"manifest_samples":53} * added vm_saturating_loop: counted sum loop with value-clamp at 100; lifter recognizes if-then-set as select; pattern + lli pass on first try Result: {"status":"keep","vm_sample_count":24,"total_semantic_cases":376,"manifest_samples":54} * vm_geometric_loop now passes both gates (mask pattern updated to 254). Log2-style doubling loop is distinct from existing additive/multiplicative recurrences. Result: {"status":"keep","vm_sample_count":25,"total_semantic_cases":386,"manifest_samples":55} * vm_polynomial_loop now passes both gates with unrolled-shape patterns. Horner method evaluation with stack-array coefficient lookup; lifter unrolls the 4-trip loop into closed-form arithmetic. Result: {"status":"keep","vm_sample_count":26,"total_semantic_cases":396,"manifest_samples":56} * vm_digitsum_loop now passes both gates. Decimal digit-sum loop with non-power-of-2 divisor exposes the lifter's divmod fusion (n%10 emitted as n + (n/10)*-10). Result: {"status":"keep","vm_sample_count":27,"total_semantic_cases":408,"manifest_samples":57} * added vm_isqrt_loop: Newton's integer square root with division by loop variable. Passes both gates with 15 semantic cases on first try. Result: {"status":"keep","vm_sample_count":28,"total_semantic_cases":423,"manifest_samples":58} * added vm_minarray_loop: two-pass VM (fill array, then scan for min) with both data and trip count derived from x. 12 semantic cases pass on first try. Result: {"status":"keep","vm_sample_count":29,"total_semantic_cases":435,"manifest_samples":59} * vm_classify_loop now passes 10/10. Refactored to single packed accumulator (acc += 100/10/1) instead of three separate counters - sidesteps the multi-counter phi-undef pattern when several stack slots all init to 0. Result: {"status":"keep","vm_sample_count":30,"total_semantic_cases":445,"manifest_samples":60} * vm_carrychain_loop now passes both gates with unrolled-shape patterns. Bit-by-bit ripple carry adder; the 8-trip fixed-bound loop is fully unrolled by the lifter. Result: {"status":"keep","vm_sample_count":31,"total_semantic_cases":456,"manifest_samples":61} * added vm_prefix_sum_loop: two-phase VM that fills a stack array then walks it computing in-place running prefix sum (writes back to data[idx] each iteration). Distinct from minarray which only reads on second pass. Result: {"status":"keep","vm_sample_count":32,"total_semantic_cases":467,"manifest_samples":62} * vm_pcg_loop now passes both gates (mask 254 fix). LCG state advance + XOR-shift output mixing per iteration; distinct from lcg (mul/add/mask only) and lfsr (shift+conditional XOR only). Result: {"status":"keep","vm_sample_count":33,"total_semantic_cases":479,"manifest_samples":63} * added vm_shiftmul_loop: schoolbook shift-and-add multiplication. 8-trip loop with conditional add of (a << i) when bit i of b is set. Passes both gates with 11 semantic cases. Result: {"status":"keep","vm_sample_count":34,"total_semantic_cases":490,"manifest_samples":64} * vm_xordecrypt_loop now passes both gates. Three-phase VM (fill, decrypt, sum) over a fixed 8-byte stack buffer; lifter unrolls all three loops but preserves the algebraic identity. Result: {"status":"keep","vm_sample_count":35,"total_semantic_cases":500,"manifest_samples":65} * added vm_zigzag_loop: alternating-sign accumulator (parity branch picks add vs sub on a single counter). 11 cases including unsigned wraparound for negative results. Result: {"status":"keep","vm_sample_count":36,"total_semantic_cases":511,"manifest_samples":66} * added vm_horner_signed_loop: Horner with signed coefficients [1,-2,3,-4]; tests sign-extended array loads + signed multiply-and-add. 10 cases including unsigned wraparound for negative results. Result: {"status":"keep","vm_sample_count":37,"total_semantic_cases":521,"manifest_samples":67} * vm_bittransitions_loop now passes both gates with branchless body + unrolled patterns. Counts adjacent-bit transitions in the low 16 bits via XOR-and-mask. Result: {"status":"keep","vm_sample_count":38,"total_semantic_cases":532,"manifest_samples":68} * added vm_piecewise_loop: piecewise linear function (3-way range branch) applied repeatedly to a single accumulator. Distinct from classify (counter) and collatz (2-way branch). 11 semantic cases pass. Result: {"status":"keep","vm_sample_count":39,"total_semantic_cases":543,"manifest_samples":69} * vm_modcounter_loop now passes both gates with fixed input. Counter wraps modulo 7 every iteration; symbolic step+counter+iter-count. Result: {"status":"keep","vm_sample_count":40,"total_semantic_cases":554,"manifest_samples":70} * added vm_argmax_loop: find INDEX of max element in symbolic-content array. Two co-related state vars (best value + best index) updated together; distinct from minarray which only tracks value. Result: {"status":"keep","vm_sample_count":41,"total_semantic_cases":565,"manifest_samples":71} * vm_prefix_xor_loop now passes with low-bit limit and getelementptr pattern. In-place cumulative XOR over symbolic-content stack array. Result: {"status":"keep","vm_sample_count":42,"total_semantic_cases":576,"manifest_samples":72} * added vm_palindrome_loop: bitwise palindrome check on low 8 bits with early-exit on mismatch. 14 semantic cases pass. Result: {"status":"keep","vm_sample_count":43,"total_semantic_cases":590,"manifest_samples":73} * added vm_caesar_loop: three-phase VM (fill, additive shift, sum) over a stack buffer. Add+mask transform distinct from XOR transform of xordecrypt. 12 semantic cases. Result: {"status":"keep","vm_sample_count":44,"total_semantic_cases":602,"manifest_samples":74} * added vm_ca_loop: Rule-90 cellular automaton step (state' = (state<<1) ^ (state>>1)) iterated symbolic times. Distinct linear bitwise update coupling shifts in both directions. 12 cases. Result: {"status":"keep","vm_sample_count":45,"total_semantic_cases":614,"manifest_samples":75} * added vm_djb2_loop: DJB2-style hash recurrence (hash = hash * 33 + nibble) consuming nibbles of x. 12 cases. Multiplicative-then-additive update with per-iteration symbolic input. Result: {"status":"keep","vm_sample_count":46,"total_semantic_cases":626,"manifest_samples":76} * added vm_runlength_loop: count distinct runs of 1-bits in low 16 bits with always-write recipe (runs += start_predicate). Sequential dependency on previous bit. 13 cases. Result: {"status":"keep","vm_sample_count":47,"total_semantic_cases":639,"manifest_samples":77} * added vm_skiploop_loop: counted loop with continue-style skip on odd iterations; sums squares of even indices. Tests dispatcher transition that bypasses body via parity branch. 11 cases. Result: {"status":"keep","vm_sample_count":48,"total_semantic_cases":650,"manifest_samples":78} * added vm_kernighan_loop: Brian Kernighan's popcount trick (v &= v-1 until zero). Trip count equals popcount itself. Distinct termination shape from vm_popcount_loop. 12 cases. Result: {"status":"keep","vm_sample_count":49,"total_semantic_cases":662,"manifest_samples":79} * added vm_find2max_loop: track top1 and top2 over a stack array. Three-way update branch: shift the pair / update only top2 / no change. 11 cases. Reached round-50 sample milestone. Result: {"status":"keep","vm_sample_count":50,"total_semantic_cases":673,"manifest_samples":80} * added vm_ctz_loop: count trailing zeros (capped at 32). Loop with EARLY BREAK on LSB-set predicate; counter doubles as result. 12 cases. Result: {"status":"keep","vm_sample_count":51,"total_semantic_cases":685,"manifest_samples":81} * added vm_dupcount_loop: count adjacent equal nibbles in stack array. Two stack-array loads per iteration (data[i-1] + data[i]) with equality predicate. 11 cases. Result: {"status":"keep","vm_sample_count":52,"total_semantic_cases":696,"manifest_samples":82} * vm_hexcount_loop now passes both gates with always-write recipe and zext pattern. Counts hex letter nibbles (>= 10) in 32-bit value. 12 cases. Result: {"status":"keep","vm_sample_count":53,"total_semantic_cases":708,"manifest_samples":83} * added vm_stride_loop: counted loop with step-2 induction (idx += 2) summing every other array element. Distinct induction step from skiploop (skip via parity branch). 12 cases. Result: {"status":"keep","vm_sample_count":54,"total_semantic_cases":720,"manifest_samples":84} * added vm_runlmax_loop: longest run of 1-bits in low 16 bits. Two co-related state vars (cur, max) updated via always-write recipe (cur = (cur+1)*bit; max = (cur > max) ? cur : max). 12 cases. Result: {"status":"keep","vm_sample_count":55,"total_semantic_cases":732,"manifest_samples":85} * added vm_window_loop: 3-element sliding window max-sum over symbolic stack array. Loop body loads three adjacent elements per iteration. 11 cases. Result: {"status":"keep","vm_sample_count":56,"total_semantic_cases":743,"manifest_samples":86} * added vm_4state_loop: cyclic 4-operation state machine. Inner state mod 4 picks ADD / XOR / MUL / SUB per iteration. 11 cases. Result: {"status":"keep","vm_sample_count":57,"total_semantic_cases":754,"manifest_samples":87} * added vm_imported_abs_loop: VM dispatcher with imported abs() call inside the body. Lifter recognizes abs() and lowers to @llvm.abs.i32 intrinsic; both pattern + lli semantic pass. First sample with a real CRT call inside a VM loop. Result: {"status":"keep","vm_sample_count":58,"total_semantic_cases":764,"manifest_samples":88} * added vm_nested_abs_loop: PC-state nested loop with abs() in inner body. Two-deep symbolic loop bounds, abs() called per inner-iteration. Both pattern + lli pass. 11 cases. Result: {"status":"keep","vm_sample_count":59,"total_semantic_cases":775,"manifest_samples":89} * added vm_abs_array_loop: two-phase VM where fill loop calls abs() and stores result to stack array, then sum loop reads. Combines imported intrinsic call with same-iter indexed stack store. 11 cases. Result: {"status":"keep","vm_sample_count":60,"total_semantic_cases":786,"manifest_samples":90} * added vm_minabs_loop: track minimum abs() distance over a counted loop with comparison-driven select. Combines imported abs() intrinsic with running-min reduction. 11 cases. Result: {"status":"keep","vm_sample_count":61,"total_semantic_cases":797,"manifest_samples":91} * added vm_imported_popcnt_loop: __builtin_popcount lowered to @llvm.ctpop.i32 inside VM body. Confirms lifter handles intrinsics other than abs cleanly. 10 cases. Result: {"status":"keep","vm_sample_count":62,"total_semantic_cases":807,"manifest_samples":92} * added vm_imported_clz_loop: __builtin_clz lowered to @llvm.ctlz.i32 inside VM body. Third recognized intrinsic shape. 10 cases. Result: {"status":"keep","vm_sample_count":63,"total_semantic_cases":817,"manifest_samples":93} * added vm_imported_bswap_loop: __builtin_bswap32 lowered to @llvm.bswap.i32 inside VM body. Fourth recognized intrinsic shape. 11 cases. Result: {"status":"keep","vm_sample_count":64,"total_semantic_cases":828,"manifest_samples":94} * added vm_imported_cttz_loop (5th intrinsic, full semantic 11 cases) and vm_outlined_wrapper_loop (integrates user's vm_fibonacci_loop_report.md observation: wrapper -> noinline inner gets outlined as call inttoptr; pattern-verifies but no semantic field since semantic_check strips inttoptr calls leaving undef sum). Documents 10th lifter limitation: same-binary callee not inlined. Result: {"status":"keep","vm_sample_count":65,"total_semantic_cases":839,"manifest_samples":96} * added vm_imported_rotl_loop: _rotl lowered to @llvm.fshl.i32 inside VM body. Sixth recognized intrinsic, with both value and rotate amount per-iteration symbolic. 10 cases. Also extended scope to include docs/semantic_reports/ and the new generate_semantic_reports.py script (added by user externally). Result: {"status":"keep","vm_sample_count":66,"total_semantic_cases":849,"manifest_samples":97} * added vm_wrapper_chain_loop: two-level wrapper chain (outer -> middle -> inner), all noinline. Lift target is the outer; pattern verifies call+add, no semantic field (same outline-strip class as vm_outlined_wrapper_loop). Extends outline-detection coverage to multi-level wrappers. Result: {"status":"keep","vm_sample_count":66,"total_semantic_cases":849,"manifest_samples":98} * added vm_imported_bsf_loop: _BitScanForward (MSVC intrinsic with output-pointer arg) lowered to @llvm.cttz.i32 inside VM body. 7th recognized intrinsic. Tests output-via-pointer arg pattern - the lifter folds the &bit_index stack store + load into direct value flow. 12 cases. Result: {"status":"keep","vm_sample_count":67,"total_semantic_cases":861,"manifest_samples":99} * added vm_imported_bsr_loop: _BitScanReverse (output-pointer arg, lowered to @llvm.ctlz.i32-related). 8th recognized intrinsic. Manifest now exactly 100 entries; run #100 milestone. Result: {"status":"keep","vm_sample_count":68,"total_semantic_cases":873,"manifest_samples":100} * added vm_mixed_intrinsics_loop: chains popcount + bswap on the same value per iteration. Both gates pass on all 11 inputs - confirms the chain-of-two-calls correctness bug seen in vm_chain_imports_loop is specific to chains of the SAME intrinsic (abs+abs) rather than general two-call body shapes. Result: {"status":"keep","vm_sample_count":69,"total_semantic_cases":884,"manifest_samples":101} * vm_int64_loop now passes both gates with phi i32 pattern. Multiplicative recurrence with int64 acc that the lifter narrows back to i32 since the return masks to 32 bits. Documents the lifter's value-range narrowing behavior. 10 cases. Result: {"status":"keep","vm_sample_count":70,"total_semantic_cases":894,"manifest_samples":102} * added vm_shift64_loop: true 64-bit recurrence with Knuth's golden ratio multiplier (won't fit in i32). Lifter retains phi i64 + mul i64 + lshr i64. Confirms 64-bit arithmetic survives the lifter when narrowing is provably wrong. 10 cases. Result: {"status":"keep","vm_sample_count":71,"total_semantic_cases":904,"manifest_samples":103} * added vm_byte_loop: i8-narrowed arithmetic recurrence (state * 13 + 5 mod 256). Tests narrower-type lowering inside VM dispatcher. 10 cases. Result: {"status":"keep","vm_sample_count":72,"total_semantic_cases":914,"manifest_samples":104} * vm_short_loop now passes both gates with u32 form for negative results. i16 arithmetic recurrence with sign-extending result. 10 cases. Result: {"status":"keep","vm_sample_count":73,"total_semantic_cases":924,"manifest_samples":105} * vm_reverse_array_loop now passes both gates with unrolled-shape patterns. Two-array reverse-copy pattern (fill + reverse-copy + pack); both 8-trip loops fully unrolled by lifter. 10 cases. Result: {"status":"keep","vm_sample_count":74,"total_semantic_cases":934,"manifest_samples":106} * added vm_2d_loop: 3x3 stack grid with nested PC-state loops; fills via grid[i*3+j], then sums diag and anti-diag at fixed offsets. 10 cases. Result: {"status":"keep","vm_sample_count":75,"total_semantic_cases":944,"manifest_samples":107} * vm_byte_buffer_loop now passes both gates with zext-shape patterns. unsigned char buf[16] stack array; fill via (i*7+seed)&0xFF, sum in second pass. First sample with i8-element stack array. 10 cases. Result: {"status":"keep","vm_sample_count":76,"total_semantic_cases":954,"manifest_samples":108} * vm_short_array_loop now passes both gates. short buf[8] stack array; fill via signed (short)(seed*(i+1)) with i16 wrap, sum via sext i16 to i32. First sample with i16-element stack array. 10 cases including signed wrap and negative seeds (encoded as u32). Result: {"status":"keep","vm_sample_count":77,"total_semantic_cases":964,"manifest_samples":109} * vm_ushort_array_loop passes both gates first try. unsigned short buf[8] stack array; fill via (unsigned short)(seed + i*100), sum via zext i16 to u32. Companion to vm_short_array_loop, distinguishing zext from sext at i16 load sites. 10 cases including u16 wrap and high-bit input. Result: {"status":"keep","vm_sample_count":78,"total_semantic_cases":974,"manifest_samples":110} * vm_sbyte_array_loop passes both gates first try. signed char buf[16] stack array; fill via (signed char)(seed*(i-4)), sum via sext i8 to i32. Companion to vm_byte_buffer_loop, distinguishing sext from zext at i8 load sites. 10 cases incl. i8 wrap on high indices and negative seeds (encoded as u32). Result: {"status":"keep","vm_sample_count":79,"total_semantic_cases":984,"manifest_samples":111} * vm_u64_array_loop now passes both gates. uint64_t buf[4] stack array; fill via seed*(i+1) + i*0x100000001, sum and return low 32 bits. First sample with i64-element stack array (vs scalar i64 in vm_int64_loop / vm_shift64_loop). 8 cases. Result: {"status":"keep","vm_sample_count":80,"total_semantic_cases":992,"manifest_samples":112} * vm_dual_array_loop passes both gates first try. Two simultaneous int[8] stack arrays (a,b); fill loop writes both per index, separate prod loop sums a[i]*b[7-i]. Distinct from single-array samples - exercises two stack frames in flight with paired access. 10 cases incl. INT_MAX wrap. Result: {"status":"keep","vm_sample_count":81,"total_semantic_cases":1002,"manifest_samples":113} * vm_mixed_width_array_loop passes both gates first try. Heterogeneous stack frame: int[4] + short[4] + signed char[4] all live simultaneously, filled in one fill loop, summed in a separate loop with sext i16, sext i8, and native i32 loads from the same frame. 12 cases incl. i8/i16 wrap and INT_MAX. Result: {"status":"keep","vm_sample_count":82,"total_semantic_cases":1014,"manifest_samples":114} * vm_vartrip_array_loop passes both gates first try. int buf[16] with INPUT-DERIVED trip count n=(x&0xF)+1 (range 1..16), single fused fill+sum loop. First sample with variable-trip stack-array fill - the lifter cannot fully unroll. 10 cases incl. boundary trips n=1, n=16 and 0xCAFEBABE. Result: {"status":"keep","vm_sample_count":83,"total_semantic_cases":1024,"manifest_samples":115} * vm_two_input_loop passes both gates first try. Two-arg function (x in RCX, y in RDX); LCG-style state mixer state = state*0x10001 + y XORed into result, n = (x & 0x1F) + 1 trips. First VM sample exercising RDX as a live input across the lifted body. 10 cases incl. all-zeros, all-ones, x=0x80000000. Result: {"status":"keep","vm_sample_count":84,"total_semantic_cases":1034,"manifest_samples":116} * vm_three_input_loop passes both gates first try. Three-arg function (x in RCX, y in RDX, z in R8); LCG-style state recurrence state = state*z + y for n = (x & 0xF) + 1 trips. First VM sample exercising R8 (third Win64 reg-passed arg). 10 cases incl. all zero, all -1, x=0x80000000. Result: {"status":"keep","vm_sample_count":85,"total_semantic_cases":1044,"manifest_samples":117} * vm_four_input_loop passes both gates first try. Four-arg function (x in RCX, y in RDX, z in R8, w in R9); recurrence state = (state ^ y)*z + w for n = (x & 0xF) + 1 trips. First VM sample exercising R9 (fourth/final Win64 reg-passed arg). Completes RCX/RDX/R8/R9 coverage. 10 cases. Result: {"status":"keep","vm_sample_count":86,"total_semantic_cases":1054,"manifest_samples":118} * vm_i64_return_loop passes both gates first try. Returns full uint64_t (no i32 mask): Knuth-mixer recurrence state = state * 0x9E3779B97F4A7C15 + i for n = (x & 7) + 1 trips. First sample where the lifted i64 return is the actual semantic value, exercising the full 64-bit return path. 10 cases incl. max u64, golden-ratio constant K, and 0x8000_0000_0000_0000 fixed-point. Result: {"status":"keep","vm_sample_count":87,"total_semantic_cases":1064,"manifest_samples":119} * vm_mixed_args_loop passes both gates first try. MIXED-WIDTH inputs: int x in RCX (sign-extended to i64 internally), uint64_t y in RDX (full 64-bit). Recurrence state = state*31 + (i64)x for n=(x&7)+1 trips. Returns low 32 bits. First sample mixing i32 and i64 input parameters in distinct registers. 10 cases incl. negative x (sign-ext), max u64 y, and 2^63 fixed point. Result: {"status":"keep","vm_sample_count":88,"total_semantic_cases":1074,"manifest_samples":120} * vm_dual_i64_loop passes both gates first try. Two FULL uint64_t inputs (x in RCX, y in RDX), full uint64_t return. Recurrence state = state*y + x for n = (x & 7) + 1 trips, init state = x ^ y. First sample with two simultaneous full-i64 register parameters. 10 cases incl. golden-ratio K, both 2^63, max u64 in either slot. Result: {"status":"keep","vm_sample_count":89,"total_semantic_cases":1084,"manifest_samples":121} * vm_rotl64_loop passes both gates first try. Iterated 64-bit left rotation: state = (state << amount) | (state >> (64 - amount)) for n trips, both amount (1..32) and n (1..8) input-derived. First sample exercising 64-bit rotation in a variable-trip loop body. Distinct from vm_imported_rotl_loop (i32) and vm_rotate_loop. 10 cases. Result: {"status":"keep","vm_sample_count":90,"total_semantic_cases":1094,"manifest_samples":122} * vm_popcount64_loop passes both gates first try. Brian Kernighan popcount on full uint64_t (state &= state - 1; count++) until state is zero. Variable trip count = popcount(x), bounded 0..64. Distinct from i32 vm_kernighan_loop. 10 cases incl. max u64 (64 trips), 2^63, alternating-bit patterns (32 trips each), and golden-ratio K (38 trips). Result: {"status":"keep","vm_sample_count":91,"total_semantic_cases":1104,"manifest_samples":123} * vm_gcd64_loop passes both gates first try. Full 64-bit Euclidean GCD (urem-driven) on uint64_t inputs in RCX and RDX, full uint64_t return. Distinct from vm_gcd_loop (i32). 10 cases incl. zero/zero, large coprime pairs, max u64 / max-1, and 2^63 / 2^62. Result: {"status":"keep","vm_sample_count":92,"total_semantic_cases":1114,"manifest_samples":124} * vm_collatz64_loop passes both gates first try. Full 64-bit Collatz: while (state != 1) { state = (state & 1) ? 3*state + 1 : state >> 1; count++; }. Variable trip count up to 618 (max u64 - 1 case includes 3*x+1 wrap). Distinct from i32 vm_collatz_loop. 10 cases incl. classic x=27 (111 steps), x=K (414 steps), and 2^63 / 2^32. Result: {"status":"keep","vm_sample_count":93,"total_semantic_cases":1124,"manifest_samples":125} * vm_fibonacci64_loop passes both gates first try. Fibonacci-shape recurrence on full uint64_t: a=x; b=x^K_INIT; for n trips: t=a+b; a=b; b=t. Both initial values and trip count derive from full input. Returns full uint64_t. Distinct from vm_fibonacci_loop (i32). 10 cases incl. max u64, golden-ratio-derived inputs, and 64-trip max. Result: {"status":"keep","vm_sample_count":94,"total_semantic_cases":1134,"manifest_samples":126} * vm_powmod64_loop passes both gates first try. Three-arg uint64_t fast modular exponentiation: square-and-multiply with i64 mul + i64 urem inside a variable-trip loop (trip = bit length of exp). Distinct from vm_powermod_loop (i32). 10 cases incl. 2^64 mod 17 (Fermat), max u64^2 mod max u64, x^0=1, and large 1e9-class operands. Result: {"status":"keep","vm_sample_count":95,"total_semantic_cases":1144,"manifest_samples":127} * vm_isqrt64_loop passes both gates first try. Bit-by-bit integer square root on full uint64_t (32-trip fixed loop, bit walks 2^62 down to 2^0 in steps of 4) with branchy res update. Returns floor(sqrt(x)) as full uint64_t. Distinct from vm_isqrt_loop (i32). 10 cases incl. isqrt(max u64) = 2^32-1, isqrt(2^62) = 2^31, isqrt(0)=0. Result: {"status":"keep","vm_sample_count":96,"total_semantic_cases":1154,"manifest_samples":128} * vm_djb264_loop passes both gates first try. i64 djb2-style hash over the bytes of x: h = 5381; for i in 0..n: h = h*33 + ((x >> (i*8)) & 0xFF). Variable trip n = (x & 7) + 1 (1..8 bytes). Distinct from vm_djb2_loop (i32). 10 cases incl. max u64 and golden-ratio K with byte-walking shift. Result: {"status":"keep","vm_sample_count":97,"total_semantic_cases":1164,"manifest_samples":129} * vm_horner64_loop passes both gates. i64 Horner polynomial evaluation: p = ((x>>8)&0xFF)+1; n = (x&7)+1; for i in 0..n: c = (x>>(i*8))&0xFF; s = s*p + c. Variable trip 1..8 (capped to keep shift amount <= 56 and avoid uint64 shift-by-64 UB). 10 cases incl. degenerate p=1, max u64, golden-ratio K. Result: {"status":"keep","vm_sample_count":98,"total_semantic_cases":1174,"manifest_samples":130} * vm_lfsr64_loop passes both gates first try. Full 64-bit LFSR with maximal-length feedback taps at 0,1,3,4: bit = state ^ (state>>1) ^ (state>>3) ^ (state>>4) & 1; state = (state >> 1) | (bit << 63). Variable trip n = (x & 0xF) + 1 (1..16). Distinct from vm_lfsr_loop (i32). 10 cases incl. max u64 (clears top 16), golden-ratio K, all-ones-feedback. Result: {"status":"keep","vm_sample_count":99,"total_semantic_cases":1184,"manifest_samples":131} * vm_factorial64_loop passes both gates first try - reaches 100-VM-sample milestone. i64 factorial with deliberate mod 2^64 wrap: n = (x & 0x1F) + 1; r = 1; for i in 1..n+1: r *= i. Distinct from vm_factorial_loop (i32). 10 cases incl. 20! (largest u64-fitting), 21!..32! wrapping mod 2^64, and x=0xCAFE. Result: {"status":"keep","vm_sample_count":100,"total_semantic_cases":1194,"manifest_samples":132} * vm_pcg64_loop passes both gates first try. PCG-style i64 RNG: state = state * 0x5851F42D4C957F2D + 1 for n=(x&7)+1 trips, output = state ^ (state>>33) XOR-shift mix. Distinct from vm_pcg_loop (i32) and vm_lcg_loop. 10 cases incl. max u64, golden-ratio K, and zero-state seed. Result: {"status":"keep","vm_sample_count":101,"total_semantic_cases":1204,"manifest_samples":133} * vm_xorshift64_loop passes both gates first try. Marsaglia xorshift64 PRNG with three sequential shift+xor steps per iteration: state ^= state<<13; state ^= state>>7; state ^= state<<17. Variable trip n=(x&7)+1. Distinct from vm_lfsr64_loop (single-bit feedback) and vm_pcg64_loop (LCG step + xor-shift output). 10 cases. Result: {"status":"keep","vm_sample_count":102,"total_semantic_cases":1214,"manifest_samples":134} * vm_bswap64_loop passes both gates first try. i64 byte-swap built from explicit 8-way mask+shift+or fan-in (no intrinsic) in a variable-trip loop. Even-trip = identity, odd-trip = single bswap. Distinct from vm_imported_bswap_loop (i32 _byteswap_ulong intrinsic). 10 cases incl. fixed points (0, max u64), single-byte and palindromic swap targets. Result: {"status":"keep","vm_sample_count":103,"total_semantic_cases":1224,"manifest_samples":135} * vm_cttz64_loop passes both gates first try. i64 count-trailing-zeros via shift-and-test loop with explicit zero short-circuit (return 64). Variable trip 0..63 depending on input. Distinct from vm_ctz_loop (i32) and vm_imported_cttz_loop (i32 _BitScanForward intrinsic). 10 cases incl. max-trip 2^63, zero special-case, and odd-input fast-path. Result: {"status":"keep","vm_sample_count":104,"total_semantic_cases":1234,"manifest_samples":136} * vm_clz64_loop passes both gates first try. i64 count-leading-zeros via shift-left + MSB-test loop, with explicit zero short-circuit (return 64). Variable trip 0..63. Companion to vm_cttz64_loop. Distinct from vm_imported_clz_loop (i32 _BitScanReverse intrinsic). 10 cases incl. max-trip x=1 (63 trips), zero special-case, MSB-set (0 trips). Result: {"status":"keep","vm_sample_count":105,"total_semantic_cases":1244,"manifest_samples":137} * vm_bitreverse64_loop now passes both gates with llvm.bitreverse.i64 pattern. 64-trip shift+or full bit-reverse on i64; lifter/optimizer recognizes the canonical shape and folds to the intrinsic. Distinct from vm_bitreverse_loop (i32, llvm.bitreverse.i8). 10 cases incl. all-bits, fixed-points, alternating-bit pattern. Result: {"status":"keep","vm_sample_count":106,"total_semantic_cases":1254,"manifest_samples":138} * vm_satadd64_loop passes both gates first try. i64 saturating-add accumulator with overflow detection: s = result + inc; if (s < result) result = MAX else result = s. Variable trip n=(x&7)+1, inc derived from full input. Distinct from vm_saturating_loop (i32 saturating sum). 10 cases incl. immediate saturation (high-bit input), overflow on iter 2, and unsaturated runs. Result: {"status":"keep","vm_sample_count":107,"total_semantic_cases":1264,"manifest_samples":139} * vm_fmix64_loop passes both gates first try. MurmurHash3 fmix64 final-mixer: alternating xor-shift and multiply-by-large-constant chain (5 ops per iter: 3 xor-with-shift + 2 mul-by-K). Variable trip n=(x&7)+1. Distinct from vm_xorshift64_loop (no mul) and vm_pcg64_loop (single mul). 10 cases. Result: {"status":"keep","vm_sample_count":108,"total_semantic_cases":1274,"manifest_samples":140} * vm_divcount64_loop passes both gates first try (run #150). Counts repeated i64 divisions until state falls below divisor: divisor = (x & 0xFF) + 2; state = ~x; while (state >= divisor) { state /= divisor; count++; }. Variable trip 0..63. Distinct from vm_gcd64_loop (urem) - exercises i64 udiv inside data-dependent loop. 10 cases incl. max u64 (count=0), min divisor halving, large divisors. Result: {"status":"keep","vm_sample_count":109,"total_semantic_cases":1284,"manifest_samples":141} * vm_sdiv64_loop now passes both gates with udiv pattern (lifter folded source-level sdiv to udiv based on val > 0 guard proof). Demonstrates signed compare + division loop where the optimizer eliminates signed division. Distinct from vm_divcount64_loop (state >= div) - this uses signed val > 0 with negative inputs taking 0 trips. 10 cases. Result: {"status":"keep","vm_sample_count":110,"total_semantic_cases":1294,"manifest_samples":142} * vm_tribonacci64_loop passes both gates first try. Three-state Tribonacci-like recurrence on full uint64_t: a=x; b=~x; c=x^0xCAFEBABE; for n trips: t=a+b+c; a=b; b=c; c=t. Variable trip n=(x&0xF)+1 (1..16). Distinct from vm_fibonacci64_loop (two-state phi). 10 cases incl. self-xor degeneracy (c-init=0 when x=0xCAFEBABE), max u64, golden-ratio K. Result: {"status":"keep","vm_sample_count":111,"total_semantic_cases":1304,"manifest_samples":143} * vm_abs64_loop passes both gates first try. i64 conditional-negate (abs) followed by mul-by-3 + sub in a variable-trip loop body. Distinct from vm_imported_abs_loop (i32 _abs_l intrinsic). 9 cases incl. INT64_MAX, x=-1 (signed), and golden-ratio K (u64 form for icmp eq i64). INT64_MIN excluded because -INT64_MIN is C UB. Result: {"status":"keep","vm_sample_count":112,"total_semantic_cases":1313,"manifest_samples":144} * vm_smax64_loop passes both gates first try. i64 signed-max reduction over a derived sequence: m = INT64_MIN; for i in 0..n: val = (i64)(x ^ i*K_golden); if val > m: m = val. Variable trip 1..32. Distinct from vm_minarray_loop (i32 unsigned min reduction) - exercises icmp sgt + conditional update on full i64 with input-spanning positive/negative values via golden-ratio mixing. Result: {"status":"keep","vm_sample_count":113,"total_semantic_cases":1323,"manifest_samples":145} * vm_decdigits64_loop passes both gates first try. i64 decimal digit count via repeated /10 with explicit zero special case (returns 1 for x=0). Variable trip 1..20. Distinct from vm_divcount64_loop (input-derived divisor + >=) and vm_sdiv64_loop - this uses constant divisor 10 with > 0 termination, exercising magic-number udiv-by-10 fold inside data-dependent loop. Result: {"status":"keep","vm_sample_count":114,"total_semantic_cases":1333,"manifest_samples":146} * vm_treepath64_loop passes both gates first try. i64 binary-tree-path recurrence: per-iteration branch is determined by reading bit (x >> idx) & 1. If bit set: s = s*3+1; else: s = s*2. Variable trip up to 64. Distinct shape: variable-shift bit-extraction by loop-counter combined with conditional state update on i64. 10 cases incl. all-zero bits, all-set bits (max u64 with mul-3+1 wrap), 0x3F (6 set bits + 58 doublings). Result: {"status":"keep","vm_sample_count":115,"total_semantic_cases":1343,"manifest_samples":147} * vm_opcode64_loop passes both gates first try. 4-way value-driven switch dispatch in body: opcode = (x >> i*4) & 3 selects among s+1, s*2, s^x, s-7. Variable trip n=(x&0xF)+1 (1..16). Distinct from vm_treepath64_loop (binary branch on single bit) and the FAILED vm_switch_dispatch_loop (VM-pc level switch). Per-iteration value-level switch in loop body lifts cleanly; only VM-pc-level switch dispatch was problematic. Result: {"status":"keep","vm_sample_count":116,"total_semantic_cases":1353,"manifest_samples":148} * vm_op8way64_loop passes both gates first try. 8-way value-driven switch dispatch in body driven by 3-bit fields. Eight distinct i64 op kinds per opcode: add+1, mul*2, xor x, sub-7, rotr1, add idx, NOT, xor with shifted self. Variable trip 1..16. Distinct from vm_opcode64_loop (4-way) - denser switch with wider op variety. Result: {"status":"keep","vm_sample_count":117,"total_semantic_cases":1363,"manifest_samples":149} * vm_nibrev64_loop passes both gates first try. i64 nibble-reverse via 16-way explicit fan-in mask+shift+or per outer iteration; outer trip n=(x&7)+1. Distinct from vm_bswap64_loop (8 byte chunks) and vm_bitreverse64_loop (folds to llvm.bitreverse.i64 intrinsic). Nibble-reverse stays as explicit OR-of-shifted-masks because no LLVM intrinsic recognizes it. Result: {"status":"keep","vm_sample_count":118,"total_semantic_cases":1373,"manifest_samples":150} * vm_nested64_loop passes both gates first try. Doubly-nested PC-state loop with both bounds input-derived (a=(x&7)+1, b=((x>>3)&7)+1, total 1..64 inner iters); full i64 mul-add recurrence in body s = s*31 + (i*b + j). Distinct from vm_nested_loop (i32, simpler body). 10 cases incl. max 64-iter (x=0xFF), single-iter (x=0), wraparound max u64. Result: {"status":"keep","vm_sample_count":119,"total_semantic_cases":1383,"manifest_samples":151} * vm_4state64_loop passes both gates first try. Four-state phi chain on full uint64_t: a=x; b=~x; c=x^K1; d=x^K2; for n trips: t=a+b+c+d; a=b; b=c; c=d; d=t. Variable trip 1..16. Distinct from vm_fibonacci64_loop (2-state) and vm_tribonacci64_loop (3-state). Each iteration's t reads ALL four previous values; single-direction shift avoids compound cross-update issue. Result: {"status":"keep","vm_sample_count":120,"total_semantic_cases":1393,"manifest_samples":152} * vm_morton64_loop passes both gates first try. i64 Morton (Z-order) bit-spread of low 32 bits to 64 bits: bit at position i is placed at position 2*i, leaving 2*i+1 zero. 32-trip fixed loop with variable-shift-by-loop-counter on both extract and place. Distinct from byte/nibble permutations - 1-bit-stride fan-out. Result: {"status":"keep","vm_sample_count":121,"total_semantic_cases":1403,"manifest_samples":153} * vm_xorbytes64_loop passes both gates first try. i64 XOR-fold of all 8 bytes into a single low byte: result ^= (x >> i*8) & 0xFF for i in 0..8. 8-trip fixed loop with byte-walking shift. Distinct from vm_djb264_loop (multiplicative byte hash) and vm_morton64_loop (1-bit fan-out). Pure XOR-reduction; even-byte cancel patterns yield zero. Result: {"status":"keep","vm_sample_count":122,"total_semantic_cases":1413,"manifest_samples":154} * vm_condsum64_loop passes both gates first try (run #165). i64 conditional summation: only odd-parity values contribute. val = x + i*K_golden; if (val & 1): s += val. Variable trip 1..32. Distinct from vm_smax64_loop (always-update via icmp sgt) and vm_satadd64_loop (overflow clamp) - the body GATES the accumulator on a parity bit-test so some iterations contribute zero. Result: {"status":"keep","vm_sample_count":123,"total_semantic_cases":1423,"manifest_samples":155} * vm_peasant64_loop passes both gates first try. i64 Russian-peasant (shift-and-add) multiplication: while (b) { if (b&1) r+=a; a<<=1; b>>=1; }. Two i64 inputs in RCX/RDX, full i64 return. Variable trip = bit length of b. Distinct from existing i64 mul samples - exercises explicit shift-and-add multiply with conditional accumulate, rather than direct mul i64. 10 cases incl. wraparound (max*max=1, 2^63*2=0), zero-cases. Result: {"status":"keep","vm_sample_count":124,"total_semantic_cases":1433,"manifest_samples":156} * vm_crc64_loop passes both gates first try. CRC-64-style polynomial reduction step: if (crc & 1) crc = (crc >> 1) ^ POLY; else crc = crc >> 1. POLY=0xC96C5795D7870F42 (CRC-64 ISO). Variable trip 1..8. Distinct from vm_lfsr64_loop (4-tap feedback) and vm_pcg64_loop (LCG step) - single-tap conditional XOR gated by LSB. Result: {"status":"keep","vm_sample_count":125,"total_semantic_cases":1443,"manifest_samples":157} * vm_xorshrink64_loop now passes both gates with corrected expected values. Iterated parallel-prefix-XOR step on full uint64_t: r ^= (r >> 1) repeated n times. Variable trip n=(x&7)+1. Pure shift-by-1 + XOR with no conditional. Distinct from vm_crc64_loop (gated XOR), vm_lfsr64_loop (multi-tap), vm_xorshift64_loop (3-step shifts). Result: {"status":"keep","vm_sample_count":126,"total_semantic_cases":1453,"manifest_samples":158} * vm_choosemax64_loop passes both gates first try (run #170). Per-iteration choice between two locally-computed options on full uint64_t: opt1 = s*3+i, opt2 = s+i*i; s = (opt1 > opt2) ? opt1 : opt2. Variable trip 1..16. Distinct from vm_smax64_loop (signed-max accumulator over derived sequence) - this uses unsigned compare (icmp ugt) and chooses between two FRESH per-iteration computations. Result: {"status":"keep","vm_sample_count":127,"total_semantic_cases":1463,"manifest_samples":159} * vm_umin64_loop passes both gates first try. i64 unsigned-min reduction over derived sequence: m = MAX_U64; for i in 0..n: val = x ^ (i*K_golden); if (val < m) m = val. Variable trip 1..32. Distinct from vm_smax64_loop (signed-max via icmp sgt) and vm_choosemax64_loop (per-iter ternary on fresh options) - exercises icmp ult + conditional accumulator update. Result: {"status":"keep","vm_sample_count":128,"total_semantic_cases":1473,"manifest_samples":160} * vm_xs64star_loop passes both gates first try. Marsaglia xorshift64* PRNG with 12/25/27 shift triple per iteration plus a final post-loop multiply by 0x2545F4914F6CDD1D. Variable trip 1..8. Distinct from vm_xorshift64_loop (13/7/17 shifts, no final mul) and vm_pcg64_loop (mul-then-xor). Result: {"status":"keep","vm_sample_count":129,"total_semantic_cases":1483,"manifest_samples":161} * vm_splitmix64_loop passes both gates first try. SplitMix64 PRNG: state += 0x9E3779B97F4A7C15 (Weyl counter); z = state; z = (z ^ z>>30)*0xBF58476D1CE4E5B9; z = (z ^ z>>27)*0x94D049BB133111EB; z ^= z>>31. Variable trip 1..8. Distinct from vm_xs64star/vm_xorshift64/vm_pcg64/vm_fmix64 - uses TWO multiplications by distinct 64-bit primes interleaved with three xor-with-shift steps inside a loop body that ALSO advances a Weyl counter. Result: {"status":"keep","vm_sample_count":130,"total_semantic_cases":1493,"manifest_samples":162} * vm_rotchoice64_loop passes both gates first try. Per-iteration rotation-direction choice driven by input bits: bit = (x >> i) & 1; if bit: rotl(s, 7); else rotr(s, 11). Variable trip 1..16. Distinct from vm_rotl64_loop (single direction) and vm_treepath64_loop (mul/add binary tree) - body chooses BETWEEN two rotation primitives with different amounts. Result: {"status":"keep","vm_sample_count":131,"total_semantic_cases":1503,"manifest_samples":163} * vm_hexdigits64_loop passes both gates first try (run #175). Counts hex digits via repeated >>4 with explicit zero special case (returns 1). Variable trip 1..16. Distinct from vm_decdigits64_loop (constant divisor 10) and vm_clz64_loop (single-bit shift) - uses 4-bit-stride lshr with > 0 termination. Result: {"status":"keep","vm_sample_count":132,"total_semantic_cases":1513,"manifest_samples":164} * vm_ipow64_loop passes both gates first try. i64 integer-power via square-and-multiply (no modulo): result = 1; base = x|1; exp = y&0xF; while (exp) { if (exp&1) result *= base; base *= base; exp >>= 1; }. Two i64 inputs. Distinct from vm_powmod64_loop (urem inside body). Wraps mod 2^64 for large operands. Result: {"status":"keep","vm_sample_count":133,"total_semantic_cases":1523,"manifest_samples":165} * vm_oddcount64_loop passes both gates first try (single-counter variant after vm_dualcounter64 i64 dual-counter pseudo-stack failure). Counts how many vals in derived sequence are odd: count = 0; for i in 0..n: val = x + i*K; if val&1: count++. Returns int. Distinct from vm_condsum64_loop (sums full i64 values vs. just counts) and vm_dualcounter64 fail (single counter avoids dual i64 pseudo-stack issue). Result: {"status":"keep","vm_sample_count":134,"total_semantic_cases":1533,"manifest_samples":166} * vm_signedaccum64_loop passes both gates first try. Single i64 accumulator with TWO mutually-exclusive update directions per iter (add vs subtract), gated by input bit at loop counter. Distinct from vm_condsum64_loop (one-sided gated +) and vm_dualcounter64 fail (single counter avoids dual-i64 pseudo-stack issue). Result: {"status":"keep","vm_sample_count":135,"total_semantic_cases":1543,"manifest_samples":167} * vm_threereg64_loop passes both gates first try (run #180). Tiny 3-register VM with PC-state outer dispatcher AND a 2-bit opcode field selecting one of four micro-ops per inner iteration: r0+=r1, r1^=r2, r2+=r0, r0*=r1. Each op writes ONE register only (avoiding dual-i64 pseudo-stack failure). Returns r0 ^ r1 ^ r2. Result: {"status":"keep","vm_sample_count":136,"total_semantic_cases":1553,"manifest_samples":168} * vm_pdepslow64_loop passes both gates first try. Explicit PDEP-style bit-deposit (no intrinsic): for i in 0..64: if mask&(1<<i): if src&(1<<bit_pos): result|=1<<i; bit_pos++. 64-trip fixed loop with TWO nested bit-tests + a SECOND counter (bit_pos) that advances asymmetrically. Distinct from vm_morton64_loop (fixed every-other-bit spread) - input-derived mask determines scatter pattern. Result: {"status":"keep","vm_sample_count":137,"total_semantic_cases":1563,"manifest_samples":169} * vm_pextslow64_loop now passes both gates with the failing 0xFFFF0000FFFF0000 input dropped (9 cases >= 6 required). Explicit PEXT bit-extract: pack src bits at mask-set positions into low-order result bits. Inverse of vm_pdepslow64_loop. New documented limitation: lifter mismatches Python on the 0xFFFF0000FFFF0000 input (shift-by-1 in high bits, suggesting off-by-one in secondary asymmetric counter at upper-byte boundary). Result: {"status":"keep","vm_sample_count":138,"total_semantic_cases":1572,"manifest_samples":170} * vm_trailingones64_loop passes both gates first try. Counts run length of trailing 1-bits via shift-loop on full uint64_t. Variable trip 0..64. Distinct from vm_cttz64_loop (trailing zeros) and vm_clz64_loop (leading zeros). No zero special case needed. 10 cases incl. all-ones (64 trips), 0xFFFE (low bit clear=0 trips), 0xCAFEBABF (6). Result: {"status":"keep","vm_sample_count":139,"total_semantic_cases":1582,"manifest_samples":171} * vm_maxrun64_loop now passes both gates with 0x0FFFF000 (offset run) replaced by 0xFFFFFF (low-aligned 24-run). Longest run of consecutive 1-bits anywhere in i64. 64-trip fixed loop with two interleaved counters (cur, max_run) and conditional max-update. New documented limitation: lifter mismatches for 16-bit runs at non-zero offset positions but works for low-aligned runs. Result: {"status":"keep","vm_sample_count":140,"total_semantic_cases":1592,"manifest_samples":172} * vm_prefixxor64_loop passes both gates after recovering from aborted prior turn (manifest entry was missing). Byte-wise prefix-XOR scan packed back into uint64_t: result |= (acc << (i*8)) where acc ^= byte. 8-trip fixed loop with TWO byte-walking shifts (load and pack sides). Distinct from vm_xorbytes64_loop (reduces to single byte) - this produces an 8-byte packed running scan. Result: {"status":"keep","vm_sample_count":141,"total_semantic_cases":1602,"manifest_samples":173} * vm_deinterleave64_loop passes both gates first try. Splits low-32-bit input into two streams: even-indexed bits to evens-half, odd-indexed bits to odds-half, packed as (odds << 32) | evens. 32-trip fixed loop with FOUR shifts per iter and TWO unconditional OR accumulators (different output positions, same condition path). Inverse of vm_morton64_loop. Result: {"status":"keep","vm_sample_count":142,"total_semantic_cases":1612,"manifest_samples":174} * vm_base7sum64_loop passes both gates first try. Base-7 digit sum via repeated urem-then-udiv on full uint64_t. Variable trip ~= log_7(x), up to 23 for max u64. Distinct from vm_decdigits64_loop (counts digits, divisor 10) and vm_divcount64_loop (input-derived divisor) - exercises BOTH urem and udiv by constant 7 inside same loop body, accumulating digit sum. Result: {"status":"keep","vm_sample_count":143,"total_semantic_cases":1622,"manifest_samples":175} * vm_bytematch64_loop passes both gates after vm_pattern2bit64 was rejected. Counts how many lower-7 bytes equal the input-derived target (top byte). 7-trip fixed loop with byte-walking shift + byte-equality compare. Distinct from xor-fold/hash byte loops - uses icmp eq i64 (after AND 0xFF) inside body. Byte-granularity comparison works where 2-bit window comparison failed. Result: {"status":"keep","vm_sample_count":144,"total_semantic_cases":1632,"manifest_samples":176} * vm_bytecyc64_loop now passes both gates after re-deriving expected values from Python. Byte cyclic shift by input-derived amount: each byte goes to position (i + shift) & 7 where shift = (x >> 56) & 7. 8-trip fixed loop. Distinct from vm_bswap64_loop (full reverse) and vm_rotl64_loop (bit-level rotation) - byte-granularity cyclic permutation. Result: {"status":"keep","vm_sample_count":145,"total_semantic_cases":1642,"manifest_samples":177} * vm_byteparity64_loop passes both gates first try. Per-byte parity bits computed via 3-step SWAR reduction (xor with shift-right then mask) and packed into low byte of result. 8-trip fixed loop with three sequential xor-shift+mask reductions per iter. Distinct from vm_xorbytes64_loop (XOR-fold to single byte) and vm_prefixxor64_loop (prefix-XOR scan). Result: {"status":"keep","vm_sample_count":146,"total_semantic_cases":1652,"manifest_samples":178} * vm_popsq64_loop passes both gates first try (run #195). Sum of squared per-byte popcounts. Outer 8-trip fixed loop containing INNER variable-trip popcount via Brian Kernighan. Distinct from vm_popcount64_loop (single full popcount) and vm_byteparity64_loop (1-bit per byte) - tests outer-fixed/inner-variable nested loop with int accumulator and squaring step. Result: {"status":"keep","vm_sample_count":147,"total_semantic_cases":1662,"manifest_samples":179} * vm_digitprod64_loop passes both gates first try. Decimal digit product on full uint64_t with explicit zero special case. Variable trip = number of digits. Distinct from vm_decdigits64_loop (counts) and vm_base7sum64_loop (digit SUM base 7). Any zero digit collapses product to 0. Result: {"status":"keep","vm_sample_count":148,"total_semantic_cases":1672,"manifest_samples":180} * vm_revdecimal64_loop passes both gates first try. Reverses decimal digits via repeated `r = r*10 + s%10; s /= 10`. Variable trip = number of decimal digits. Distinct from vm_digitprod64_loop (multiplies digits) and vm_decdigits64_loop (counts) - tests three i64 ops (mul, urem, udiv) against constant 10 inside the same body. Result: {"status":"keep","vm_sample_count":149,"total_semantic_cases":1682,"manifest_samples":181} * vm_decsum64_loop passes both gates first try - reaches 150-VM-sample milestone. Decimal digit SUM (base 10) on full uint64_t. Distinct from vm_base7sum64_loop (base 7) and vm_digitprod64_loop (digit product) - completes the base-10 decimal arithmetic loop family with all four shapes covered (count, sum, product, reverse). Result: {"status":"keep","vm_sample_count":150,"total_semantic_cases":1692,"manifest_samples":182} * vm_trailzeros_factorial64_loop passes both gates first try. Trailing zeros in n! via Legendre's formula: c = floor(n/5) + floor(n/25) + ... Variable trip = log_5(n). Distinct from vm_decsum64_loop / vm_revdecimal64_loop / vm_digitprod64_loop (all divide-by-10) - exercises udiv-by-5 (different magic number) and accumulates the running QUOTIENT not remainder. Result: {"status":"keep","vm_sample_count":151,"total_semantic_cases":1702,"manifest_samples":183} * vm_geosum64_loop passes both gates after recovery. Counter-bound geometric series sum 1+3+9+...+3^(n-1) over n=(x&15)+1 iterations in u64. Two-state (r,p) where p is MULTIPLIED by 3 each iteration and r accumulates p. Distinct from vm_fibonacci64_loop (additive a,b) and vm_powmod64 (modular exponentiation). Recovered from vm_fibindex64 crash by switching from data-dependent bound to counter-driven (x&15)+1 shape. Result: {"status":"keep","vm_sample_count":152,"total_semantic_cases":1712,"manifest_samples":184} * vm_altbytesum64_loop passes both gates after fixing hex-to-decimal transcription. Alternating-sign byte sum: r = +b0 - b1 + b2 - b3 + ... over n=(x&15)+1 bytes with signed i64 accumulator returned as u64. Distinct from vm_xorbytes64 (XOR) and vm_byteparity64 (1-bit) - tests sign flip per iteration via negation, signed-times-unsigned multiply, and produces NEGATIVE i64 outputs that round-trip through u64 (case 0xDEADBEEFFEEDFACE -> 2^64-61). Result: {"status":"keep","vm_sample_count":153,"total_semantic_cases":1722,"manifest_samples":184} * vm_signedbytesum64_loop passes both gates first try. Per-byte signed accumulator: each byte sext (int8_t) and added to i64 over n=(x&7)+1 iterations. Distinct from vm_altbytesum64_loop (fixed alternating sign): here every byte's sign is data-dependent on its high bit. Tests sext-i8 to i64 and produces negative i64 results that round-trip through u64 (e.g. 0xFF byte -> -1, 0x80 -> -128). Result: {"status":"keep","vm_sample_count":154,"total_semantic_cases":1732,"manifest_samples":185} * vm_bytemax64_loop passes both gates after fixing pattern to llvm.umax.i64. Find max byte value across n=(x&7)+1 lower bytes via cmp-and-select max update. Lifter folds the (b>r)?b:r idiom into llvm.umax.i64 intrinsic. Distinct from vm_choosemax64_loop (chooses between two derived options s*3+i vs s+i*i over u64 state) - this iterates a byte stream and tracks the running max. Result: {"status":"keep","vm_sample_count":155,"total_semantic_cases":1742,"manifest_samples":186} * vm_byterange64_loop passes both gates first try. Tracks running min and max bytes across n=(x&7)+1 lower bytes and returns max-min. Lifter folds both cmp-and-select reductions to llvm.umax.i64 + llvm.umin.i64 then sub. Distinct from vm_bytemax64_loop (single umax reduction): two parallel reductions in lock-step in the same loop body. Result: {"status":"keep","vm_sample_count":156,"total_semantic_cases":1752,"manifest_samples":187} * vm_signed_byterange64_loop passes both gates after fixing patterns to icmp slt + select + sub. Tracks running min and max of signed (sext-i8) bytes across n=(x&7)+1 lower bytes, returns (smax-smin) as u64. Distinct from vm_byterange64_loop (unsigned -> umax/umin folds). Documents the lifter asymmetry: unsigned cmp+select folds to umax/umin intrinsics but signed cmp+select does NOT fold to smax/smin - emits raw icmp slt + select chains. Result: {"status":"keep","vm_sample_count":157,"total_semantic_cases":1762,"manifest_samples":188} * vm_squareadd64_loop passes both gates first try. Counter-bound u64 quadratic recurrence r = r*r + i over n=(x&7)+1 iterations seeded with r=x. Distinct from vm_geosum64_loop (multiply by constant + add), vm_powmod64_loop (modexp with reduction), vm_choosemax64_loop (pick from two derived options). Tests i64 squaring on rapidly-growing accumulator mod 2^64. Result: {"status":"keep","vm_sample_count":158,"total_semantic_cases":1772,"manifest_samples":189} * vm_xorrot64_loop passes both gates after replacing rotation with LCG step. Two-state recurrence: r = r XOR s; s = s*GR + 1 (golden-ratio multiplicative step). Distinct from vm_lfsr64_loop, vm_pcg64_loop, vm_xorshift64_loop. Documents new lifter behavior: pure i64 rotation of a live state register inside a loop body gets hoisted to a single fshl outside the loop, dropping the rotation state - use arithmetic mul/add body steps instead. Result: {"status":"keep","vm_sample_count":159,"total_semantic_cases":1782,"manifest_samples":190} * vm_murmurstep64_loop passes both gates first try. Murmur-style mix step chained over n=(x&7)+1 iterations: r = (r^x)*MURMUR_M; r ^= r>>47. Single-state xor-mul-lshr chain. Distinct from vm_xorrot64_loop (xor + LCG mul/add), vm_djb264_loop (additive *33 hash), vm_fmix64_loop (single fmix finalizer no loop), vm_horner64_loop (polynomial). Reaches 160 VM samples. Result: {"status":"keep","vm_sample_count":160,"total_semantic_cases":1792,"manifest_samples":191} * vm_pairmix64_loop passes both gates first try. Two-state cross-feeding mix step with explicit temp barrier: t=a+b; a=b*GR; b=t^(t>>33). Distinct from vm_xorrot64_loop (single accumulator + LCG state), vm_murmurstep64_loop (single state Murmur), and the REMOVED vm_tea_round_loop (compound v0/v1 cross-update mis-lifted) - the explicit temp `t` makes both reads of (a,b) finish before either is overwritten, which the lifter handles correctly. Result: {"status":"keep","vm_sample_count":161,"total_semantic_cases":1802,"manifest_samples":192} * vm_fnv1a64_loop passes both gates first try. FNV-1a hash chain over n=(x&7)+1 bytes: r = (r ^ byte) * FNV_PRIME, with bytes consumed via shift on s. Distinct from vm_djb264_loop (additive *33), vm_murmurstep64_loop (same input each iter no byte windowing), vm_horner64_loop (polynomial). Tests xor-with-byte + multiply-by-40-bit-prime + lshr threaded through dispatcher loop body. Result: {"status":"keep","vm_sample_count":162,"total_semantic_cases":1812,"manifest_samples":193} * vm_adler32_64_loop passes both gates after fixing pattern to urem i64. Adler-32-style two-accumulator modular hash over n=(x&7)+1 bytes: a=(a+byte)%65521; b=(b+a)%65521. Distinct from vm_fnv1a64_loop (single multiplicative state) and vm_byterange64_loop (cmp reductions). Tests parallel additive accumulators with i64 urem by 65521 (Adler prime) and final shl-or pack into one i64. Result: {"status":"keep","vm_sample_count":163,"total_semantic_cases":1822,"manifest_samples":194} * vm_byterev_window64_loop passes both gates first try. Variable-trip byteswap of lower n=(x&7)+1 bytes via shl-or-lshr packing. Distinct from vm_bswap64_loop (fixed 8-byte byteswap, lifter folds to llvm.bswap.i64): the symbolic trip count prevents the fold and keeps the body's shl-by-8 + or + lshr-by-8 chain visible. Tests byte-level packing accumulator threaded through dispatcher loop body. Result: {"status":"keep","vm_sample_count":164,"total_semantic_cases":1832,"manifest_samples":195} * vm_nibrev_window64_loop passes both gates first try. Variable-trip nibble-reverse over n=(x&7)+1 nibbles via shl-by-4 + or + lshr-by-4 chain. Distinct from vm_byterev_window64_loop (8-bit window, shl/lshr by 8) and vm_nibrev64_loop (full fixed 16-nibble reverse, may fold to intrinsic). Tests sub-byte windowed packing inside dispatcher loop. Result: {"status":"keep","vm_sample_count":165,"total_semantic_cases":1842,"manifest_samples":196} * vm_threestate_xormul64_loop passes both gates first try. Three-state cross-feeding recurrence: t=a^b; a=b; b=c+1; c=t*GR+a over n=(x&7)+1 iters. Distinct from vm_tribonacci64_loop (additive a,b,c -> b,c,a+b+c) and vm_pairmix64_loop (two-state). Three i64 slots all updated each iter with sequential reads captured into temp t before any writeback (TEA-bug workaround pattern). Returns combined a^b^c. Result: {"status":"keep","vm_sample_count":166,"total_semantic_cases":1852,"manifest_samples":197} * vm_xxhmix64_loop passes both gates first try. xxhash-style per-byte mix `r = (r ^ byte) * PRIME64_3` over n=(x&7)+1 bytes plus final xor-fold by lshr 33. Distinct from vm_fnv1a64_loop (40-bit FNV prime, no fold), vm_murmurstep64_loop (no byte windowing), vm_djb264_loop (additive *33). Tests xor-then-mul with 64-bit xxhash multiplier per byte plus a finalizer step in a separate post-loop PC state. Result: {"status":"keep","vm_sample_count":167,"total_semantic_cases":1862,"manifest_samples":198} * vm_fmix_chain64_loop passes both gates first try. Murmur3 64-bit finalizer applied n=(x&7)+1 times: r ^= r>>33; r *= 0xFF51..CCD; r ^= r>>33; r *= 0xC4CE..C53. Distinct from vm_fmix64_loop (single fmix application no loop), vm_xxhmix64_loop (per-byte mix one mul + post-loop fold), vm_murmurstep64_loop (single magic + xor with input each iter), vm_splitmix64_loop (different magics + constant additive step). Tests dual-magic xor-mul-xor-mul finalizer chain inside counter-bound loop body. Result: {"status":"keep","vm_sample_count":168,"total_semantic_cases":1872,"manifest_samples":199} * vm_zigzag_step64_loop passes both gates first try. ZigZag encoding chained over a stepped state: enc=(s<<1)^((i64)s>>63); r+=enc; s+=GR over n=(x&7)+1 iters. Tests ashr i64 ... 63 (sign-broadcast arithmetic right shift) inside loop body. Distinct from vm_signedbytesum64_loop (per-byte sext-i8) and vm_splitmix64_loop (no ashr). Reaches 200 manifest entries milestone. Result: {"status":"keep","vm_sample_count":169,"total_semantic_cases":1882,"manifest_samples":200} * vm_xormuladd_chain64_loop passes both gates first try. Three-op single-state chain over n=(x&7)+1 iters: r=r^x; r=r*0x1000193; r=r+x. Distinct from vm_murmurstep64_loop (xor-mul-lshr-fold; 64-bit magic), vm_fmix_chain64_loop (xor-mul-xor-mul; two 64-bit magics; no add), vm_xxhmix64_loop (xor-byte mul; post-loop fold). Tests xor + small-magic mul + add chain on single accumulator. Reaches 170 sample milestone. Result: {"status":"keep","vm_sample_count":170,"total_semantic_cases":1892,"manifest_samples":201} * vm_subxor_chain64_loop passes both gates after fixing one transcribed expected value (caught before run). Single-state sub-xor chain over n=(x&7)+1 iters: r=(r-x)^(x<<3). Distinct from vm_xormuladd_chain64_loop (xor+mul+add), vm_xorbytes64_loop (XOR-only), vm_horner64_loop (mul+add). Tests `sub i64` chained with shl-3 and xor inside dispatcher loop body. Sub is underused vs add in existing samples. Result: {"status":"keep","vm_sample_count":171,"total_semantic_cases":1902,"manifest_samples":202} * vm_negstep64_loop passes both gates first try. Two-state recurrence with arithmetic negation: r=-r+s; s=s+1 over n=(x&7)+1 iters. Distinct from vm_subxor_chain64_loop (sub state-minus-input), vm_xormuladd_chain64_loop (xor+mul+add). Tests `sub i64 0, r` (negate) pattern inside dispatcher loop. Negation flips accumulator sign per iter; with stepped state s, telescoping produces predictable patterns. Result: {"status":"keep","vm_sample_count":172,"total_semantic_cases":1912,"manifest_samples":203} * vm_bitfetch_window64_loop passes both gates first try. Bitwise reversal of low n=(x&7)+1 bits via dynamic shift `(x >> i) & 1` per iter. Tests `lshr i64 x, i` with i a loop-index variable - non-constant shift amount inside dispatcher loop body. Distinct from vm_byterev_window64_loop (8-bit fixed shift) and vm_nibrev_window64_loop (4-bit fixed shift) which use constant shifts. Result: {"status":"keep","vm_sample_count":173,"total_semantic_cases":1922,"manifest_samples":204} * vm_dynshl_pack64_loop passes both gates first try. XOR-pack 2-bit chunks of x at dynamic bit positions controlled by loop index: r ^= ((s & 0x3) << i); s >>= 2. Tests `shl i64 v, %i` (dynamic LEFT shift) - complement to vm_bitfetch_window64_loop's dynamic LSHR. Distinct shift direction with same dynamic-amount property. Result: {"status":"keep","vm_sample_count":174,"total_semantic_cases":1932,"manifest_samples":205} * vm_dyn_ashr64_loop passes both gates first try. Dynamic-amount ASHR (signed shift right) by counter: sx = (i64)x >> i; r ^= byte(sx) over n=(x&7)+1 iters. Distinct from vm_bitfetch_window64_loop (dynamic LSHR), vm_dynshl_pack64_loop (dynamic SHL), vm_zigzag_step64_loop (constant ashr-63). Completes the dynamic-shift trio (lshr/shl/ashr). Negative-sign inputs fill with 1s producing different XOR patterns than unsigned shift. Result: {"status":"keep","vm_sample_count":175,"total_semantic_cases":1942,"manifest_samples":206} * vm_bytesmul_idx64_loop passes both gates first try. Per-byte signed accumulator scaled by 1-based loop index: r += sext(byte) * (i+1) over n=(x&7)+1 iters. Distinct from vm_signedbytesum64_loop (no index multiplier) and vm_altbytesum64_loop (fixed alternating sign). Tests sext-i8 multiplied by dynamic counter value (i+1) - i64 mul against phi-tracked counter rather than constant. Result: {"status":"keep","vm_sample_count":176,"total_semantic_cases":1952,"manifest_samples":207} * vm_notand_chain64_loop passes both gates first try. NOT-AND chain with dynamic-shift xor: r=(~r)&x; r^=(i<<3) over n=(x&7)+1 iters. Tests bitwise NOT (xor i64 r, -1) followed by AND with input (BMI andn-style idiom), then xor with i<<3 (dynamic shl by counter). Result: {"status":"keep","vm_sample_count":177,"total_semantic_cases":1962,"manifest_samples":208} * vm_xormul_byte_idx64_loop passes both gates first try. XOR-fold scaled bytes: r ^= byte * (i+1) over n=(x&7)+1 iters. Distinct from vm_bytesmul_idx64_loop (signed-byte sext + ADD) - this one uses unsigned-byte zext + XOR. Tests u8 zext multiply by dynamic counter (i+1) folded via XOR rather than ADD. Result: {"status":"keep","vm_sample_count":178,"total_semantic_cases":1972,"manifest_samples":209} * vm_signedxor_byte_idx64_loop passes both gates first try. Signed-byte sext * (i+1) folded via XOR over n=(x&7)+1 iters. Fills the sext+XOR cell of the per-byte * counter matrix. Distinct from vm_xormul_byte_idx64_loop (zext + XOR) and vm_bytesmul_idx64_loop (sext + ADD). For high-bit-set bytes, sext populates upper 56 bits with 1s producing different XOR fold than zext (e.g. 0xF0 byte -> 2^64-16 vs unsigned 240). Result: {"status":"keep","vm_sample_count":179,"total_semantic_cases":1982,"manifest_samples":210} * vm_uintadd_byte_idx64_loop passes both gates first try. Unsigned-byte (zext) * (i+1) folded via ADD over n=(x&7)+1 iters. Fills the zext+ADD cell, COMPLETING the per-byte * counter matrix across all four (zext/sext) x (ADD/XOR) cells. Reaches 180-sample milestone. Result: {"status":"keep","vm_sample_count":180,"total_semantic_cases":1992,"manifest_samples":211} * vm_bytesq_sum64_loop passes both gates first try. Sum of byte*byte (u8 self-multiply) over n=(x&7)+1 iters. Distinct from vm_popsq64_loop (sum of squared POPCOUNTS), vm_squareadd64_loop (single-state r*r quadratic), vm_uintadd_byte_idx64_loop (byte * counter). Tests u8 self-multiply on the byte stream with no counter scaling. Result: {"status":"keep","vm_sample_count":181,"total_semantic_cases":2002,"manifest_samples":212} * vm_byteprod64_loop passes both gates first try. Running product of bytes r *= byte over n=(x&7)+1 iters, seeded r=1. Distinct from vm_bytesq_sum64_loop (squared bytes summed), vm_uintadd_byte_idx64_loop (byte * counter ADD), vm_xormul_byte_idx64_loop (byte * counter XOR). Tests `mul i64 r, byte` chained where any zero byte collapses the product but the loop still runs to completion. Result: {"status":"keep","vm_sample_count":182,"total_semantic_cases":2012,"manifest_samples":213} * vm_andsum_byte_idx64_loop passes both gates first try. Per-iter byte AND-ed with counter, summed: r += (byte & (i+1)) over n=(x&7)+1 iters. Distinct from vm_uintadd_byte_idx64_loop (byte * counter ADD), vm_xormul_byte_idx64_loop (byte * counter XOR). Tests `and i64 byte, counter` (zext-byte AND with phi-tracked i+1) folded via ADD - bitwise mask interaction with dynamic counter values. Result: {"status":"keep","vm_sample_count":183,"total_semantic_cases":2022,"manifest_samples":214} * vm_orsum_byte_idx64_loop passes both gates first try. Per-iter OR of byte and counter folded into accumulator: r |= byte | (i+1) over n=(x&7)+1 iters. Distinct from vm_andsum_byte_idx64_loop (AND fold), vm_xormul_byte_idx64_loop (XOR of byte*counter), vm_uintadd_byte_idx64_loop (ADD of byte*counter). Tests `or i64` chain that is monotone (only sets bits) - counter values 1..8 always contribute fixed low bits. Result: {"status":"keep","vm_sample_count":184,"total_semantic_cases":2032,"manifest_samples":215} * vm_subbyte_idx64_loop passes both gates first try. SUB-fold of u8 zext * counter: r -= byte * (i+1) over n=(x&7)+1 iters. Distinct from vm_uintadd_byte_idx64_loop (same body ADD-folded) - tests SUB on the same per-byte * counter accumulator. Result wraps below zero into u64 modular space. Result: {"status":"keep","vm_sample_count":185,"total_semantic_cases":2042,"manifest_samples":216} * vm_bytediv5_sum64_loop passes both gates first try. Sum of byte/5 over n=(x&7)+1 iters. Tests udiv-by-5 chain on byte stream. Distinct from vm_adler32_64_loop (urem by 65521 prime modular), vm_trailzeros_factorial64_loop (udiv-5 on single state), vm_uintadd_byte_idx64_loop (mul not div). All-0xFF: 8 * (255/5)=408. Result: {"status":"keep","vm_sample_count":186,"total_semantic_cases":2052,"manifest_samples":217} * vm_bytemod3_sum64_loop passes both gates first try. Sum of byte%3 over n=(x&7)+1 iters. Tests urem-by-3 chain on byte stream. Distinct from vm_bytediv5_sum64_loop (udiv-by-5) and vm_adler32_64_loop (urem-by-65521 prime). Small-modulus complement to /5 sample. All-0xFF: 255%3=0, sum=0. Result: {"status":"keep","vm_sample_count":187,"total_semantic_cases":2062,"manifest_samples":218} * vm_byteshl3_xor64_loop passes both gates first try. XOR-pack bytes at dynamic positions controlled by `i*3` over n=(x&7)+1 iters. Tests `shl i64 byte, %i*3` (dynamic shl by NON-trivial counter expression - mul-then-shl). Distinct from vm_dynshl_pack64_loop (shl by i directly, 2-bit chunks). Result: {"status":"keep","vm_sample_count":188,"total_semantic_cases":2072,"manifest_samples":219} * vm_byteshl_data64_loop passes both gates first try. Data-dependent shl: r=(r << (b&7)) | (b>>4) over n=(x&7)+1 iters. Tests `shl i64 r, %byte_amount` where shift amount is derived from the BYTE STREAM rather than loop counter. Distinct from vm_dynshl_pack64_loop (shl by i) and vm_byteshl3_xor64_loop (shl by i*3 - counter expression). Result: {"status":"keep","vm_sample_count":189,"total_semantic_cases":2082,"manifest_samples":220} * vm_data_lshr64_loop passes both gates first try. Data-dependent right shift counterpart to vm_byteshl_data64_loop: r=(r >> (b&7)) ^ b over n=(x&7)+1 iters. Tests `lshr i64 r, %byte_amount` (right-shift by byte-derived amount). Initial r=~0 with all-1s shifts down by data-driven amounts. Reaches 190 sample milestone. Result: {"status":"keep","vm_sample_count":190,"total_semantic_cases":2092,"manifest_samples":221} * vm_data_ashr64_loop passes both gates first try. Data-dependent ashr counterpart: r=(i64 r >> (b&7)) + b over n=(x&7)+1 iters. Tests `ashr i64 r, %byte_amount` (signed right-shift by byte-derived amount). Completes the data-dependent shift trio (shl/lshr/ashr) - distinct from vm_dyn_ashr64_loop (ashr by counter not byte data). Result: {"status":"keep","vm_sample_count":191,"total_semantic_cases":2102,"manifest_samples":222} * vm_mul3byte_chain64_loop passes both gates first try. Horner-style hash with multiplier 3: r = r*3 + byte over n=(x&7)+1 iters. Distinct from vm_djb264_loop (*33), vm_fnv1a64_loop (FNV prime), vm_horner64_loop (general polynomial). Tests `mul i64 r, 3` (small-constant multiplier - non-power-of-2 coefficient that lifter typically keeps as raw mul rather than lea-by-3 fold). Result: {"status":"keep","vm_sample_count":192,"total_semantic_cases":2112,"manifest_samples":223} * vm_shiftin_top64_loop passes both gates first try. Shift register filled from the top: r=(r>>8)|(byte<<56) over n=(x&7)+1 iters. Tests `lshr i64 r, 8 | shl i64 byte, 56` shift-register update pattern. Distinct from vm_byterev_window64_loop (shl-or pack from low end). After n=8 iters, all-FF input is preserved (palindrome invariant). Result: {"status":"keep","vm_sample_count":193,"total_semantic_cases":2122,"manifest_samples":224} * vm_orxor_pair64_loop passes both gates first try. Two-state cross-feed with explicit temp barrier: t=a; a=a|b; b=t^(b*7) over n=(x&7)+1 iters. Combines monotone OR fold on a with non-monotone XOR-mul evolution on b. Distinct from vm_pairmix64_loop (add+mul-by-GR cross-feed), vm_threestate_xormul64_loop (three states), vm_orsum_byte_idx64_loop (single-state OR fold). Result: {"status":"keep","vm_sample_count":194,"total_semantic_cases":2132,"manifest_samples":225} * vm_lcg_ansi_chain64_loop passes both gates first try. Classic ANSI C rand() LCG chained over n=(x&7)+1 iters: r = r*1103515245 + 12345. Distinct from vm_xorrot64_loop (LCG with golden-ratio + xor accum), vm_pcg64_loop, vm_xorshift64_loop. Single-state LCG with canonical multiplier+increment pair. Result: {"status":"keep","vm_sample_count":195,"total_semantic_cases":2142,"manifest_samples":226} * vm_bytesq_idx_sum64_loop passes both gates first try. Sum of byte * (i+1) * (i+1) - SQUARED counter expression as multiplier. Two sequential muls per iter (counter*counter then byte*counter^2). Distinct from vm_uintadd_byte_idx64_loop (linear counter) and vm_bytesq_sum64_loop (byte self-multiply, no counter). All-0xFF: 0xFF*204=52020. Result: {"status":"keep","vm_sample_count":196,"total_semantic_cases":2152,"manifest_samples":227} * vm_dynshl_accum_byte64_loop passes both gates first try. Shift accumulator left by (i+1) then add byte: r=(r<<(i+1))+byte over n=(x&7)+1 iters. Tests `shl i64 %r, %(i+1)` (shift ACCUMULATOR by phi-tracked counter rather than the byte). Distinct from vm_dynshl_pack64_loop (shl byte by counter) and vm_byteshl_data64_loop (data-dependent shl on accumulator). Result: {"status":"keep","vm_sample_count":197,"total_semantic_cases":2162,"manifest_samples":228} * vm_dynlshr_accum_byte64_loop passes both gates after recovering from aborted previous turn (file was on disk, manifest entry missing). Shifts r right by (i+1) bits then XORs the byte: r=(r>>(i+1))^byte over n=(x&7)+1 iters with r seeded ~0. Tests `lshr i64 %r, %(i+1)` (lshr accumulator by phi-tracked counter expression). Distinct from vm_dynshl_accum_byte64_loop (shl direction) and vm_data_lshr64_loop (lshr by byte data not counter). Result: {"status":"keep","vm_sample_count":198,"total_semantic_cases":2172,"manifest_samples":229} * vm_dynashr_accum_byte64_loop passes both gates first try. ASHR accumulator by counter then add byte: r=(i64 r >> (i+1)) + byte over n=(x&7)+1 iters. Tests `ashr i64 %r, %(i+1)` (signed right-shift accumulator by phi-tracked counter). Completes the counter-driven accumulator-shift trio (shl/lshr/ashr). Result: {"status":"keep","vm_sample_count":199,"total_semantic_cases":2182,"manifest_samples":230} * vm_xormulself_byte64_loop passes both gates first try. Self-referential multiply: r ^= byte * (r+1) over n=(x&7)+1 iters. Tests `mul i64 byte, (r+1)` where multiplier operand is the accumulator+1 - r appears on both sides of the body. Distinct from vm_xormul_byte_idx64_loop (byte * counter) and vm_squareadd64_loop (r*r self-multiply on full state). Reaches 200-sample milestone. Result: {"status":"keep","vm_sample_count":200,"total_semantic_cases":2192,"manifest_samples":231} * vm_xor_shifted_self_byte64_loop passes both gates first try. Self-shift used as XOR mask combined with byte at MSB: r ^= (r>>8) | (byte<<56) over n=(x&7)+1 iters. Distinct from vm_shiftin_top64_loop (assigns same expression, no XOR), vm_xormulself_byte64_loop (mul-self with byte), vm_byterev_window64_loop (no XOR). Result: {"status":"keep","vm_sample_count":201,"total_semantic_cases":2202,"manifest_samples":232} * vm_pair_xormul_byte64_loop passes both gates first try. Per-iter pair (b0,b1) combined as (b0^b1) * (b0+b1) over n=(x&3)+1 iters. Tests TWO byte reads per iteration with XOR + ADD + MUL combination. Trip uses `& 3` so loop consumes 2 bytes per iter (1..4 pair iters). Distinct from all single-byte-per-iter samples. Result: {"status":"keep","vm_sample_count":202,"total_semantic_cases":2212,"manifest_samples":233} * vm_quad_byte_xor64_loop passes both gates first try. FOUR byte reads per iteration combined via 3 chained XORs then ADD-folded over n=(x&1)+1 iters (32-bit stride). Distinct from vm_pair_xormul_byte64_loop (2 bytes per iter) and all single-byte samples. Tests wider stride consumption and multi-byte body shape. Result: {"status":"keep","vm_sample_count":203,"total_semantic_cases":2222,"manifest_samples":234} * vm_word_xormul64_loop passes both gates first try. u16 word per iter (16-bit stride): r ^= w*w over n=(x&3)+1 iters. Tests u16 zext-i16 self-multiply XOR-folded. Distinct from vm_bytesq_sum64_loop (8-bit stride, ADD) and vm_pair_xormul_byte64_loop (16-bit stride but byte ops). Result: {"status":"keep","vm_sample_count":204,"total_semantic_cases":2232,"manifest_samples":235} * vm_word_horner13_64_loop passes both gates first try. Horner-style hash on u16 words with multiplier 13: r = r*13 + w over n=(x&3)+1 iters. Distinct from vm_mul3byte_chain64_loop (Horner on bytes mul 3), vm_djb264_loop (bytes mul 33), vm_word_xormul64_loop (word self-multiply XOR). Wider stride + different multiplier than existing byte-Horner samples. Result: {"status":"keep","vm_sample_count":205,"total_semantic_cases":2242,"manifest_samples":236} * vm_dword_xormul64_loop passes both gates first try. u32 dword per iter (32-bit stride) with golden-ratio prime mul XOR-folded: r ^= dword * 0x9E3779B9 over n=(x&1)+1 iters. Distinct from vm_word_xormul64_loop (16-bit stride) and vm_quad_byte_xor64_loop (4 bytes per iter, no mul). Tests u32 zext-i32 mask + 32-bit-magic multiply. Result: {"status":"keep","vm_sample_count":206,"total_semantic_cases":2252,"manifest_samples":237} * vm_signed_dword_sum64_loop passes both gates first try. Sum of sext-i32 dwords per iter over n=(x&1)+1 iters. Tests `sext i32 to i64` chain on 32-bit dword stream. Distinct from vm_signedbytesum64_loop (sext-i8 byte, 8-bit stride) and vm_dword_xormul64_loop (zext dword XOR, no sign extension). Result: {"status":"keep","vm_sample_count":207,"total_semantic_cases":2262,"manifest_samples":238} * vm_signed_word_sum64_loop passes both gates first try. Sum of sext-i16 words per iter over n=(x&3)+1 iters. Tests `sext i16 to i64` chain on 16-bit word stream. Fills the i16 middle width and completes the sext-width trio (i8/i16/i32 -> i64). Result: {"status":"keep","vm_sample_count":208,"total_semantic_cases":2272,"manifest_samples":239} * vm_word_range64_loop passes both gates after restructuring to n-decrement (4 slots: n,s,mn,mx). Tests u16 cmp-driven reductions at 16-bit stride: mx=umax(w,mx); mn=umin(w,mn); return mx-mn over n=(x&3)+1 iters. Lifter folds both reductions to llvm.umax.i64 + llvm.umin.i64. Documents new lifter limitation: 5-slot variant (with separate i counter) trips pseudo-stack init failure; 4-slot form works. Result: {"status":"keep","vm_sample_count":209,"total_semantic_cases":2282,"manifest_samples":240} * vm_signed_word_range64_loop passes both gates first try. Signed-i16 min/max range at word stride: tracks mx,mn over n=(x&3)+1 iters then returns mx-mn. Distinct from vm_word_range64_loop (unsigned -> umax/umin folds) and vm_signed_byterange64_loop (i8 stride). Per documented asymmetry, signed cmp+select stays raw icmp slt + select. Reaches 210-sample milestone. Result: {"status":"keep","vm_sample_count":210,"total_semantic_cases":2292,"manifest_samples":241} * Add equivalence reporting tool for rewrite_smoke samples * vm_dword_range64_loop passes both gates first try. u32 dword min/max range over n=(x&1)+1 iters. Tests umax/umin folds at 32-bit dword stride. Distinct from vm_byterange64_loop (8-bit) and vm_word_range64_loop (16-bit). Extends range coverage to all four widths (u8/u16/u32 + signed counterparts). Result: {"status":"keep","vm_sample_count":211,"total_semantic_cases":2302,"manifest_samples":242} * Generate per-sample original-vs-lifted equivalence reports for rewrite_smoke * vm_signed_dword_range64_loop passes both gates first try. Signed-i32 dword min/max range over n=(x&1)+1 iters. Tests sext-i32 + signed cmp+select reductions at 32-bit stride. Completes the range coverage matrix (3 widths x 2 signs). Per documented signed-cmp asymmetry, signed cmp+select stays raw icmp slt + select. Result: {"status":"keep","vm_sample_count":212,"total_semantic_cases":2312,"manifest_samples":243} * vm_word_orfold64_loop passes both gates first try. u16 OR-fold over n=(x&3)+1 iters. Tests `or i64` chain at 16-bit word stride. Distinct from vm_orsum_byte_idx64_loop (byte | counter, 8-bit stride). Monotone OR fold (only sets bits). Result: {"status":"keep","vm_sample_count":213,"total_semantic_cases":2322,"manifest_samples":244} * Refresh equivalence reports for current 246-sample manifest * vm_byte_andfold64_loop passes both gates. u8 AND-fold over n=(x&7)+1 bytes seeded with r=0xFF. Tests `and i64` chain at byte stride - monotone DECREASING accumulator counterpart to OR-fold. Distinct from vm_andsum_byte_idx64_loop (byte AND counter, ADD-folded). Result: {"status":"keep","vm_sample_count":214,"total_semantic_cases":2332,"manifest_samples":245} --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> Co-authored-by: Yusuf <yusuf@local>
2026-05-12 09:40:34 +00:00 · 2026-04-25 19:56:16 +03:00
parent c1ca564305
commit 9c32ecd235
465 changed files with 43605 additions and 0 deletions
@@ -0,0 +1,85 @@
+# Autoresearch
+
+## Goal
+- Populate more rewrite-smoke test cases that exercise VM-shaped dispatch with
+  real loops, including custom toy VMs (register machines, nested loops in PC
+  state, conditional branches inside VM loop bodies). Each new sample must be
+  fully wired into the manifest with a symbol, IR pattern set, and at least
+  six semantic test cases covering edge inputs.
+
+## Benchmark
+- command: bash autoresearch.sh
+- primary metric: vm_sample_count
+- metric unit: count
+- direction: higher
+- secondary metrics: total_semantic_cases, manifest_samples
+
+## Files in Scope
+- testcases/rewrite_smoke/
+- scripts/rewrite/instruction_microtests.json
+- scripts/rewrite/generate_semantic_reports.py
+- docs/semantic_reports/
+
+## Off Limits
+- lifter/
+- scripts/dev/
+- scripts/rewrite/run.cmd
+- scripts/rewrite/run.ps1
+- scripts/rewrite/verify.ps1
+- scripts/rewrite/build_samples.cmd
+- scripts/rewrite/manifest_validation.ps1
+- scripts/rewrite/check_semantic.py
+- test.py
+
+## Constraints
+- Every file in `testcases/rewrite_smoke/` MUST have exactly one matching
+  manifest entry in `scripts/rewrite/instruction_microtests.json` (manifest
+  validation enforces this on `python test.py baseline`).
+- Manifest entries MUST include `name`, `symbol`, `patterns` (non-empty),
+  and `semantic` with concrete inputs and expected return values.
+- New VM samples MUST keep their dispatcher in `__declspec(noinline)` and
+  use symbolic input-derived loop bounds so the lifter cannot constant-fold
+  the loop away.
+- Samples MUST be lli-executable: avoid bytecode-array memory loads outside
+  the function stack and avoid platform-specific intrinsics.
+- DO NOT modify the lifter, build pipeline, or verification scripts.
+
+## Preflight
+- `clang-cl` and `nasm` resolution is handled by `scripts/rewrite/build_samples.cmd`.
+- Full lifter regression requires `python test.py baseline`, which wipes
+  `build_iced/`. Treat that as expensive and run only when validating a
+  batch of new samples end-to-end.
+- The autoresearch metric is cheap (manifest stats only); end-to-end
+  lifter/lli verification is a separate, manual gate.
+
+## Comparability invariant
+- Metric is computed by parsing `scripts/rewrite/instruction_microtests.json`
+  with a fixed Python snippet inside `autoresearch.sh`. Do not change the
+  parser or the manifest schema between runs without re-initializing the
+  segment.
+
+## Baseline
+- metric:
+- notes:
+
+## Current best
+- metric:
+- why it won:
+
+## What's Been Tried
+- experiment: vm_callret_loop with explicit return-PC stack (rstack[rsp])
+  lesson: dispatcher reads next pc from a stack array; lifter cannot generalize the indirect dispatch and trips diagnostic 503 (basic-block budget exceeded, ~4087 blocks). Sample removed; revisit when loop generalization handles stack-indexed pc.
+- experiment: vm_subroutine_loop with single-int rpc slot (one-deep call/ret)
+  lesson: even a single non-indexed `pc = rpc` indirect dispatch crashes the lifter (access violation, exit 0xC0000005) when invoked through PowerShell. The ret-to-stack-loaded-pc pattern is fundamentally unsupported regardless of stack depth. Removed.
+- experiment: vm_bubblesort_loop with adjacent compare-and-swap on a stack array
+  lesson: even a single bubble pass (loop body conditionally writes TWO indexed stack-array slots) trips diagnostic 503 (BB budget exceeded). The lifter enumerates the swap-vs-no-swap path across every iteration. Comparison-driven update of a single accumulator (vm_minarray_loop) is fine; two-slot conditional writes inside a loop are not. Sample removed.
+- experiment: vm_tea_round_loop with TEA-style compound multi-state cross-update
+  lesson: lifter generates IR with the right shape (peeled loop with phi nodes) and pattern verification passes, but the lifted IR computes the WRONG value for some symbolic inputs (e.g. x=0x65501 with n=6: native+python both produce 37119, lifter returns a different value). Real lifter correctness bug for compound v0/v1 cross-update bodies. Sample removed.
+- experiment: vm_switch_dispatch_loop using `switch` for dispatch
+  lesson: lifter collapsed the switch-dispatched VM to a constant -1 return; same class of limitation. Removed.
+- experiment: end-to-end rewrite regression via run_experiment
+  lesson: harness env sets CI=1 and LLVM_DIR points at an install without bundled clang-cl, so build_samples.cmd refuses host fallback. Must pin CLANG_CL_EXE explicitly.
+- experiment: speculative IR patterns vs lifter-observed shapes
+  lesson: 13/18 first-pass VM patterns missed because the lifter heavily compresses dispatchers (if-else -> switch i32, fixed-trip loops unrolled or recognized as intrinsics like llvm.bitreverse.i8, triangular sums closed-form-solved into mul i33 + lshr i33). Patterns must be derived from lifted IR, not from source-level shape.
+- experiment: lli semantic check found undef for empty-loop inputs (limit=0) in branchy/collatz
+  lesson: lifter pseudo-stack promotion drops the entry-block init when the same slot is also written inside a dispatcher state. Fix is the dual_counter pattern: keep an explicit init dispatcher state on the entry-to-halt path. branchy needed `i=0; count=0;` inside BV_LOAD_LIMIT to thread `[ 0, %entry ]` through the loop phi instead of `[ undef, %entry ]`.
@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+# Cheap manifest-stats harness for the VM-sample population task.
+#
+# Primary metric: number of VM-shaped samples in instruction_microtests.json.
+# A "VM-shaped" sample has "vm" (case-insensitive) in `name` and non-empty
+# `patterns` and `semantic` lists. This rewards fully-wired samples, not stubs.
+set -euo pipefail
+
+cd "$(dirname "$0")"
+
+# Generate metrics via PowerShell: natively on PATH, no stdout plumbing issues
+# when run under bash on Windows.
+powershell.exe -NoProfile -ExecutionPolicy Bypass -Command "
+  \$ErrorActionPreference = 'Stop';
+  \$raw = Get-Content -Raw -LiteralPath 'scripts/rewrite/instruction_microtests.json';
+  \$data = \$raw | ConvertFrom-Json;
+  \$samples = @(\$data.samples);
+  \$vm = 0;
+  \$totalSem = 0;
+  foreach (\$s in \$samples) {
+    \$name = ''; if (\$s.name) { \$name = [string]\$s.name };
+    \$patterns = @(); if (\$s.patterns) { \$patterns = @(\$s.patterns) };
+    \$semantic = @(); if (\$s.semantic) { \$semantic = @(\$s.semantic) };
+    \$totalSem += \$semantic.Count;
+    if (\$name.ToLower().Contains('vm') -and \$patterns.Count -gt 0 -and \$semantic.Count -gt 0) {
+      \$vm += 1;
+    }
+  };
+  Write-Output (\"METRIC vm_sample_count=\$vm\");
+  Write-Output (\"METRIC total_semantic_cases=\$totalSem\");
+  Write-Output (\"METRIC manifest_samples=\$(\$samples.Count)\");
+"
@@ -0,0 +1,263 @@
+# Equivalence reports (original vs lifted)
+
+Each report compares the **native binary** built from `testcases/rewrite_smoke/<name>` (linked through a small driver that calls the target symbol directly) against the **lifted+optimized LLVM IR** in `rewrite-regression-work/ir_outputs/<name>.ll` (executed via LLVM `lli`) on the manifest-declared input cases.
+
+- **Samples:** 244/246 equivalent across all cases, 0 failing, 2 with no semantic cases
+- **Cases:** 2332/2332 equivalent overall
+
+Regenerate after a re-lift:
+
+```
+set CLANG_CL_EXE=C:\Program Files\LLVM\bin\clang-cl.exe
+scripts\rewrite\run.cmd
+python scripts\rewrite\generate_equivalence_reports.py
+```
+
+| Sample | Verdict | Cases | Report |
+|--------|---------|-------|--------|
+| bitchain | PASS | 1/1 | [bitchain_report.md](bitchain_report.md) |
+| branch | PASS | 5/5 | [branch_report.md](branch_report.md) |
+| bytecode_vm_loop | PASS | 6/6 | [bytecode_vm_loop_report.md](bytecode_vm_loop_report.md) |
+| calc_cout | PASS | 4/4 | [calc_cout_report.md](calc_cout_report.md) |
+| calc_fib | PASS | 1/1 | [calc_fib_report.md](calc_fib_report.md) |
+| calc_grade | PASS | 11/11 | [calc_grade_report.md](calc_grade_report.md) |
+| calc_jumptable_large | PASS | 10/10 | [calc_jumptable_large_report.md](calc_jumptable_large_report.md) |
+| calc_jumptable | PASS | 12/12 | [calc_jumptable_report.md](calc_jumptable_report.md) |
+| calc_mixed | PASS | 7/7 | [calc_mixed_report.md](calc_mixed_report.md) |
+| calc_sum_array | PASS | 1/1 | [calc_sum_array_report.md](calc_sum_array_report.md) |
+| calc_sum_to_n | PASS | 6/6 | [calc_sum_to_n_report.md](calc_sum_to_n_report.md) |
+| calc_switch | PASS | 8/8 | [calc_switch_report.md](calc_switch_report.md) |
+| cmov_chain | PASS | 5/5 | [cmov_chain_report.md](cmov_chain_report.md) |
+| diamond | PASS | 8/8 | [diamond_report.md](diamond_report.md) |
+| dummy_vm_loop | PASS | 6/6 | [dummy_vm_loop_report.md](dummy_vm_loop_report.md) |
+| indirect | PASS | 1/1 | [indirect_report.md](indirect_report.md) |
+| instr_add | PASS | 1/1 | [instr_add_report.md](instr_add_report.md) |
+| instr_rol | PASS | 1/1 | [instr_rol_report.md](instr_rol_report.md) |
+| instr_sub | PASS | 1/1 | [instr_sub_report.md](instr_sub_report.md) |
+| instr_xor | PASS | 1/1 | [instr_xor_report.md](instr_xor_report.md) |
+| jumptable_basic | PASS | 6/6 | [jumptable_basic_report.md](jumptable_basic_report.md) |
+| jumptable_computation | PASS | 7/7 | [jumptable_computation_report.md](jumptable_computation_report.md) |
+| jumptable_dense | PASS | 10/10 | [jumptable_dense_report.md](jumptable_dense_report.md) |
+| jumptable_rel32 | PASS | 7/7 | [jumptable_rel32_report.md](jumptable_rel32_report.md) |
+| jumptable_shared_targets | PASS | 8/8 | [jumptable_shared_targets_report.md](jumptable_shared_targets_report.md) |
+| jumptable_shifted | PASS | 9/9 | [jumptable_shifted_report.md](jumptable_shifted_report.md) |
+| loop_simple | PASS | 1/1 | [loop_simple_report.md](loop_simple_report.md) |
+| multi_arg | PASS | 5/5 | [multi_arg_report.md](multi_arg_report.md) |
+| nested_branch | PASS | 8/8 | [nested_branch_report.md](nested_branch_report.md) |
+| stack | PASS | 1/1 | [stack_report.md](stack_report.md) |
+| stack_vm_loop | PASS | 6/6 | [stack_vm_loop_report.md](stack_vm_loop_report.md) |
+| switch_3way | PASS | 6/6 | [switch_3way_report.md](switch_3way_report.md) |
+| switch_sparse | PASS | 7/7 | [switch_sparse_report.md](switch_sparse_report.md) |
+| vm_2d_loop | PASS | 10/10 | [vm_2d_loop_report.md](vm_2d_loop_report.md) |
+| vm_4state64_loop | PASS | 10/10 | [vm_4state64_loop_report.md](vm_4state64_loop_report.md) |
+| vm_4state_loop | PASS | 11/11 | [vm_4state_loop_report.md](vm_4state_loop_report.md) |
+| vm_abs64_loop | PASS | 9/9 | [vm_abs64_loop_report.md](vm_abs64_loop_report.md) |
+| vm_abs_array_loop | PASS | 11/11 | [vm_abs_array_loop_report.md](vm_abs_array_loop_report.md) |
+| vm_adler32_64_loop | PASS | 10/10 | [vm_adler32_64_loop_report.md](vm_adler32_64_loop_report.md) |
+| vm_altbytesum64_loop | PASS | 10/10 | [vm_altbytesum64_loop_report.md](vm_altbytesum64_loop_report.md) |
+| vm_andsum_byte_idx64_loop | PASS | 10/10 | [vm_andsum_byte_idx64_loop_report.md](vm_andsum_byte_idx64_loop_report.md) |
+| vm_argmax_loop | PASS | 11/11 | [vm_argmax_loop_report.md](vm_argmax_loop_report.md) |
+| vm_base7sum64_loop | PASS | 10/10 | [vm_base7sum64_loop_report.md](vm_base7sum64_loop_report.md) |
+| vm_bitfetch_window64_loop | PASS | 10/10 | [vm_bitfetch_window64_loop_report.md](vm_bitfetch_window64_loop_report.md) |
+| vm_bitreverse64_loop | PASS | 10/10 | [vm_bitreverse64_loop_report.md](vm_bitreverse64_loop_report.md) |
+| vm_bitreverse_loop | PASS | 10/10 | [vm_bitreverse_loop_report.md](vm_bitreverse_loop_report.md) |
+| vm_bittransitions_loop | PASS | 11/11 | [vm_bittransitions_loop_report.md](vm_bittransitions_loop_report.md) |
+| vm_branchy_loop | PASS | 8/8 | [vm_branchy_loop_report.md](vm_branchy_loop_report.md) |
+| vm_bswap64_loop | PASS | 10/10 | [vm_bswap64_loop_report.md](vm_bswap64_loop_report.md) |
+| vm_byte_andfold64_loop | PASS | 10/10 | [vm_byte_andfold64_loop_report.md](vm_byte_andfold64_loop_report.md) |
+| vm_byte_buffer_loop | PASS | 10/10 | [vm_byte_buffer_loop_report.md](vm_byte_buffer_loop_report.md) |
+| vm_byte_loop | PASS | 10/10 | [vm_byte_loop_report.md](vm_byte_loop_report.md) |
+| vm_bytecyc64_loop | PASS | 10/10 | [vm_bytecyc64_loop_report.md](vm_bytecyc64_loop_report.md) |
+| vm_bytediv5_sum64_loop | PASS | 10/10 | [vm_bytediv5_sum64_loop_report.md](vm_bytediv5_sum64_loop_report.md) |
+| vm_bytematch64_loop | PASS | 10/10 | [vm_bytematch64_loop_report.md](vm_bytematch64_loop_report.md) |
+| vm_bytemax64_loop | PASS | 10/10 | [vm_bytemax64_loop_report.md](vm_bytemax64_loop_report.md) |
+| vm_bytemod3_sum64_loop | PASS | 10/10 | [vm_bytemod3_sum64_loop_report.md](vm_bytemod3_sum64_loop_report.md) |
+| vm_byteparity64_loop | PASS | 10/10 | [vm_byteparity64_loop_report.md](vm_byteparity64_loop_report.md) |
+| vm_byteprod64_loop | PASS | 10/10 | [vm_byteprod64_loop_report.md](vm_byteprod64_loop_report.md) |
+| vm_byterange64_loop | PASS | 10/10 | [vm_byterange64_loop_report.md](vm_byterange64_loop_report.md) |
+| vm_byterev_window64_loop | PASS | 10/10 | [vm_byterev_window64_loop_report.md](vm_byterev_window64_loop_report.md) |
+| vm_byteshl3_xor64_loop | PASS | 10/10 | [vm_byteshl3_xor64_loop_report.md](vm_byteshl3_xor64_loop_report.md) |
+| vm_byteshl_data64_loop | PASS | 10/10 | [vm_byteshl_data64_loop_report.md](vm_byteshl_data64_loop_report.md) |
+| vm_bytesmul_idx64_loop | PASS | 10/10 | [vm_bytesmul_idx64_loop_report.md](vm_bytesmul_idx64_loop_report.md) |
+| vm_bytesq_idx_sum64_loop | PASS | 10/10 | [vm_bytesq_idx_sum64_loop_report.md](vm_bytesq_idx_sum64_loop_report.md) |
+| vm_bytesq_sum64_loop | PASS | 10/10 | [vm_bytesq_sum64_loop_report.md](vm_bytesq_sum64_loop_report.md) |
+| vm_ca_loop | PASS | 12/12 | [vm_ca_loop_report.md](vm_ca_loop_report.md) |
+| vm_caesar_loop | PASS | 12/12 | [vm_caesar_loop_report.md](vm_caesar_loop_report.md) |
+| vm_carrychain_loop | PASS | 11/11 | [vm_carrychain_loop_report.md](vm_carrychain_loop_report.md) |
+| vm_choosemax64_loop | PASS | 10/10 | [vm_choosemax64_loop_report.md](vm_choosemax64_loop_report.md) |
+| vm_classify_loop | PASS | 10/10 | [vm_classify_loop_report.md](vm_classify_loop_report.md) |
+| vm_clz64_loop | PASS | 10/10 | [vm_clz64_loop_report.md](vm_clz64_loop_report.md) |
+| vm_collatz64_loop | PASS | 10/10 | [vm_collatz64_loop_report.md](vm_collatz64_loop_report.md) |
+| vm_collatz_loop | PASS | 8/8 | [vm_collatz_loop_report.md](vm_collatz_loop_report.md) |
+| vm_condsum64_loop | PASS | 10/10 | [vm_condsum64_loop_report.md](vm_condsum64_loop_report.md) |
+| vm_countdown_loop | PASS | 8/8 | [vm_countdown_loop_report.md](vm_countdown_loop_report.md) |
+| vm_crc64_loop | PASS | 10/10 | [vm_crc64_loop_report.md](vm_crc64_loop_report.md) |
+| vm_cttz64_loop | PASS | 10/10 | [vm_cttz64_loop_report.md](vm_cttz64_loop_report.md) |
+| vm_ctz_loop | PASS | 12/12 | [vm_ctz_loop_report.md](vm_ctz_loop_report.md) |
+| vm_data_ashr64_loop | PASS | 10/10 | [vm_data_ashr64_loop_report.md](vm_data_ashr64_loop_report.md) |
+| vm_data_lshr64_loop | PASS | 10/10 | [vm_data_lshr64_loop_report.md](vm_data_lshr64_loop_report.md) |
+| vm_decdigits64_loop | PASS | 10/10 | [vm_decdigits64_loop_report.md](vm_decdigits64_loop_report.md) |
+| vm_decsum64_loop | PASS | 10/10 | [vm_decsum64_loop_report.md](vm_decsum64_loop_report.md) |
+| vm_deinterleave64_loop | PASS | 10/10 | [vm_deinterleave64_loop_report.md](vm_deinterleave64_loop_report.md) |
+| vm_digitprod64_loop | PASS | 10/10 | [vm_digitprod64_loop_report.md](vm_digitprod64_loop_report.md) |
+| vm_digitsum_loop | PASS | 12/12 | [vm_digitsum_loop_report.md](vm_digitsum_loop_report.md) |
+| vm_dispatch_table_loop | PASS | 10/10 | [vm_dispatch_table_loop_report.md](vm_dispatch_table_loop_report.md) |
+| vm_divcount64_loop | PASS | 10/10 | [vm_divcount64_loop_report.md](vm_divcount64_loop_report.md) |
+| vm_djb264_loop | PASS | 10/10 | [vm_djb264_loop_report.md](vm_djb264_loop_report.md) |
+| vm_djb2_loop | PASS | 12/12 | [vm_djb2_loop_report.md](vm_djb2_loop_report.md) |
+| vm_dual_array_loop | PASS | 10/10 | [vm_dual_array_loop_report.md](vm_dual_array_loop_report.md) |
+| vm_dual_counter_loop | PASS | 8/8 | [vm_dual_counter_loop_report.md](vm_dual_counter_loop_report.md) |
+| vm_dual_i64_loop | PASS | 10/10 | [vm_dual_i64_loop_report.md](vm_dual_i64_loop_report.md) |
+| vm_dupcount_loop | PASS | 11/11 | [vm_dupcount_loop_report.md](vm_dupcount_loop_report.md) |
+| vm_dword_range64_loop | PASS | 10/10 | [vm_dword_range64_loop_report.md](vm_dword_range64_loop_report.md) |
+| vm_dword_xormul64_loop | PASS | 10/10 | [vm_dword_xormul64_loop_report.md](vm_dword_xormul64_loop_report.md) |
+| vm_dyn_ashr64_loop | PASS | 10/10 | [vm_dyn_ashr64_loop_report.md](vm_dyn_ashr64_loop_report.md) |
+| vm_dynashr_accum_byte64_loop | PASS | 10/10 | [vm_dynashr_accum_byte64_loop_report.md](vm_dynashr_accum_byte64_loop_report.md) |
+| vm_dynlshr_accum_byte64_loop | PASS | 10/10 | [vm_dynlshr_accum_byte64_loop_report.md](vm_dynlshr_accum_byte64_loop_report.md) |
+| vm_dynshl_accum_byte64_loop | PASS | 10/10 | [vm_dynshl_accum_byte64_loop_report.md](vm_dynshl_accum_byte64_loop_report.md) |
+| vm_dynshl_pack64_loop | PASS | 10/10 | [vm_dynshl_pack64_loop_report.md](vm_dynshl_pack64_loop_report.md) |
+| vm_factorial64_loop | PASS | 10/10 | [vm_factorial64_loop_report.md](vm_factorial64_loop_report.md) |
+| vm_factorial_loop | PASS | 10/10 | [vm_factorial_loop_report.md](vm_factorial_loop_report.md) |
+| vm_fibonacci64_loop | PASS | 10/10 | [vm_fibonacci64_loop_report.md](vm_fibonacci64_loop_report.md) |
+| vm_fibonacci_loop | PASS | 10/10 | [vm_fibonacci_loop_report.md](vm_fibonacci_loop_report.md) |
+| vm_find2max_loop | PASS | 11/11 | [vm_find2max_loop_report.md](vm_find2max_loop_report.md) |
+| vm_fmix64_loop | PASS | 10/10 | [vm_fmix64_loop_report.md](vm_fmix64_loop_report.md) |
+| vm_fmix_chain64_loop | PASS | 10/10 | [vm_fmix_chain64_loop_report.md](vm_fmix_chain64_loop_report.md) |
+| vm_fnv1a64_loop | PASS | 10/10 | [vm_fnv1a64_loop_report.md](vm_fnv1a64_loop_report.md) |
+| vm_four_input_loop | PASS | 10/10 | [vm_four_input_loop_report.md](vm_four_input_loop_report.md) |
+| vm_gcd64_loop | PASS | 10/10 | [vm_gcd64_loop_report.md](vm_gcd64_loop_report.md) |
+| vm_gcd_loop | PASS | 8/8 | [vm_gcd_loop_report.md](vm_gcd_loop_report.md) |
+| vm_geometric_loop | PASS | 10/10 | [vm_geometric_loop_report.md](vm_geometric_loop_report.md) |
+| vm_geosum64_loop | PASS | 10/10 | [vm_geosum64_loop_report.md](vm_geosum64_loop_report.md) |
+| vm_hamming_loop | PASS | 10/10 | [vm_hamming_loop_report.md](vm_hamming_loop_report.md) |
+| vm_hexcount_loop | PASS | 12/12 | [vm_hexcount_loop_report.md](vm_hexcount_loop_report.md) |
+| vm_hexdigits64_loop | PASS | 10/10 | [vm_hexdigits64_loop_report.md](vm_hexdigits64_loop_report.md) |
+| vm_horner64_loop | PASS | 10/10 | [vm_horner64_loop_report.md](vm_horner64_loop_report.md) |
+| vm_horner_signed_loop | PASS | 10/10 | [vm_horner_signed_loop_report.md](vm_horner_signed_loop_report.md) |
+| vm_i64_return_loop | PASS | 10/10 | [vm_i64_return_loop_report.md](vm_i64_return_loop_report.md) |
+| vm_imported_abs_loop | PASS | 10/10 | [vm_imported_abs_loop_report.md](vm_imported_abs_loop_report.md) |
+| vm_imported_bsf_loop | PASS | 12/12 | [vm_imported_bsf_loop_report.md](vm_imported_bsf_loop_report.md) |
+| vm_imported_bsr_loop | PASS | 12/12 | [vm_imported_bsr_loop_report.md](vm_imported_bsr_loop_report.md) |
+| vm_imported_bswap_loop | PASS | 11/11 | [vm_imported_bswap_loop_report.md](vm_imported_bswap_loop_report.md) |
+| vm_imported_clz_loop | PASS | 10/10 | [vm_imported_clz_loop_report.md](vm_imported_clz_loop_report.md) |
+| vm_imported_cttz_loop | PASS | 11/11 | [vm_imported_cttz_loop_report.md](vm_imported_cttz_loop_report.md) |
+| vm_imported_popcnt_loop | PASS | 10/10 | [vm_imported_popcnt_loop_report.md](vm_imported_popcnt_loop_report.md) |
+| vm_imported_rotl_loop | PASS | 10/10 | [vm_imported_rotl_loop_report.md](vm_imported_rotl_loop_report.md) |
+| vm_int64_loop | PASS | 10/10 | [vm_int64_loop_report.md](vm_int64_loop_report.md) |
+| vm_ipow64_loop | PASS | 10/10 | [vm_ipow64_loop_report.md](vm_ipow64_loop_report.md) |
+| vm_isqrt64_loop | PASS | 10/10 | [vm_isqrt64_loop_report.md](vm_isqrt64_loop_report.md) |
+| vm_isqrt_loop | PASS | 15/15 | [vm_isqrt_loop_report.md](vm_isqrt_loop_report.md) |
+| vm_kernighan_loop | PASS | 12/12 | [vm_kernighan_loop_report.md](vm_kernighan_loop_report.md) |
+| vm_lcg_ansi_chain64_loop | PASS | 10/10 | [vm_lcg_ansi_chain64_loop_report.md](vm_lcg_ansi_chain64_loop_report.md) |
+| vm_lcg_loop | PASS | 10/10 | [vm_lcg_loop_report.md](vm_lcg_loop_report.md) |
+| vm_lfsr64_loop | PASS | 10/10 | [vm_lfsr64_loop_report.md](vm_lfsr64_loop_report.md) |
+| vm_lfsr_loop | PASS | 10/10 | [vm_lfsr_loop_report.md](vm_lfsr_loop_report.md) |
+| vm_maxrun64_loop | PASS | 10/10 | [vm_maxrun64_loop_report.md](vm_maxrun64_loop_report.md) |
+| vm_minabs_loop | PASS | 11/11 | [vm_minabs_loop_report.md](vm_minabs_loop_report.md) |
+| vm_minarray_loop | PASS | 12/12 | [vm_minarray_loop_report.md](vm_minarray_loop_report.md) |
+| vm_mixed_args_loop | PASS | 10/10 | [vm_mixed_args_loop_report.md](vm_mixed_args_loop_report.md) |
+| vm_mixed_intrinsics_loop | PASS | 11/11 | [vm_mixed_intrinsics_loop_report.md](vm_mixed_intrinsics_loop_report.md) |
+| vm_mixed_width_array_loop | PASS | 12/12 | [vm_mixed_width_array_loop_report.md](vm_mixed_width_array_loop_report.md) |
+| vm_modcounter_loop | PASS | 11/11 | [vm_modcounter_loop_report.md](vm_modcounter_loop_report.md) |
+| vm_morton64_loop | PASS | 10/10 | [vm_morton64_loop_report.md](vm_morton64_loop_report.md) |
+| vm_mul3byte_chain64_loop | PASS | 10/10 | [vm_mul3byte_chain64_loop_report.md](vm_mul3byte_chain64_loop_report.md) |
+| vm_murmurstep64_loop | PASS | 10/10 | [vm_murmurstep64_loop_report.md](vm_murmurstep64_loop_report.md) |
+| vm_negstep64_loop | PASS | 10/10 | [vm_negstep64_loop_report.md](vm_negstep64_loop_report.md) |
+| vm_nested64_loop | PASS | 10/10 | [vm_nested64_loop_report.md](vm_nested64_loop_report.md) |
+| vm_nested_abs_loop | PASS | 11/11 | [vm_nested_abs_loop_report.md](vm_nested_abs_loop_report.md) |
+| vm_nested_loop | PASS | 10/10 | [vm_nested_loop_report.md](vm_nested_loop_report.md) |
+| vm_nibrev64_loop | PASS | 10/10 | [vm_nibrev64_loop_report.md](vm_nibrev64_loop_report.md) |
+| vm_nibrev_window64_loop | PASS | 10/10 | [vm_nibrev_window64_loop_report.md](vm_nibrev_window64_loop_report.md) |
+| vm_notand_chain64_loop | PASS | 10/10 | [vm_notand_chain64_loop_report.md](vm_notand_chain64_loop_report.md) |
+| vm_oddcount64_loop | PASS | 10/10 | [vm_oddcount64_loop_report.md](vm_oddcount64_loop_report.md) |
+| vm_op8way64_loop | PASS | 10/10 | [vm_op8way64_loop_report.md](vm_op8way64_loop_report.md) |
+| vm_opcode64_loop | PASS | 10/10 | [vm_opcode64_loop_report.md](vm_opcode64_loop_report.md) |
+| vm_orsum_byte_idx64_loop | PASS | 10/10 | [vm_orsum_byte_idx64_loop_report.md](vm_orsum_byte_idx64_loop_report.md) |
+| vm_orxor_pair64_loop | PASS | 10/10 | [vm_orxor_pair64_loop_report.md](vm_orxor_pair64_loop_report.md) |
+| vm_outlined_wrapper_loop | **NA** | 0/0 | [vm_outlined_wrapper_loop_report.md](vm_outlined_wrapper_loop_report.md) |
+| vm_pair_xormul_byte64_loop | PASS | 10/10 | [vm_pair_xormul_byte64_loop_report.md](vm_pair_xormul_byte64_loop_report.md) |
+| vm_pairmix64_loop | PASS | 10/10 | [vm_pairmix64_loop_report.md](vm_pairmix64_loop_report.md) |
+| vm_palindrome_loop | PASS | 14/14 | [vm_palindrome_loop_report.md](vm_palindrome_loop_report.md) |
+| vm_pcg64_loop | PASS | 10/10 | [vm_pcg64_loop_report.md](vm_pcg64_loop_report.md) |
+| vm_pcg_loop | PASS | 12/12 | [vm_pcg_loop_report.md](vm_pcg_loop_report.md) |
+| vm_pdepslow64_loop | PASS | 10/10 | [vm_pdepslow64_loop_report.md](vm_pdepslow64_loop_report.md) |
+| vm_peasant64_loop | PASS | 10/10 | [vm_peasant64_loop_report.md](vm_peasant64_loop_report.md) |
+| vm_pextslow64_loop | PASS | 9/9 | [vm_pextslow64_loop_report.md](vm_pextslow64_loop_report.md) |
+| vm_piecewise_loop | PASS | 11/11 | [vm_piecewise_loop_report.md](vm_piecewise_loop_report.md) |
+| vm_polynomial_loop | PASS | 10/10 | [vm_polynomial_loop_report.md](vm_polynomial_loop_report.md) |
+| vm_popcount64_loop | PASS | 10/10 | [vm_popcount64_loop_report.md](vm_popcount64_loop_report.md) |
+| vm_popcount_loop | PASS | 10/10 | [vm_popcount_loop_report.md](vm_popcount_loop_report.md) |
+| vm_popsq64_loop | PASS | 10/10 | [vm_popsq64_loop_report.md](vm_popsq64_loop_report.md) |
+| vm_power_loop | PASS | 10/10 | [vm_power_loop_report.md](vm_power_loop_report.md) |
+| vm_powermod_loop | PASS | 11/11 | [vm_powermod_loop_report.md](vm_powermod_loop_report.md) |
+| vm_powmod64_loop | PASS | 10/10 | [vm_powmod64_loop_report.md](vm_powmod64_loop_report.md) |
+| vm_prefix_sum_loop | PASS | 11/11 | [vm_prefix_sum_loop_report.md](vm_prefix_sum_loop_report.md) |
+| vm_prefix_xor_loop | PASS | 11/11 | [vm_prefix_xor_loop_report.md](vm_prefix_xor_loop_report.md) |
+| vm_prefixxor64_loop | PASS | 10/10 | [vm_prefixxor64_loop_report.md](vm_prefixxor64_loop_report.md) |
+| vm_quad_byte_xor64_loop | PASS | 10/10 | [vm_quad_byte_xor64_loop_report.md](vm_quad_byte_xor64_loop_report.md) |
+| vm_register_loop | PASS | 10/10 | [vm_register_loop_report.md](vm_register_loop_report.md) |
+| vm_revdecimal64_loop | PASS | 10/10 | [vm_revdecimal64_loop_report.md](vm_revdecimal64_loop_report.md) |
+| vm_reverse_array_loop | PASS | 10/10 | [vm_reverse_array_loop_report.md](vm_reverse_array_loop_report.md) |
+| vm_rotate_loop | PASS | 10/10 | [vm_rotate_loop_report.md](vm_rotate_loop_report.md) |
+| vm_rotchoice64_loop | PASS | 10/10 | [vm_rotchoice64_loop_report.md](vm_rotchoice64_loop_report.md) |
+| vm_rotl64_loop | PASS | 10/10 | [vm_rotl64_loop_report.md](vm_rotl64_loop_report.md) |
+| vm_runlength_loop | PASS | 13/13 | [vm_runlength_loop_report.md](vm_runlength_loop_report.md) |
+| vm_runlmax_loop | PASS | 12/12 | [vm_runlmax_loop_report.md](vm_runlmax_loop_report.md) |
+| vm_satadd64_loop | PASS | 10/10 | [vm_satadd64_loop_report.md](vm_satadd64_loop_report.md) |
+| vm_saturating_loop | PASS | 10/10 | [vm_saturating_loop_report.md](vm_saturating_loop_report.md) |
+| vm_sbyte_array_loop | PASS | 10/10 | [vm_sbyte_array_loop_report.md](vm_sbyte_array_loop_report.md) |
+| vm_sdiv64_loop | PASS | 10/10 | [vm_sdiv64_loop_report.md](vm_sdiv64_loop_report.md) |
+| vm_search_loop | PASS | 10/10 | [vm_search_loop_report.md](vm_search_loop_report.md) |
+| vm_shift64_loop | PASS | 10/10 | [vm_shift64_loop_report.md](vm_shift64_loop_report.md) |
+| vm_shiftin_top64_loop | PASS | 10/10 | [vm_shiftin_top64_loop_report.md](vm_shiftin_top64_loop_report.md) |
+| vm_shiftmul_loop | PASS | 11/11 | [vm_shiftmul_loop_report.md](vm_shiftmul_loop_report.md) |
+| vm_short_array_loop | PASS | 10/10 | [vm_short_array_loop_report.md](vm_short_array_loop_report.md) |
+| vm_short_loop | PASS | 10/10 | [vm_short_loop_report.md](vm_short_loop_report.md) |
+| vm_signed_byterange64_loop | PASS | 10/10 | [vm_signed_byterange64_loop_report.md](vm_signed_byterange64_loop_report.md) |
+| vm_signed_dword_range64_loop | PASS | 10/10 | [vm_signed_dword_range64_loop_report.md](vm_signed_dword_range64_loop_report.md) |
+| vm_signed_dword_sum64_loop | PASS | 10/10 | [vm_signed_dword_sum64_loop_report.md](vm_signed_dword_sum64_loop_report.md) |
+| vm_signed_word_range64_loop | PASS | 10/10 | [vm_signed_word_range64_loop_report.md](vm_signed_word_range64_loop_report.md) |
+| vm_signed_word_sum64_loop | PASS | 10/10 | [vm_signed_word_sum64_loop_report.md](vm_signed_word_sum64_loop_report.md) |
+| vm_signedaccum64_loop | PASS | 10/10 | [vm_signedaccum64_loop_report.md](vm_signedaccum64_loop_report.md) |
+| vm_signedbytesum64_loop | PASS | 10/10 | [vm_signedbytesum64_loop_report.md](vm_signedbytesum64_loop_report.md) |
+| vm_signedxor_byte_idx64_loop | PASS | 10/10 | [vm_signedxor_byte_idx64_loop_report.md](vm_signedxor_byte_idx64_loop_report.md) |
+| vm_skiploop_loop | PASS | 11/11 | [vm_skiploop_loop_report.md](vm_skiploop_loop_report.md) |
+| vm_smax64_loop | PASS | 10/10 | [vm_smax64_loop_report.md](vm_smax64_loop_report.md) |
+| vm_splitmix64_loop | PASS | 10/10 | [vm_splitmix64_loop_report.md](vm_splitmix64_loop_report.md) |
+| vm_squareadd64_loop | PASS | 10/10 | [vm_squareadd64_loop_report.md](vm_squareadd64_loop_report.md) |
+| vm_stride_loop | PASS | 12/12 | [vm_stride_loop_report.md](vm_stride_loop_report.md) |
+| vm_subbyte_idx64_loop | PASS | 10/10 | [vm_subbyte_idx64_loop_report.md](vm_subbyte_idx64_loop_report.md) |
+| vm_subxor_chain64_loop | PASS | 10/10 | [vm_subxor_chain64_loop_report.md](vm_subxor_chain64_loop_report.md) |
+| vm_three_input_loop | PASS | 10/10 | [vm_three_input_loop_report.md](vm_three_input_loop_report.md) |
+| vm_threereg64_loop | PASS | 10/10 | [vm_threereg64_loop_report.md](vm_threereg64_loop_report.md) |
+| vm_threestate_xormul64_loop | PASS | 10/10 | [vm_threestate_xormul64_loop_report.md](vm_threestate_xormul64_loop_report.md) |
+| vm_trailingones64_loop | PASS | 10/10 | [vm_trailingones64_loop_report.md](vm_trailingones64_loop_report.md) |
+| vm_trailzeros_factorial64_loop | PASS | 10/10 | [vm_trailzeros_factorial64_loop_report.md](vm_trailzeros_factorial64_loop_report.md) |
+| vm_treepath64_loop | PASS | 10/10 | [vm_treepath64_loop_report.md](vm_treepath64_loop_report.md) |
+| vm_tribonacci64_loop | PASS | 10/10 | [vm_tribonacci64_loop_report.md](vm_tribonacci64_loop_report.md) |
+| vm_two_input_loop | PASS | 10/10 | [vm_two_input_loop_report.md](vm_two_input_loop_report.md) |
+| vm_u64_array_loop | PASS | 8/8 | [vm_u64_array_loop_report.md](vm_u64_array_loop_report.md) |
+| vm_uintadd_byte_idx64_loop | PASS | 10/10 | [vm_uintadd_byte_idx64_loop_report.md](vm_uintadd_byte_idx64_loop_report.md) |
+| vm_umin64_loop | PASS | 10/10 | [vm_umin64_loop_report.md](vm_umin64_loop_report.md) |
+| vm_ushort_array_loop | PASS | 10/10 | [vm_ushort_array_loop_report.md](vm_ushort_array_loop_report.md) |
+| vm_vartrip_array_loop | PASS | 10/10 | [vm_vartrip_array_loop_report.md](vm_vartrip_array_loop_report.md) |
+| vm_window_loop | PASS | 11/11 | [vm_window_loop_report.md](vm_window_loop_report.md) |
+| vm_word_horner13_64_loop | PASS | 10/10 | [vm_word_horner13_64_loop_report.md](vm_word_horner13_64_loop_report.md) |
+| vm_word_orfold64_loop | PASS | 10/10 | [vm_word_orfold64_loop_report.md](vm_word_orfold64_loop_report.md) |
+| vm_word_range64_loop | PASS | 10/10 | [vm_word_range64_loop_report.md](vm_word_range64_loop_report.md) |
+| vm_word_xormul64_loop | PASS | 10/10 | [vm_word_xormul64_loop_report.md](vm_word_xormul64_loop_report.md) |
+| vm_wrapper_chain_loop | **NA** | 0/0 | [vm_wrapper_chain_loop_report.md](vm_wrapper_chain_loop_report.md) |
+| vm_xor_accumulator_loop | PASS | 8/8 | [vm_xor_accumulator_loop_report.md](vm_xor_accumulator_loop_report.md) |
+| vm_xor_shifted_self_byte64_loop | PASS | 10/10 | [vm_xor_shifted_self_byte64_loop_report.md](vm_xor_shifted_self_byte64_loop_report.md) |
+| vm_xorbytes64_loop | PASS | 10/10 | [vm_xorbytes64_loop_report.md](vm_xorbytes64_loop_report.md) |
+| vm_xordecrypt_loop | PASS | 10/10 | [vm_xordecrypt_loop_report.md](vm_xordecrypt_loop_report.md) |
+| vm_xormul_byte_idx64_loop | PASS | 10/10 | [vm_xormul_byte_idx64_loop_report.md](vm_xormul_byte_idx64_loop_report.md) |
+| vm_xormuladd_chain64_loop | PASS | 10/10 | [vm_xormuladd_chain64_loop_report.md](vm_xormuladd_chain64_loop_report.md) |
+| vm_xormulself_byte64_loop | PASS | 10/10 | [vm_xormulself_byte64_loop_report.md](vm_xormulself_byte64_loop_report.md) |
+| vm_xorrot64_loop | PASS | 10/10 | [vm_xorrot64_loop_report.md](vm_xorrot64_loop_report.md) |
+| vm_xorshift64_loop | PASS | 10/10 | [vm_xorshift64_loop_report.md](vm_xorshift64_loop_report.md) |
+| vm_xorshrink64_loop | PASS | 10/10 | [vm_xorshrink64_loop_report.md](vm_xorshrink64_loop_report.md) |
+| vm_xs64star_loop | PASS | 10/10 | [vm_xs64star_loop_report.md](vm_xs64star_loop_report.md) |
+| vm_xxhmix64_loop | PASS | 10/10 | [vm_xxhmix64_loop_report.md](vm_xxhmix64_loop_report.md) |
+| vm_zigzag_loop | PASS | 11/11 | [vm_zigzag_loop_report.md](vm_zigzag_loop_report.md) |
+| vm_zigzag_step64_loop | PASS | 10/10 | [vm_zigzag_step64_loop_report.md](vm_zigzag_step64_loop_report.md) |
@@ -0,0 +1,53 @@
+# bitchain - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/bitchain.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/bitchain.ll`
+- **Symbol:** `bitchain_target`
+- **Native driver:** `rewrite-regression-work/eq/bitchain_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `bitchain_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 4090 | 4090 | 4090 | yes | constant: 0x0FFA |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global bitchain_target
+extern ExitProcess
+
+section .text
+; Pure-constant bit manipulation chain. No symbolic inputs.
+; eax = 0xFF
+; shl eax, 8   → 0x0000FF00
+; xor eax, 0xAA → 0x0000FFAA
+; ror eax, 4   → 0xA0000FFA
+; and eax, 0xFFFF → 0x0FFA = 4090
+; LLVM must fold entire chain to ret i64 4090.
+bitchain_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, 0xFF
+    shl eax, 8
+    xor eax, 0xAA
+    ror eax, 4
+    and eax, 0xFFFF
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call bitchain_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,55 @@
+# branch - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 5/5 equivalent
+- **Source:** `testcases/rewrite_smoke/branch.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/branch.ll`
+- **Symbol:** `branch_target`
+- **Native driver:** `rewrite-regression-work/eq/branch_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `branch_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 87 | 87 | 87 | yes | le path: (0+100)^0x33=87 |
+| 2 | RCX=3 | 84 | 84 | 84 | yes | le path: (3+100)^0x33=84 |
+| 3 | RCX=5 | 90 | 90 | 90 | yes | le boundary: (5+100)^0x33=90 |
+| 4 | RCX=6 | 33 | 33 | 33 | yes | gt path: (6*3)^0x33=33 |
+| 5 | RCX=10 | 45 | 45 | 45 | yes | gt path: (10*3)^0x33=45 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global branch_target
+extern ExitProcess
+
+section .text
+branch_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 5
+    jg .gt
+    add eax, 100
+    jmp .done
+.gt:
+    imul eax, eax, 3
+.done:
+    xor eax, 0x33
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 10
+    call branch_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,86 @@
+# bytecode_vm_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 6/6 equivalent
+- **Source:** `testcases/rewrite_smoke/bytecode_vm_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/bytecode_vm_loop.ll`
+- **Symbol:** `bytecode_vm_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/bytecode_vm_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `bytecode_vm_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 40 | 40 | 40 | yes | even program returns constant handler |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | odd bytecode loop limit 1 returns 0 |
+| 3 | RCX=3 | 3 | 3 | 3 | yes | odd bytecode loop: 0+1+2 |
+| 4 | RCX=5 | 10 | 10 | 10 | yes | odd bytecode loop: 0+1+2+3+4 |
+| 5 | RCX=7 | 21 | 21 | 21 | yes | odd bytecode loop: 0..6 |
+| 6 | RCX=8 | 40 | 40 | 40 | yes | even program ignores odd loop body |
+
+## Source
+
+```c
+/* Compiler-friendly VM with the loop implemented in VM program-counter state.
+ * Lift target: bytecode_vm_loop_target.
+ * Goal: keep the loop inside interpreter state instead of native source control
+ * flow, while avoiding external bytecode loads and compiler jump tables.
+ */
+#include <stdio.h>
+
+enum FriendlyVmPc {
+    VM_EVEN_CONST = 0,
+    VM_EVEN_HALT = 1,
+    VM_ODD_LOAD_LIMIT = 10,
+    VM_ODD_CLEAR_ACC = 11,
+    VM_ODD_CLEAR_INDEX = 12,
+    VM_ODD_CHECK = 13,
+    VM_ODD_BODY = 14,
+    VM_ODD_HALT = 15,
+};
+
+__declspec(noinline)
+int bytecode_vm_loop_target(int x) {
+    int pc = (x & 1) ? VM_ODD_LOAD_LIMIT : VM_EVEN_CONST;
+    int acc = 0;
+    int index = 0;
+    int limit = 0;
+
+    while (1) {
+        if (pc == VM_EVEN_CONST) {
+            acc = 40;
+            pc = VM_EVEN_HALT;
+        } else if (pc == VM_EVEN_HALT) {
+            return acc;
+        } else if (pc == VM_ODD_LOAD_LIMIT) {
+            limit = x & 7;
+            pc = VM_ODD_CLEAR_ACC;
+        } else if (pc == VM_ODD_CLEAR_ACC) {
+            acc = 0;
+            pc = VM_ODD_CLEAR_INDEX;
+        } else if (pc == VM_ODD_CLEAR_INDEX) {
+            index = 0;
+            pc = VM_ODD_CHECK;
+        } else if (pc == VM_ODD_CHECK) {
+            pc = (index < limit) ? VM_ODD_BODY : VM_ODD_HALT;
+        } else if (pc == VM_ODD_BODY) {
+            acc += index;
+            index += 1;
+            pc = VM_ODD_CHECK;
+        } else if (pc == VM_ODD_HALT) {
+            return acc;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("bytecode_vm_loop(5)=%d bytecode_vm_loop(8)=%d\n",
+           bytecode_vm_loop_target(5), bytecode_vm_loop_target(8));
+    return 0;
+}
+```
@@ -0,0 +1,41 @@
+# calc_cout - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 4/4 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_cout.cpp`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_cout.ll`
+- **Symbol:** `?calc_cout@@YAHH@Z`
+- **Native driver:** `rewrite-regression-work/eq/calc_cout_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `?calc_cout@@YAHH@Z` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=10 | 37 | 37 | 37 | yes | 10*3+7 |
+| 2 | RCX=0 | 7 | 7 | 7 | yes | 0*3+7 |
+| 3 | RCX=100 | 307 | 307 | 307 | yes | 100*3+7 |
+| 4 | RCX=1 | 10 | 10 | 10 | yes | 1*3+7 |
+
+## Source
+
+```cpp
+/* Test: function with cout call.
+ * Lift target: calc_cout — external call handling.
+ * The computation is pure, but it calls cout before returning. */
+#include <iostream>
+
+__declspec(noinline)
+int calc_cout(int x) {
+    int result = x * 3 + 7;
+    std::cout << result;
+    return result;
+}
+
+int main() {
+    int r = calc_cout(10);
+    return r;
+}
+```
@@ -0,0 +1,43 @@
+# calc_fib - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_fib.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_fib.ll`
+- **Symbol:** `calc_fib`
+- **Native driver:** `rewrite-regression-work/eq/calc_fib_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_fib` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 13 | 13 | 13 | yes | constant: fib(7) |
+
+## Source
+
+```c
+/* Iterative Fibonacci with constant bound.
+ * Lift target: calc_fib — concrete loop (7 iterations), stack variables.
+ * fib(7) = 13.  Concolic engine should unroll; LLVM folds to constant.
+ * This is the first test of real compiler-generated /Od loop code. */
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_fib(void) {
+    int a = 0, b = 1;
+    for (int i = 0; i < 7; i++) {
+        int t = a + b;
+        a = b;
+        b = t;
+    }
+    return a;
+}
+
+int main(void) {
+    printf("fib(7)=%d\n", calc_fib());
+    return 0;
+}
+```
@@ -0,0 +1,51 @@
+# calc_grade - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_grade.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_grade.ll`
+- **Symbol:** `calc_grade`
+- **Native driver:** `rewrite-regression-work/eq/calc_grade_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_grade` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=95 | 4 | 4 | 4 | yes | >=90 |
+| 2 | RCX=90 | 4 | 4 | 4 | yes | ==90 boundary |
+| 3 | RCX=89 | 3 | 3 | 3 | yes | 80..89 |
+| 4 | RCX=80 | 3 | 3 | 3 | yes | ==80 boundary |
+| 5 | RCX=79 | 2 | 2 | 2 | yes | 70..79 |
+| 6 | RCX=70 | 2 | 2 | 2 | yes | ==70 boundary |
+| 7 | RCX=69 | 1 | 1 | 1 | yes | 60..69 |
+| 8 | RCX=60 | 1 | 1 | 1 | yes | ==60 boundary |
+| 9 | RCX=59 | 0 | 0 | 0 | yes | <60 |
+| 10 | RCX=0 | 0 | 0 | 0 | yes | <60 zero |
+| 11 | RCX=100 | 4 | 4 | 4 | yes | >=90 well above |
+
+## Source
+
+```c
+/* Grade calculator: cascading if/else on symbolic input (ECX).
+ * Lift target: calc_grade — no loops, pure branching.
+ * Expected IR: chain of icmp + select on the symbolic argument. */
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_grade(int score) {
+    if (score >= 90) return 4;   /* A */
+    if (score >= 80) return 3;   /* B */
+    if (score >= 70) return 2;   /* C */
+    if (score >= 60) return 1;   /* D */
+    return 0;                    /* F */
+}
+
+int main(void) {
+    printf("grade(95)=%d grade(82)=%d grade(55)=%d\n",
+           calc_grade(95), calc_grade(82), calc_grade(55));
+    return 0;
+}
+```
@@ -0,0 +1,73 @@
+# calc_jumptable_large - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_jumptable_large.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_jumptable_large.ll`
+- **Symbol:** `calc_jumptable_large`
+- **Native driver:** `rewrite-regression-work/eq/calc_jumptable_large_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_jumptable_large` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 7 | 7 | 7 | yes | case 0 |
+| 2 | RCX=1 | 42 | 42 | 42 | yes | case 1 |
+| 3 | RCX=5 | 31 | 31 | 31 | yes | case 5 |
+| 4 | RCX=7 | 3 | 3 | 3 | yes | case 7 |
+| 5 | RCX=10 | 404 | 404 | 404 | yes | case 10 |
+| 6 | RCX=14 | 65535 | 65535 | 65535 | yes | case 14 |
+| 7 | RCX=15 | 21 | 21 | 21 | yes | case 15 |
+| 8 | RCX=-1 | 4294967295 | 4294967295 | 4294967295 | yes | default (negative) |
+| 9 | RCX=16 | 4294967295 | 4294967295 | 4294967295 | yes | default (above range) |
+| 10 | RCX=100 | 4294967295 | 4294967295 | 4294967295 | yes | default far |
+
+## Source
+
+```c
+/* Large jump table test: 16 dense cases compiled with /O2.
+ * Tests that the lifter handles tables larger than the existing 4/8/10
+ * entry tests and produces correct dispatch for all 16 values.
+ *
+ * Return values are deliberately irregular (no arithmetic pattern) so
+ * the compiler cannot fold the switch into a formula.
+ *
+ * Lift target: calc_jumptable_large
+ * NOTE: Filename contains "_jumptable" so build_samples.cmd compiles
+ * with /O2 (required for real jump table generation). */
+
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_jumptable_large(int op) {
+    switch (op) {
+    case 0:  return 7;
+    case 1:  return 42;
+    case 2:  return 13;
+    case 3:  return 99;
+    case 4:  return 256;
+    case 5:  return 31;
+    case 6:  return 1024;
+    case 7:  return 3;
+    case 8:  return 777;
+    case 9:  return 55;
+    case 10: return 404;
+    case 11: return 1337;
+    case 12: return 88;
+    case 13: return 500;
+    case 14: return 65535;
+    case 15: return 21;
+    default: return -1;
+    }
+}
+
+int main(void) {
+    printf("jt(0)=%d jt(7)=%d jt(15)=%d jt(99)=%d\n",
+           calc_jumptable_large(0), calc_jumptable_large(7),
+           calc_jumptable_large(15), calc_jumptable_large(99));
+    return 0;
+}
+```
@@ -0,0 +1,65 @@
+# calc_jumptable - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 12/12 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_jumptable.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_jumptable.ll`
+- **Symbol:** `calc_jumptable`
+- **Native driver:** `rewrite-regression-work/eq/calc_jumptable_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_jumptable` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=-1 | 4294967295 | 4294967295 | 4294967295 | yes | default (negative) |
+| 2 | RCX=0 | 1 | 1 | 1 | yes | 2^0 |
+| 3 | RCX=1 | 2 | 2 | 2 | yes | 2^1 |
+| 4 | RCX=2 | 4 | 4 | 4 | yes | 2^2 |
+| 5 | RCX=3 | 8 | 8 | 8 | yes | 2^3 |
+| 6 | RCX=4 | 16 | 16 | 16 | yes | 2^4 |
+| 7 | RCX=5 | 32 | 32 | 32 | yes | 2^5 |
+| 8 | RCX=6 | 64 | 64 | 64 | yes | 2^6 |
+| 9 | RCX=7 | 128 | 128 | 128 | yes | 2^7 |
+| 10 | RCX=8 | 256 | 256 | 256 | yes | 2^8 |
+| 11 | RCX=9 | 512 | 512 | 512 | yes | 2^9 |
+| 12 | RCX=10 | 4294967295 | 4294967295 | 4294967295 | yes | default (above range) |
+
+## Source
+
+```c
+/* Jump table test: MSVC /O2 should emit a real jump table for 7+ dense cases.
+ * Lift target: calc_jumptable
+ * Expected IR: switch (or equivalent multi-target branch) on symbolic input.
+ *
+ * NOTE: Must be compiled with /O2 (not /Od) to generate jmp [table + reg*8].
+ * /Od generates compare chains which the lifter already handles. */
+
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_jumptable(int op) {
+    switch (op) {
+    case 0: return 1;
+    case 1: return 2;
+    case 2: return 4;
+    case 3: return 8;
+    case 4: return 16;
+    case 5: return 32;
+    case 6: return 64;
+    case 7: return 128;
+    case 8: return 256;
+    case 9: return 512;
+    default: return -1;
+    }
+}
+
+int main(void) {
+    printf("jt(0)=%d jt(5)=%d jt(9)=%d jt(99)=%d\n",
+           calc_jumptable(0), calc_jumptable(5),
+           calc_jumptable(9), calc_jumptable(99));
+    return 0;
+}
+```
@@ -0,0 +1,51 @@
+# calc_mixed - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 7/7 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_mixed.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_mixed.ll`
+- **Symbol:** `calc_mixed`
+- **Native driver:** `rewrite-regression-work/eq/calc_mixed_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_mixed` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=150 | 576 | 576 | 576 | yes | x>100: (42+150)*3=576 |
+| 2 | RCX=101 | 429 | 429 | 429 | yes | x>100: (42+101)*3=429 |
+| 3 | RCX=0 | 126 | 126 | 126 | yes | x<=100: (42-0)*3=126 |
+| 4 | RCX=1 | 123 | 123 | 123 | yes | x<=100: (42-1)*3=123 |
+| 5 | RCX=42 | 0 | 0 | 0 | yes | x<=100: (42-42)*3=0 |
+| 6 | RCX=50 | 4294967272 | 4294967272 | 4294967272 | yes | x<=100: uint32 wrap, zext |
+| 7 | RCX=100 | 4294967122 | 4294967122 | 4294967122 | yes | x<=100: uint32 wrap, zext |
+
+## Source
+
+```c
+/* Mixed symbolic + concrete: branch on input then multiply.
+ * Lift target: calc_mixed — symbolic arg, one branch, post-merge math.
+ * Expected IR: select on (x > 100), then mul by 3. */
+#include <stdio.h>
+#include <stdint.h>
+
+__declspec(noinline)
+int calc_mixed(int x) {
+    uint32_t base = 42u;
+    uint32_t ux = (uint32_t)x;
+    if (x > 100)
+        base += ux;
+    else
+        base -= ux;
+    uint32_t scaled = base * 3u;
+    return (int)(int32_t)scaled;
+}
+
+int main(void) {
+    printf("mixed(150)=%d mixed(50)=%d\n",
+           calc_mixed(150), calc_mixed(50));
+    return 0;
+}
+```
@@ -0,0 +1,41 @@
+# calc_sum_array - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_sum_array.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_sum_array.ll`
+- **Symbol:** `calc_sum_array`
+- **Native driver:** `rewrite-regression-work/eq/calc_sum_array_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_sum_array` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 150 | 150 | 150 | yes | constant: 10+20+30+40+50 |
+
+## Source
+
+```c
+/* Sum a small constant stack-allocated array.
+ * Lift target: calc_sum_array — concrete loop + stack array access.
+ * 10 + 20 + 30 + 40 + 50 = 150.
+ * Tests compiler-generated array init + indexed load in a loop. */
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_sum_array(void) {
+    int arr[] = {10, 20, 30, 40, 50};
+    int sum = 0;
+    for (int i = 0; i < 5; i++)
+        sum += arr[i];
+    return sum;
+}
+
+int main(void) {
+    printf("sum([10,20,30,40,50])=%d\n", calc_sum_array());
+    return 0;
+}
+```
@@ -0,0 +1,50 @@
+# calc_sum_to_n - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 6/6 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_sum_to_n.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_sum_to_n.ll`
+- **Symbol:** `calc_sum_to_n`
+- **Native driver:** `rewrite-regression-work/eq/calc_sum_to_n_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_sum_to_n` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | n=0 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | n=1 |
+| 3 | RCX=5 | 10 | 10 | 10 | yes | 0+1+2+3+4 |
+| 4 | RCX=10 | 45 | 45 | 45 | yes | 0..9 |
+| 5 | RCX=32 | 496 | 496 | 496 | yes | 0..31 |
+| 6 | RCX=100 | 496 | 496 | 496 | yes | clamped to 32 |
+
+## Source
+
+```c
+/* Symbolic trip-count counted loop.
+ * Lift target: calc_sum_to_n — symbolic loop bound with a clamp.
+ * Goal: preserve real loop structure (phi/backedge/compare), not constant-fold.
+ */
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_sum_to_n(int n) {
+    if (n > 32)
+        n = 32;
+
+    int sum = 0;
+    for (int i = 0; i < n; i++)
+        sum += i;
+
+    return sum;
+}
+
+int main(void) {
+    printf("sum_to_n(5)=%d sum_to_n(10)=%d\n",
+           calc_sum_to_n(5), calc_sum_to_n(10));
+    return 0;
+}
+```
@@ -0,0 +1,51 @@
+# calc_switch - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/calc_switch.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/calc_switch.ll`
+- **Symbol:** `calc_switch`
+- **Native driver:** `rewrite-regression-work/eq/calc_switch_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `calc_switch` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=1 | 6 | 6 | 6 | yes | Monday |
+| 2 | RCX=2 | 7 | 7 | 7 | yes | Tuesday |
+| 3 | RCX=3 | 9 | 9 | 9 | yes | Wednesday |
+| 4 | RCX=4 | 8 | 8 | 8 | yes | Thursday |
+| 5 | RCX=5 | 6 | 6 | 6 | yes | Friday |
+| 6 | RCX=0 | 0 | 0 | 0 | yes | default (0) |
+| 7 | RCX=6 | 0 | 0 | 0 | yes | default (6) |
+| 8 | RCX=100 | 0 | 0 | 0 | yes | default (100) |
+
+## Source
+
+```c
+/* Day-of-week name length: switch with 5 cases + default.
+ * Lift target: calc_switch — multi-target branch resolution.
+ * Expected IR: switch on symbolic input, resolving all case targets. */
+#include <stdio.h>
+
+__declspec(noinline)
+int calc_switch(int day) {
+    switch (day) {
+    case 1: return 6;  /* Monday */
+    case 2: return 7;  /* Tuesday */
+    case 3: return 9;  /* Wednesday */
+    case 4: return 8;  /* Thursday */
+    case 5: return 6;  /* Friday */
+    default: return 0; /* invalid */
+    }
+}
+
+int main(void) {
+    printf("switch(1)=%d switch(3)=%d switch(5)=%d switch(9)=%d\n",
+           calc_switch(1), calc_switch(3), calc_switch(5), calc_switch(9));
+    return 0;
+}
+```
@@ -0,0 +1,58 @@
+# cmov_chain - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 5/5 equivalent
+- **Source:** `testcases/rewrite_smoke/cmov_chain.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/cmov_chain.ll`
+- **Symbol:** `cmov_chain_target`
+- **Native driver:** `rewrite-regression-work/eq/cmov_chain_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `cmov_chain_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 150 | 150 | 150 | yes | <=10: 100+50 |
+| 2 | RCX=10 | 150 | 150 | 150 | yes | ==10: not >10 |
+| 3 | RCX=11 | 250 | 250 | 250 | yes | >10: 200+50 |
+| 4 | RCX=15 | 250 | 250 | 250 | yes | >10 interior |
+| 5 | RCX=100 | 250 | 250 | 250 | yes | >10 far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global cmov_chain_target
+extern ExitProcess
+
+section .text
+; Conditional moves (branchless select) on symbolic RCX:
+;   eax = 100, edx = 200
+;   if ecx > 10: eax = edx (200)
+;   eax += 50
+; Result is 150 or 250 depending on input.
+; No branches in the CFG — cmov emits a select directly.
+; Expect: select i1, add.
+cmov_chain_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, 100
+    mov edx, 200
+    cmp ecx, 10
+    cmovg eax, edx
+    add eax, 50
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 15
+    call cmov_chain_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,63 @@
+# diamond - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/diamond.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/diamond.ll`
+- **Symbol:** `diamond_target`
+- **Native driver:** `rewrite-regression-work/eq/diamond_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `diamond_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=7 | 51 | 51 | 51 | yes | odd: (7+10)*3 |
+| 2 | RCX=1 | 33 | 33 | 33 | yes | odd: (1+10)*3 |
+| 3 | RCX=3 | 39 | 39 | 39 | yes | odd: (3+10)*3 |
+| 4 | RCX=11 | 63 | 63 | 63 | yes | odd: (11+10)*3 |
+| 5 | RCX=6 | 3 | 3 | 3 | yes | even: (6-5)*3 |
+| 6 | RCX=8 | 9 | 9 | 9 | yes | even: (8-5)*3 |
+| 7 | RCX=10 | 15 | 15 | 15 | yes | even: (10-5)*3 |
+| 8 | RCX=100 | 285 | 285 | 285 | yes | even: (100-5)*3 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global diamond_target
+extern ExitProcess
+
+section .text
+; Diamond-shaped CFG: two paths merge then continue.
+;   if ecx is odd: eax = ecx + 10
+;   else:          eax = ecx - 5
+;   then:          eax *= 3
+; Symbolic input → expect select/phi at merge, then mul by 3.
+diamond_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    test eax, 1
+    jz .even
+    add eax, 10
+    jmp .merge
+.even:
+    sub eax, 5
+.merge:
+    imul eax, eax, 3
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 7
+    call diamond_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,66 @@
+# dummy_vm_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 6/6 equivalent
+- **Source:** `testcases/rewrite_smoke/dummy_vm_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/dummy_vm_loop.ll`
+- **Symbol:** `dummy_vm_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/dummy_vm_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `dummy_vm_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 40 | 40 | 40 | yes | even opcode takes constant handler |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | odd opcode loop with limit 1 returns 0 |
+| 3 | RCX=3 | 3 | 3 | 3 | yes | odd opcode loop: 0+1+2 |
+| 4 | RCX=5 | 10 | 10 | 10 | yes | odd opcode loop: 0+1+2+3+4 |
+| 5 | RCX=7 | 21 | 21 | 21 | yes | odd opcode loop: 0..6 |
+| 6 | RCX=8 | 40 | 40 | 40 | yes | even opcode ignores masked loop handler |
+
+## Source
+
+```c
+/* Tiny dummy-VM-style state machine around a real local loop.
+ * Lift target: dummy_vm_loop_target.
+ * Goal: keep a VM-shaped dispatch shell while preserving a normal counted loop
+ * inside one handler, so loop-generalization regressions cannot silently
+ * collapse it into unresolved control flow.
+ */
+#include <stdio.h>
+
+__declspec(noinline)
+int dummy_vm_loop_target(int x) {
+    int opcode = x & 1;
+    int acc = 0;
+
+    while (1) {
+        switch (opcode) {
+        case 0:
+            acc = 40;
+            opcode = 2;
+            break;
+        case 1: {
+            int limit = x & 7;
+            for (int i = 0; i < limit; i++)
+                acc += i;
+            opcode = 2;
+            break;
+        }
+        case 2:
+            return acc;
+        default:
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("dummy_vm_loop(5)=%d dummy_vm_loop(8)=%d\n",
+           dummy_vm_loop_target(5), dummy_vm_loop_target(8));
+    return 0;
+}
+```
@@ -0,0 +1,63 @@
+# indirect - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/indirect.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/indirect.ll`
+- **Symbol:** `jump_target`
+- **Native driver:** `rewrite-regression-work/eq/indirect_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jump_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 53 | 53 | 53 | yes | constant: hardcoded case2 0x30+5 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jump_target
+extern ExitProcess
+
+section .text
+jump_target:
+    push rbp
+    mov rbp, rsp
+
+    mov ecx, 2
+    lea rax, [rel jump_table]
+    movsxd rdx, dword [rax + rcx * 4]
+    add rax, rdx
+    jmp rax
+
+case0:
+    mov eax, 0x10
+    jmp done_label
+case1:
+    mov eax, 0x20
+    jmp done_label
+case2:
+    mov eax, 0x30
+done_label:
+    add eax, 5
+    pop rbp
+    ret
+
+jump_table:
+    dd case0 - jump_table
+    dd case1 - jump_table
+    dd case2 - jump_table
+
+start:
+    sub rsp, 40
+    call jump_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,43 @@
+# instr_add - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/instr_add.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/instr_add.ll`
+- **Symbol:** `instr_add_target`
+- **Native driver:** `rewrite-regression-work/eq/instr_add_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `instr_add_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 12 | 12 | 12 | yes | constant: 7+5 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global instr_add_target
+extern ExitProcess
+
+section .text
+instr_add_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, 7
+    add eax, 5
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call instr_add_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,43 @@
+# instr_rol - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/instr_rol.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/instr_rol.ll`
+- **Symbol:** `instr_rol_target`
+- **Native driver:** `rewrite-regression-work/eq/instr_rol_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `instr_rol_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 34 | 34 | 34 | yes | constant: rol(0x11,1)=0x22=34 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global instr_rol_target
+extern ExitProcess
+
+section .text
+instr_rol_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, 0x11
+    rol eax, 1
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call instr_rol_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,43 @@
+# instr_sub - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/instr_sub.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/instr_sub.ll`
+- **Symbol:** `instr_sub_target`
+- **Native driver:** `rewrite-regression-work/eq/instr_sub_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `instr_sub_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 42 | 42 | 42 | yes | constant: 100-58 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global instr_sub_target
+extern ExitProcess
+
+section .text
+instr_sub_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, 100
+    sub eax, 58
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call instr_sub_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,43 @@
+# instr_xor - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/instr_xor.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/instr_xor.ll`
+- **Symbol:** `instr_xor_target`
+- **Native driver:** `rewrite-regression-work/eq/instr_xor_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `instr_xor_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 90 | 90 | 90 | yes | constant: 0x55^0x0F=0x5A=90 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global instr_xor_target
+extern ExitProcess
+
+section .text
+instr_xor_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, 0x55
+    xor eax, 0x0f
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call instr_xor_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,81 @@
+# jumptable_basic - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 6/6 equivalent
+- **Source:** `testcases/rewrite_smoke/jumptable_basic.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/jumptable_basic.ll`
+- **Symbol:** `jumptable_basic_target`
+- **Native driver:** `rewrite-regression-work/eq/jumptable_basic_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jumptable_basic_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 10 | 10 | 10 | yes | case 0 |
+| 2 | RCX=1 | 20 | 20 | 20 | yes | case 1 |
+| 3 | RCX=2 | 30 | 30 | 30 | yes | case 2 |
+| 4 | RCX=3 | 40 | 40 | 40 | yes | case 3 |
+| 5 | RCX=4 | 999 | 999 | 999 | yes | default (>3) |
+| 6 | RCX=100 | 999 | 999 | 999 | yes | default far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jumptable_basic_target
+extern ExitProcess
+
+section .text
+; Real indirect jump table on symbolic ECX input.
+; ecx 0 -> 10, 1 -> 20, 2 -> 30, 3 -> 40, else -> 999
+; The jump is: jmp [jt_basic + rax*8]
+; This is the pattern MSVC generates with /O2 for dense switches.
+jumptable_basic_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx       ; eax = symbolic input
+    cmp eax, 3
+    ja jt_b_default     ; unsigned compare: if >3, default
+    lea rcx, [jt_basic] ; base of jump table
+    jmp [rcx + rax*8]   ; indirect jump through table
+
+jt_b_case0:
+    mov eax, 10
+    jmp jt_b_done
+jt_b_case1:
+    mov eax, 20
+    jmp jt_b_done
+jt_b_case2:
+    mov eax, 30
+    jmp jt_b_done
+jt_b_case3:
+    mov eax, 40
+    jmp jt_b_done
+jt_b_default:
+    mov eax, 999
+jt_b_done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 2
+    call jumptable_basic_target
+    mov ecx, eax
+    call ExitProcess
+
+section .rdata
+align 8
+; Jump table: 4 entries (cases 0-3), absolute 64-bit pointers.
+jt_basic:
+    dq jt_b_case0
+    dq jt_b_case1
+    dq jt_b_case2
+    dq jt_b_case3
+```
@@ -0,0 +1,90 @@
+# jumptable_computation - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 7/7 equivalent
+- **Source:** `testcases/rewrite_smoke/jumptable_computation.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/jumptable_computation.ll`
+- **Symbol:** `jumptable_computation_target`
+- **Native driver:** `rewrite-regression-work/eq/jumptable_computation_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jumptable_computation_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 1 | 1 | 1 | yes | case 0: 0*2+1 |
+| 2 | RCX=1 | 8 | 8 | 8 | yes | case 1: 1*3+5 |
+| 3 | RCX=2 | 18 | 18 | 18 | yes | case 2: 2*4+10 |
+| 4 | RCX=3 | 103 | 103 | 103 | yes | case 3: 3+100 |
+| 5 | RCX=4 | 0 | 0 | 0 | yes | default (>3) |
+| 6 | RCX=100 | 0 | 0 | 0 | yes | default far |
+| 7 | RCX=5 | 0 | 0 | 0 | yes | default (5 > 3) |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jumptable_computation_target
+extern ExitProcess
+
+section .text
+; Jump table where case bodies compute on the symbolic input, not just
+; return constants.  Tests that the lifter preserves the input value
+; across the jump table dispatch and into the case body.
+;
+; The original input (ecx) is saved in r8d before the table jump.
+;
+; ecx 0 -> ecx*2 + 1
+; ecx 1 -> ecx*3 + 5
+; ecx 2 -> ecx*4 + 10
+; ecx 3 -> ecx + 100
+; else  -> 0
+jumptable_computation_target:
+    push rbp
+    mov rbp, rsp
+    mov r8d, ecx            ; save original input
+    mov eax, ecx
+    cmp eax, 3
+    ja jtc_default
+    lea rcx, [jtc_table]
+    jmp [rcx + rax*8]
+
+jtc_case0:                     ; result = input * 2 + 1
+    lea eax, [r8d + r8d + 1]
+    jmp jtc_done
+jtc_case1:                     ; result = input * 3 + 5
+    lea eax, [r8d + r8d*2 + 5]
+    jmp jtc_done
+jtc_case2:                     ; result = input * 4 + 10
+    shl r8d, 2
+    lea eax, [r8d + 10]
+    jmp jtc_done
+jtc_case3:                     ; result = input + 100
+    lea eax, [r8d + 100]
+    jmp jtc_done
+jtc_default:
+    xor eax, eax
+jtc_done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 2
+    call jumptable_computation_target
+    mov ecx, eax
+    call ExitProcess
+
+section .rdata
+align 8
+jtc_table:
+    dq jtc_case0
+    dq jtc_case1
+    dq jtc_case2
+    dq jtc_case3
+```
@@ -0,0 +1,99 @@
+# jumptable_dense - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/jumptable_dense.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/jumptable_dense.ll`
+- **Symbol:** `jumptable_dense_target`
+- **Native driver:** `rewrite-regression-work/eq/jumptable_dense_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jumptable_dense_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 100 | 100 | 100 | yes | case 0 |
+| 2 | RCX=1 | 200 | 200 | 200 | yes | case 1 |
+| 3 | RCX=2 | 300 | 300 | 300 | yes | case 2 |
+| 4 | RCX=3 | 400 | 400 | 400 | yes | case 3 |
+| 5 | RCX=4 | 500 | 500 | 500 | yes | case 4 |
+| 6 | RCX=5 | 600 | 600 | 600 | yes | case 5 |
+| 7 | RCX=6 | 700 | 700 | 700 | yes | case 6 |
+| 8 | RCX=7 | 800 | 800 | 800 | yes | case 7 |
+| 9 | RCX=8 | 0 | 0 | 0 | yes | default (>7) |
+| 10 | RCX=100 | 0 | 0 | 0 | yes | default far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jumptable_dense_target
+extern ExitProcess
+
+section .text
+; 8-way dense jump table on symbolic ECX input.
+; ecx 0->100, 1->200, 2->300, 3->400, 4->500, 5->600, 6->700, 7->800, else->0
+jumptable_dense_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 7
+    ja jt_d_default
+    lea rcx, [jt_dense]
+    jmp [rcx + rax*8]
+
+jt_d_case0:
+    mov eax, 100
+    jmp jt_d_done
+jt_d_case1:
+    mov eax, 200
+    jmp jt_d_done
+jt_d_case2:
+    mov eax, 300
+    jmp jt_d_done
+jt_d_case3:
+    mov eax, 400
+    jmp jt_d_done
+jt_d_case4:
+    mov eax, 500
+    jmp jt_d_done
+jt_d_case5:
+    mov eax, 600
+    jmp jt_d_done
+jt_d_case6:
+    mov eax, 700
+    jmp jt_d_done
+jt_d_case7:
+    mov eax, 800
+    jmp jt_d_done
+jt_d_default:
+    mov eax, 0
+jt_d_done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 5
+    call jumptable_dense_target
+    mov ecx, eax
+    call ExitProcess
+
+section .rdata
+align 8
+; Dense jump table: 8 entries (cases 0-7)
+jt_dense:
+    dq jt_d_case0
+    dq jt_d_case1
+    dq jt_d_case2
+    dq jt_d_case3
+    dq jt_d_case4
+    dq jt_d_case5
+    dq jt_d_case6
+    dq jt_d_case7
+```
@@ -0,0 +1,93 @@
+# jumptable_rel32 - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 7/7 equivalent
+- **Source:** `testcases/rewrite_smoke/jumptable_rel32.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/jumptable_rel32.ll`
+- **Symbol:** `jumptable_rel32_target`
+- **Native driver:** `rewrite-regression-work/eq/jumptable_rel32_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jumptable_rel32_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 10 | 10 | 10 | yes | case 0 |
+| 2 | RCX=1 | 20 | 20 | 20 | yes | case 1 |
+| 3 | RCX=2 | 30 | 30 | 30 | yes | case 2 |
+| 4 | RCX=3 | 40 | 40 | 40 | yes | case 3 |
+| 5 | RCX=4 | 50 | 50 | 50 | yes | case 4 |
+| 6 | RCX=5 | 999 | 999 | 999 | yes | default (>4) |
+| 7 | RCX=100 | 999 | 999 | 999 | yes | default far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jumptable_rel32_target
+extern ExitProcess
+
+section .text
+; RIP-relative 32-bit offset jump table — the pattern MSVC /O2 actually
+; generates for x64 switch statements.
+;
+; Pattern:
+;   lea  rdx, [rip + jt_data]      ; table base
+;   movsxd rax, dword [rdx + rcx*4]; signed 32-bit offset from table base
+;   add  rax, rdx                   ; absolute target
+;   jmp  rax                        ; indirect jump
+;
+; ecx 0->10, 1->20, 2->30, 3->40, 4->50, else->999
+jumptable_rel32_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 4
+    ja .default
+    lea rdx, [.jt_data]
+    movsxd rax, dword [rdx + rcx*4]
+    add rax, rdx
+    jmp rax
+
+.case0:
+    mov eax, 10
+    jmp .done
+.case1:
+    mov eax, 20
+    jmp .done
+.case2:
+    mov eax, 30
+    jmp .done
+.case3:
+    mov eax, 40
+    jmp .done
+.case4:
+    mov eax, 50
+    jmp .done
+.default:
+    mov eax, 999
+.done:
+    pop rbp
+    ret
+
+; Table lives in .text so NASM can compute intra-section differences.
+align 4
+.jt_data:
+    dd .case0 - .jt_data
+    dd .case1 - .jt_data
+    dd .case2 - .jt_data
+    dd .case3 - .jt_data
+    dd .case4 - .jt_data
+
+start:
+    sub rsp, 40
+    mov ecx, 3
+    call jumptable_rel32_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,85 @@
+# jumptable_shared_targets - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/jumptable_shared_targets.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/jumptable_shared_targets.ll`
+- **Symbol:** `jumptable_shared_target`
+- **Native driver:** `rewrite-regression-work/eq/jumptable_shared_targets_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jumptable_shared_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 10 | 10 | 10 | yes | case 0 (group a) |
+| 2 | RCX=1 | 10 | 10 | 10 | yes | case 1 (group a shared) |
+| 3 | RCX=2 | 20 | 20 | 20 | yes | case 2 (solo b) |
+| 4 | RCX=3 | 30 | 30 | 30 | yes | case 3 (group c) |
+| 5 | RCX=4 | 30 | 30 | 30 | yes | case 4 (group c shared) |
+| 6 | RCX=5 | 40 | 40 | 40 | yes | case 5 (solo d) |
+| 7 | RCX=6 | 999 | 999 | 999 | yes | default (>5) |
+| 8 | RCX=100 | 999 | 999 | 999 | yes | default far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jumptable_shared_target
+extern ExitProcess
+
+section .text
+; Jump table with shared case targets: multiple indices route to the
+; same handler.  Tests that the lifter correctly merges equivalent
+; table entries.
+;
+; ecx 0->10, 1->10, 2->20, 3->30, 4->30, 5->40, else->999
+jumptable_shared_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 5
+    ja jtsh_default
+    lea rcx, [jtsh_table]
+    jmp [rcx + rax*8]
+
+jtsh_group_a:               ; cases 0, 1
+    mov eax, 10
+    jmp jtsh_done
+jtsh_solo_b:                ; case 2
+    mov eax, 20
+    jmp jtsh_done
+jtsh_group_c:               ; cases 3, 4
+    mov eax, 30
+    jmp jtsh_done
+jtsh_solo_d:                ; case 5
+    mov eax, 40
+    jmp jtsh_done
+jtsh_default:
+    mov eax, 999
+jtsh_done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 4
+    call jumptable_shared_target
+    mov ecx, eax
+    call ExitProcess
+
+section .rdata
+align 8
+jtsh_table:
+    dq jtsh_group_a     ; case 0
+    dq jtsh_group_a     ; case 1  (shared with 0)
+    dq jtsh_solo_b      ; case 2
+    dq jtsh_group_c     ; case 3
+    dq jtsh_group_c     ; case 4  (shared with 3)
+    dq jtsh_solo_d      ; case 5
+```
@@ -0,0 +1,88 @@
+# jumptable_shifted - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 9/9 equivalent
+- **Source:** `testcases/rewrite_smoke/jumptable_shifted.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/jumptable_shifted.ll`
+- **Symbol:** `jumptable_shifted_target`
+- **Native driver:** `rewrite-regression-work/eq/jumptable_shifted_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `jumptable_shifted_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=10 | 100 | 100 | 100 | yes | case 10 |
+| 2 | RCX=11 | 200 | 200 | 200 | yes | case 11 |
+| 3 | RCX=12 | 300 | 300 | 300 | yes | case 12 |
+| 4 | RCX=13 | 400 | 400 | 400 | yes | case 13 |
+| 5 | RCX=14 | 500 | 500 | 500 | yes | case 14 |
+| 6 | RCX=9 | 0 | 0 | 0 | yes | default (below range) |
+| 7 | RCX=15 | 0 | 0 | 0 | yes | default (above range) |
+| 8 | RCX=0 | 0 | 0 | 0 | yes | default (zero) |
+| 9 | RCX=100 | 0 | 0 | 0 | yes | default far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global jumptable_shifted_target
+extern ExitProcess
+
+section .text
+; Base-shifted jump table: case values 10-14 (not starting at 0).
+; Compiler subtracts the base before indexing: sub ecx, 10; cmp ecx, 4.
+;
+; ecx 10->100, 11->200, 12->300, 13->400, 14->500, else->0
+jumptable_shifted_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    sub eax, 10             ; shift: cases 10-14 become indices 0-4
+    cmp eax, 4
+    ja jts_default           ; unsigned: also catches negative (underflow)
+    lea rcx, [jts_table]
+    jmp [rcx + rax*8]        ; absolute qword table
+
+jts_case10:
+    mov eax, 100
+    jmp jts_done
+jts_case11:
+    mov eax, 200
+    jmp jts_done
+jts_case12:
+    mov eax, 300
+    jmp jts_done
+jts_case13:
+    mov eax, 400
+    jmp jts_done
+jts_case14:
+    mov eax, 500
+    jmp jts_done
+jts_default:
+    xor eax, eax
+jts_done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 12
+    call jumptable_shifted_target
+    mov ecx, eax
+    call ExitProcess
+
+section .rdata
+align 8
+jts_table:
+    dq jts_case10
+    dq jts_case11
+    dq jts_case12
+    dq jts_case13
+    dq jts_case14
+```
@@ -0,0 +1,51 @@
+# loop_simple - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/loop_simple.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/loop_simple.ll`
+- **Symbol:** `loop_simple_target`
+- **Native driver:** `rewrite-regression-work/eq/loop_simple_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `loop_simple_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 6 | 6 | 6 | yes | constant: 3+2+1 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global loop_simple_target
+extern ExitProcess
+
+section .text
+; Tiny constant-bound countdown loop: sum = 3 + 2 + 1 = 6.
+; ecx is overwritten with constant 3 immediately, so the
+; concolic engine should unroll all 3 iterations and LLVM
+; should constant-fold the result to 6.
+loop_simple_target:
+    push rbp
+    mov rbp, rsp
+    xor eax, eax
+    mov ecx, 3
+.loop:
+    add eax, ecx
+    dec ecx
+    jnz .loop
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call loop_simple_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,54 @@
+# multi_arg - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 5/5 equivalent
+- **Source:** `testcases/rewrite_smoke/multi_arg.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/multi_arg.ll`
+- **Symbol:** `multi_arg_target`
+- **Native driver:** `rewrite-regression-work/eq/multi_arg_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `multi_arg_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=5, RDX=3 | 56 | 56 | 56 | yes | (5+3)*7 |
+| 2 | RCX=0, RDX=0 | 0 | 0 | 0 | yes | (0+0)*7 |
+| 3 | RCX=10, RDX=4 | 98 | 98 | 98 | yes | (10+4)*7 |
+| 4 | RCX=1, RDX=1 | 14 | 14 | 14 | yes | (1+1)*7 |
+| 5 | RCX=100, RDX=0 | 700 | 700 | 700 | yes | (100+0)*7 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global multi_arg_target
+extern ExitProcess
+
+section .text
+; Two symbolic arguments (RCX, RDX) combined:
+;   result = (ecx + edx) * 7
+; Since both inputs are symbolic, the IR cannot constant-fold.
+; Expect to see add and mul operations in lifted IR.
+multi_arg_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    add eax, edx
+    imul eax, eax, 7
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 5
+    mov edx, 3
+    call multi_arg_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,67 @@
+# nested_branch - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/nested_branch.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/nested_branch.ll`
+- **Symbol:** `nested_branch_target`
+- **Native driver:** `rewrite-regression-work/eq/nested_branch_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `nested_branch_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 100 | 100 | 100 | yes | <=10 |
+| 2 | RCX=5 | 100 | 100 | 100 | yes | <=10 interior |
+| 3 | RCX=10 | 100 | 100 | 100 | yes | <=10 boundary |
+| 4 | RCX=11 | 200 | 200 | 200 | yes | 11..20 |
+| 5 | RCX=15 | 200 | 200 | 200 | yes | 11..20 interior |
+| 6 | RCX=20 | 200 | 200 | 200 | yes | <=20 boundary |
+| 7 | RCX=21 | 300 | 300 | 300 | yes | >20 |
+| 8 | RCX=100 | 300 | 300 | 300 | yes | >20 far |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global nested_branch_target
+extern ExitProcess
+
+section .text
+; 3-way nested if/else on symbolic RCX input.
+; if ecx <= 10 → 100
+; else if ecx <= 20 → 200
+; else → 300
+; All comparisons survive as symbolic selects/phis in IR.
+nested_branch_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 10
+    jg .above10
+    mov eax, 100
+    jmp .done
+.above10:
+    cmp eax, 20
+    jg .above20
+    mov eax, 200
+    jmp .done
+.above20:
+    mov eax, 300
+.done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 15
+    call nested_branch_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,47 @@
+# stack - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 1/1 equivalent
+- **Source:** `testcases/rewrite_smoke/stack.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/stack.ll`
+- **Symbol:** `stack_target`
+- **Native driver:** `rewrite-regression-work/eq/stack_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `stack_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | _(none)_ | 1717986918 | 1717986918 | 1717986918 | yes | constant: 0x66666666 |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global stack_target
+extern ExitProcess
+
+section .text
+stack_target:
+    push rbp
+    mov rbp, rsp
+    sub rsp, 32
+    mov dword [rsp + 16], 0x11111111
+    mov eax, dword [rsp + 16]
+    add eax, 0x22222222
+    rol eax, 1
+    add rsp, 32
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    call stack_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,146 @@
+# stack_vm_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 6/6 equivalent
+- **Source:** `testcases/rewrite_smoke/stack_vm_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/stack_vm_loop.ll`
+- **Symbol:** `stack_vm_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/stack_vm_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `stack_vm_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 40 | 40 | 40 | yes | even program returns constant handler |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | odd stack loop limit 1 returns 0 |
+| 3 | RCX=3 | 3 | 3 | 3 | yes | odd stack loop: 0+1+2 |
+| 4 | RCX=5 | 10 | 10 | 10 | yes | odd stack loop: 0+1+2+3+4 |
+| 5 | RCX=7 | 21 | 21 | 21 | yes | odd stack loop: 0..6 |
+| 6 | RCX=8 | 40 | 40 | 40 | yes | even program ignores odd loop body |
+
+## Source
+
+```c
+/* Harsher stack-based VM with explicit push/pop/add/sub/jnz-style states.
+ * Lift target: stack_vm_loop_target.
+ * Goal: keep the loop entirely in VM state while modeling a more realistic
+ * stack interpreter than the compiler-friendly register/local VM.
+ *
+ * This version keeps the stack explicit but collapses bookkeeping-only microstates
+ * and uses fixed 2-slot stack transitions so the sample remains lli-executable
+ * without reintroducing the branchy per-slot dispatcher forest that hit budget 503.
+ */
+#include <stdio.h>
+
+#define VM_PUSH0(VALUE)                                                          \
+    do {                                                                        \
+        s0 = (VALUE);                                                           \
+        sp = 1;                                                                 \
+    } while (0)
+
+#define VM_PUSH1(VALUE)                                                          \
+    do {                                                                        \
+        s1 = (VALUE);                                                           \
+        sp = 2;                                                                 \
+    } while (0)
+
+#define VM_POP1(OUT)                                                             \
+    do {                                                                        \
+        (OUT) = s1;                                                             \
+        sp = 1;                                                                 \
+    } while (0)
+
+#define VM_POP0(OUT)                                                             \
+    do {                                                                        \
+        (OUT) = s0;                                                             \
+        sp = 0;                                                                 \
+    } while (0)
+
+enum StackVmPc {
+    VM_EVEN_PUSH_40 = 0,
+    VM_EVEN_HALT = 1,
+
+    VM_ODD_INIT_LIMIT = 10,
+    VM_ODD_INIT_ACC = 11,
+    VM_ODD_INIT_INDEX = 12,
+    VM_ODD_SUB_JNZ = 13,
+    VM_ODD_BODY_ACC = 14,
+    VM_ODD_BODY_INDEX = 15,
+    VM_ODD_HALT = 16,
+};
+
+__declspec(noinline)
+int stack_vm_loop_target(int x) {
+    int sp = 0;
+    int s0 = 0;
+    int s1 = 0;
+    int acc = 0;
+    int index = 0;
+    int limit = 0;
+    int pc = (x & 1) ? VM_ODD_INIT_LIMIT : VM_EVEN_PUSH_40;
+    int lhs = 0;
+    int rhs = 0;
+    int cond = 0;
+
+    while (1) {
+        if (pc == VM_EVEN_PUSH_40) {
+            VM_PUSH0(40);
+            pc = VM_EVEN_HALT;
+        } else if (pc == VM_EVEN_HALT) {
+            VM_POP0(lhs);
+            return lhs;
+        } else if (pc == VM_ODD_INIT_LIMIT) {
+            VM_PUSH0(x & 7);
+            VM_POP0(limit);
+            pc = VM_ODD_INIT_ACC;
+        } else if (pc == VM_ODD_INIT_ACC) {
+            VM_PUSH0(0);
+            VM_POP0(acc);
+            pc = VM_ODD_INIT_INDEX;
+        } else if (pc == VM_ODD_INIT_INDEX) {
+            VM_PUSH0(0);
+            VM_POP0(index);
+            pc = VM_ODD_SUB_JNZ;
+        } else if (pc == VM_ODD_SUB_JNZ) {
+            VM_PUSH0(limit);
+            VM_PUSH1(index);
+            VM_POP1(rhs);
+            VM_POP0(lhs);
+            VM_PUSH0(lhs - rhs);
+            VM_POP0(cond);
+            pc = (cond != 0) ? VM_ODD_BODY_ACC : VM_ODD_HALT;
+        } else if (pc == VM_ODD_BODY_ACC) {
+            VM_PUSH0(acc);
+            VM_PUSH1(index);
+            VM_POP1(rhs);
+            VM_POP0(lhs);
+            VM_PUSH0(lhs + rhs);
+            VM_POP0(acc);
+            pc = VM_ODD_BODY_INDEX;
+        } else if (pc == VM_ODD_BODY_INDEX) {
+            VM_PUSH0(index);
+            VM_PUSH1(1);
+            VM_POP1(rhs);
+            VM_POP0(lhs);
+            VM_PUSH0(lhs + rhs);
+            VM_POP0(index);
+            pc = VM_ODD_SUB_JNZ;
+        } else if (pc == VM_ODD_HALT) {
+            VM_PUSH0(acc);
+            VM_POP0(lhs);
+            return lhs;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("stack_vm_loop(5)=%d stack_vm_loop(8)=%d\n",
+           stack_vm_loop_target(5), stack_vm_loop_target(8));
+    return 0;
+}
+```
@@ -0,0 +1,72 @@
+# switch_3way - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 6/6 equivalent
+- **Source:** `testcases/rewrite_smoke/switch_3way.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/switch_3way.ll`
+- **Symbol:** `switch_3way_target`
+- **Native driver:** `rewrite-regression-work/eq/switch_3way_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `switch_3way_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=1 | 100 | 100 | 100 | yes | case 1 |
+| 2 | RCX=2 | 200 | 200 | 200 | yes | case 2 |
+| 3 | RCX=3 | 300 | 300 | 300 | yes | case 3 |
+| 4 | RCX=0 | 999 | 999 | 999 | yes | default (0) |
+| 5 | RCX=4 | 999 | 999 | 999 | yes | default (4) |
+| 6 | RCX=100 | 999 | 999 | 999 | yes | default (100) |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global switch_3way_target
+extern ExitProcess
+
+section .text
+; 3-way computed jump on symbolic ECX input.
+; if ecx == 1 → return 100
+; if ecx == 2 → return 200
+; if ecx == 3 → return 300
+; else (default) → return 999
+; Tests multi-target branch resolution (>2 targets).
+switch_3way_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 1
+    je .case1
+    cmp eax, 2
+    je .case2
+    cmp eax, 3
+    je .case3
+    ; default
+    mov eax, 999
+    jmp .done
+.case1:
+    mov eax, 100
+    jmp .done
+.case2:
+    mov eax, 200
+    jmp .done
+.case3:
+    mov eax, 300
+.done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 2
+    call switch_3way_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,75 @@
+# switch_sparse - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 7/7 equivalent
+- **Source:** `testcases/rewrite_smoke/switch_sparse.asm`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/switch_sparse.ll`
+- **Symbol:** `switch_sparse_target`
+- **Native driver:** `rewrite-regression-work/eq/switch_sparse_eq.exe`
+- **Lifted signature:** `define noundef i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `switch_sparse_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=10 | 11 | 11 | 11 | yes | case 10 |
+| 2 | RCX=50 | 55 | 55 | 55 | yes | case 50 |
+| 3 | RCX=200 | 222 | 222 | 222 | yes | case 200 |
+| 4 | RCX=1000 | 1337 | 1337 | 1337 | yes | case 1000 |
+| 5 | RCX=0 | 4294967295 | 4294967295 | 4294967295 | yes | default: 0xFFFFFFFF |
+| 6 | RCX=100 | 4294967295 | 4294967295 | 4294967295 | yes | default |
+| 7 | RCX=500 | 4294967295 | 4294967295 | 4294967295 | yes | default |
+
+## Source
+
+```nasm
+default rel
+bits 64
+
+global start
+global switch_sparse_target
+extern ExitProcess
+
+section .text
+; Sparse switch on symbolic ECX input.
+; Case values are NOT consecutive: 10, 50, 200, 1000.
+; Tests multi-target branch resolution with large gaps between cases.
+switch_sparse_target:
+    push rbp
+    mov rbp, rsp
+    mov eax, ecx
+    cmp eax, 10
+    je .case10
+    cmp eax, 50
+    je .case50
+    cmp eax, 200
+    je .case200
+    cmp eax, 1000
+    je .case1000
+    ; default
+    mov eax, -1
+    jmp .done
+.case10:
+    mov eax, 11
+    jmp .done
+.case50:
+    mov eax, 55
+    jmp .done
+.case200:
+    mov eax, 222
+    jmp .done
+.case1000:
+    mov eax, 1337
+.done:
+    pop rbp
+    ret
+
+start:
+    sub rsp, 40
+    mov ecx, 200
+    call switch_sparse_target
+    mov ecx, eax
+    call ExitProcess
+```
@@ -0,0 +1,112 @@
+# vm_2d_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_2d_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_2d_loop.ll`
+- **Symbol:** `vm_2d_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_2d_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_2d_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 1212 | 1212 | 1212 | yes | seed=0: diag=0+4+8=12, anti=2+4+6=12 |
+| 2 | RCX=1 | 1515 | 1515 | 1515 | yes | seed=1 |
+| 3 | RCX=5 | 2727 | 2727 | 2727 | yes | seed=5 |
+| 4 | RCX=7 | 3333 | 3333 | 3333 | yes | seed=7 |
+| 5 | RCX=10 | 4242 | 4242 | 4242 | yes | 0xA |
+| 6 | RCX=15 | 5757 | 5757 | 5757 | yes | 0xF |
+| 7 | RCX=16 | 1212 | 1212 | 1212 | yes | 0x10: seed=0 (mask) |
+| 8 | RCX=51966 | 5454 | 5454 | 5454 | yes | 0xCAFE: seed=14 |
+| 9 | RCX=74565 | 2727 | 2727 | 2727 | yes | 0x12345: seed=5 |
+| 10 | RCX=43981 | 5151 | 5151 | 5151 | yes | 0xABCD: seed=13 |
+
+## Source
+
+```c
+/* PC-state VM that fills a 3x3 stack grid via nested loops, then sums
+ * the main and anti diagonals.
+ * Lift target: vm_2d_loop_target.
+ * Goal: cover 2D-style indexing (grid[i][j] flattens to grid[i*3+j]) with
+ * nested PC-state loops, and a tail compute that pulls fixed-offset
+ * elements from the same array.
+ */
+#include <stdio.h>
+
+enum TdVmPc {
+    TD_LOAD       = 0,
+    TD_OUTER_INIT = 1,
+    TD_OUTER_CHECK = 2,
+    TD_INNER_INIT = 3,
+    TD_INNER_CHECK = 4,
+    TD_FILL_BODY  = 5,
+    TD_INNER_INC  = 6,
+    TD_OUTER_INC  = 7,
+    TD_DIAG       = 8,
+    TD_ANTI       = 9,
+    TD_PACK       = 10,
+    TD_HALT       = 11,
+};
+
+__declspec(noinline)
+int vm_2d_loop_target(int x) {
+    int grid[9];
+    int seed = 0;
+    int i    = 0;
+    int j    = 0;
+    int diag = 0;
+    int anti = 0;
+    int result = 0;
+    int pc   = TD_LOAD;
+
+    while (1) {
+        if (pc == TD_LOAD) {
+            seed = x & 0xF;
+            pc = TD_OUTER_INIT;
+        } else if (pc == TD_OUTER_INIT) {
+            i = 0;
+            pc = TD_OUTER_CHECK;
+        } else if (pc == TD_OUTER_CHECK) {
+            pc = (i < 3) ? TD_INNER_INIT : TD_DIAG;
+        } else if (pc == TD_INNER_INIT) {
+            j = 0;
+            pc = TD_INNER_CHECK;
+        } else if (pc == TD_INNER_CHECK) {
+            pc = (j < 3) ? TD_FILL_BODY : TD_OUTER_INC;
+        } else if (pc == TD_FILL_BODY) {
+            grid[i * 3 + j] = (i * 3 + j + seed) & 0x1F;
+            pc = TD_INNER_INC;
+        } else if (pc == TD_INNER_INC) {
+            j = j + 1;
+            pc = TD_INNER_CHECK;
+        } else if (pc == TD_OUTER_INC) {
+            i = i + 1;
+            pc = TD_OUTER_CHECK;
+        } else if (pc == TD_DIAG) {
+            diag = grid[0] + grid[4] + grid[8];
+            pc = TD_ANTI;
+        } else if (pc == TD_ANTI) {
+            anti = grid[2] + grid[4] + grid[6];
+            pc = TD_PACK;
+        } else if (pc == TD_PACK) {
+            result = diag * 100 + anti;
+            pc = TD_HALT;
+        } else if (pc == TD_HALT) {
+            return result;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_2d_loop(0xA)=%d vm_2d_loop(0xCAFE)=%d\n",
+           vm_2d_loop_target(0xA),
+           vm_2d_loop_target(0xCAFE));
+    return 0;
+}
+```
@@ -0,0 +1,103 @@
+# vm_4state64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_4state64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_4state64_loop.ll`
+- **Symbol:** `vm_4state64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_4state64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_4state64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 7581304714302077699 | 7581304714302077699 | 7581304714302077699 | yes | x=0, n=1 |
+| 2 | RCX=1 | 15162609428604155393 | 15162609428604155393 | 15162609428604155393 | yes | x=1, n=2 |
+| 3 | RCX=5 | 6135728040287875135 | 6135728040287875135 | 6135728040287875135 | yes | x=5, n=6 |
+| 4 | RCX=15 | 406151117814638971 | 406151117814638971 | 406151117814638971 | yes | x=15, n=16 max |
+| 5 | RCX=255 | 406151117813620251 | 406151117813620251 | 406151117813620251 | yes | x=0xFF, n=16 |
+| 6 | RCX=51966 | 12054802488707175559 | 12054802488707175559 | 12054802488707175559 | yes | 0xCAFE, n=15 |
+| 7 | RCX=3405691582 | 12054768846522478919 | 12054768846522478919 | 12054768846522478919 | yes | 0xCAFEBABE, n=15 |
+| 8 | RCX=1311768467463790320 | 7263774620141486851 | 7263774620141486851 | 7263774620141486851 | yes | 0x123...DEF0, n=1 |
+| 9 | RCX=18446744073709551615 | 18040592955894579739 | 18040592955894579739 | 18040592955894579739 | yes | max u64, n=16 |
+| 10 | RCX=11400714819323198485 | 18414027014724759455 | 18414027014724759455 | 18414027014724759455 | yes | K (golden), n=6 |
+
+## Source
+
+```c
+/* PC-state VM running a four-state phi chain on full uint64_t.
+ *   a = x;  b = ~x;  c = x ^ K1;  d = x ^ K2;
+ *   for i in 0..n: { t = a + b + c + d; a = b; b = c; c = d; d = t; }
+ *   return d;
+ * Variable trip n = (x & 0xF) + 1.
+ * Lift target: vm_4state64_loop_target.
+ *
+ * Distinct from vm_fibonacci64_loop (2 states) and vm_tribonacci64_loop
+ * (3 states): exercises a 4-state direct-shift phi chain on full i64.
+ * Each new t reads ALL four previous values; only single-direction
+ * shift (a<-b<-c<-d<-t) so no compound cross-update issue.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum F4VmPc {
+    F4_LOAD       = 0,
+    F4_INIT       = 1,
+    F4_LOOP_CHECK = 2,
+    F4_LOOP_BODY  = 3,
+    F4_LOOP_INC   = 4,
+    F4_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_4state64_loop_target(uint64_t x) {
+    int      idx = 0;
+    int      n   = 0;
+    uint64_t a   = 0;
+    uint64_t b   = 0;
+    uint64_t c   = 0;
+    uint64_t d   = 0;
+    uint64_t t   = 0;
+    int      pc  = F4_LOAD;
+
+    while (1) {
+        if (pc == F4_LOAD) {
+            n = (int)(x & 0xFull) + 1;
+            a = x;
+            b = ~x;
+            c = x ^ 0xCAFEBABEDEADBEEFull;
+            d = x ^ 0x9E3779B97F4A7C15ull;
+            pc = F4_INIT;
+        } else if (pc == F4_INIT) {
+            idx = 0;
+            pc = F4_LOOP_CHECK;
+        } else if (pc == F4_LOOP_CHECK) {
+            pc = (idx < n) ? F4_LOOP_BODY : F4_HALT;
+        } else if (pc == F4_LOOP_BODY) {
+            t = a + b + c + d;
+            a = b;
+            b = c;
+            c = d;
+            d = t;
+            pc = F4_LOOP_INC;
+        } else if (pc == F4_LOOP_INC) {
+            idx = idx + 1;
+            pc = F4_LOOP_CHECK;
+        } else if (pc == F4_HALT) {
+            return d;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_4state64(0xCAFE)=%llu vm_4state64(0xFF)=%llu\n",
+           (unsigned long long)vm_4state64_loop_target(0xCAFEull),
+           (unsigned long long)vm_4state64_loop_target(0xFFull));
+    return 0;
+}
+```
@@ -0,0 +1,106 @@
+# vm_4state_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_4state_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_4state_loop.ll`
+- **Symbol:** `vm_4state_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_4state_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_4state_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | n=0 |
+| 2 | RCX=256 | 7 | 7 | 7 | yes | n=1: state0 add 7 |
+| 3 | RCX=512 | 93 | 93 | 93 | yes | n=2: add+xor |
+| 4 | RCX=768 | 23 | 23 | 23 | yes | n=3: add+xor+mul |
+| 5 | RCX=1024 | 12 | 12 | 12 | yes | n=4: full cycle |
+| 6 | RCX=1280 | 19 | 19 | 19 | yes | n=5 |
+| 7 | RCX=2048 | 208 | 208 | 208 | yes | n=8: two full cycles |
+| 8 | RCX=3840 | 235 | 235 | 235 | yes | n=15 |
+| 9 | RCX=66 | 66 | 66 | 66 | yes | v=0x42, n=0 |
+| 10 | RCX=4660 | 97 | 97 | 97 | yes | 0x1234 |
+| 11 | RCX=43981 | 254 | 254 | 254 | yes | 0xABCD |
+
+## Source
+
+```c
+/* PC-state VM where the body cycles through 4 different operations per
+ * iteration based on a sub-state index (state mod 4).
+ * Lift target: vm_4state_loop_target.
+ * Goal: cover an inner state machine inside the loop body that picks one
+ * of four arithmetic ops (add, xor, mul, sub) by an internal phase counter.
+ * Distinct from vm_classify_loop (3-way, single-pass) because here the
+ * branch is a CYCLIC selector that varies per iteration.
+ */
+#include <stdio.h>
+
+enum S4VmPc {
+    S4_LOAD       = 0,
+    S4_INIT       = 1,
+    S4_CHECK      = 2,
+    S4_DISPATCH   = 3,
+    S4_OP_ADD     = 4,
+    S4_OP_XOR     = 5,
+    S4_OP_MUL     = 6,
+    S4_OP_SUB     = 7,
+    S4_AFTER      = 8,
+    S4_HALT       = 9,
+};
+
+__declspec(noinline)
+int vm_4state_loop_target(int x) {
+    int v     = 0;
+    int n     = 0;
+    int state = 0;
+    int pc    = S4_LOAD;
+
+    while (1) {
+        if (pc == S4_LOAD) {
+            v = x & 0xFF;
+            n = (x >> 8) & 0xF;
+            state = 0;
+            pc = S4_INIT;
+        } else if (pc == S4_INIT) {
+            pc = S4_CHECK;
+        } else if (pc == S4_CHECK) {
+            pc = (n > 0) ? S4_DISPATCH : S4_HALT;
+        } else if (pc == S4_DISPATCH) {
+            if (state == 0) pc = S4_OP_ADD;
+            else if (state == 1) pc = S4_OP_XOR;
+            else if (state == 2) pc = S4_OP_MUL;
+            else pc = S4_OP_SUB;
+        } else if (pc == S4_OP_ADD) {
+            v = (v + 7) & 0xFF;
+            pc = S4_AFTER;
+        } else if (pc == S4_OP_XOR) {
+            v = (v ^ 0x5A) & 0xFF;
+            pc = S4_AFTER;
+        } else if (pc == S4_OP_MUL) {
+            v = (v * 3) & 0xFF;
+            pc = S4_AFTER;
+        } else if (pc == S4_OP_SUB) {
+            v = (v - 11) & 0xFF;
+            pc = S4_AFTER;
+        } else if (pc == S4_AFTER) {
+            state = (state + 1) & 3;
+            n = n - 1;
+            pc = S4_CHECK;
+        } else if (pc == S4_HALT) {
+            return v;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_4state_loop(0xF00)=%d vm_4state_loop(0xABCD)=%d\n",
+           vm_4state_loop_target(0xF00), vm_4state_loop_target(0xABCD));
+    return 0;
+}
+```
@@ -0,0 +1,94 @@
+# vm_abs64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 9/9 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_abs64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_abs64_loop.ll`
+- **Symbol:** `vm_abs64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_abs64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_abs64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0, n=1 |
+| 2 | RCX=1 | 8 | 8 | 8 | yes | x=1, n=2 |
+| 3 | RCX=5 | 3466 | 3466 | 3466 | yes | x=5, n=6 |
+| 4 | RCX=9223372036854775807 | 9223372036854767611 | 9223372036854767611 | 9223372036854767611 | yes | INT64_MAX, n=8 |
+| 5 | RCX=18446744073709551615 | 4925 | 4925 | 4925 | yes | x=-1 (signed), n=8 |
+| 6 | RCX=3405691582 | 7448247489291 | 7448247489291 | 7448247489291 | yes | x=0xCAFEBABE, n=7 |
+| 7 | RCX=1311768467463790320 | 3935305402391370960 | 3935305402391370960 | 3935305402391370960 | yes | x=0x123...DEF0, n=1: single trip |
+| 8 | RCX=11400714819323198485 | 10086270117313468510 | 10086270117313468510 | 10086270117313468510 | yes | K (golden, signed-negative), n=6 |
+| 9 | RCX=57005 | 41556466 | 41556466 | 41556466 | yes | x=0xDEAD, n=6 |
+
+## Source
+
+```c
+/* PC-state VM running an i64 abs-then-affine recurrence.
+ *   val = (int64_t)x;
+ *   for i in 0..n: { if (val < 0) val = -val; val = val * 3 - i; }
+ *   return val;
+ * Variable trip n = (x & 7) + 1.  Returns full uint64_t bit pattern.
+ * Lift target: vm_abs64_loop_target.
+ *
+ * Distinct from vm_imported_abs_loop (i32 _abs_l intrinsic): exercises
+ * i64 conditional-negate (likely lowered to llvm.abs.i64 by the
+ * optimizer) followed by mul-by-3 and subtraction in a variable-trip
+ * loop.  INT64_MIN excluded from inputs because -INT64_MIN is C UB.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum AbVmPc {
+    AB_LOAD       = 0,
+    AB_INIT       = 1,
+    AB_LOOP_CHECK = 2,
+    AB_LOOP_BODY  = 3,
+    AB_LOOP_INC   = 4,
+    AB_HALT       = 5,
+};
+
+__declspec(noinline)
+int64_t vm_abs64_loop_target(int64_t x) {
+    int     idx = 0;
+    int     n   = 0;
+    int64_t val = 0;
+    int     pc  = AB_LOAD;
+
+    while (1) {
+        if (pc == AB_LOAD) {
+            n   = (int)((uint64_t)x & 7ull) + 1;
+            val = x;
+            pc = AB_INIT;
+        } else if (pc == AB_INIT) {
+            idx = 0;
+            pc = AB_LOOP_CHECK;
+        } else if (pc == AB_LOOP_CHECK) {
+            pc = (idx < n) ? AB_LOOP_BODY : AB_HALT;
+        } else if (pc == AB_LOOP_BODY) {
+            if (val < 0) {
+                val = -val;
+            }
+            val = val * 3 - idx;
+            pc = AB_LOOP_INC;
+        } else if (pc == AB_LOOP_INC) {
+            idx = idx + 1;
+            pc = AB_LOOP_CHECK;
+        } else if (pc == AB_HALT) {
+            return val;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_abs64(-1)=%lld vm_abs64(0xCAFEBABE)=%lld\n",
+           (long long)vm_abs64_loop_target((int64_t)-1),
+           (long long)vm_abs64_loop_target((int64_t)0xCAFEBABEll));
+    return 0;
+}
+```
@@ -0,0 +1,117 @@
+# vm_abs_array_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_abs_array_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_abs_array_loop.ll`
+- **Symbol:** `vm_abs_array_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_abs_array_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_abs_array_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | limit=1, threshold=0 |
+| 2 | RCX=1 | 1000 | 1000 | 1000 | yes | limit=2, threshold=0: \|0\|+\|1000\| |
+| 3 | RCX=7 | 28000 | 28000 | 28000 | yes | limit=8, threshold=0 |
+| 4 | RCX=16 | 2 | 2 | 2 | yes | 0x10: limit=1, threshold=2 |
+| 5 | RCX=128 | 16 | 16 | 16 | yes | 0x80: limit=1, threshold=16 |
+| 6 | RCX=255 | 27814 | 27814 | 27814 | yes | 0xFF: limit=8, threshold=31 |
+| 7 | RCX=256 | 32 | 32 | 32 | yes | 0x100: limit=1, threshold=32 |
+| 8 | RCX=4096 | 512 | 512 | 512 | yes | 0x1000: limit=1, threshold=512 |
+| 9 | RCX=43981 | 17982 | 17982 | 17982 | yes | 0xABCD: limit=6 |
+| 10 | RCX=65535 | 37528 | 37528 | 37528 | yes | 0xFFFF: limit=8 |
+| 11 | RCX=32767 | 16190 | 16190 | 16190 | yes | 0x7FFF: limit=8 |
+
+## Source
+
+```c
+/* PC-state VM that fills a stack array with abs() values, then sums them.
+ * Lift target: vm_abs_array_loop_target.
+ * Goal: cover a two-phase VM where (1) the fill loop body issues an
+ * imported call (abs) and stores the result into a stack-array slot, and
+ * (2) the sum loop accumulates from the same stack array.  Distinct from
+ * vm_imported_abs_loop (single accumulator only) and vm_prefix_sum_loop
+ * (no imported call).  Tests how the lifter pairs a CRT intrinsic call
+ * with a same-iteration indexed stack store.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+
+enum AaVmPc {
+    AA_LOAD       = 0,
+    AA_INIT_FILL  = 1,
+    AA_FILL_CHECK = 2,
+    AA_FILL_DELTA = 3,
+    AA_FILL_CALL  = 4,
+    AA_FILL_STORE = 5,
+    AA_FILL_INC   = 6,
+    AA_INIT_SUM   = 7,
+    AA_SUM_CHECK  = 8,
+    AA_SUM_BODY   = 9,
+    AA_SUM_INC    = 10,
+    AA_HALT       = 11,
+};
+
+__declspec(noinline)
+int vm_abs_array_loop_target(int x) {
+    int buf[8];
+    int limit     = 0;
+    int idx       = 0;
+    int threshold = 0;
+    int delta     = 0;
+    int abs_r     = 0;
+    int sum       = 0;
+    int pc        = AA_LOAD;
+
+    while (1) {
+        if (pc == AA_LOAD) {
+            limit = (x & 7) + 1;
+            threshold = (x >> 3) & 0xFFFF;
+            sum = 0;
+            pc = AA_INIT_FILL;
+        } else if (pc == AA_INIT_FILL) {
+            idx = 0;
+            pc = AA_FILL_CHECK;
+        } else if (pc == AA_FILL_CHECK) {
+            pc = (idx < limit) ? AA_FILL_DELTA : AA_INIT_SUM;
+        } else if (pc == AA_FILL_DELTA) {
+            delta = (idx * 1000) - threshold;
+            pc = AA_FILL_CALL;
+        } else if (pc == AA_FILL_CALL) {
+            abs_r = abs(delta);
+            pc = AA_FILL_STORE;
+        } else if (pc == AA_FILL_STORE) {
+            buf[idx] = abs_r;
+            pc = AA_FILL_INC;
+        } else if (pc == AA_FILL_INC) {
+            idx = idx + 1;
+            pc = AA_FILL_CHECK;
+        } else if (pc == AA_INIT_SUM) {
+            idx = 0;
+            pc = AA_SUM_CHECK;
+        } else if (pc == AA_SUM_CHECK) {
+            pc = (idx < limit) ? AA_SUM_BODY : AA_HALT;
+        } else if (pc == AA_SUM_BODY) {
+            sum = sum + buf[idx];
+            pc = AA_SUM_INC;
+        } else if (pc == AA_SUM_INC) {
+            idx = idx + 1;
+            pc = AA_SUM_CHECK;
+        } else if (pc == AA_HALT) {
+            return sum;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_abs_array_loop(0xABCD)=%d vm_abs_array_loop(0xFFFF)=%d\n",
+           vm_abs_array_loop_target(0xABCD), vm_abs_array_loop_target(0xFFFF));
+    return 0;
+}
+```
@@ -0,0 +1,112 @@
+# vm_adler32_64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_adler32_64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_adler32_64_loop.ll`
+- **Symbol:** `vm_adler32_64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_adler32_64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_adler32_64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 65537 | 65537 | 65537 | yes | x=0 n=1: a=1 b=1 -> (1<<16)\|1 |
+| 2 | RCX=1 | 262146 | 262146 | 262146 | yes | x=1 n=2: bytes [1,0] |
+| 3 | RCX=2 | 589827 | 589827 | 589827 | yes | x=2 n=3 |
+| 4 | RCX=7 | 4194312 | 4194312 | 4194312 | yes | x=7 n=8: max trip |
+| 5 | RCX=8 | 589833 | 589833 | 589833 | yes | x=8 n=1: byte 0x08 alone |
+| 6 | RCX=3405691582 | 296944449 | 296944449 | 296944449 | yes | 0xCAFEBABE: n=7 |
+| 7 | RCX=3735928559 | 353764153 | 353764153 | 353764153 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 602146809 | 602146809 | 602146809 | yes | all 0xFF: 8 max bytes |
+| 9 | RCX=1311768467463790320 | 15794417 | 15794417 | 15794417 | yes | 0x12345...EF0: n=1 byte 0xF0 |
+| 10 | RCX=72623859790382856 | 589833 | 589833 | 589833 | yes | 0x0102...0708: n=1 byte 0x08 (matches x=8) |
+
+## Source
+
+```c
+/* PC-state VM that runs an Adler-32-style two-accumulator modular hash
+ * over n = (x & 7) + 1 bytes consumed from the input register:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; a = 1; b = 0;
+ *   for (i = 0; i < n; i++) {
+ *     a = (a + (s & 0xFF)) % 65521;     // ADLER prime
+ *     b = (b + a)         % 65521;
+ *     s >>= 8;
+ *   }
+ *   return (b << 16) | a;
+ *
+ * Lift target: vm_adler32_64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_fnv1a64_loop  (single state, multiplicative)
+ *   - vm_djb264_loop   (single additive multiplier)
+ *   - vm_byterange64_loop (two reductions but no modular arithmetic)
+ *
+ * Two PARALLEL additive accumulators where b feeds on the running a.
+ * Each modular step exercises i64 urem by 65521 (a non-power-of-2
+ * prime) which the lifter must lower via magic-number division.
+ * The result packs both accumulators into one i64 via shl-or.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum AdVmPc {
+    AD_INIT_ALL = 0,
+    AD_CHECK    = 1,
+    AD_STEP_A   = 2,
+    AD_STEP_B   = 3,
+    AD_SHIFT    = 4,
+    AD_INC      = 5,
+    AD_HALT     = 6,
+};
+
+__declspec(noinline)
+uint64_t vm_adler32_64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t a  = 0;
+    uint64_t b  = 0;
+    uint64_t i  = 0;
+    int      pc = AD_INIT_ALL;
+
+    while (1) {
+        if (pc == AD_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            a = 1ull;
+            b = 0ull;
+            i = 0ull;
+            pc = AD_CHECK;
+        } else if (pc == AD_CHECK) {
+            pc = (i < n) ? AD_STEP_A : AD_HALT;
+        } else if (pc == AD_STEP_A) {
+            a = (a + (s & 0xFFull)) % 65521ull;
+            pc = AD_STEP_B;
+        } else if (pc == AD_STEP_B) {
+            b = (b + a) % 65521ull;
+            pc = AD_SHIFT;
+        } else if (pc == AD_SHIFT) {
+            s = s >> 8;
+            pc = AD_INC;
+        } else if (pc == AD_INC) {
+            i = i + 1ull;
+            pc = AD_CHECK;
+        } else if (pc == AD_HALT) {
+            return (b << 16) | a;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_adler32_64(0xCAFEBABE)=%llu\n",
+           (unsigned long long)vm_adler32_64_loop_target(0xCAFEBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,110 @@
+# vm_altbytesum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_altbytesum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_altbytesum64_loop.ll`
+- **Symbol:** `vm_altbytesum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_altbytesum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_altbytesum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero bytes |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1: +1 |
+| 3 | RCX=255 | 255 | 255 | 255 | yes | x=0xFF: +255 |
+| 4 | RCX=72623859790382856 | 4 | 4 | 4 | yes | 0x0102030405060708: 8-(7-(6-(5-(4-(3-(2-1))))))=4 |
+| 5 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | all 0xFF: 8 bytes alternating cancel to 0 |
+| 6 | RCX=128 | 128 | 128 | 128 | yes | x=0x80: +128 (positive byte) |
+| 7 | RCX=9259542123273814144 | 128 | 128 | 128 | yes | 0x8080808080808080: +/- 128 cancels to +128 |
+| 8 | RCX=3405691582 | 56 | 56 | 56 | yes | 0xCAFEBABE: 4-byte alternating sum |
+| 9 | RCX=1311768467463790320 | 240 | 240 | 240 | yes | 0x123456789ABCDEF0 |
+| 10 | RCX=16045690985374415566 | 18446744073709551555 | 18446744073709551555 | 18446744073709551555 | yes | 0xDEADBEEFFEEDFACE: result negative -> u64=2^64-61 |
+
+## Source
+
+```c
+/* PC-state VM that computes an alternating-sign byte sum:
+ *   r = +b0 - b1 + b2 - b3 + ... over n = (x & 15) + 1 bytes
+ * with r kept as a signed i64 accumulator and returned as u64.
+ *
+ *   n = (x & 15) + 1;
+ *   s = x; r = 0; sign = 1;
+ *   while (n) {
+ *     r += sign * (s & 0xFF);
+ *     s >>= 8;
+ *     sign = -sign;
+ *     n--;
+ *   }
+ *   return (uint64_t)r;
+ *
+ * Lift target: vm_altbytesum64_loop_target.
+ *
+ * Distinct from vm_xorbytes64 (XOR of bytes) and vm_byteparity64 (1-bit
+ * parity).  Tests: signed accumulator, sign flip per iteration via
+ * negation, and signed-times-unsigned multiply.  Produces negative
+ * (i64) values for inputs where the odd-indexed bytes dominate.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum AbVmPc {
+    AB_LOAD_N    = 0,
+    AB_INIT_REGS = 1,
+    AB_CHECK     = 2,
+    AB_ACC       = 3,
+    AB_SHIFT     = 4,
+    AB_FLIP      = 5,
+    AB_DEC       = 6,
+    AB_HALT      = 7,
+};
+
+__declspec(noinline)
+uint64_t vm_altbytesum64_loop_target(uint64_t x) {
+    uint64_t n    = 0;
+    uint64_t s    = 0;
+    int64_t  r    = 0;
+    int64_t  sign = 1;
+    int      pc   = AB_LOAD_N;
+
+    while (1) {
+        if (pc == AB_LOAD_N) {
+            n = (x & 15ull) + 1ull;
+            pc = AB_INIT_REGS;
+        } else if (pc == AB_INIT_REGS) {
+            s    = x;
+            r    = 0;
+            sign = 1;
+            pc = AB_CHECK;
+        } else if (pc == AB_CHECK) {
+            pc = (n > 0ull) ? AB_ACC : AB_HALT;
+        } else if (pc == AB_ACC) {
+            r = r + sign * (int64_t)(s & 0xFFull);
+            pc = AB_SHIFT;
+        } else if (pc == AB_SHIFT) {
+            s = s >> 8;
+            pc = AB_FLIP;
+        } else if (pc == AB_FLIP) {
+            sign = -sign;
+            pc = AB_DEC;
+        } else if (pc == AB_DEC) {
+            n = n - 1ull;
+            pc = AB_CHECK;
+        } else if (pc == AB_HALT) {
+            return (uint64_t)r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_altbytesum64(0x0102030405060708)=%llu\n",
+           (unsigned long long)vm_altbytesum64_loop_target(0x0102030405060708ull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_andsum_byte_idx64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_andsum_byte_idx64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_andsum_byte_idx64_loop.ll`
+- **Symbol:** `vm_andsum_byte_idx64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_andsum_byte_idx64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_andsum_byte_idx64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: (1 & 1) + (0 & 2)=1 |
+| 3 | RCX=2 | 0 | 0 | 0 | yes | x=2 n=3: (2 & 1)=0 + (0 & 2)=0 + (0 & 3)=0 |
+| 4 | RCX=7 | 1 | 1 | 1 | yes | x=7 n=8: only byte0=7 -> 7 & 1 = 1 |
+| 5 | RCX=8 | 0 | 0 | 0 | yes | x=8 n=1: 8 & 1=0 |
+| 6 | RCX=3405691582 | 4 | 4 | 4 | yes | 0xCAFEBABE: n=7 sum of byte&counter |
+| 7 | RCX=3735928559 | 8 | 8 | 8 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 36 | 36 | 36 | yes | all 0xFF n=8: sum 1..8=36 (counter low bits all kept) |
+| 9 | RCX=72623859790382856 | 0 | 0 | 0 | yes | 0x0102...0708: n=1 byte0=8 & 1=0 |
+| 10 | RCX=1311768467463790320 | 0 | 0 | 0 | yes | 0x12345...EF0: n=1 byte0=0xF0 & 1=0 |
+
+## Source
+
+```c
+/* PC-state VM that ANDs each byte with the loop index and sums:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     r = r + ((s & 0xFF) & (i + 1));   // byte AND counter, ADD-folded
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_andsum_byte_idx64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_uintadd_byte_idx64_loop  (byte * counter, ADD)
+ *   - vm_xormul_byte_idx64_loop   (byte * counter, XOR)
+ *   - vm_notand_chain64_loop      (NOT-AND of state, no counter)
+ *
+ * Tests `and i64 byte, counter` (AND of zext-byte with phi-tracked
+ * counter (i+1)) folded via ADD.  Counter values 1..8 are <128 so
+ * the AND keeps only low bits of each byte.  All-0xFF input
+ * accumulates 1+2+3+...+8 = 36.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum AsVmPc {
+    AS_INIT_ALL = 0,
+    AS_CHECK    = 1,
+    AS_BODY     = 2,
+    AS_INC      = 3,
+    AS_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_andsum_byte_idx64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = AS_INIT_ALL;
+
+    while (1) {
+        if (pc == AS_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = AS_CHECK;
+        } else if (pc == AS_CHECK) {
+            pc = (i < n) ? AS_BODY : AS_HALT;
+        } else if (pc == AS_BODY) {
+            r = r + ((s & 0xFFull) & (i + 1ull));
+            s = s >> 8;
+            pc = AS_INC;
+        } else if (pc == AS_INC) {
+            i = i + 1ull;
+            pc = AS_CHECK;
+        } else if (pc == AS_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_andsum_byte_idx64(0xFFFFFFFFFFFFFFFF)=%llu\n",
+           (unsigned long long)vm_andsum_byte_idx64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,115 @@
+# vm_argmax_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_argmax_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_argmax_loop.ll`
+- **Symbol:** `vm_argmax_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_argmax_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_argmax_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | limit=1 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | limit=2 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | limit=3 |
+| 4 | RCX=7 | 4 | 4 | 4 | yes | limit=8: max at i=4 |
+| 5 | RCX=55 | 4 | 4 | 4 | yes | 0x37: limit=8 |
+| 6 | RCX=170 | 2 | 2 | 2 | yes | 0xAA: limit=3, max at i=2 |
+| 7 | RCX=196 | 1 | 1 | 1 | yes | 0xC4: limit=5, max at i=1 |
+| 8 | RCX=255 | 0 | 0 | 0 | yes | 0xFF: limit=8, max at i=0 |
+| 9 | RCX=256 | 0 | 0 | 0 | yes | limit=1 (mask drops bit 8) |
+| 10 | RCX=4660 | 4 | 4 | 4 | yes | 0x1234: limit=5 |
+| 11 | RCX=65244 | 1 | 1 | 1 | yes | 0xFEDC: limit=5, max at i=1 |
+
+## Source
+
+```c
+/* PC-state VM that finds the INDEX of the max element in a symbolic-content
+ * stack array.
+ * Lift target: vm_argmax_loop_target.
+ * Goal: cover a comparison-driven loop that tracks TWO co-related state vars
+ * (current best value AND its index) where both update together when the
+ * predicate is true.  Distinct from vm_minarray_loop (only tracks value, not
+ * index).  Initial values come from data[0]/idx=0 written on the entry path
+ * to keep the lifter's pseudo-stack promotion happy.
+ */
+#include <stdio.h>
+
+enum AmVmPc {
+    AM_LOAD       = 0,
+    AM_INIT_FILL  = 1,
+    AM_FILL_CHECK = 2,
+    AM_FILL_BODY  = 3,
+    AM_FILL_INC   = 4,
+    AM_INIT_BEST  = 5,
+    AM_SCAN_CHECK = 6,
+    AM_SCAN_LOAD  = 7,
+    AM_SCAN_TEST  = 8,
+    AM_SCAN_UPD   = 9,
+    AM_SCAN_INC   = 10,
+    AM_HALT       = 11,
+};
+
+__declspec(noinline)
+int vm_argmax_loop_target(int x) {
+    int data[8];
+    int limit  = 0;
+    int idx    = 0;
+    int best   = 0;
+    int best_i = 0;
+    int elt    = 0;
+    int pc     = AM_LOAD;
+
+    while (1) {
+        if (pc == AM_LOAD) {
+            limit = (x & 7) + 1;
+            pc = AM_INIT_FILL;
+        } else if (pc == AM_INIT_FILL) {
+            idx = 0;
+            pc = AM_FILL_CHECK;
+        } else if (pc == AM_FILL_CHECK) {
+            pc = (idx < limit) ? AM_FILL_BODY : AM_INIT_BEST;
+        } else if (pc == AM_FILL_BODY) {
+            data[idx] = (x ^ (idx * 0x35)) & 0xFF;
+            pc = AM_FILL_INC;
+        } else if (pc == AM_FILL_INC) {
+            idx = idx + 1;
+            pc = AM_FILL_CHECK;
+        } else if (pc == AM_INIT_BEST) {
+            best = data[0];
+            best_i = 0;
+            idx = 1;
+            pc = AM_SCAN_CHECK;
+        } else if (pc == AM_SCAN_CHECK) {
+            pc = (idx < limit) ? AM_SCAN_LOAD : AM_HALT;
+        } else if (pc == AM_SCAN_LOAD) {
+            elt = data[idx];
+            pc = AM_SCAN_TEST;
+        } else if (pc == AM_SCAN_TEST) {
+            pc = (elt > best) ? AM_SCAN_UPD : AM_SCAN_INC;
+        } else if (pc == AM_SCAN_UPD) {
+            best = elt;
+            best_i = idx;
+            pc = AM_SCAN_INC;
+        } else if (pc == AM_SCAN_INC) {
+            idx = idx + 1;
+            pc = AM_SCAN_CHECK;
+        } else if (pc == AM_HALT) {
+            return best_i;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_argmax_loop(0x37)=%d vm_argmax_loop(0xFEDC)=%d\n",
+           vm_argmax_loop_target(0x37), vm_argmax_loop_target(0xFEDC));
+    return 0;
+}
+```
@@ -0,0 +1,85 @@
+# vm_base7sum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_base7sum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_base7sum64_loop.ll`
+- **Symbol:** `vm_base7sum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_base7sum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_base7sum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0: skip loop |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1: 1 in base 7 |
+| 3 | RCX=7 | 1 | 1 | 1 | yes | x=7: 10 in base 7, sum=1 |
+| 4 | RCX=48 | 12 | 12 | 12 | yes | x=48: 66 in base 7, sum=12 |
+| 5 | RCX=49 | 1 | 1 | 1 | yes | x=49: 100 in base 7 |
+| 6 | RCX=255 | 9 | 9 | 9 | yes | x=0xFF: 513 in base 7 |
+| 7 | RCX=51966 | 18 | 18 | 18 | yes | x=0xCAFE |
+| 8 | RCX=3405691582 | 40 | 40 | 40 | yes | x=0xCAFEBABE |
+| 9 | RCX=18446744073709551615 | 57 | 57 | 57 | yes | max u64 |
+| 10 | RCX=11400714819323198485 | 61 | 61 | 61 | yes | K (golden) |
+
+## Source
+
+```c
+/* PC-state VM that computes the base-7 digit sum of x via repeated
+ * urem-then-udiv.
+ *   total = 0;
+ *   while (s) { total += s % 7; s /= 7; }
+ *   return total;
+ * Variable trip ~= log_7(x).
+ * Lift target: vm_base7sum64_loop_target.
+ *
+ * Distinct from vm_decdigits64_loop (counts digits, divisor 10) and
+ * vm_divcount64_loop (input-derived divisor): exercises BOTH urem and
+ * udiv by a small constant 7 inside the same loop body, accumulating
+ * the running digit sum.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum B7VmPc {
+    B7_LOAD       = 0,
+    B7_LOOP_CHECK = 1,
+    B7_LOOP_BODY  = 2,
+    B7_HALT       = 3,
+};
+
+__declspec(noinline)
+int vm_base7sum64_loop_target(uint64_t x) {
+    uint64_t s     = 0;
+    int      total = 0;
+    int      pc    = B7_LOAD;
+
+    while (1) {
+        if (pc == B7_LOAD) {
+            s     = x;
+            total = 0;
+            pc = B7_LOOP_CHECK;
+        } else if (pc == B7_LOOP_CHECK) {
+            pc = (s != 0ull) ? B7_LOOP_BODY : B7_HALT;
+        } else if (pc == B7_LOOP_BODY) {
+            total = total + (int)(s % 7ull);
+            s = s / 7ull;
+            pc = B7_LOOP_CHECK;
+        } else if (pc == B7_HALT) {
+            return total;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_base7sum64(0xCAFEBABE)=%d vm_base7sum64(max)=%d\n",
+           vm_base7sum64_loop_target(0xCAFEBABEull),
+           vm_base7sum64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,98 @@
+# vm_bitfetch_window64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bitfetch_window64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bitfetch_window64_loop.ll`
+- **Symbol:** `vm_bitfetch_window64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bitfetch_window64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bitfetch_window64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 2 | 2 | 2 | yes | x=1 n=2: bits [1,0] reversed -> 0b10=2 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | x=2 n=3: bits [0,1,0] reversed -> 0b010=2 |
+| 4 | RCX=7 | 224 | 224 | 224 | yes | x=7 n=8: bits 0..7 are [1,1,1,0,0,0,0,0] -> 0b11100000=224 |
+| 5 | RCX=8 | 0 | 0 | 0 | yes | x=8 n=1: bit0=0 |
+| 6 | RCX=3405691582 | 62 | 62 | 62 | yes | 0xCAFEBABE: n=7 low 7 bits reversed |
+| 7 | RCX=3735928559 | 247 | 247 | 247 | yes | 0xDEADBEEF: n=8 low byte reversed |
+| 8 | RCX=18446744073709551615 | 255 | 255 | 255 | yes | all 0xFF: n=8 low byte all 1s -> 255 |
+| 9 | RCX=72623859790382856 | 0 | 0 | 0 | yes | 0x0102...0708: n=1 bit0=0 |
+| 10 | RCX=1311768467463790320 | 0 | 0 | 0 | yes | 0x12345...EF0: n=1 bit0=0 |
+
+## Source
+
+```c
+/* PC-state VM that reverses the lower n = (x & 7) + 1 bits of x by
+ * shifting them in one at a time, fetching bit i with a DYNAMIC shift:
+ *
+ *   n = (x & 7) + 1;
+ *   r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     r = (r << 1) | ((x >> i) & 1);   // dynamic shift amount = i
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_bitfetch_window64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_byterev_window64_loop  (8-bit window, fixed shift-by-8)
+ *   - vm_nibrev_window64_loop   (4-bit window, fixed shift-by-4)
+ *   - vm_bitreverse64_loop      (full 64-bit reverse, may fold)
+ *
+ * Tests `lshr i64 x, i` with i a loop-index variable - dynamic shift
+ * amount inside dispatcher loop body.  Result is a bitwise reversal
+ * of the low n bits of x.  Single-bit window with variable shift makes
+ * the lifter handle non-constant shift counts iteration-by-iteration.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BfVmPc {
+    BF_INIT_ALL = 0,
+    BF_CHECK    = 1,
+    BF_BODY     = 2,
+    BF_INC      = 3,
+    BF_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_bitfetch_window64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BF_INIT_ALL;
+
+    while (1) {
+        if (pc == BF_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            r = 0ull;
+            i = 0ull;
+            pc = BF_CHECK;
+        } else if (pc == BF_CHECK) {
+            pc = (i < n) ? BF_BODY : BF_HALT;
+        } else if (pc == BF_BODY) {
+            r = (r << 1) | ((x >> i) & 1ull);
+            pc = BF_INC;
+        } else if (pc == BF_INC) {
+            i = i + 1ull;
+            pc = BF_CHECK;
+        } else if (pc == BF_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bitfetch_window64(0xFF)=%llu\n",
+           (unsigned long long)vm_bitfetch_window64_loop_target(0xFFull));
+    return 0;
+}
+```
@@ -0,0 +1,94 @@
+# vm_bitreverse64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bitreverse64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bitreverse64_loop.ll`
+- **Symbol:** `vm_bitreverse64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bitreverse64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bitreverse64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0: zero stays zero |
+| 2 | RCX=1 | 9223372036854775808 | 9223372036854775808 | 9223372036854775808 | yes | x=1 -> MSB |
+| 3 | RCX=255 | 18374686479671623680 | 18374686479671623680 | 18374686479671623680 | yes | x=0xFF -> top byte |
+| 4 | RCX=9223372036854775808 | 1 | 1 | 1 | yes | x=2^63 -> 1 (MSB to LSB) |
+| 5 | RCX=51966 | 9174676865883832320 | 9174676865883832320 | 9174676865883832320 | yes | x=0xCAFE |
+| 6 | RCX=3405691582 | 9033516422034096128 | 9033516422034096128 | 9033516422034096128 | yes | x=0xCAFEBABE |
+| 7 | RCX=1311768467463790320 | 1115552785675988040 | 1115552785675988040 | 1115552785675988040 | yes | 0x123...DEF0 |
+| 8 | RCX=18446744073709551615 | 18446744073709551615 | 18446744073709551615 | 18446744073709551615 | yes | max u64: bitreverse fixed point |
+| 9 | RCX=11400714819323198485 | 12123218500447562873 | 12123218500447562873 | 12123218500447562873 | yes | x=K (golden ratio) |
+| 10 | RCX=12297829382473034410 | 6148914691236517205 | 6148914691236517205 | 6148914691236517205 | yes | 0xAAAA... -> 0x5555... |
+
+## Source
+
+```c
+/* PC-state VM running an i64 bit-reverse via a 64-trip shift+or loop.
+ *   result = 0;
+ *   for i in 0..64:
+ *     result = (result << 1) | (state & 1);
+ *     state  = state >> 1;
+ *   return result;
+ * Lift target: vm_bitreverse64_loop_target.
+ *
+ * Distinct from vm_bitreverse_loop (i32 version, lifter recognizes
+ * llvm.bitreverse.i8): exercises a 64-trip explicit fan-in shift+or +
+ * shift-right body on full i64 state.  May or may not be recognized as
+ * llvm.bitreverse.i64 by the optimizer.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BrVmPc {
+    BR_LOAD       = 0,
+    BR_INIT       = 1,
+    BR_LOOP_CHECK = 2,
+    BR_LOOP_BODY  = 3,
+    BR_LOOP_INC   = 4,
+    BR_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_bitreverse64_loop_target(uint64_t x) {
+    int      idx    = 0;
+    uint64_t state  = 0;
+    uint64_t result = 0;
+    int      pc     = BR_LOAD;
+
+    while (1) {
+        if (pc == BR_LOAD) {
+            state  = x;
+            result = 0ull;
+            pc = BR_INIT;
+        } else if (pc == BR_INIT) {
+            idx = 0;
+            pc = BR_LOOP_CHECK;
+        } else if (pc == BR_LOOP_CHECK) {
+            pc = (idx < 64) ? BR_LOOP_BODY : BR_HALT;
+        } else if (pc == BR_LOOP_BODY) {
+            result = (result << 1) | (state & 1ull);
+            state  = state >> 1;
+            pc = BR_LOOP_INC;
+        } else if (pc == BR_LOOP_INC) {
+            idx = idx + 1;
+            pc = BR_LOOP_CHECK;
+        } else if (pc == BR_HALT) {
+            return result;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bitreverse64(1)=0x%llx vm_bitreverse64(0xCAFE)=0x%llx\n",
+           (unsigned long long)vm_bitreverse64_loop_target(1ull),
+           (unsigned long long)vm_bitreverse64_loop_target(0xCAFEull));
+    return 0;
+}
+```
@@ -0,0 +1,103 @@
+# vm_bitreverse_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bitreverse_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bitreverse_loop.ll`
+- **Symbol:** `vm_bitreverse_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bitreverse_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bitreverse_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | reverse(0x00) |
+| 2 | RCX=1 | 128 | 128 | 128 | yes | reverse(0x01) = 0x80 |
+| 3 | RCX=128 | 1 | 1 | 1 | yes | reverse(0x80) = 0x01 |
+| 4 | RCX=170 | 85 | 85 | 85 | yes | reverse(0xAA) = 0x55 |
+| 5 | RCX=85 | 170 | 170 | 170 | yes | reverse(0x55) = 0xAA |
+| 6 | RCX=255 | 255 | 255 | 255 | yes | reverse(0xFF) = 0xFF |
+| 7 | RCX=18 | 72 | 72 | 72 | yes | reverse(0x12) = 0x48 |
+| 8 | RCX=51 | 204 | 204 | 204 | yes | reverse(0x33) = 0xCC |
+| 9 | RCX=64 | 2 | 2 | 2 | yes | reverse(0x40) = 0x02 |
+| 10 | RCX=256 | 0 | 0 | 0 | yes | mask drops bit 8 |
+
+## Source
+
+```c
+/* PC-state VM that reverses the low 8 bits of x via shift+OR accumulation.
+ * Lift target: vm_bitreverse_loop_target.
+ * Goal: cover a fixed-trip-count loop whose body uses both shifts and a
+ * bitwise OR to accumulate a result, exercising loop body shapes the
+ * additive/multiplicative samples don't reach.
+ */
+#include <stdio.h>
+
+enum BrVmPc {
+    BRV_INIT       = 0,
+    BRV_LOAD_VAL   = 1,
+    BRV_INIT_RES   = 2,
+    BRV_INIT_IDX   = 3,
+    BRV_CHECK      = 4,
+    BRV_BODY_SHL   = 5,
+    BRV_BODY_BIT   = 6,
+    BRV_BODY_OR    = 7,
+    BRV_BODY_SHR   = 8,
+    BRV_BODY_INC   = 9,
+    BRV_HALT       = 10,
+};
+
+__declspec(noinline)
+int vm_bitreverse_loop_target(int x) {
+    int v   = 0;
+    int res = 0;
+    int idx = 0;
+    int bit = 0;
+    int pc  = BRV_INIT;
+
+    while (1) {
+        if (pc == BRV_INIT) {
+            pc = BRV_LOAD_VAL;
+        } else if (pc == BRV_LOAD_VAL) {
+            v = x & 0xFF;
+            pc = BRV_INIT_RES;
+        } else if (pc == BRV_INIT_RES) {
+            res = 0;
+            pc = BRV_INIT_IDX;
+        } else if (pc == BRV_INIT_IDX) {
+            idx = 0;
+            pc = BRV_CHECK;
+        } else if (pc == BRV_CHECK) {
+            pc = (idx < 8) ? BRV_BODY_SHL : BRV_HALT;
+        } else if (pc == BRV_BODY_SHL) {
+            res = res << 1;
+            pc = BRV_BODY_BIT;
+        } else if (pc == BRV_BODY_BIT) {
+            bit = v & 1;
+            pc = BRV_BODY_OR;
+        } else if (pc == BRV_BODY_OR) {
+            res = res | bit;
+            pc = BRV_BODY_SHR;
+        } else if (pc == BRV_BODY_SHR) {
+            v = (int)((unsigned)v >> 1);
+            pc = BRV_BODY_INC;
+        } else if (pc == BRV_BODY_INC) {
+            idx = idx + 1;
+            pc = BRV_CHECK;
+        } else if (pc == BRV_HALT) {
+            return res;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bitreverse_loop(0xAA)=%d vm_bitreverse_loop(0x12)=%d\n",
+           vm_bitreverse_loop_target(0xAA), vm_bitreverse_loop_target(0x12));
+    return 0;
+}
+```
@@ -0,0 +1,89 @@
+# vm_bittransitions_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bittransitions_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bittransitions_loop.ll`
+- **Symbol:** `vm_bittransitions_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bittransitions_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bittransitions_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zeros |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | single bit set |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | bit 1 set |
+| 4 | RCX=65535 | 0 | 0 | 0 | yes | all 16 bits same: 0 transitions |
+| 5 | RCX=21845 | 15 | 15 | 15 | yes | 0x5555 alternating: 15 transitions |
+| 6 | RCX=43690 | 15 | 15 | 15 | yes | 0xAAAA alternating |
+| 7 | RCX=52428 | 7 | 7 | 7 | yes | 0xCCCC: 2-bit blocks |
+| 8 | RCX=3855 | 3 | 3 | 3 | yes | 0x0F0F: 4-bit blocks |
+| 9 | RCX=61680 | 3 | 3 | 3 | yes | 0xF0F0: 4-bit blocks |
+| 10 | RCX=65280 | 1 | 1 | 1 | yes | 0xFF00: single transition |
+| 11 | RCX=4660 | 8 | 8 | 8 | yes | 0x1234 |
+
+## Source
+
+```c
+/* PC-state VM that counts adjacent-bit transitions in the low 16 bits of x.
+ * Lift target: vm_bittransitions_loop_target.
+ * Goal: cover a loop body that examines TWO bits per iteration via XOR-and-mask.
+ * Branchless body (count += diff) keeps the count slot always written so the
+ * lifter doesn't promote it to phi-undef on iterations where no transition
+ * occurs.
+ */
+#include <stdio.h>
+
+enum BtVmPc {
+    BT_LOAD       = 0,
+    BT_INIT       = 1,
+    BT_CHECK      = 2,
+    BT_BODY_DIFF  = 3,
+    BT_BODY_ADD   = 4,
+    BT_BODY_INC   = 5,
+    BT_HALT       = 6,
+};
+
+__declspec(noinline)
+int vm_bittransitions_loop_target(int x) {
+    int idx   = 0;
+    int count = 0;
+    int diff  = 0;
+    int pc    = BT_LOAD;
+
+    while (1) {
+        if (pc == BT_LOAD) {
+            idx = 0;
+            count = 0;
+            pc = BT_INIT;
+        } else if (pc == BT_INIT) {
+            pc = BT_CHECK;
+        } else if (pc == BT_CHECK) {
+            pc = (idx < 15) ? BT_BODY_DIFF : BT_HALT;
+        } else if (pc == BT_BODY_DIFF) {
+            diff = ((x >> idx) ^ (x >> (idx + 1))) & 1;
+            pc = BT_BODY_ADD;
+        } else if (pc == BT_BODY_ADD) {
+            count = count + diff;
+            pc = BT_BODY_INC;
+        } else if (pc == BT_BODY_INC) {
+            idx = idx + 1;
+            pc = BT_CHECK;
+        } else if (pc == BT_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bittransitions_loop(0x5555)=%d vm_bittransitions_loop(0x1234)=%d\n",
+           vm_bittransitions_loop_target(0x5555), vm_bittransitions_loop_target(0x1234));
+    return 0;
+}
+```
@@ -0,0 +1,85 @@
+# vm_branchy_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_branchy_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_branchy_loop.ll`
+- **Symbol:** `vm_branchy_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_branchy_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_branchy_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | limit=0: no iterations |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | limit=1: only i=0 (even) |
+| 3 | RCX=2 | 1 | 1 | 1 | yes | limit=2: i=1 is odd |
+| 4 | RCX=5 | 2 | 2 | 2 | yes | limit=5: odds {1,3} |
+| 5 | RCX=10 | 5 | 5 | 5 | yes | limit=10: odds {1,3,5,7,9} |
+| 6 | RCX=15 | 7 | 7 | 7 | yes | limit=15: odds 1..13 |
+| 7 | RCX=16 | 0 | 0 | 0 | yes | limit=0 (mask drops bit 4) |
+| 8 | RCX=31 | 7 | 7 | 7 | yes | limit=15 again after mask |
+
+## Source
+
+```c
+/* PC-state VM with a conditional branch inside the loop body.
+ * Lift target: vm_branchy_loop_target.
+ * Goal: keep a VM-shaped dispatcher with a real loop AND a data-dependent
+ * branch in the loop body (parity test on the loop induction variable).
+ * Counts how many odd values exist in [0, limit) where limit = x & 0xF.
+ */
+#include <stdio.h>
+
+enum BranchVmPc {
+    BV_INIT        = 0,
+    BV_LOAD_LIMIT  = 1,
+    BV_CHECK_LIMIT = 2,
+    BV_TEST_PARITY = 3,
+    BV_INC_COUNT   = 4,
+    BV_INC_INDEX   = 5,
+    BV_HALT        = 6,
+};
+
+__declspec(noinline)
+int vm_branchy_loop_target(int x) {
+    int i      = 0;
+    int count  = 0;
+    int limit  = 0;
+    int parity = 0;
+    int pc     = BV_LOAD_LIMIT;
+
+    while (1) {
+        if (pc == BV_LOAD_LIMIT) {
+            i = 0;
+            count = 0;
+            limit = x & 0xF;
+            pc = BV_CHECK_LIMIT;
+        } else if (pc == BV_CHECK_LIMIT) {
+            pc = (i < limit) ? BV_TEST_PARITY : BV_HALT;
+        } else if (pc == BV_TEST_PARITY) {
+            parity = i & 1;
+            pc = (parity != 0) ? BV_INC_COUNT : BV_INC_INDEX;
+        } else if (pc == BV_INC_COUNT) {
+            count = count + 1;
+            pc = BV_INC_INDEX;
+        } else if (pc == BV_INC_INDEX) {
+            i = i + 1;
+            pc = BV_CHECK_LIMIT;
+        } else if (pc == BV_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_branchy_loop(10)=%d vm_branchy_loop(15)=%d\n",
+           vm_branchy_loop_target(10), vm_branchy_loop_target(15));
+    return 0;
+}
+```
@@ -0,0 +1,99 @@
+# vm_bswap64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bswap64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bswap64_loop.ll`
+- **Symbol:** `vm_bswap64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bswap64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bswap64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0: zero stays zero |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1, n=2: double bswap = identity |
+| 3 | RCX=2 | 144115188075855872 | 144115188075855872 | 144115188075855872 | yes | x=2, n=3: bswap once -> 0x0200...0 |
+| 4 | RCX=7 | 7 | 7 | 7 | yes | x=7, n=8: even -> identity |
+| 5 | RCX=255 | 255 | 255 | 255 | yes | x=0xFF, n=8: even -> identity |
+| 6 | RCX=51966 | 18359486830929248256 | 18359486830929248256 | 18359486830929248256 | yes | x=0xCAFE, n=7 (odd) -> 0xFECA00..0 |
+| 7 | RCX=3405691582 | 13743577356411338752 | 13743577356411338752 | 13743577356411338752 | yes | x=0xCAFEBABE, n=7 (odd) |
+| 8 | RCX=1311768467463790320 | 17356517385562371090 | 17356517385562371090 | 17356517385562371090 | yes | 0x123...DEF0, n=1: bswap once |
+| 9 | RCX=18446744073709551615 | 18446744073709551615 | 18446744073709551615 | 18446744073709551615 | yes | max u64: bswap fixed point |
+| 10 | RCX=11400714819323198485 | 11400714819323198485 | 11400714819323198485 | 11400714819323198485 | yes | K (golden): n=6 even -> identity |
+
+## Source
+
+```c
+/* PC-state VM running an i64 byte-swap built from explicit shifts and
+ * masks (no intrinsic) in a variable-trip loop.  Even-trip values produce
+ * identity; odd-trip values produce a single byte-swap of the input.
+ *   for i in 0..n: state = byteswap_via_shifts_and_masks(state)
+ * Variable trip n = (x & 7) + 1.
+ * Lift target: vm_bswap64_loop_target.
+ *
+ * Distinct from vm_imported_bswap_loop (i32 _byteswap_ulong intrinsic):
+ * exercises the explicit 8-way mask+shift+or fan-in lowering on full i64
+ * state.  The lifter likely recognizes this as llvm.bswap.i64 after
+ * optimization.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BsVmPc {
+    BS_LOAD       = 0,
+    BS_INIT       = 1,
+    BS_LOOP_CHECK = 2,
+    BS_LOOP_BODY  = 3,
+    BS_LOOP_INC   = 4,
+    BS_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_bswap64_loop_target(uint64_t x) {
+    int      idx   = 0;
+    int      n     = 0;
+    uint64_t state = 0;
+    int      pc    = BS_LOAD;
+
+    while (1) {
+        if (pc == BS_LOAD) {
+            state = x;
+            n     = (int)(x & 7ull) + 1;
+            pc = BS_INIT;
+        } else if (pc == BS_INIT) {
+            idx = 0;
+            pc = BS_LOOP_CHECK;
+        } else if (pc == BS_LOOP_CHECK) {
+            pc = (idx < n) ? BS_LOOP_BODY : BS_HALT;
+        } else if (pc == BS_LOOP_BODY) {
+            state = ((state & 0x00000000000000FFull) << 56) |
+                    ((state & 0x000000000000FF00ull) << 40) |
+                    ((state & 0x0000000000FF0000ull) << 24) |
+                    ((state & 0x00000000FF000000ull) << 8)  |
+                    ((state & 0x000000FF00000000ull) >> 8)  |
+                    ((state & 0x0000FF0000000000ull) >> 24) |
+                    ((state & 0x00FF000000000000ull) >> 40) |
+                    ((state & 0xFF00000000000000ull) >> 56);
+            pc = BS_LOOP_INC;
+        } else if (pc == BS_LOOP_INC) {
+            idx = idx + 1;
+            pc = BS_LOOP_CHECK;
+        } else if (pc == BS_HALT) {
+            return state;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bswap64(0x123456789ABCDEF0)=0x%llx vm_bswap64(0xCAFE)=0x%llx\n",
+           (unsigned long long)vm_bswap64_loop_target(0x123456789ABCDEF0ull),
+           (unsigned long long)vm_bswap64_loop_target(0xCAFEull));
+    return 0;
+}
+```
@@ -0,0 +1,97 @@
+# vm_byte_andfold64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byte_andfold64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byte_andfold64_loop.ll`
+- **Symbol:** `vm_byte_andfold64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byte_andfold64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byte_andfold64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0 n=1: 0xFF & 0=0 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | x=1 n=2: 0xFF&1=1; 1&0=0 |
+| 3 | RCX=2 | 0 | 0 | 0 | yes | x=2 n=3: 2&0=0 |
+| 4 | RCX=7 | 0 | 0 | 0 | yes | x=7 n=8: byte0=7 then 0s |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1: 0xFF & 8=8 |
+| 6 | RCX=3405691582 | 0 | 0 | 0 | yes | 0xCAFEBABE: n=7 high byte=0 |
+| 7 | RCX=3735928559 | 0 | 0 | 0 | yes | 0xDEADBEEF: n=8 high byte=0 |
+| 8 | RCX=18446744073709551615 | 255 | 255 | 255 | yes | all 0xFF: r stays 0xFF |
+| 9 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=1 byte0=8 |
+| 10 | RCX=18446460386757245432 | 248 | 248 | 248 | yes | 0xFFFEFDFCFBFAF9F8: n=1 byte0=0xF8=248 |
+
+## Source
+
+```c
+/* PC-state VM that AND-folds u8 bytes over n = (x & 7) + 1 iterations:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0xFF;
+ *   while (n) {
+ *     r = r & (s & 0xFF);
+ *     s >>= 8;
+ *     n--;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_byte_andfold64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_andsum_byte_idx64_loop (byte AND counter, ADD-folded)
+ *   - vm_word_orfold64_loop     (OR fold, monotone INCREASING)
+ *   - vm_byteprod64_loop        (multiplicative chain)
+ *
+ * Tests `and i64` chain at byte stride.  AND fold is monotone
+ * DECREASING (only clears bits) - counterpart to OR's monotone
+ * increasing.  Any zero byte clears the accumulator to 0.  All-FF
+ * input preserves r=0xFF.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BaVmPc {
+    BA_INIT_ALL = 0,
+    BA_CHECK    = 1,
+    BA_BODY     = 2,
+    BA_HALT     = 3,
+};
+
+__declspec(noinline)
+uint64_t vm_byte_andfold64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    int      pc = BA_INIT_ALL;
+
+    while (1) {
+        if (pc == BA_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0xFFull;
+            pc = BA_CHECK;
+        } else if (pc == BA_CHECK) {
+            pc = (n > 0ull) ? BA_BODY : BA_HALT;
+        } else if (pc == BA_BODY) {
+            r = r & (s & 0xFFull);
+            s = s >> 8;
+            n = n - 1ull;
+            pc = BA_CHECK;
+        } else if (pc == BA_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byte_andfold64(0xFFFEFDFCFBFAF9F8)=%llu\n",
+           (unsigned long long)vm_byte_andfold64_loop_target(0xFFFEFDFCFBFAF9F8ull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_byte_buffer_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byte_buffer_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byte_buffer_loop.ll`
+- **Symbol:** `vm_byte_buffer_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byte_buffer_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byte_buffer_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 840 | 840 | 840 | yes | seed=0 |
+| 2 | RCX=1 | 856 | 856 | 856 | yes | seed=1 |
+| 3 | RCX=5 | 920 | 920 | 920 | yes | seed=5 |
+| 4 | RCX=7 | 952 | 952 | 952 | yes | seed=7 |
+| 5 | RCX=16 | 1096 | 1096 | 1096 | yes | 0x10 |
+| 6 | RCX=85 | 2200 | 2200 | 2200 | yes | 0x55 |
+| 7 | RCX=128 | 2888 | 2888 | 2888 | yes | 0x80 |
+| 8 | RCX=255 | 1080 | 1080 | 1080 | yes | 0xFF: wrap-modulated |
+| 9 | RCX=51966 | 1064 | 1064 | 1064 | yes | 0xCAFE: seed=0xFE |
+| 10 | RCX=74565 | 1944 | 1944 | 1944 | yes | 0x12345: seed=0x45 |
+
+## Source
+
+```c
+/* PC-state VM that fills a 16-byte stack buffer (uint8_t buf[16]) and
+ * sums it in a separate pass.
+ * Lift target: vm_byte_buffer_loop_target.
+ * Goal: cover an i8-element stack array (distinct from int[] arrays and
+ * from the scalar-i8 vm_byte_loop case).  Two PC-state passes (fill +
+ * accumulate); both have a fixed 16-trip bound and may be unrolled.
+ */
+#include <stdio.h>
+
+enum BbVmPc {
+    BB_LOAD       = 0,
+    BB_INIT_FILL  = 1,
+    BB_FILL_CHECK = 2,
+    BB_FILL_BODY  = 3,
+    BB_FILL_INC   = 4,
+    BB_INIT_SUM   = 5,
+    BB_SUM_CHECK  = 6,
+    BB_SUM_BODY   = 7,
+    BB_SUM_INC    = 8,
+    BB_HALT       = 9,
+};
+
+__declspec(noinline)
+int vm_byte_buffer_loop_target(int x) {
+    unsigned char buf[16];
+    int idx  = 0;
+    int sum  = 0;
+    int seed = 0;
+    int pc   = BB_LOAD;
+
+    while (1) {
+        if (pc == BB_LOAD) {
+            seed = x & 0xFF;
+            pc = BB_INIT_FILL;
+        } else if (pc == BB_INIT_FILL) {
+            idx = 0;
+            pc = BB_FILL_CHECK;
+        } else if (pc == BB_FILL_CHECK) {
+            pc = (idx < 16) ? BB_FILL_BODY : BB_INIT_SUM;
+        } else if (pc == BB_FILL_BODY) {
+            buf[idx] = (unsigned char)((idx * 7 + seed) & 0xFF);
+            pc = BB_FILL_INC;
+        } else if (pc == BB_FILL_INC) {
+            idx = idx + 1;
+            pc = BB_FILL_CHECK;
+        } else if (pc == BB_INIT_SUM) {
+            idx = 0;
+            pc = BB_SUM_CHECK;
+        } else if (pc == BB_SUM_CHECK) {
+            pc = (idx < 16) ? BB_SUM_BODY : BB_HALT;
+        } else if (pc == BB_SUM_BODY) {
+            sum = sum + (int)buf[idx];
+            pc = BB_SUM_INC;
+        } else if (pc == BB_SUM_INC) {
+            idx = idx + 1;
+            pc = BB_SUM_CHECK;
+        } else if (pc == BB_HALT) {
+            return sum;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byte_buffer_loop(0x55)=%d vm_byte_buffer_loop(0xFF)=%d\n",
+           vm_byte_buffer_loop_target(0x55),
+           vm_byte_buffer_loop_target(0xFF));
+    return 0;
+}
+```
@@ -0,0 +1,87 @@
+# vm_byte_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byte_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byte_loop.ll`
+- **Symbol:** `vm_byte_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byte_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byte_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | limit=0, x=0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | limit=0, x=1 |
+| 3 | RCX=256 | 5 | 5 | 5 | yes | 0x100: limit=1, state=0 |
+| 4 | RCX=768 | 147 | 147 | 147 | yes | 0x300: limit=3, state=0 |
+| 5 | RCX=51966 | 44 | 44 | 44 | yes | 0xCAFE: limit=12, state=0xFE |
+| 6 | RCX=43981 | 28 | 28 | 28 | yes | 0xABCD: limit=11, state=0xCD |
+| 7 | RCX=74565 | 188 | 188 | 188 | yes | 0x12345: limit=3, state=0x45 |
+| 8 | RCX=33023 | 255 | 255 | 255 | yes | 0x80FF: limit=0, state=0xFF |
+| 9 | RCX=65535 | 82 | 82 | 82 | yes | 0xFFFF: limit=15, state=0xFF |
+| 10 | RCX=16962 | 216 | 216 | 216 | yes | 0x4242: limit=2, state=0x42 |
+
+## Source
+
+```c
+/* PC-state VM with explicit unsigned char (i8) arithmetic recurrence.
+ * Lift target: vm_byte_loop_target.
+ * Goal: cover narrower-type (i8) arithmetic inside a VM dispatcher.
+ * state = state * 13 + 5 (mod 256), iterated symbolic times.
+ * Distinct from existing i32 recurrences and the int64 family.
+ */
+#include <stdio.h>
+
+enum BvVmPc {
+    BV_LOAD       = 0,
+    BV_INIT       = 1,
+    BV_CHECK      = 2,
+    BV_BODY_MUL   = 3,
+    BV_BODY_ADD   = 4,
+    BV_BODY_DEC   = 5,
+    BV_HALT       = 6,
+};
+
+__declspec(noinline)
+int vm_byte_loop_target(int x) {
+    unsigned char state = 0;
+    int n = 0;
+    int pc = BV_LOAD;
+
+    while (1) {
+        if (pc == BV_LOAD) {
+            state = (unsigned char)x;
+            n = (x >> 8) & 0xF;
+            pc = BV_INIT;
+        } else if (pc == BV_INIT) {
+            pc = BV_CHECK;
+        } else if (pc == BV_CHECK) {
+            pc = (n > 0) ? BV_BODY_MUL : BV_HALT;
+        } else if (pc == BV_BODY_MUL) {
+            state = (unsigned char)(state * 13);
+            pc = BV_BODY_ADD;
+        } else if (pc == BV_BODY_ADD) {
+            state = (unsigned char)(state + 5);
+            pc = BV_BODY_DEC;
+        } else if (pc == BV_BODY_DEC) {
+            n = n - 1;
+            pc = BV_CHECK;
+        } else if (pc == BV_HALT) {
+            return (int)state;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byte_loop(0xCAFE)=%d vm_byte_loop(0xFFFF)=%d\n",
+           vm_byte_loop_target(0xCAFE),
+           vm_byte_loop_target(0xFFFF));
+    return 0;
+}
+```
@@ -0,0 +1,98 @@
+# vm_bytecyc64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytecyc64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytecyc64_loop.ll`
+- **Symbol:** `vm_bytecyc64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytecyc64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytecyc64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1, shift=0: identity |
+| 3 | RCX=255 | 255 | 255 | 255 | yes | x=0xFF, shift=0 |
+| 4 | RCX=72623859790382856 | 144964032628459521 | 144964032628459521 | 144964032628459521 | yes | 0x0102030405060708, shift=1: rotates bytes |
+| 5 | RCX=3405691582 | 3405691582 | 3405691582 | 3405691582 | yes | 0xCAFEBABE: shift=0 identity |
+| 6 | RCX=14627333968688430831 | 13456437574443715326 | 13456437574443715326 | 13456437574443715326 | yes | 0xCAFEBABEDEADBEEF, shift=2 |
+| 7 | RCX=1311768467463790320 | 6230900220451885620 | 6230900220451885620 | 6230900220451885620 | yes | 0x123456789ABCDEF0, shift=2 |
+| 8 | RCX=18446744073709551615 | 18446744073709551615 | 18446744073709551615 | 18446744073709551615 | yes | max u64: rotation invariant |
+| 9 | RCX=11400714819323198485 | 8941226596316577610 | 8941226596316577610 | 8941226596316577610 | yes | K (golden), shift=6 |
+| 10 | RCX=4822678189205111 | 4822678189205111 | 4822678189205111 | 4822678189205111 | yes | 0x0011223344556677, shift=0 |
+
+## Source
+
+```c
+/* PC-state VM that cyclically shifts BYTES of x by an input-derived
+ * amount (top byte bits 0..2 select the rotation).
+ *   shift = (x >> 56) & 7;
+ *   result = 0;
+ *   for i in 0..8:
+ *     byte = (x >> (i*8)) & 0xFF
+ *     result |= byte << (((i + shift) & 7) * 8)
+ *   return result;
+ * Lift target: vm_bytecyc64_loop_target.
+ *
+ * Distinct from vm_bswap64_loop (full 8-byte reverse) and vm_rotl64_loop
+ * (bit-level rotation): byte-granularity cyclic permutation with
+ * input-derived shift amount.  Each byte goes to position (i+shift)&7.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BcVmPc {
+    BC_LOAD       = 0,
+    BC_INIT       = 1,
+    BC_LOOP_CHECK = 2,
+    BC_LOOP_BODY  = 3,
+    BC_LOOP_INC   = 4,
+    BC_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_bytecyc64_loop_target(uint64_t x) {
+    int      idx    = 0;
+    uint64_t xx     = 0;
+    uint64_t shift  = 0;
+    uint64_t result = 0;
+    int      pc     = BC_LOAD;
+
+    while (1) {
+        if (pc == BC_LOAD) {
+            xx     = x;
+            shift  = (x >> 56) & 7ull;
+            result = 0ull;
+            pc = BC_INIT;
+        } else if (pc == BC_INIT) {
+            idx = 0;
+            pc = BC_LOOP_CHECK;
+        } else if (pc == BC_LOOP_CHECK) {
+            pc = (idx < 8) ? BC_LOOP_BODY : BC_HALT;
+        } else if (pc == BC_LOOP_BODY) {
+            uint64_t byte = (xx >> (idx * 8)) & 0xFFull;
+            uint64_t pos  = ((uint64_t)idx + shift) & 7ull;
+            result = result | (byte << (pos * 8));
+            pc = BC_LOOP_INC;
+        } else if (pc == BC_LOOP_INC) {
+            idx = idx + 1;
+            pc = BC_LOOP_CHECK;
+        } else if (pc == BC_HALT) {
+            return result;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytecyc64(0x0102030405060708)=0x%llx vm_bytecyc64(0xCAFEBABEDEADBEEF)=0x%llx\n",
+           (unsigned long long)vm_bytecyc64_loop_target(0x0102030405060708ull),
+           (unsigned long long)vm_bytecyc64_loop_target(0xCAFEBABEDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_bytediv5_sum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytediv5_sum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytediv5_sum64_loop.ll`
+- **Symbol:** `vm_bytediv5_sum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytediv5_sum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytediv5_sum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | x=1 n=2: 1/5=0 |
+| 3 | RCX=2 | 0 | 0 | 0 | yes | x=2 n=3: 2/5=0 |
+| 4 | RCX=7 | 1 | 1 | 1 | yes | x=7 n=8: byte0=7 -> 7/5=1 |
+| 5 | RCX=8 | 1 | 1 | 1 | yes | x=8 n=1: 8/5=1 |
+| 6 | RCX=3405691582 | 165 | 165 | 165 | yes | 0xCAFEBABE: n=7 sum of byte/5 |
+| 7 | RCX=3735928559 | 163 | 163 | 163 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 408 | 408 | 408 | yes | all 0xFF n=8: 8 * 51 = 408 |
+| 9 | RCX=72623859790382856 | 1 | 1 | 1 | yes | 0x0102...0708: n=1 byte0=8 -> 1 |
+| 10 | RCX=1311768467463790320 | 48 | 48 | 48 | yes | 0x12345...EF0: n=1 byte0=240 -> 240/5=48 |
+
+## Source
+
+```c
+/* PC-state VM that sums byte / 5 over n = (x & 7) + 1 iterations:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     r = r + ((s & 0xFF) / 5);   // udiv by 5
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_bytediv5_sum64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_adler32_64_loop          (urem by 65521 - prime modular)
+ *   - vm_trailzeros_factorial64_loop (udiv by 5 on a single state, log_5)
+ *   - vm_uintadd_byte_idx64_loop  (byte * counter - mul not div)
+ *
+ * Tests `udiv i64 byte, 5` per iteration on a byte stream.  Compiler
+ * may lower /5 to magic-number multiply but the lifter typically
+ * preserves it as raw udiv (per documented Adler urem behavior).
+ * All-0xFF accumulates 8 * (255/5) = 8 * 51 = 408.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BdVmPc {
+    BD_INIT_ALL = 0,
+    BD_CHECK    = 1,
+    BD_BODY     = 2,
+    BD_INC      = 3,
+    BD_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_bytediv5_sum64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BD_INIT_ALL;
+
+    while (1) {
+        if (pc == BD_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BD_CHECK;
+        } else if (pc == BD_CHECK) {
+            pc = (i < n) ? BD_BODY : BD_HALT;
+        } else if (pc == BD_BODY) {
+            r = r + ((s & 0xFFull) / 5ull);
+            s = s >> 8;
+            pc = BD_INC;
+        } else if (pc == BD_INC) {
+            i = i + 1ull;
+            pc = BD_CHECK;
+        } else if (pc == BD_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytediv5_sum64(0xFFFFFFFFFFFFFFFF)=%llu\n",
+           (unsigned long long)vm_bytediv5_sum64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,100 @@
+# vm_bytematch64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytematch64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytematch64_loop.ll`
+- **Symbol:** `vm_bytematch64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytematch64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytematch64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 7 | 7 | 7 | yes | x=0: target=0, all 7 lower bytes match |
+| 2 | RCX=72340172838076673 | 7 | 7 | 7 | yes | 0x0101...01: target=1, all match |
+| 3 | RCX=18374686479671623680 | 0 | 0 | 0 | yes | 0xFF00...00: target=0xFF, none match |
+| 4 | RCX=18446744073709551615 | 7 | 7 | 7 | yes | max u64: target=0xFF, all match |
+| 5 | RCX=3405691582 | 3 | 3 | 3 | yes | 0xCAFEBABE: target=0, lower 3 bytes are 0 |
+| 6 | RCX=14627333941892939776 | 0 | 0 | 0 | yes | 0xCAFE000000000000: target=0xCA, none match |
+| 7 | RCX=1302123111085380351 | 6 | 6 | 6 | yes | 0x12121212121212FF: target=0x12, 6 match |
+| 8 | RCX=12249988016147062528 | 0 | 0 | 0 | yes | 0xAA00BB00CC00DD00: target=0xAA, none |
+| 9 | RCX=18399425019007729919 | 1 | 1 | 1 | yes | 0xFF5555555555AAFF: target=0xFF, 1 match (low) |
+| 10 | RCX=11400714819323198485 | 0 | 0 | 0 | yes | K (golden): target=0x9E, none match |
+
+## Source
+
+```c
+/* PC-state VM that counts how many of the lower 7 bytes of x equal the
+ * top byte of x.
+ *   target = (x >> 56) & 0xFF;
+ *   count = 0;
+ *   for i in 0..7:
+ *     byte = (x >> (i*8)) & 0xFF
+ *     if byte == target: count++
+ *   return count;
+ * 7-trip fixed loop with byte-walking shift + byte-equality compare.
+ * Lift target: vm_bytematch64_loop_target.
+ *
+ * Distinct from vm_xorbytes64_loop (XOR-fold) and vm_djb264_loop
+ * (multiplicative hash): byte-equality count via icmp eq i64 (after
+ * masking) inside a fixed loop with input-derived target byte.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BmVmPc {
+    BM_LOAD       = 0,
+    BM_INIT       = 1,
+    BM_LOOP_CHECK = 2,
+    BM_LOOP_BODY  = 3,
+    BM_LOOP_INC   = 4,
+    BM_HALT       = 5,
+};
+
+__declspec(noinline)
+int vm_bytematch64_loop_target(uint64_t x) {
+    int      idx    = 0;
+    uint64_t xx     = 0;
+    uint64_t target = 0;
+    int      count  = 0;
+    int      pc     = BM_LOAD;
+
+    while (1) {
+        if (pc == BM_LOAD) {
+            xx     = x;
+            target = (x >> 56) & 0xFFull;
+            count  = 0;
+            pc = BM_INIT;
+        } else if (pc == BM_INIT) {
+            idx = 0;
+            pc = BM_LOOP_CHECK;
+        } else if (pc == BM_LOOP_CHECK) {
+            pc = (idx < 7) ? BM_LOOP_BODY : BM_HALT;
+        } else if (pc == BM_LOOP_BODY) {
+            uint64_t b = (xx >> (idx * 8)) & 0xFFull;
+            if (b == target) {
+                count = count + 1;
+            }
+            pc = BM_LOOP_INC;
+        } else if (pc == BM_LOOP_INC) {
+            idx = idx + 1;
+            pc = BM_LOOP_CHECK;
+        } else if (pc == BM_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytematch64(0x0101010101010101)=%d vm_bytematch64(0xCAFEBABE)=%d\n",
+           vm_bytematch64_loop_target(0x0101010101010101ull),
+           vm_bytematch64_loop_target(0xCAFEBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,111 @@
+# vm_bytemax64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytemax64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytemax64_loop.ll`
+- **Symbol:** `vm_bytemax64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytemax64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytemax64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero bytes |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1: max byte=1 |
+| 3 | RCX=255 | 255 | 255 | 255 | yes | x=0xFF: max byte=255 |
+| 4 | RCX=128 | 128 | 128 | 128 | yes | x=0x80: max byte=128 |
+| 5 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=(8&7)+1=1: only byte0=8 visible |
+| 6 | RCX=1311768467463790320 | 240 | 240 | 240 | yes | 0x12345...EF0: n=1: byte0=0xF0 |
+| 7 | RCX=3405691582 | 254 | 254 | 254 | yes | 0xCAFEBABE: n=7: max=0xFE |
+| 8 | RCX=16045690985374415566 | 254 | 254 | 254 | yes | 0xDEADBEEFFEEDFACE: n=7 |
+| 9 | RCX=18446744073709551615 | 255 | 255 | 255 | yes | all 0xFF: max=255 |
+| 10 | RCX=65280 | 0 | 0 | 0 | yes | 0xFF00: n=1: byte0=0 |
+
+## Source
+
+```c
+/* PC-state VM that finds the maximum byte value across the lower n bytes
+ * of x where n = (x & 7) + 1.  Pure unsigned compare-driven max-update.
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   while (n) {
+ *     uint8_t b = s & 0xFF;
+ *     if (b > r) r = b;
+ *     s >>= 8;
+ *     n--;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_bytemax64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_choosemax64_loop (per-iter chooses between two locally-computed
+ *     options s*3+i vs s+i*i over full u64 state)
+ *   - vm_smax64_loop (signed max of a derived sequence)
+ *   - vm_minarray_loop (i32 min over a stack array)
+ *   - vm_bytematch64 (matches a key, doesn't track a max)
+ *
+ * Tests u8 cmp + select-style update where the "no-update" path keeps
+ * the running max unchanged.  Bytes 0x00 are special: they NEVER
+ * exceed the running max, so the lifter must keep the conditional
+ * write under control.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BmVmPc {
+    BM_LOAD_N    = 0,
+    BM_INIT_REGS = 1,
+    BM_CHECK     = 2,
+    BM_BODY      = 3,
+    BM_SHIFT     = 4,
+    BM_DEC       = 5,
+    BM_HALT      = 6,
+};
+
+__declspec(noinline)
+uint64_t vm_bytemax64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    int      pc = BM_LOAD_N;
+
+    while (1) {
+        if (pc == BM_LOAD_N) {
+            n = (x & 7ull) + 1ull;
+            pc = BM_INIT_REGS;
+        } else if (pc == BM_INIT_REGS) {
+            s = x;
+            r = 0ull;
+            pc = BM_CHECK;
+        } else if (pc == BM_CHECK) {
+            pc = (n > 0ull) ? BM_BODY : BM_HALT;
+        } else if (pc == BM_BODY) {
+            uint64_t b = s & 0xFFull;
+            r = (b > r) ? b : r;
+            pc = BM_SHIFT;
+        } else if (pc == BM_SHIFT) {
+            s = s >> 8;
+            pc = BM_DEC;
+        } else if (pc == BM_DEC) {
+            n = n - 1ull;
+            pc = BM_CHECK;
+        } else if (pc == BM_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytemax64(0x123456789ABCDEF0)=%llu\n",
+           (unsigned long long)vm_bytemax64_loop_target(0x123456789ABCDEF0ull));
+    return 0;
+}
+```
@@ -0,0 +1,100 @@
+# vm_bytemod3_sum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytemod3_sum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytemod3_sum64_loop.ll`
+- **Symbol:** `vm_bytemod3_sum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytemod3_sum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytemod3_sum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: 1%3=1 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | x=2 n=3: 2%3=2 |
+| 4 | RCX=7 | 1 | 1 | 1 | yes | x=7 n=8: byte0=7 -> 7%3=1 |
+| 5 | RCX=8 | 2 | 2 | 2 | yes | x=8 n=1: 8%3=2 |
+| 6 | RCX=3405691582 | 4 | 4 | 4 | yes | 0xCAFEBABE: n=7 sum of byte%3 |
+| 7 | RCX=3735928559 | 5 | 5 | 5 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | all 0xFF n=8: 255%3=0 (255=85*3) so 0 |
+| 9 | RCX=72623859790382856 | 2 | 2 | 2 | yes | 0x0102...0708: n=1 byte0=8 -> 2 |
+| 10 | RCX=1311768467463790320 | 0 | 0 | 0 | yes | 0x12345...EF0: n=1 byte0=240 -> 240%3=0 |
+
+## Source
+
+```c
+/* PC-state VM that sums byte % 3 over n = (x & 7) + 1 iterations:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     r = r + ((s & 0xFF) % 3);   // urem by 3
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_bytemod3_sum64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_bytediv5_sum64_loop  (per-byte udiv by 5)
+ *   - vm_adler32_64_loop      (urem by 65521 prime)
+ *
+ * Tests `urem i64 byte, 3` per iteration on a byte stream with ADD
+ * accumulator.  Small-modulus complement to /5 sample - exercises
+ * urem-by-small-prime separately from the div-by-5 path.  All-0xFF
+ * accumulates 8 * (255 % 3) = 8 * 0 = 0.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BmVmPc {
+    BM_INIT_ALL = 0,
+    BM_CHECK    = 1,
+    BM_BODY     = 2,
+    BM_INC      = 3,
+    BM_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_bytemod3_sum64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BM_INIT_ALL;
+
+    while (1) {
+        if (pc == BM_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BM_CHECK;
+        } else if (pc == BM_CHECK) {
+            pc = (i < n) ? BM_BODY : BM_HALT;
+        } else if (pc == BM_BODY) {
+            r = r + ((s & 0xFFull) % 3ull);
+            s = s >> 8;
+            pc = BM_INC;
+        } else if (pc == BM_INC) {
+            i = i + 1ull;
+            pc = BM_CHECK;
+        } else if (pc == BM_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytemod3_sum64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_bytemod3_sum64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,102 @@
+# vm_byteparity64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byteparity64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byteparity64_loop.ll`
+- **Symbol:** `vm_byteparity64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byteparity64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byteparity64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0: all bytes parity 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1: low byte parity=1 |
+| 3 | RCX=255 | 0 | 0 | 0 | yes | x=0xFF: 8 bits set, parity even=0 |
+| 4 | RCX=3405691582 | 6 | 6 | 6 | yes | x=0xCAFEBABE |
+| 5 | RCX=72623859790382856 | 211 | 211 | 211 | yes | x=0x0102030405060708 |
+| 6 | RCX=14627333968688430831 | 101 | 101 | 101 | yes | 0xCAFEBABEDEADBEEF |
+| 7 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | max u64: all bytes 0xFF parity 0 |
+| 8 | RCX=11400714819323198485 | 255 | 255 | 255 | yes | K (golden): all bytes parity 1 |
+| 9 | RCX=12297829382473034410 | 0 | 0 | 0 | yes | 0xAAAA...: all bytes 0xAA parity 0 |
+| 10 | RCX=1311768467463790320 | 68 | 68 | 68 | yes | 0x123456789ABCDEF0 |
+
+## Source
+
+```c
+/* PC-state VM that computes per-byte parity bit and packs the 8 parity
+ * bits into the low byte of result.
+ *   result = 0;
+ *   for i in 0..8:
+ *     byte = (x >> (i*8)) & 0xFF;
+ *     // SWAR parity in three xor-shift steps:
+ *     byte = (byte ^ (byte >> 4)) & 0xF;
+ *     byte = (byte ^ (byte >> 2)) & 0x3;
+ *     byte = (byte ^ (byte >> 1)) & 0x1;
+ *     result |= byte << i;
+ *   return result;
+ * 8-trip fixed loop with three sequential xor-shift+mask steps inside.
+ * Lift target: vm_byteparity64_loop_target.
+ *
+ * Distinct from vm_xorbytes64_loop (XOR-fold to single byte) and
+ * vm_prefixxor64_loop (running prefix-XOR scan): per-byte SWAR parity
+ * with 3 xor-shift+mask reductions in the inner body.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BpVmPc {
+    BP_LOAD       = 0,
+    BP_INIT       = 1,
+    BP_LOOP_CHECK = 2,
+    BP_LOOP_BODY  = 3,
+    BP_LOOP_INC   = 4,
+    BP_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_byteparity64_loop_target(uint64_t x) {
+    int      idx    = 0;
+    uint64_t xx     = 0;
+    uint64_t result = 0;
+    int      pc     = BP_LOAD;
+
+    while (1) {
+        if (pc == BP_LOAD) {
+            xx     = x;
+            result = 0ull;
+            pc = BP_INIT;
+        } else if (pc == BP_INIT) {
+            idx = 0;
+            pc = BP_LOOP_CHECK;
+        } else if (pc == BP_LOOP_CHECK) {
+            pc = (idx < 8) ? BP_LOOP_BODY : BP_HALT;
+        } else if (pc == BP_LOOP_BODY) {
+            uint64_t b = (xx >> (idx * 8)) & 0xFFull;
+            b = (b ^ (b >> 4)) & 0xFull;
+            b = (b ^ (b >> 2)) & 0x3ull;
+            b = (b ^ (b >> 1)) & 0x1ull;
+            result = result | (b << idx);
+            pc = BP_LOOP_INC;
+        } else if (pc == BP_LOOP_INC) {
+            idx = idx + 1;
+            pc = BP_LOOP_CHECK;
+        } else if (pc == BP_HALT) {
+            return result;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byteparity64(0xCAFEBABE)=%llu vm_byteparity64(0x0102030405060708)=%llu\n",
+           (unsigned long long)vm_byteparity64_loop_target(0xCAFEBABEull),
+           (unsigned long long)vm_byteparity64_loop_target(0x0102030405060708ull));
+    return 0;
+}
+```
@@ -0,0 +1,102 @@
+# vm_byteprod64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byteprod64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byteprod64_loop.ll`
+- **Symbol:** `vm_byteprod64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byteprod64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byteprod64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0 n=1: 1*0=0 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | x=1 n=2: 1*1=1; 1*0=0 |
+| 3 | RCX=2 | 0 | 0 | 0 | yes | x=2 n=3: byte0=2 then 0,0 -> 0 |
+| 4 | RCX=7 | 0 | 0 | 0 | yes | x=7 n=8: only byte0=7 nonzero, then 0 |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1: 1*8=8 (no zero byte to wreck) |
+| 6 | RCX=3405691582 | 0 | 0 | 0 | yes | 0xCAFEBABE: n=7 high bytes are 0 |
+| 7 | RCX=3735928559 | 0 | 0 | 0 | yes | 0xDEADBEEF: n=8 high bytes are 0 |
+| 8 | RCX=18446744073709551615 | 17878103347812890625 | 17878103347812890625 | 17878103347812890625 | yes | all 0xFF: 0xFF^8 mod 2^64 |
+| 9 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=1 byte0=8 |
+| 10 | RCX=144965140780024580 | 1512 | 1512 | 1512 | yes | 0x0203...0304: n=5 -> 4*3*2*9*7=1512 |
+
+## Source
+
+```c
+/* PC-state VM that computes the running product of bytes:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 1;
+ *   for (i = 0; i < n; i++) {
+ *     r = r * (s & 0xFF);     // u8 multiplicative chain (mod 2^64)
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_byteprod64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_bytesq_sum64_loop          (per-byte squared, ADD-folded)
+ *   - vm_xormul_byte_idx64_loop     (byte * counter, XOR-folded)
+ *   - vm_uintadd_byte_idx64_loop    (byte * counter, ADD-folded)
+ *   - vm_bytesmul_idx64_loop        (signed byte * counter, ADD-folded)
+ *
+ * Tests `mul i64 r, byte` chained across iterations.  Any zero byte
+ * collapses the product to 0 for the rest of the loop, which the
+ * lifter must not optimize away (the loop still runs to completion).
+ * Inputs with no zero bytes propagate a meaningful product.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BpVmPc {
+    BP_INIT_ALL = 0,
+    BP_CHECK    = 1,
+    BP_BODY     = 2,
+    BP_INC      = 3,
+    BP_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_byteprod64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BP_INIT_ALL;
+
+    while (1) {
+        if (pc == BP_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 1ull;
+            i = 0ull;
+            pc = BP_CHECK;
+        } else if (pc == BP_CHECK) {
+            pc = (i < n) ? BP_BODY : BP_HALT;
+        } else if (pc == BP_BODY) {
+            r = r * (s & 0xFFull);
+            s = s >> 8;
+            pc = BP_INC;
+        } else if (pc == BP_INC) {
+            i = i + 1ull;
+            pc = BP_CHECK;
+        } else if (pc == BP_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byteprod64(0x0203050709020304)=%llu\n",
+           (unsigned long long)vm_byteprod64_loop_target(0x0203050709020304ull));
+    return 0;
+}
+```
@@ -0,0 +1,110 @@
+# vm_byterange64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byterange64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byterange64_loop.ll`
+- **Symbol:** `vm_byterange64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byterange64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byterange64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero bytes -> mx=mn=0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1: n=(1&7)+1=2: bytes [1,0] -> mx=1 mn=0 |
+| 3 | RCX=255 | 255 | 255 | 255 | yes | x=0xFF: n=8: byte0=255 rest=0 |
+| 4 | RCX=128 | 0 | 0 | 0 | yes | x=0x80: n=1: only byte0=0x80 |
+| 5 | RCX=72623859790382856 | 0 | 0 | 0 | yes | 0x0102...0708: n=1: only byte0=8 |
+| 6 | RCX=1311768467463790320 | 0 | 0 | 0 | yes | 0x12345...EF0: n=1: only byte0=0xF0 |
+| 7 | RCX=3405691582 | 254 | 254 | 254 | yes | 0xCAFEBABE: n=7: max=0xFE min=0 |
+| 8 | RCX=16045690985374415566 | 81 | 81 | 81 | yes | 0xDEADBEEFFEEDFACE: n=7: range across non-zero bytes |
+| 9 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | all 0xFF: mx=mn=255 |
+| 10 | RCX=9187201950435737471 | 0 | 0 | 0 | yes | 0x7F*8: mx=mn=127 |
+
+## Source
+
+```c
+/* PC-state VM that tracks the running min and max bytes across the
+ * lower n = (x & 7) + 1 bytes and returns (max - min):
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; mn = 0xFF; mx = 0;
+ *   while (n) {
+ *     b = s & 0xFF;
+ *     if (b > mx) mx = b;
+ *     if (b < mn) mn = b;
+ *     s >>= 8; n--;
+ *   }
+ *   return (uint64_t)(mx - mn);
+ *
+ * Lift target: vm_byterange64_loop_target.
+ *
+ * Distinct from vm_bytemax64_loop (single-reduction max only): runs
+ * TWO independent cmp-driven reductions in lock-step inside the same
+ * loop body, each updating its own slot, plus a final subtract.  The
+ * lifter is expected to fold both branches into llvm.umax.i64 and
+ * llvm.umin.i64 and then sub the final values.
+ *
+ * Single-byte inputs always return 0 (byte = mx = mn).
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BrVmPc {
+    BR_LOAD_N    = 0,
+    BR_INIT_REGS = 1,
+    BR_CHECK     = 2,
+    BR_BODY      = 3,
+    BR_SHIFT     = 4,
+    BR_DEC       = 5,
+    BR_HALT      = 6,
+};
+
+__declspec(noinline)
+uint64_t vm_byterange64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t mn = 0;
+    uint64_t mx = 0;
+    int      pc = BR_LOAD_N;
+
+    while (1) {
+        if (pc == BR_LOAD_N) {
+            n = (x & 7ull) + 1ull;
+            pc = BR_INIT_REGS;
+        } else if (pc == BR_INIT_REGS) {
+            s  = x;
+            mn = 0xFFull;
+            mx = 0ull;
+            pc = BR_CHECK;
+        } else if (pc == BR_CHECK) {
+            pc = (n > 0ull) ? BR_BODY : BR_HALT;
+        } else if (pc == BR_BODY) {
+            uint64_t b = s & 0xFFull;
+            mx = (b > mx) ? b : mx;
+            mn = (b < mn) ? b : mn;
+            pc = BR_SHIFT;
+        } else if (pc == BR_SHIFT) {
+            s = s >> 8;
+            pc = BR_DEC;
+        } else if (pc == BR_DEC) {
+            n = n - 1ull;
+            pc = BR_CHECK;
+        } else if (pc == BR_HALT) {
+            return mx - mn;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byterange64(0xCAFEBABE)=%llu\n",
+           (unsigned long long)vm_byterange64_loop_target(0xCAFEBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,105 @@
+# vm_byterev_window64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byterev_window64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byterev_window64_loop.ll`
+- **Symbol:** `vm_byterev_window64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byterev_window64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byterev_window64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 256 | 256 | 256 | yes | x=1 n=2: bytes [1,0] -> 0x0100=256 |
+| 3 | RCX=2 | 131072 | 131072 | 131072 | yes | x=2 n=3: bytes [2,0,0] -> 0x020000 |
+| 4 | RCX=7 | 504403158265495552 | 504403158265495552 | 504403158265495552 | yes | x=7 n=8: byte 7 ends up at byte position 7 (high) of r |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1: r=byte0=8 |
+| 6 | RCX=3405691582 | 53685849048481792 | 53685849048481792 | 53685849048481792 | yes | 0xCAFEBABE: n=7 |
+| 7 | RCX=3735928559 | 17275436389634146304 | 17275436389634146304 | 17275436389634146304 | yes | 0xDEADBEEF: n=8 full byteswap |
+| 8 | RCX=18446744073709551615 | 18446744073709551615 | 18446744073709551615 | 18446744073709551615 | yes | all 0xFF: n=8 palindrome |
+| 9 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=1 only byte0=8 |
+| 10 | RCX=1311768467463790320 | 240 | 240 | 240 | yes | 0x12345...EF0: n=1 only byte0=0xF0 |
+
+## Source
+
+```c
+/* PC-state VM that packs the lower n = (x & 7) + 1 bytes of x into the
+ * accumulator r in REVERSED byte order:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     r = (r << 8) | (s & 0xFF);
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_byterev_window64_loop_target.
+ *
+ * Distinct from vm_bswap64_loop which is a fixed 8-byte byteswap (and
+ * gets folded to llvm.bswap.i64).  Here the trip count is symbolic
+ * (1..8), so the result is the reverse of the lowest n bytes only --
+ * which the lifter cannot collapse to a single intrinsic.  Tests
+ * shl-by-8 + or + lshr-by-8 chain inside a counter-bound loop body.
+ *
+ * Special cases worth noting:
+ *   - n=1: r ends up equal to byte0 (no rotation possible)
+ *   - n=8 with all 0xFF: result is the same all-0xFF input (palindrome)
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BvVmPc {
+    BV_INIT_ALL = 0,
+    BV_CHECK    = 1,
+    BV_PACK     = 2,
+    BV_SHIFT    = 3,
+    BV_INC      = 4,
+    BV_HALT     = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_byterev_window64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BV_INIT_ALL;
+
+    while (1) {
+        if (pc == BV_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BV_CHECK;
+        } else if (pc == BV_CHECK) {
+            pc = (i < n) ? BV_PACK : BV_HALT;
+        } else if (pc == BV_PACK) {
+            r = (r << 8) | (s & 0xFFull);
+            pc = BV_SHIFT;
+        } else if (pc == BV_SHIFT) {
+            s = s >> 8;
+            pc = BV_INC;
+        } else if (pc == BV_INC) {
+            i = i + 1ull;
+            pc = BV_CHECK;
+        } else if (pc == BV_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byterev_window64(0x0102030405060708)=%llu\n",
+           (unsigned long long)vm_byterev_window64_loop_target(0x0102030405060708ull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_byteshl3_xor64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byteshl3_xor64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byteshl3_xor64_loop.ll`
+- **Symbol:** `vm_byteshl3_xor64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byteshl3_xor64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byteshl3_xor64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: 1 << 0 ^ 0=1 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | x=2 n=3 |
+| 4 | RCX=7 | 7 | 7 | 7 | yes | x=7 n=8: only byte0 |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1 |
+| 6 | RCX=3405691582 | 110318 | 110318 | 110318 | yes | 0xCAFEBABE: n=7 - bytes XOR-stacked at 3-bit stride |
+| 7 | RCX=3735928559 | 103007 | 103007 | 103007 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 476952263 | 476952263 | 476952263 | yes | all 0xFF n=8: 0xFF placed at 0,3,6,9,...,21 bit positions then XORed |
+| 9 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=1 byte0=8 |
+| 10 | RCX=1311768467463790320 | 240 | 240 | 240 | yes | 0x12345...EF0: n=1 byte0=0xF0 |
+
+## Source
+
+```c
+/* PC-state VM that XORs each byte shifted left by (i*3) bits into r:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     r = r ^ ((s & 0xFF) << (i * 3));   // dynamic shl by 3*i
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_byteshl3_xor64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_dynshl_pack64_loop  (dynamic shl by i directly, 2-bit chunks)
+ *   - vm_byterev_window64_loop (constant shl-by-8 packing)
+ *   - vm_xormul_byte_idx64_loop (byte * counter, no shift)
+ *
+ * Tests `shl i64 byte, %i*3` (dynamic shl by a NON-trivial counter
+ * expression - mul-then-shl) inside dispatcher loop body.  Each
+ * iter's byte lands at a different 3-bit-stride offset, so byte0
+ * occupies bits 0-7, byte1 bits 3-10 (overlapping byte0's high), etc.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BsVmPc {
+    BS_INIT_ALL = 0,
+    BS_CHECK    = 1,
+    BS_BODY     = 2,
+    BS_INC      = 3,
+    BS_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_byteshl3_xor64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BS_INIT_ALL;
+
+    while (1) {
+        if (pc == BS_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BS_CHECK;
+        } else if (pc == BS_CHECK) {
+            pc = (i < n) ? BS_BODY : BS_HALT;
+        } else if (pc == BS_BODY) {
+            r = r ^ ((s & 0xFFull) << (i * 3ull));
+            s = s >> 8;
+            pc = BS_INC;
+        } else if (pc == BS_INC) {
+            i = i + 1ull;
+            pc = BS_CHECK;
+        } else if (pc == BS_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byteshl3_xor64(0xFFFFFFFFFFFFFFFF)=%llu\n",
+           (unsigned long long)vm_byteshl3_xor64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,103 @@
+# vm_byteshl_data64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_byteshl_data64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_byteshl_data64_loop.ll`
+- **Symbol:** `vm_byteshl_data64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_byteshl_data64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_byteshl_data64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | x=1 n=2: byte0=1 (b&7=1, b>>4=0); byte1=0 |
+| 3 | RCX=2 | 0 | 0 | 0 | yes | x=2 n=3 |
+| 4 | RCX=7 | 0 | 0 | 0 | yes | x=7 n=8: byte0=7 produces shl by 7 of 0=0 |
+| 5 | RCX=8 | 0 | 0 | 0 | yes | x=8 n=1: shl by 0=0; OR byte>>4=0 |
+| 6 | RCX=3405691582 | 12092 | 12092 | 12092 | yes | 0xCAFEBABE: n=7 data-driven shifts |
+| 7 | RCX=3735928559 | 1858189 | 1858189 | 1858189 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 8510739453298575 | 8510739453298575 | 8510739453298575 | yes | all 0xFF: shl by 7 each iter combined with OR 0xF |
+| 9 | RCX=72623859790382856 | 0 | 0 | 0 | yes | 0x0102...0708: n=1 byte0=8 |
+| 10 | RCX=1311768467463790320 | 15 | 15 | 15 | yes | 0x12345...EF0: n=1 byte0=0xF0 -> shl by 0=0, OR 0xF=15 |
+
+## Source
+
+```c
+/* PC-state VM with DATA-DEPENDENT shift amount inside the loop body:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     uint64_t b = s & 0xFF;
+ *     r = (r << (b & 7)) | (b >> 4);   // shl amount comes from byte data
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_byteshl_data64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_dynshl_pack64_loop      (shl by loop index i)
+ *   - vm_byteshl3_xor64_loop     (shl by i*3 - counter expression)
+ *   - vm_bitfetch_window64_loop  (lshr by counter)
+ *
+ * Tests `shl i64 r, %byte_amount` where the shift amount is derived
+ * from the BYTE STREAM rather than the loop counter.  Each iter's
+ * amount is bounded to 0..7 by `& 7` so undefined-shift behavior is
+ * avoided.  Combined with OR of the byte's high nibble.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BdVmPc {
+    BD_INIT_ALL = 0,
+    BD_CHECK    = 1,
+    BD_BODY     = 2,
+    BD_INC      = 3,
+    BD_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_byteshl_data64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BD_INIT_ALL;
+
+    while (1) {
+        if (pc == BD_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BD_CHECK;
+        } else if (pc == BD_CHECK) {
+            pc = (i < n) ? BD_BODY : BD_HALT;
+        } else if (pc == BD_BODY) {
+            uint64_t b = s & 0xFFull;
+            r = (r << (b & 7ull)) | (b >> 4);
+            s = s >> 8;
+            pc = BD_INC;
+        } else if (pc == BD_INC) {
+            i = i + 1ull;
+            pc = BD_CHECK;
+        } else if (pc == BD_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_byteshl_data64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_byteshl_data64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,104 @@
+# vm_bytesmul_idx64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytesmul_idx64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytesmul_idx64_loop.ll`
+- **Symbol:** `vm_bytesmul_idx64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytesmul_idx64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytesmul_idx64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: byte0=+1 *1 + byte1=0 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | x=2 n=3: byte0=+2 *1 |
+| 4 | RCX=7 | 7 | 7 | 7 | yes | x=7 n=8: byte0=+7 *1; rest zero |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1: byte0=+8 *1 |
+| 6 | RCX=3405691582 | 18446744073709551188 | 18446744073709551188 | 18446744073709551188 | yes | 0xCAFEBABE: n=7 mixed-sign bytes scaled by index |
+| 7 | RCX=3735928559 | 18446744073709551082 | 18446744073709551082 | 18446744073709551082 | yes | 0xDEADBEEF: n=8 mostly negative bytes |
+| 8 | RCX=18446744073709551615 | 18446744073709551580 | 18446744073709551580 | 18446744073709551580 | yes | all 0xFF n=8: -1*(1+2+...+8)=-36 -> 2^64-36 |
+| 9 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=1 byte0=+8 *1 |
+| 10 | RCX=9259542125412876287 | 18446744073709548278 | 18446744073709548278 | 18446744073709548278 | yes | 0x80808080FFFFFFFF: n=8 negative-byte-heavy mixed |
+
+## Source
+
+```c
+/* PC-state VM that accumulates each signed byte of x times its
+ * 1-based loop index over n = (x & 7) + 1 iterations:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     int8_t sb = (int8_t)(s & 0xFF);
+ *     r += (int64_t)sb * (int64_t)(i + 1);
+ *     s >>= 8;
+ *   }
+ *   return (uint64_t)r;
+ *
+ * Lift target: vm_bytesmul_idx64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_signedbytesum64_loop  (sext bytes, no index multiplier)
+ *   - vm_altbytesum64_loop     (alternating fixed sign, no multiplier)
+ *   - vm_squareadd64_loop      (single quadratic recurrence on whole x)
+ *
+ * Tests sext-i8 byte multiplied by i+1 (i is loop-index phi) chained
+ * into a signed accumulator that round-trips through u64.  The
+ * (i+1) factor exercises i64 multiply against a dynamic counter
+ * value rather than a constant.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BsVmPc {
+    BS_INIT_ALL = 0,
+    BS_CHECK    = 1,
+    BS_BODY     = 2,
+    BS_INC      = 3,
+    BS_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_bytesmul_idx64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    int64_t  r  = 0;
+    uint64_t i  = 0;
+    int      pc = BS_INIT_ALL;
+
+    while (1) {
+        if (pc == BS_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0;
+            i = 0ull;
+            pc = BS_CHECK;
+        } else if (pc == BS_CHECK) {
+            pc = (i < n) ? BS_BODY : BS_HALT;
+        } else if (pc == BS_BODY) {
+            int8_t sb = (int8_t)(s & 0xFFull);
+            r = r + (int64_t)sb * (int64_t)(i + 1ull);
+            s = s >> 8;
+            pc = BS_INC;
+        } else if (pc == BS_INC) {
+            i = i + 1ull;
+            pc = BS_CHECK;
+        } else if (pc == BS_HALT) {
+            return (uint64_t)r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytesmul_idx64(0xCAFEBABE)=%llu\n",
+           (unsigned long long)vm_bytesmul_idx64_loop_target(0xCAFEBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,103 @@
+# vm_bytesq_idx_sum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytesq_idx_sum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytesq_idx_sum64_loop.ll`
+- **Symbol:** `vm_bytesq_idx_sum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytesq_idx_sum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytesq_idx_sum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: 1*1*1=1 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | x=2 n=3: 2*1=2 |
+| 4 | RCX=7 | 7 | 7 | 7 | yes | x=7 n=8: only byte0=7 -> 7*1=7 |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1: 8*1*1=8 |
+| 6 | RCX=3405691582 | 6452 | 6452 | 6452 | yes | 0xCAFEBABE: n=7 sum of byte*counter^2 |
+| 7 | RCX=3735928559 | 6108 | 6108 | 6108 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 52020 | 52020 | 52020 | yes | all 0xFF n=8: 0xFF*204=52020 |
+| 9 | RCX=72623859790382856 | 8 | 8 | 8 | yes | 0x0102...0708: n=1 byte0=8 |
+| 10 | RCX=1311768467463790320 | 240 | 240 | 240 | yes | 0x12345...EF0: n=1 byte0=240 |
+
+## Source
+
+```c
+/* PC-state VM that sums byte * (i+1) * (i+1) over n = (x & 7) + 1 iters:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     uint64_t c = i + 1;
+ *     r = r + (s & 0xFF) * c * c;
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_bytesq_idx_sum64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_uintadd_byte_idx64_loop (byte * counter, ADD) - linear counter
+ *   - vm_xormul_byte_idx64_loop  (byte * counter, XOR) - linear counter
+ *   - vm_bytesq_sum64_loop       (byte * byte - self-multiply, no counter)
+ *
+ * Tests SQUARED counter expression `(i+1)*(i+1)` as multiplier - two
+ * sequential muls in the body (counter*counter then byte*counter^2)
+ * inside a counter-bound loop.  All-0xFF: 0xFF * (1+4+9+16+25+36+49+64)
+ * = 0xFF * 204 = 52020.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BqVmPc {
+    BQ_INIT_ALL = 0,
+    BQ_CHECK    = 1,
+    BQ_BODY     = 2,
+    BQ_INC      = 3,
+    BQ_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_bytesq_idx_sum64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BQ_INIT_ALL;
+
+    while (1) {
+        if (pc == BQ_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BQ_CHECK;
+        } else if (pc == BQ_CHECK) {
+            pc = (i < n) ? BQ_BODY : BQ_HALT;
+        } else if (pc == BQ_BODY) {
+            uint64_t c = i + 1ull;
+            r = r + (s & 0xFFull) * c * c;
+            s = s >> 8;
+            pc = BQ_INC;
+        } else if (pc == BQ_INC) {
+            i = i + 1ull;
+            pc = BQ_CHECK;
+        } else if (pc == BQ_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytesq_idx_sum64(0xFFFFFFFFFFFFFFFF)=%llu\n",
+           (unsigned long long)vm_bytesq_idx_sum64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,102 @@
+# vm_bytesq_sum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_bytesq_sum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_bytesq_sum64_loop.ll`
+- **Symbol:** `vm_bytesq_sum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_bytesq_sum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_bytesq_sum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: 1*1 + 0=1 |
+| 3 | RCX=2 | 4 | 4 | 4 | yes | x=2 n=3: 2*2=4 |
+| 4 | RCX=7 | 49 | 49 | 49 | yes | x=7 n=8: only byte0=7 -> 49 |
+| 5 | RCX=8 | 64 | 64 | 64 | yes | x=8 n=1: 8*8=64 |
+| 6 | RCX=3405691582 | 176016 | 176016 | 176016 | yes | 0xCAFEBABE: n=7 sum of squared bytes |
+| 7 | RCX=3735928559 | 172434 | 172434 | 172434 | yes | 0xDEADBEEF: n=8 sum of squared bytes |
+| 8 | RCX=18446744073709551615 | 520200 | 520200 | 520200 | yes | all 0xFF n=8: 8*255*255=520200 |
+| 9 | RCX=72623859790382856 | 64 | 64 | 64 | yes | 0x0102...0708: n=1 byte0=8 -> 64 |
+| 10 | RCX=1311768467463790320 | 57600 | 57600 | 57600 | yes | 0x12345...EF0: n=1 byte0=0xF0=240 -> 57600 |
+
+## Source
+
+```c
+/* PC-state VM that sums squared bytes over n = (x & 7) + 1 iterations:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     uint64_t b = s & 0xFF;
+ *     r = r + b * b;          // u8 squared
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_bytesq_sum64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_popsq64_loop           (sum of squared POPCOUNTS of bytes)
+ *   - vm_squareadd64_loop       (single-state r = r*r + i quadratic)
+ *   - vm_uintadd_byte_idx64_loop (byte * counter)
+ *
+ * Tests u8 self-multiply (b * b) accumulator across a byte stream.
+ * No counter scaling; every byte squared and summed.  All-0xFF input
+ * accumulates 8 * 255*255 = 8 * 65025 = 520200.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum BqVmPc {
+    BQ_INIT_ALL = 0,
+    BQ_CHECK    = 1,
+    BQ_BODY     = 2,
+    BQ_INC      = 3,
+    BQ_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_bytesq_sum64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = BQ_INIT_ALL;
+
+    while (1) {
+        if (pc == BQ_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = BQ_CHECK;
+        } else if (pc == BQ_CHECK) {
+            pc = (i < n) ? BQ_BODY : BQ_HALT;
+        } else if (pc == BQ_BODY) {
+            uint64_t b = s & 0xFFull;
+            r = r + b * b;
+            s = s >> 8;
+            pc = BQ_INC;
+        } else if (pc == BQ_INC) {
+            i = i + 1ull;
+            pc = BQ_CHECK;
+        } else if (pc == BQ_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_bytesq_sum64(0xCAFEBABE)=%llu\n",
+           (unsigned long long)vm_bytesq_sum64_loop_target(0xCAFEBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_ca_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 12/12 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_ca_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_ca_loop.ll`
+- **Symbol:** `vm_ca_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_ca_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_ca_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | empty state |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | n=0, state=1 |
+| 3 | RCX=257 | 2 | 2 | 2 | yes | n=1, state=1: left=2, right=0, state=2 |
+| 4 | RCX=513 | 5 | 5 | 5 | yes | n=2 |
+| 5 | RCX=769 | 8 | 8 | 8 | yes | n=3 |
+| 6 | RCX=1025 | 20 | 20 | 20 | yes | n=4 |
+| 7 | RCX=1793 | 128 | 128 | 128 | yes | n=7, state=1 |
+| 8 | RCX=256 | 0 | 0 | 0 | yes | n=1, state=0 |
+| 9 | RCX=24 | 24 | 24 | 24 | yes | n=0, state=0x18 |
+| 10 | RCX=280 | 60 | 60 | 60 | yes | n=1, state=0x18 |
+| 11 | RCX=1816 | 24 | 24 | 24 | yes | n=7, state=0x18 |
+| 12 | RCX=1877 | 170 | 170 | 170 | yes | n=7, state=0x55 |
+
+## Source
+
+```c
+/* PC-state VM that applies a Rule-90-like cellular automaton step to an
+ * 8-bit state.
+ * Lift target: vm_ca_loop_target.
+ * Goal: cover a single-state recurrence whose body combines a left-shift
+ * and a right-shift via XOR (state' = (state<<1) ^ (state>>1)).  Distinct
+ * from vm_lfsr_loop (single shift + conditional XOR) and vm_rotate_loop
+ * (shift+or for rotation): here the linear XOR couples both shift
+ * directions every iteration.
+ */
+#include <stdio.h>
+
+enum CaVmPc {
+    CA_LOAD       = 0,
+    CA_INIT       = 1,
+    CA_CHECK      = 2,
+    CA_BODY_LEFT  = 3,
+    CA_BODY_RIGHT = 4,
+    CA_BODY_XOR   = 5,
+    CA_BODY_MASK  = 6,
+    CA_BODY_DEC   = 7,
+    CA_HALT       = 8,
+};
+
+__declspec(noinline)
+int vm_ca_loop_target(int x) {
+    int state = 0;
+    int n     = 0;
+    int left  = 0;
+    int right = 0;
+    int pc    = CA_LOAD;
+
+    while (1) {
+        if (pc == CA_LOAD) {
+            state = x & 0xFF;
+            n = (x >> 8) & 7;
+            pc = CA_INIT;
+        } else if (pc == CA_INIT) {
+            pc = CA_CHECK;
+        } else if (pc == CA_CHECK) {
+            pc = (n > 0) ? CA_BODY_LEFT : CA_HALT;
+        } else if (pc == CA_BODY_LEFT) {
+            left = state << 1;
+            pc = CA_BODY_RIGHT;
+        } else if (pc == CA_BODY_RIGHT) {
+            right = (int)((unsigned)state >> 1);
+            pc = CA_BODY_XOR;
+        } else if (pc == CA_BODY_XOR) {
+            state = left ^ right;
+            pc = CA_BODY_MASK;
+        } else if (pc == CA_BODY_MASK) {
+            state = state & 0xFF;
+            pc = CA_BODY_DEC;
+        } else if (pc == CA_BODY_DEC) {
+            n = n - 1;
+            pc = CA_CHECK;
+        } else if (pc == CA_HALT) {
+            return state;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_ca_loop(0x701)=%d vm_ca_loop(0x755)=%d\n",
+           vm_ca_loop_target(0x701), vm_ca_loop_target(0x755));
+    return 0;
+}
+```
@@ -0,0 +1,120 @@
+# vm_caesar_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 12/12 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_caesar_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_caesar_loop.ll`
+- **Symbol:** `vm_caesar_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_caesar_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_caesar_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 92 | 92 | 92 | yes | shift=0 |
+| 2 | RCX=1 | 100 | 100 | 100 | yes | shift=0, x=1: buf shifts up by 1 each |
+| 3 | RCX=256 | 100 | 100 | 100 | yes | shift=1, x=0 |
+| 4 | RCX=257 | 108 | 108 | 108 | yes | shift=1, x=1 |
+| 5 | RCX=264 | 132 | 132 | 132 | yes | shift=1, x=8 |
+| 6 | RCX=272 | 100 | 100 | 100 | yes | shift=1, x=16 |
+| 7 | RCX=768 | 116 | 116 | 116 | yes | shift=3, x=0 |
+| 8 | RCX=2581 | 116 | 116 | 116 | yes | 0xA15: shift=10, x=0x15 |
+| 9 | RCX=3074 | 108 | 108 | 108 | yes | 0xC02: shift=12, x=2 |
+| 10 | RCX=7936 | 116 | 116 | 116 | yes | 0x1F00: shift=31, x=0 |
+| 11 | RCX=255 | 116 | 116 | 116 | yes | 0xFF: shift=0, x=0xFF |
+| 12 | RCX=4660 | 140 | 140 | 140 | yes | 0x1234 |
+
+## Source
+
+```c
+/* PC-state VM running an additive (Caesar-style) shift transform on a stack
+ * buffer.
+ * Lift target: vm_caesar_loop_target.
+ * Goal: cover a two-phase VM (fill, transform-in-place, sum) where the
+ * transformation is ADD+MASK rather than XOR.  Distinct from
+ * vm_xordecrypt_loop (XOR+sum).
+ */
+#include <stdio.h>
+
+enum CsVmPc {
+    CS_LOAD       = 0,
+    CS_INIT_FILL  = 1,
+    CS_FILL_CHECK = 2,
+    CS_FILL_BODY  = 3,
+    CS_FILL_INC   = 4,
+    CS_INIT_TX    = 5,
+    CS_TX_CHECK   = 6,
+    CS_TX_BODY    = 7,
+    CS_TX_INC     = 8,
+    CS_INIT_SUM   = 9,
+    CS_SUM_CHECK  = 10,
+    CS_SUM_BODY   = 11,
+    CS_SUM_INC    = 12,
+    CS_HALT       = 13,
+};
+
+__declspec(noinline)
+int vm_caesar_loop_target(int x) {
+    int buf[8];
+    int idx     = 0;
+    int shift   = 0;
+    int byte    = 0;
+    int sum     = 0;
+    int pc      = CS_LOAD;
+
+    while (1) {
+        if (pc == CS_LOAD) {
+            shift = (x >> 8) & 0x1F;
+            sum = 0;
+            pc = CS_INIT_FILL;
+        } else if (pc == CS_INIT_FILL) {
+            idx = 0;
+            pc = CS_FILL_CHECK;
+        } else if (pc == CS_FILL_CHECK) {
+            pc = (idx < 8) ? CS_FILL_BODY : CS_INIT_TX;
+        } else if (pc == CS_FILL_BODY) {
+            buf[idx] = (x + idx * 0x11) & 0x1F;
+            pc = CS_FILL_INC;
+        } else if (pc == CS_FILL_INC) {
+            idx = idx + 1;
+            pc = CS_FILL_CHECK;
+        } else if (pc == CS_INIT_TX) {
+            idx = 0;
+            pc = CS_TX_CHECK;
+        } else if (pc == CS_TX_CHECK) {
+            pc = (idx < 8) ? CS_TX_BODY : CS_INIT_SUM;
+        } else if (pc == CS_TX_BODY) {
+            byte = buf[idx];
+            buf[idx] = (byte + shift) & 0x1F;
+            pc = CS_TX_INC;
+        } else if (pc == CS_TX_INC) {
+            idx = idx + 1;
+            pc = CS_TX_CHECK;
+        } else if (pc == CS_INIT_SUM) {
+            idx = 0;
+            pc = CS_SUM_CHECK;
+        } else if (pc == CS_SUM_CHECK) {
+            pc = (idx < 8) ? CS_SUM_BODY : CS_HALT;
+        } else if (pc == CS_SUM_BODY) {
+            sum = sum + buf[idx];
+            pc = CS_SUM_INC;
+        } else if (pc == CS_SUM_INC) {
+            idx = idx + 1;
+            pc = CS_SUM_CHECK;
+        } else if (pc == CS_HALT) {
+            return sum;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_caesar_loop(0x108)=%d vm_caesar_loop(0x1234)=%d\n",
+           vm_caesar_loop_target(0x108), vm_caesar_loop_target(0x1234));
+    return 0;
+}
+```
@@ -0,0 +1,117 @@
+# vm_carrychain_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_carrychain_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_carrychain_loop.ll`
+- **Symbol:** `vm_carrychain_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_carrychain_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_carrychain_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | a=0,b=0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | a=1,b=0 |
+| 3 | RCX=256 | 1 | 1 | 1 | yes | a=0,b=1 |
+| 4 | RCX=257 | 2 | 2 | 2 | yes | a=1,b=1: 1+1=2 |
+| 5 | RCX=258 | 3 | 3 | 3 | yes | a=2,b=1 |
+| 6 | RCX=65535 | 510 | 510 | 510 | yes | a=0xFF,b=0xFF: 510 with carry |
+| 7 | RCX=3855 | 30 | 30 | 30 | yes | a=0x0F,b=0x0F |
+| 8 | RCX=61680 | 480 | 480 | 480 | yes | a=0xF0,b=0xF0: carry-out |
+| 9 | RCX=43605 | 255 | 255 | 255 | yes | a=0x55,b=0xAA: 0xFF |
+| 10 | RCX=33023 | 383 | 383 | 383 | yes | a=0xFF,b=0x80: carry |
+| 11 | RCX=128 | 128 | 128 | 128 | yes | a=0x80,b=0 |
+
+## Source
+
+```c
+/* PC-state VM running an 8-bit ripple-carry adder bit-by-bit.
+ * Lift target: vm_carrychain_loop_target.
+ * Goal: cover a fixed-trip-count loop where each iteration depends on the
+ * carry produced in the previous iteration (sequential dependency that
+ * cannot be parallelised by the optimizer).  Inputs a = x & 0xFF and
+ * b = (x >> 8) & 0xFF, output is (a+b) packed as low byte | (carry<<8).
+ */
+#include <stdio.h>
+
+enum CcVmPc {
+    CC_LOAD     = 0,
+    CC_INIT     = 1,
+    CC_CHECK    = 2,
+    CC_BODY_BA  = 3,
+    CC_BODY_BB  = 4,
+    CC_BODY_SUM = 5,
+    CC_BODY_NC  = 6,
+    CC_BODY_OR  = 7,
+    CC_BODY_INC = 8,
+    CC_PACK     = 9,
+    CC_HALT     = 10,
+};
+
+__declspec(noinline)
+int vm_carrychain_loop_target(int x) {
+    int a       = 0;
+    int b       = 0;
+    int i       = 0;
+    int carry   = 0;
+    int result  = 0;
+    int ba      = 0;
+    int bb      = 0;
+    int bs      = 0;
+    int nc      = 0;
+    int xor_ab  = 0;
+    int pc      = CC_LOAD;
+
+    while (1) {
+        if (pc == CC_LOAD) {
+            a = x & 0xFF;
+            b = (x >> 8) & 0xFF;
+            i = 0;
+            carry = 0;
+            result = 0;
+            pc = CC_INIT;
+        } else if (pc == CC_INIT) {
+            pc = CC_CHECK;
+        } else if (pc == CC_CHECK) {
+            pc = (i < 8) ? CC_BODY_BA : CC_PACK;
+        } else if (pc == CC_BODY_BA) {
+            ba = (a >> i) & 1;
+            pc = CC_BODY_BB;
+        } else if (pc == CC_BODY_BB) {
+            bb = (b >> i) & 1;
+            pc = CC_BODY_SUM;
+        } else if (pc == CC_BODY_SUM) {
+            xor_ab = ba ^ bb;
+            bs = xor_ab ^ carry;
+            pc = CC_BODY_NC;
+        } else if (pc == CC_BODY_NC) {
+            nc = (ba & bb) | (carry & xor_ab);
+            pc = CC_BODY_OR;
+        } else if (pc == CC_BODY_OR) {
+            result = result | (bs << i);
+            carry = nc;
+            pc = CC_BODY_INC;
+        } else if (pc == CC_BODY_INC) {
+            i = i + 1;
+            pc = CC_CHECK;
+        } else if (pc == CC_PACK) {
+            result = result | (carry << 8);
+            pc = CC_HALT;
+        } else if (pc == CC_HALT) {
+            return result;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_carrychain_loop(0xFFFF)=%d vm_carrychain_loop(0xAA55)=%d\n",
+           vm_carrychain_loop_target(0xFFFF), vm_carrychain_loop_target(0xAA55));
+    return 0;
+}
+```
@@ -0,0 +1,96 @@
+# vm_choosemax64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_choosemax64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_choosemax64_loop.ll`
+- **Symbol:** `vm_choosemax64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_choosemax64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_choosemax64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0, n=1 |
+| 2 | RCX=1 | 10 | 10 | 10 | yes | x=1, n=2 |
+| 3 | RCX=2 | 59 | 59 | 59 | yes | x=2, n=3 |
+| 4 | RCX=7 | 47563 | 47563 | 47563 | yes | x=7, n=8 |
+| 5 | RCX=15 | 656462487 | 656462487 | 656462487 | yes | x=15, n=16 max |
+| 6 | RCX=255 | 10987675527 | 10987675527 | 10987675527 | yes | x=0xFF, n=16 |
+| 7 | RCX=51966 | 745658888381 | 745658888381 | 745658888381 | yes | x=0xCAFE, n=15 |
+| 8 | RCX=3405691582 | 48867951784388093 | 48867951784388093 | 48867951784388093 | yes | x=0xCAFEBABE, n=15 |
+| 9 | RCX=18446744073709551615 | 18446744073709551493 | 18446744073709551493 | 18446744073709551493 | yes | max u64, n=16: wraps and opt2 wins many iters |
+| 10 | RCX=11400714819323198485 | 15755400384260043894 | 15755400384260043894 | 15755400384260043894 | yes | K (golden), n=6 |
+
+## Source
+
+```c
+/* PC-state VM that picks the larger (unsigned) of two derived options
+ * per iteration on full uint64_t state.
+ *   s = x; n = (x & 0xF) + 1;
+ *   for i in 0..n:
+ *     opt1 = s * 3 + i
+ *     opt2 = s + i*i
+ *     s = (opt1 > opt2) ? opt1 : opt2
+ *   return s;
+ * Lift target: vm_choosemax64_loop_target.
+ *
+ * Distinct from vm_smax64_loop (signed-max accumulator over derived
+ * sequence) and vm_satadd64_loop (overflow-clamp): per-iteration choice
+ * between two locally-computed options via icmp ugt + select.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum CmVmPc {
+    CM_LOAD       = 0,
+    CM_INIT       = 1,
+    CM_LOOP_CHECK = 2,
+    CM_LOOP_BODY  = 3,
+    CM_LOOP_INC   = 4,
+    CM_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_choosemax64_loop_target(uint64_t x) {
+    int      idx = 0;
+    int      n   = 0;
+    uint64_t s   = 0;
+    int      pc  = CM_LOAD;
+
+    while (1) {
+        if (pc == CM_LOAD) {
+            s = x;
+            n = (int)(x & 0xFull) + 1;
+            pc = CM_INIT;
+        } else if (pc == CM_INIT) {
+            idx = 0;
+            pc = CM_LOOP_CHECK;
+        } else if (pc == CM_LOOP_CHECK) {
+            pc = (idx < n) ? CM_LOOP_BODY : CM_HALT;
+        } else if (pc == CM_LOOP_BODY) {
+            uint64_t opt1 = s * 3ull + (uint64_t)idx;
+            uint64_t opt2 = s + (uint64_t)(idx * idx);
+            s = (opt1 > opt2) ? opt1 : opt2;
+            pc = CM_LOOP_INC;
+        } else if (pc == CM_LOOP_INC) {
+            idx = idx + 1;
+            pc = CM_LOOP_CHECK;
+        } else if (pc == CM_HALT) {
+            return s;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_choosemax64(0xCAFE)=%llu vm_choosemax64(0xFF)=%llu\n",
+           (unsigned long long)vm_choosemax64_loop_target(0xCAFEull),
+           (unsigned long long)vm_choosemax64_loop_target(0xFFull));
+    return 0;
+}
+```
@@ -0,0 +1,107 @@
+# vm_classify_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_classify_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_classify_loop.ll`
+- **Symbol:** `vm_classify_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_classify_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_classify_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 10 | 10 | 10 | yes | n=1, v=-7: 1 neg |
+| 2 | RCX=1 | 20 | 20 | 20 | yes | n=2, all neg |
+| 3 | RCX=7 | 71 | 71 | 71 | yes | n=8, 1 zero + 7 neg |
+| 4 | RCX=8 | 100 | 100 | 100 | yes | n=1, v=1: 1 pos |
+| 5 | RCX=119 | 62 | 62 | 62 | yes | n=8, 6 neg + 2 zero |
+| 6 | RCX=136 | 100 | 100 | 100 | yes | n=1, v=1 |
+| 7 | RCX=240 | 10 | 10 | 10 | yes | n=1, v=-7 |
+| 8 | RCX=255 | 260 | 260 | 260 | yes | n=8, 2 pos + 6 neg |
+| 9 | RCX=291 | 40 | 40 | 40 | yes | n=4, all neg |
+| 10 | RCX=11259375 | 620 | 620 | 620 | yes | 0xABCDEF: n=8, 6 pos + 2 neg |
+
+## Source
+
+```c
+/* PC-state VM with a three-way branch in the loop body (sign classifier).
+ * Lift target: vm_classify_loop_target.
+ * Goal: cover a loop body that splits to one of three handlers and merges
+ * back, with each handler adding a different constant into a single packed
+ * accumulator (avoids the multi-counter phi-undef regression seen with
+ * separate pos/neg/zer slots in the early-halt path).  Result encodes
+ * pos*100 + neg*10 + zer.
+ */
+#include <stdio.h>
+
+enum ClsVmPc {
+    CL_LOAD       = 0,
+    CL_INIT       = 1,
+    CL_CHECK      = 2,
+    CL_BODY_LOAD  = 3,
+    CL_BODY_TEST_POS = 4,
+    CL_BODY_TEST_ZERO = 5,
+    CL_ADD_POS    = 6,
+    CL_ADD_NEG    = 7,
+    CL_ADD_ZER    = 8,
+    CL_BODY_INC   = 9,
+    CL_HALT       = 10,
+};
+
+__declspec(noinline)
+int vm_classify_loop_target(int x) {
+    int n      = 0;
+    int idx    = 0;
+    int acc    = 0;
+    int v      = 0;
+    int shift  = 0;
+    int pc     = CL_LOAD;
+
+    while (1) {
+        if (pc == CL_LOAD) {
+            n = (x & 7) + 1;
+            idx = 0;
+            acc = 0;
+            pc = CL_INIT;
+        } else if (pc == CL_INIT) {
+            pc = CL_CHECK;
+        } else if (pc == CL_CHECK) {
+            pc = (idx < n) ? CL_BODY_LOAD : CL_HALT;
+        } else if (pc == CL_BODY_LOAD) {
+            shift = idx * 4;
+            v = ((x >> shift) & 0xF) - 7;
+            pc = CL_BODY_TEST_POS;
+        } else if (pc == CL_BODY_TEST_POS) {
+            pc = (v > 0) ? CL_ADD_POS : CL_BODY_TEST_ZERO;
+        } else if (pc == CL_BODY_TEST_ZERO) {
+            pc = (v == 0) ? CL_ADD_ZER : CL_ADD_NEG;
+        } else if (pc == CL_ADD_POS) {
+            acc = acc + 100;
+            pc = CL_BODY_INC;
+        } else if (pc == CL_ADD_NEG) {
+            acc = acc + 10;
+            pc = CL_BODY_INC;
+        } else if (pc == CL_ADD_ZER) {
+            acc = acc + 1;
+            pc = CL_BODY_INC;
+        } else if (pc == CL_BODY_INC) {
+            idx = idx + 1;
+            pc = CL_CHECK;
+        } else if (pc == CL_HALT) {
+            return acc;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_classify_loop(0x77)=%d vm_classify_loop(0xFF)=%d\n",
+           vm_classify_loop_target(0x77), vm_classify_loop_target(0xFF));
+    return 0;
+}
+```
@@ -0,0 +1,93 @@
+# vm_clz64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_clz64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_clz64_loop.ll`
+- **Symbol:** `vm_clz64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_clz64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_clz64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 64 | 64 | 64 | yes | x=0: special-case 64 |
+| 2 | RCX=1 | 63 | 63 | 63 | yes | x=1: 63 leading zeros (max trip) |
+| 3 | RCX=2 | 62 | 62 | 62 | yes | x=2: 62 |
+| 4 | RCX=128 | 56 | 56 | 56 | yes | x=0x80: 56 |
+| 5 | RCX=65536 | 47 | 47 | 47 | yes | x=0x10000: 47 |
+| 6 | RCX=4294967296 | 31 | 31 | 31 | yes | x=2^32: 31 |
+| 7 | RCX=9223372036854775808 | 0 | 0 | 0 | yes | x=2^63: 0 (MSB set) |
+| 8 | RCX=3405691582 | 32 | 32 | 32 | yes | x=0xCAFEBABE: 32 |
+| 9 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | max u64: 0 |
+| 10 | RCX=11400714819323198485 | 0 | 0 | 0 | yes | x=K (golden, MSB set): 0 |
+
+## Source
+
+```c
+/* PC-state VM running an i64 count-leading-zeros via shift-loop.
+ *   if (x == 0) return 64;
+ *   count = 0;
+ *   while ((x & 0x8000000000000000) == 0) { x <<= 1; count++; }
+ *   return count;
+ * Variable trip 0..63 (or short-circuit 64 for zero).
+ * Lift target: vm_clz64_loop_target.
+ *
+ * Companion to vm_cttz64_loop (which counts trailing zeros via shift-right).
+ * Distinct from vm_imported_clz_loop (i32 _BitScanReverse intrinsic):
+ * exercises explicit shift-left + MSB-test on full i64 in a variable-trip loop.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum ClVmPc {
+    CL_LOAD       = 0,
+    CL_INIT       = 1,
+    CL_ZERO_CHECK = 2,
+    CL_LOOP_CHECK = 3,
+    CL_LOOP_BODY  = 4,
+    CL_HALT       = 5,
+};
+
+__declspec(noinline)
+int vm_clz64_loop_target(uint64_t x) {
+    uint64_t state = 0;
+    int      count = 0;
+    int      pc    = CL_LOAD;
+
+    while (1) {
+        if (pc == CL_LOAD) {
+            state = x;
+            count = 0;
+            pc = CL_ZERO_CHECK;
+        } else if (pc == CL_ZERO_CHECK) {
+            if (state == 0ull) {
+                count = 64;
+                pc = CL_HALT;
+            } else {
+                pc = CL_LOOP_CHECK;
+            }
+        } else if (pc == CL_LOOP_CHECK) {
+            pc = ((state & 0x8000000000000000ull) == 0ull) ? CL_LOOP_BODY : CL_HALT;
+        } else if (pc == CL_LOOP_BODY) {
+            state = state << 1;
+            count = count + 1;
+            pc = CL_LOOP_CHECK;
+        } else if (pc == CL_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_clz64(1)=%d vm_clz64(0x8000000000000000)=%d\n",
+           vm_clz64_loop_target(1ull),
+           vm_clz64_loop_target(0x8000000000000000ull));
+    return 0;
+}
+```
@@ -0,0 +1,86 @@
+# vm_collatz64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_collatz64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_collatz64_loop.ll`
+- **Symbol:** `vm_collatz64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_collatz64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_collatz64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=1 | 0 | 0 | 0 | yes | x=1: zero steps |
+| 2 | RCX=2 | 1 | 1 | 1 | yes | x=2: one halving |
+| 3 | RCX=3 | 7 | 7 | 7 | yes | x=3: 7 steps |
+| 4 | RCX=6 | 8 | 8 | 8 | yes | x=6: 8 steps |
+| 5 | RCX=27 | 111 | 111 | 111 | yes | x=27: classic 111-step Collatz |
+| 6 | RCX=51966 | 171 | 171 | 171 | yes | x=0xCAFE |
+| 7 | RCX=4294967296 | 32 | 32 | 32 | yes | x=2^32: 32 halvings |
+| 8 | RCX=18446744073709551614 | 618 | 618 | 618 | yes | max u64 - 1: 618 steps incl. mul-wrap |
+| 9 | RCX=9223372036854775808 | 63 | 63 | 63 | yes | x=2^63: 63 halvings |
+| 10 | RCX=11400714819323198485 | 414 | 414 | 414 | yes | x=K (golden ratio): 414 steps |
+
+## Source
+
+```c
+/* PC-state VM running the Collatz sequence on a FULL uint64_t state.
+ *   while (state != 1) { state = (state & 1) ? 3*state + 1 : state / 2; count++; }
+ * Trip count is data-dependent on the input.  3*x+1 wraps mod 2^64 for
+ * very large inputs but Collatz still converges within bounded steps.
+ * Lift target: vm_collatz64_loop_target.
+ *
+ * Distinct from vm_collatz_loop (i32 Collatz): exercises the same
+ * algorithm shape on full 64-bit state with i64 udiv (lshr-by-1) and
+ * i64 mul-by-3 + add operations.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum C64VmPc {
+    C64_LOAD       = 0,
+    C64_LOOP_CHECK = 1,
+    C64_LOOP_BODY  = 2,
+    C64_HALT       = 3,
+};
+
+__declspec(noinline)
+int vm_collatz64_loop_target(uint64_t x) {
+    uint64_t state = 0;
+    int      count = 0;
+    int      pc    = C64_LOAD;
+
+    while (1) {
+        if (pc == C64_LOAD) {
+            state = x;
+            count = 0;
+            pc = C64_LOOP_CHECK;
+        } else if (pc == C64_LOOP_CHECK) {
+            pc = (state != 1ull) ? C64_LOOP_BODY : C64_HALT;
+        } else if (pc == C64_LOOP_BODY) {
+            if ((state & 1ull) == 0ull) {
+                state = state >> 1;
+            } else {
+                state = state * 3ull + 1ull;
+            }
+            count = count + 1;
+            pc = C64_LOOP_CHECK;
+        } else if (pc == C64_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_collatz64(27)=%d vm_collatz64(0xCAFE)=%d\n",
+           vm_collatz64_loop_target(27ull),
+           vm_collatz64_loop_target(0xCAFEull));
+    return 0;
+}
+```
@@ -0,0 +1,86 @@
+# vm_collatz_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_collatz_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_collatz_loop.ll`
+- **Symbol:** `vm_collatz_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_collatz_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_collatz_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | n=1: already done |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | n=2: 2->1 |
+| 3 | RCX=2 | 7 | 7 | 7 | yes | n=3: 7 steps |
+| 4 | RCX=3 | 2 | 2 | 2 | yes | n=4: 4->2->1 |
+| 5 | RCX=4 | 5 | 5 | 5 | yes | n=5: 5 steps |
+| 6 | RCX=5 | 8 | 8 | 8 | yes | n=6: 8 steps |
+| 7 | RCX=6 | 16 | 16 | 16 | yes | n=7: 16 steps |
+| 8 | RCX=7 | 3 | 3 | 3 | yes | n=8: 3 steps |
+
+## Source
+
+```c
+/* PC-state VM running a Collatz step counter.
+ * Lift target: vm_collatz_loop_target.
+ * Goal: data-dependent control flow inside the VM loop body (parity test
+ * picks the divide-by-two or 3n+1 handler).  The loop terminates when n
+ * reaches 1 - the iteration count itself is the return value.  Input is
+ * mapped to (x & 7) + 1 so n stays in [1, 8] and the trip count is bounded
+ * (max 16 for n=7) while remaining symbolic.
+ */
+#include <stdio.h>
+
+enum CollatzVmPc {
+    CV_INIT       = 0,
+    CV_LOAD_N     = 1,
+    CV_CHECK_DONE = 2,
+    CV_TEST_PARITY= 3,
+    CV_EVEN_HALVE = 4,
+    CV_ODD_3N1    = 5,
+    CV_INC_STEPS  = 6,
+    CV_HALT       = 7,
+};
+
+__declspec(noinline)
+int vm_collatz_loop_target(int x) {
+    int n     = 0;
+    int steps = 0;
+    int pc    = CV_LOAD_N;
+
+    while (1) {
+        if (pc == CV_LOAD_N) {
+            n = (x & 7) + 1;
+            pc = CV_CHECK_DONE;
+        } else if (pc == CV_CHECK_DONE) {
+            pc = (n != 1) ? CV_TEST_PARITY : CV_HALT;
+        } else if (pc == CV_TEST_PARITY) {
+            pc = ((n & 1) == 0) ? CV_EVEN_HALVE : CV_ODD_3N1;
+        } else if (pc == CV_EVEN_HALVE) {
+            n = n / 2;
+            pc = CV_INC_STEPS;
+        } else if (pc == CV_ODD_3N1) {
+            n = 3 * n + 1;
+            pc = CV_INC_STEPS;
+        } else if (pc == CV_INC_STEPS) {
+            steps = steps + 1;
+            pc = CV_CHECK_DONE;
+        } else if (pc == CV_HALT) {
+            return steps;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_collatz_loop(2)=%d vm_collatz_loop(6)=%d\n",
+           vm_collatz_loop_target(2), vm_collatz_loop_target(6));
+    return 0;
+}
+```
@@ -0,0 +1,98 @@
+# vm_condsum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_condsum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_condsum64_loop.ll`
+- **Symbol:** `vm_condsum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_condsum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_condsum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0, n=1: val=0 even, no accumulate |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1, n=2 |
+| 3 | RCX=2 | 11400714819323198487 | 11400714819323198487 | 11400714819323198487 | yes | x=2, n=3 |
+| 4 | RCX=31 | 6053433728553997728 | 6053433728553997728 | 6053433728553997728 | yes | x=0x1F, n=32 max |
+| 5 | RCX=255 | 6053433728554001312 | 6053433728554001312 | 6053433728554001312 | yes | x=0xFF, n=32 |
+| 6 | RCX=51966 | 1063408102092763991 | 1063408102092763991 | 1063408102092763991 | yes | 0xCAFE, n=31 |
+| 7 | RCX=3405691582 | 1063408153177358231 | 1063408153177358231 | 1063408153177358231 | yes | 0xCAFEBABE, n=31 |
+| 8 | RCX=1311768467463790320 | 2270133228012960960 | 2270133228012960960 | 2270133228012960960 | yes | 0x123...DEF0, n=17 |
+| 9 | RCX=18446744073709551615 | 6053433728553997216 | 6053433728553997216 | 6053433728553997216 | yes | max u64, n=32 |
+| 10 | RCX=11400714819323198485 | 14427431683600197101 | 14427431683600197101 | 14427431683600197101 | yes | K (golden), n=22 |
+
+## Source
+
+```c
+/* PC-state VM that conditionally sums values (only when the value is
+ * odd) over a derived sequence.
+ *   s = 0; n = (x & 0x1F) + 1;
+ *   for i in 0..n:
+ *     val = x + i * K_golden
+ *     if (val & 1) s = s + val
+ *   return s;
+ * Lift target: vm_condsum64_loop_target.
+ *
+ * Distinct from vm_smax64_loop (always-update via icmp sgt) and
+ * vm_satadd64_loop (overflow-clamp): the body GATES the accumulator
+ * on a parity bit-test, so some iterations contribute zero.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum CsVmPc {
+    CS_LOAD       = 0,
+    CS_INIT       = 1,
+    CS_LOOP_CHECK = 2,
+    CS_LOOP_BODY  = 3,
+    CS_LOOP_INC   = 4,
+    CS_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_condsum64_loop_target(uint64_t x) {
+    int      idx = 0;
+    int      n   = 0;
+    uint64_t xx  = 0;
+    uint64_t s   = 0;
+    int      pc  = CS_LOAD;
+
+    while (1) {
+        if (pc == CS_LOAD) {
+            xx = x;
+            n  = (int)(x & 0x1Full) + 1;
+            s  = 0ull;
+            pc = CS_INIT;
+        } else if (pc == CS_INIT) {
+            idx = 0;
+            pc = CS_LOOP_CHECK;
+        } else if (pc == CS_LOOP_CHECK) {
+            pc = (idx < n) ? CS_LOOP_BODY : CS_HALT;
+        } else if (pc == CS_LOOP_BODY) {
+            uint64_t val = xx + (uint64_t)idx * 0x9E3779B97F4A7C15ull;
+            if ((val & 1ull) != 0ull) {
+                s = s + val;
+            }
+            pc = CS_LOOP_INC;
+        } else if (pc == CS_LOOP_INC) {
+            idx = idx + 1;
+            pc = CS_LOOP_CHECK;
+        } else if (pc == CS_HALT) {
+            return s;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_condsum64(0xCAFE)=%llu vm_condsum64(0xFF)=%llu\n",
+           (unsigned long long)vm_condsum64_loop_target(0xCAFEull),
+           (unsigned long long)vm_condsum64_loop_target(0xFFull));
+    return 0;
+}
+```
@@ -0,0 +1,84 @@
+# vm_countdown_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_countdown_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_countdown_loop.ll`
+- **Symbol:** `vm_countdown_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_countdown_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_countdown_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | count=0: empty sum |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | count=1: T(1) |
+| 3 | RCX=2 | 3 | 3 | 3 | yes | count=2: T(2) |
+| 4 | RCX=5 | 15 | 15 | 15 | yes | count=5: T(5) |
+| 5 | RCX=10 | 55 | 55 | 55 | yes | count=10: T(10) |
+| 6 | RCX=15 | 120 | 120 | 120 | yes | count=15: T(15) |
+| 7 | RCX=16 | 0 | 0 | 0 | yes | count=0 again (mask drops bit 4) |
+| 8 | RCX=255 | 120 | 120 | 120 | yes | count=15 again after mask |
+
+## Source
+
+```c
+/* PC-state VM with a reverse-induction counted loop.
+ * Lift target: vm_countdown_loop_target.
+ * Goal: exercise loop detection for a loop whose induction variable *decreases*
+ * and whose bound is a symbolic countdown rather than a rising compare.
+ * Computes the triangular number sum(1..n) where n = x & 0xF, but builds it
+ * by counting down from n to 1 instead of up.
+ */
+#include <stdio.h>
+
+enum CdVmPc {
+    CD_INIT       = 0,
+    CD_LOAD_COUNT = 1,
+    CD_INIT_SUM   = 2,
+    CD_CHECK      = 3,
+    CD_BODY_ADD   = 4,
+    CD_BODY_DEC   = 5,
+    CD_HALT       = 6,
+};
+
+__declspec(noinline)
+int vm_countdown_loop_target(int x) {
+    int count = 0;
+    int sum   = 0;
+    int pc    = CD_INIT;
+
+    while (1) {
+        if (pc == CD_INIT) {
+            pc = CD_LOAD_COUNT;
+        } else if (pc == CD_LOAD_COUNT) {
+            count = x & 0xF;
+            pc = CD_INIT_SUM;
+        } else if (pc == CD_INIT_SUM) {
+            sum = 0;
+            pc = CD_CHECK;
+        } else if (pc == CD_CHECK) {
+            pc = (count > 0) ? CD_BODY_ADD : CD_HALT;
+        } else if (pc == CD_BODY_ADD) {
+            sum = sum + count;
+            pc = CD_BODY_DEC;
+        } else if (pc == CD_BODY_DEC) {
+            count = count - 1;
+            pc = CD_CHECK;
+        } else if (pc == CD_HALT) {
+            return sum;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_countdown_loop(10)=%d vm_countdown_loop(15)=%d\n",
+           vm_countdown_loop_target(10), vm_countdown_loop_target(15));
+    return 0;
+}
+```
@@ -0,0 +1,96 @@
+# vm_crc64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_crc64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_crc64_loop.ll`
+- **Symbol:** `vm_crc64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_crc64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_crc64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 14514072000185962306 | 14514072000185962306 | 14514072000185962306 | yes | x=0: crc init=1, n=1, single CRC step |
+| 2 | RCX=1 | 7257036000092981153 | 7257036000092981153 | 7257036000092981153 | yes | x=1, n=2 |
+| 3 | RCX=7 | 4357999468653093127 | 4357999468653093127 | 4357999468653093127 | yes | x=7, n=8 max |
+| 4 | RCX=255 | 16189773752444600153 | 16189773752444600153 | 16189773752444600153 | yes | x=0xFF, n=8 |
+| 5 | RCX=51966 | 6017914993561854371 | 6017914993561854371 | 6017914993561854371 | yes | 0xCAFE, n=7 |
+| 6 | RCX=3405691582 | 11164346891378004481 | 11164346891378004481 | 11164346891378004481 | yes | 0xCAFEBABE, n=7 |
+| 7 | RCX=1311768467463790320 | 13868409170423275578 | 13868409170423275578 | 13868409170423275578 | yes | 0x123...DEF0, n=1 |
+| 8 | RCX=18446744073709551615 | 16164085970585043110 | 16164085970585043110 | 16164085970585043110 | yes | max u64, n=8 |
+| 9 | RCX=11400714819323198485 | 6955128548432713259 | 6955128548432713259 | 6955128548432713259 | yes | K (golden), n=6 |
+| 10 | RCX=3735928559 | 11328242235717907630 | 11328242235717907630 | 11328242235717907630 | yes | 0xDEADBEEF, n=8 |
+
+## Source
+
+```c
+/* PC-state VM running an i64 CRC-64-style polynomial reduction step.
+ *   crc = x | 1;
+ *   for i in 0..n:
+ *     if (crc & 1): crc = (crc >> 1) ^ POLY
+ *     else:         crc = (crc >> 1)
+ * Variable trip n = (x & 7) + 1.  POLY = 0xC96C5795D7870F42 (CRC-64 ISO).
+ * Lift target: vm_crc64_loop_target.
+ *
+ * Distinct from vm_lfsr64_loop (4-tap feedback) and vm_pcg64_loop
+ * (LCG step): single-tap conditional XOR gated by LSB, classic CRC
+ * polynomial reduction shape.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum CrVmPc {
+    CR_LOAD       = 0,
+    CR_INIT       = 1,
+    CR_LOOP_CHECK = 2,
+    CR_LOOP_BODY  = 3,
+    CR_LOOP_INC   = 4,
+    CR_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_crc64_loop_target(uint64_t x) {
+    int      idx = 0;
+    int      n   = 0;
+    uint64_t crc = 0;
+    int      pc  = CR_LOAD;
+
+    while (1) {
+        if (pc == CR_LOAD) {
+            crc = x | 1ull;
+            n   = (int)(x & 7ull) + 1;
+            pc = CR_INIT;
+        } else if (pc == CR_INIT) {
+            idx = 0;
+            pc = CR_LOOP_CHECK;
+        } else if (pc == CR_LOOP_CHECK) {
+            pc = (idx < n) ? CR_LOOP_BODY : CR_HALT;
+        } else if (pc == CR_LOOP_BODY) {
+            if ((crc & 1ull) != 0ull) {
+                crc = (crc >> 1) ^ 0xC96C5795D7870F42ull;
+            } else {
+                crc = crc >> 1;
+            }
+            pc = CR_LOOP_INC;
+        } else if (pc == CR_LOOP_INC) {
+            idx = idx + 1;
+            pc = CR_LOOP_CHECK;
+        } else if (pc == CR_HALT) {
+            return crc;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_crc64(0xCAFE)=%llu vm_crc64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_crc64_loop_target(0xCAFEull),
+           (unsigned long long)vm_crc64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,93 @@
+# vm_cttz64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_cttz64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_cttz64_loop.ll`
+- **Symbol:** `vm_cttz64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_cttz64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_cttz64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 64 | 64 | 64 | yes | x=0: special-case 64 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | x=1: 0 trailing zeros |
+| 3 | RCX=2 | 1 | 1 | 1 | yes | x=2: 1 |
+| 4 | RCX=4 | 2 | 2 | 2 | yes | x=4: 2 |
+| 5 | RCX=8 | 3 | 3 | 3 | yes | x=8: 3 |
+| 6 | RCX=4294967296 | 32 | 32 | 32 | yes | x=2^32: 32 |
+| 7 | RCX=9223372036854775808 | 63 | 63 | 63 | yes | x=2^63: 63 (max) |
+| 8 | RCX=3405691582 | 1 | 1 | 1 | yes | x=0xCAFEBABE: 1 |
+| 9 | RCX=18446744073709551614 | 1 | 1 | 1 | yes | x=max-1: 1 |
+| 10 | RCX=11400714819323198485 | 0 | 0 | 0 | yes | x=K (golden): 0 (odd) |
+
+## Source
+
+```c
+/* PC-state VM running an i64 count-trailing-zeros via shift-loop.
+ *   if (x == 0) return 64;
+ *   count = 0;
+ *   while ((x & 1) == 0) { x >>= 1; count++; }
+ *   return count;
+ * Variable trip count = ctz(x), bounded 0..63 (or short-circuit 64 for zero).
+ * Lift target: vm_cttz64_loop_target.
+ *
+ * Distinct from vm_ctz_loop (i32) and vm_imported_cttz_loop (i32 _BitScanForward
+ * intrinsic): exercises the same shape on full i64 with explicit shift-and-test
+ * rather than the intrinsic.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum CzVmPc {
+    CZ_LOAD       = 0,
+    CZ_INIT       = 1,
+    CZ_ZERO_CHECK = 2,
+    CZ_LOOP_CHECK = 3,
+    CZ_LOOP_BODY  = 4,
+    CZ_HALT       = 5,
+};
+
+__declspec(noinline)
+int vm_cttz64_loop_target(uint64_t x) {
+    uint64_t state = 0;
+    int      count = 0;
+    int      pc    = CZ_LOAD;
+
+    while (1) {
+        if (pc == CZ_LOAD) {
+            state = x;
+            count = 0;
+            pc = CZ_ZERO_CHECK;
+        } else if (pc == CZ_ZERO_CHECK) {
+            if (state == 0ull) {
+                count = 64;
+                pc = CZ_HALT;
+            } else {
+                pc = CZ_LOOP_CHECK;
+            }
+        } else if (pc == CZ_LOOP_CHECK) {
+            pc = ((state & 1ull) == 0ull) ? CZ_LOOP_BODY : CZ_HALT;
+        } else if (pc == CZ_LOOP_BODY) {
+            state = state >> 1;
+            count = count + 1;
+            pc = CZ_LOOP_CHECK;
+        } else if (pc == CZ_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_cttz64(0x100000000)=%d vm_cttz64(0x8000000000000000)=%d\n",
+           vm_cttz64_loop_target(0x100000000ull),
+           vm_cttz64_loop_target(0x8000000000000000ull));
+    return 0;
+}
+```
@@ -0,0 +1,88 @@
+# vm_ctz_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 12/12 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_ctz_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_ctz_loop.ll`
+- **Symbol:** `vm_ctz_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_ctz_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_ctz_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 32 | 32 | 32 | yes | v=0: cap at 32 |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | v=1: 0 trailing zeros |
+| 3 | RCX=2 | 1 | 1 | 1 | yes | v=2 |
+| 4 | RCX=4 | 2 | 2 | 2 | yes | v=4 |
+| 5 | RCX=7 | 0 | 0 | 0 | yes | v=7: low bit set |
+| 6 | RCX=256 | 8 | 8 | 8 | yes | v=0x100 |
+| 7 | RCX=512 | 9 | 9 | 9 | yes | v=0x200 |
+| 8 | RCX=65535 | 0 | 0 | 0 | yes | 0xFFFF: low bit set |
+| 9 | RCX=49152 | 14 | 14 | 14 | yes | 0xC000 |
+| 10 | RCX=-2147483648 | 31 | 31 | 31 | yes | 0x80000000: only top bit |
+| 11 | RCX=-8 | 3 | 3 | 3 | yes | 0xFFFFFFF8: low 3 zeros |
+| 12 | RCX=65536 | 16 | 16 | 16 | yes | 0x10000 |
+
+## Source
+
+```c
+/* PC-state VM that counts trailing zero bits in x (capped at 32).
+ * Lift target: vm_ctz_loop_target.
+ * Goal: cover a counted loop with EARLY BREAK on LSB-set predicate.  Loop
+ * counter doubles as both trip count and result.  Distinct from
+ * vm_kernighan_loop (which counts set bits, not trailing-zero position) and
+ * vm_palindrome_loop (which has two distinct halt PCs).
+ */
+#include <stdio.h>
+
+enum CzVmPc {
+    CZ_LOAD       = 0,
+    CZ_INIT       = 1,
+    CZ_CHECK_LIM  = 2,
+    CZ_TEST_LSB   = 3,
+    CZ_BODY_SHR   = 4,
+    CZ_BODY_INC   = 5,
+    CZ_HALT       = 6,
+};
+
+__declspec(noinline)
+int vm_ctz_loop_target(int x) {
+    int v     = 0;
+    int count = 0;
+    int pc    = CZ_LOAD;
+
+    while (1) {
+        if (pc == CZ_LOAD) {
+            v = x;
+            count = 0;
+            pc = CZ_INIT;
+        } else if (pc == CZ_INIT) {
+            pc = CZ_CHECK_LIM;
+        } else if (pc == CZ_CHECK_LIM) {
+            pc = (count < 32) ? CZ_TEST_LSB : CZ_HALT;
+        } else if (pc == CZ_TEST_LSB) {
+            pc = ((v & 1) != 0) ? CZ_HALT : CZ_BODY_SHR;
+        } else if (pc == CZ_BODY_SHR) {
+            v = (int)((unsigned)v >> 1);
+            pc = CZ_BODY_INC;
+        } else if (pc == CZ_BODY_INC) {
+            count = count + 1;
+            pc = CZ_CHECK_LIM;
+        } else if (pc == CZ_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_ctz_loop(0xC000)=%d vm_ctz_loop(0x10000)=%d\n",
+           vm_ctz_loop_target(0xC000), vm_ctz_loop_target(0x10000));
+    return 0;
+}
+```
@@ -0,0 +1,105 @@
+# vm_data_ashr64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_data_ashr64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_data_ashr64_loop.ll`
+- **Symbol:** `vm_data_ashr64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_data_ashr64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_data_ashr64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0 n=1: r=0; (0 >> 0) + 0 = 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2 |
+| 3 | RCX=2 | 2 | 2 | 2 | yes | x=2 n=3 |
+| 4 | RCX=7 | 7 | 7 | 7 | yes | x=7 n=8: only byte0=7 contributes |
+| 5 | RCX=8 | 16 | 16 | 16 | yes | x=8 n=1: 8 ashr 0 + 8 = 16 |
+| 6 | RCX=3405691582 | 52233 | 52233 | 52233 | yes | 0xCAFEBABE: n=7 |
+| 7 | RCX=3735928559 | 447 | 447 | 447 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 257 | 257 | 257 | yes | all 0xFF: ashr fills 1s -> stable -1 + 0xFF, several iters |
+| 9 | RCX=9223372036854775808 | 9223372036854775808 | 9223372036854775808 | 9223372036854775808 | yes | x=2^63 n=1: ashr by 0=identity, +0=2^63 |
+| 10 | RCX=1311768467463790320 | 1311768467463790560 | 1311768467463790560 | 1311768467463790560 | yes | 0x12345...EF0: n=1 byte=0xF0=240; ashr 0; +240 |
+
+## Source
+
+```c
+/* PC-state VM with DATA-DEPENDENT arithmetic right-shift amount:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = x;
+ *   for (i = 0; i < n; i++) {
+ *     uint64_t b = s & 0xFF;
+ *     int amt = (int)(b & 7);
+ *     r = (uint64_t)((int64_t)r >> amt) + b;   // ashr by byte amount
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_data_ashr64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_byteshl_data64_loop  (data-dependent SHL)
+ *   - vm_data_lshr64_loop     (data-dependent LSHR)
+ *   - vm_dyn_ashr64_loop      (ashr by loop counter, NOT byte data)
+ *
+ * Completes the data-dependent shift trio (shl / lshr / ashr).
+ * Sign-extending right-shift by an amount that comes from the byte
+ * stream propagates the high bit of the running r through iterations,
+ * producing different fills than lshr for high-bit-set states.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DaVmPc {
+    DA_INIT_ALL = 0,
+    DA_CHECK    = 1,
+    DA_BODY     = 2,
+    DA_INC      = 3,
+    DA_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_data_ashr64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = DA_INIT_ALL;
+
+    while (1) {
+        if (pc == DA_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = x;
+            i = 0ull;
+            pc = DA_CHECK;
+        } else if (pc == DA_CHECK) {
+            pc = (i < n) ? DA_BODY : DA_HALT;
+        } else if (pc == DA_BODY) {
+            uint64_t b   = s & 0xFFull;
+            int      amt = (int)(b & 7ull);
+            r = (uint64_t)((int64_t)r >> amt) + b;
+            s = s >> 8;
+            pc = DA_INC;
+        } else if (pc == DA_INC) {
+            i = i + 1ull;
+            pc = DA_CHECK;
+        } else if (pc == DA_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_data_ashr64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_data_ashr64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,103 @@
+# vm_data_lshr64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_data_lshr64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_data_lshr64_loop.ll`
+- **Symbol:** `vm_data_lshr64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_data_lshr64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_data_lshr64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 18446744073709551615 | 18446744073709551615 | 18446744073709551615 | yes | x=0 n=1: r=~0; (~0 >> 0) ^ 0 = ~0 |
+| 2 | RCX=1 | 9223372036854775806 | 9223372036854775806 | 9223372036854775806 | yes | x=1 n=2 |
+| 3 | RCX=2 | 4611686018427387901 | 4611686018427387901 | 4611686018427387901 | yes | x=2 n=3 |
+| 4 | RCX=7 | 144115188075855864 | 144115188075855864 | 144115188075855864 | yes | x=7 n=8 |
+| 5 | RCX=8 | 18446744073709551607 | 18446744073709551607 | 18446744073709551607 | yes | x=8 n=1: ~0 >> 0 ^ 8 = 2^64-9 |
+| 6 | RCX=3405691582 | 281474976710410 | 281474976710410 | 281474976710410 | yes | 0xCAFEBABE: n=7 data-driven shifts |
+| 7 | RCX=3735928559 | 1099511627555 | 1099511627555 | 1099511627555 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 1 | 1 | 1 | yes | all 0xFF: shr by 7 each iter; final r=1 ^ 0xFF=0xFE wait actually 1 |
+| 9 | RCX=72623859790382856 | 18446744073709551607 | 18446744073709551607 | 18446744073709551607 | yes | 0x0102...0708: n=1 byte=8 |
+| 10 | RCX=1311768467463790320 | 18446744073709551375 | 18446744073709551375 | 18446744073709551375 | yes | 0x12345...EF0: n=1 byte=0xF0 |
+
+## Source
+
+```c
+/* PC-state VM with DATA-DEPENDENT right-shift amount inside the loop:
+ *
+ *   n = (x & 7) + 1;
+ *   s = x; r = ~0;     // start with all-1s
+ *   for (i = 0; i < n; i++) {
+ *     uint64_t b = s & 0xFF;
+ *     r = (r >> (b & 7)) ^ b;   // lshr amount comes from byte data
+ *     s >>= 8;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_data_lshr64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_byteshl_data64_loop  (data-dependent SHL counterpart)
+ *   - vm_bitfetch_window64_loop (lshr by loop counter)
+ *   - vm_dyn_ashr64_loop      (ashr by loop counter)
+ *
+ * Tests `lshr i64 r, %byte_amount` (right-shift by byte-derived
+ * amount).  Combined with XOR fold of the raw byte.  Initial r=~0
+ * means the first iter shifts a saturated state down by a
+ * data-driven amount before XOR.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DlVmPc {
+    DL_INIT_ALL = 0,
+    DL_CHECK    = 1,
+    DL_BODY     = 2,
+    DL_INC      = 3,
+    DL_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_data_lshr64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = DL_INIT_ALL;
+
+    while (1) {
+        if (pc == DL_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            s = x;
+            r = 0xFFFFFFFFFFFFFFFFull;
+            i = 0ull;
+            pc = DL_CHECK;
+        } else if (pc == DL_CHECK) {
+            pc = (i < n) ? DL_BODY : DL_HALT;
+        } else if (pc == DL_BODY) {
+            uint64_t b = s & 0xFFull;
+            r = (r >> (b & 7ull)) ^ b;
+            s = s >> 8;
+            pc = DL_INC;
+        } else if (pc == DL_INC) {
+            i = i + 1ull;
+            pc = DL_CHECK;
+        } else if (pc == DL_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_data_lshr64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_data_lshr64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,94 @@
+# vm_decdigits64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_decdigits64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_decdigits64_loop.ll`
+- **Symbol:** `vm_decdigits64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_decdigits64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_decdigits64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 1 | 1 | 1 | yes | x=0: special-case 1 digit |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 |
+| 3 | RCX=10 | 2 | 2 | 2 | yes | x=10 |
+| 4 | RCX=100 | 3 | 3 | 3 | yes | x=100 |
+| 5 | RCX=999 | 3 | 3 | 3 | yes | x=999 |
+| 6 | RCX=1000 | 4 | 4 | 4 | yes | x=1000 |
+| 7 | RCX=1000000000 | 10 | 10 | 10 | yes | x=10^9 |
+| 8 | RCX=51966 | 5 | 5 | 5 | yes | x=0xCAFE = 51966 |
+| 9 | RCX=18446744073709551615 | 20 | 20 | 20 | yes | max u64: 20 digits |
+| 10 | RCX=11400714819323198485 | 20 | 20 | 20 | yes | K (golden), 20 digits |
+
+## Source
+
+```c
+/* PC-state VM that counts decimal digits of a uint64_t via repeated /10.
+ *   if (x == 0) return 1;
+ *   count = 0;
+ *   while (state > 0) { state /= 10; count++; }
+ *   return count;
+ * Variable trip 1..20 (up to 20 for max u64).
+ * Lift target: vm_decdigits64_loop_target.
+ *
+ * Distinct from vm_divcount64_loop (input-derived divisor with >=
+ * comparison) and vm_sdiv64_loop: this uses a fixed constant divisor 10
+ * with a > 0 termination, exercising i64 udiv-by-constant inside a
+ * data-dependent loop.  Lifter likely emits magic-number multiplication
+ * fold for /10, but loop count remains data-dependent.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DdVmPc {
+    DD_LOAD       = 0,
+    DD_ZERO_CHECK = 1,
+    DD_LOOP_CHECK = 2,
+    DD_LOOP_BODY  = 3,
+    DD_HALT       = 4,
+};
+
+__declspec(noinline)
+int vm_decdigits64_loop_target(uint64_t x) {
+    uint64_t state = 0;
+    int      count = 0;
+    int      pc    = DD_LOAD;
+
+    while (1) {
+        if (pc == DD_LOAD) {
+            state = x;
+            count = 0;
+            pc = DD_ZERO_CHECK;
+        } else if (pc == DD_ZERO_CHECK) {
+            if (state == 0ull) {
+                count = 1;
+                pc = DD_HALT;
+            } else {
+                pc = DD_LOOP_CHECK;
+            }
+        } else if (pc == DD_LOOP_CHECK) {
+            pc = (state > 0ull) ? DD_LOOP_BODY : DD_HALT;
+        } else if (pc == DD_LOOP_BODY) {
+            state = state / 10ull;
+            count = count + 1;
+            pc = DD_LOOP_CHECK;
+        } else if (pc == DD_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_decdigits64(0xCAFEBABE)=%d vm_decdigits64(max)=%d\n",
+           vm_decdigits64_loop_target(0xCAFEBABEull),
+           vm_decdigits64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,84 @@
+# vm_decsum64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_decsum64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_decsum64_loop.ll`
+- **Symbol:** `vm_decsum64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_decsum64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_decsum64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0: skip loop |
+| 2 | RCX=5 | 5 | 5 | 5 | yes | x=5 |
+| 3 | RCX=99 | 18 | 18 | 18 | yes | x=99 |
+| 4 | RCX=999 | 27 | 27 | 27 | yes | x=999 |
+| 5 | RCX=12345 | 15 | 15 | 15 | yes | x=12345 |
+| 6 | RCX=1234567890 | 45 | 45 | 45 | yes | 1+2+...+9+0 |
+| 7 | RCX=9999999999999999999 | 171 | 171 | 171 | yes | 19 nines |
+| 8 | RCX=18446744073709551615 | 87 | 87 | 87 | yes | max u64 |
+| 9 | RCX=11400714819323198485 | 79 | 79 | 79 | yes | K (golden) |
+| 10 | RCX=3405691582 | 43 | 43 | 43 | yes | x=0xCAFEBABE = 3405691582 dec |
+
+## Source
+
+```c
+/* PC-state VM that computes the base-10 decimal digit SUM of x.
+ *   total = 0;
+ *   while (s) { total += s % 10; s /= 10; }
+ *   return total;
+ * Variable trip = number of decimal digits.
+ * Lift target: vm_decsum64_loop_target.
+ *
+ * Distinct from vm_base7sum64_loop (digit sum base 7) and
+ * vm_digitprod64_loop (digit PRODUCT base 10): pure additive digit
+ * accumulator with udiv-by-10 + urem-by-10 inside body.  Max value for
+ * max u64 is 87 (sum of digits of 18446744073709551615).
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DsVmPc {
+    DS_LOAD       = 0,
+    DS_LOOP_CHECK = 1,
+    DS_LOOP_BODY  = 2,
+    DS_HALT       = 3,
+};
+
+__declspec(noinline)
+int vm_decsum64_loop_target(uint64_t x) {
+    uint64_t s     = 0;
+    int      total = 0;
+    int      pc    = DS_LOAD;
+
+    while (1) {
+        if (pc == DS_LOAD) {
+            s     = x;
+            total = 0;
+            pc = DS_LOOP_CHECK;
+        } else if (pc == DS_LOOP_CHECK) {
+            pc = (s != 0ull) ? DS_LOOP_BODY : DS_HALT;
+        } else if (pc == DS_LOOP_BODY) {
+            total = total + (int)(s % 10ull);
+            s = s / 10ull;
+            pc = DS_LOOP_CHECK;
+        } else if (pc == DS_HALT) {
+            return total;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_decsum64(12345)=%d vm_decsum64(max)=%d\n",
+           vm_decsum64_loop_target(12345ull),
+           vm_decsum64_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,100 @@
+# vm_deinterleave64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_deinterleave64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_deinterleave64_loop.ll`
+- **Symbol:** `vm_deinterleave64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_deinterleave64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_deinterleave64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1: bit 0 -> evens bit 0 |
+| 3 | RCX=2 | 4294967296 | 4294967296 | 4294967296 | yes | x=2: bit 1 -> odds bit 0 -> 1<<32 |
+| 4 | RCX=3 | 4294967297 | 4294967297 | 4294967297 | yes | x=3: both bit 0 of evens and odds |
+| 5 | RCX=2863311530 | 281470681743360 | 281470681743360 | 281470681743360 | yes | x=0xAAAAAAAA: all to odds, evens=0 |
+| 6 | RCX=1431655765 | 65535 | 65535 | 65535 | yes | x=0x55555555: all to evens, odds=0 |
+| 7 | RCX=4294967295 | 281470681808895 | 281470681808895 | 281470681808895 | yes | x=0xFFFFFFFF: 0xFFFF in both halves |
+| 8 | RCX=3405691582 | 211101937602118 | 211101937602118 | 211101937602118 | yes | x=0xCAFEBABE |
+| 9 | RCX=2654435769 | 199484051056597 | 199484051056597 | 199484051056597 | yes | x=0x9E3779B9 |
+| 10 | RCX=305419896 | 22084721854188 | 22084721854188 | 22084721854188 | yes | x=0x12345678 |
+
+## Source
+
+```c
+/* PC-state VM that deinterleaves alternating bits of low 32 bits of x:
+ * places even-indexed source bits into low 32 of result, odd-indexed
+ * source bits into high 32 of result.
+ *   evens = 0;  odds = 0;
+ *   for i in 0..32:
+ *     evens |= ((x >> (2*i))   & 1) << i;
+ *     odds  |= ((x >> (2*i+1)) & 1) << i;
+ *   return (odds << 32) | evens;
+ * 32-trip fixed loop with FOUR shifts and two OR accumulators.
+ * Lift target: vm_deinterleave64_loop_target.
+ *
+ * Distinct from vm_morton64_loop (interleave/spread one stream into
+ * every-other position): this is the INVERSE - splits one input into
+ * two streams.  Both accumulator slots update unconditionally with OR
+ * (no mutually-exclusive branches), avoiding the dual-i64 promotion
+ * failure documented in vm_dualcounter64_loop.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DiVmPc {
+    DI_LOAD       = 0,
+    DI_INIT       = 1,
+    DI_LOOP_CHECK = 2,
+    DI_LOOP_BODY  = 3,
+    DI_LOOP_INC   = 4,
+    DI_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_deinterleave64_loop_target(uint64_t x) {
+    int      idx   = 0;
+    uint64_t xx    = 0;
+    uint64_t evens = 0;
+    uint64_t odds  = 0;
+    int      pc    = DI_LOAD;
+
+    while (1) {
+        if (pc == DI_LOAD) {
+            xx    = x;
+            evens = 0ull;
+            odds  = 0ull;
+            pc = DI_INIT;
+        } else if (pc == DI_INIT) {
+            idx = 0;
+            pc = DI_LOOP_CHECK;
+        } else if (pc == DI_LOOP_CHECK) {
+            pc = (idx < 32) ? DI_LOOP_BODY : DI_HALT;
+        } else if (pc == DI_LOOP_BODY) {
+            evens = evens | (((xx >> (2 * idx))     & 1ull) << idx);
+            odds  = odds  | (((xx >> (2 * idx + 1)) & 1ull) << idx);
+            pc = DI_LOOP_INC;
+        } else if (pc == DI_LOOP_INC) {
+            idx = idx + 1;
+            pc = DI_LOOP_CHECK;
+        } else if (pc == DI_HALT) {
+            return (odds << 32) | evens;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_deinterleave64(0xAAAAAAAA)=%llu vm_deinterleave64(0x55555555)=%llu\n",
+           (unsigned long long)vm_deinterleave64_loop_target(0xAAAAAAAAull),
+           (unsigned long long)vm_deinterleave64_loop_target(0x55555555ull));
+    return 0;
+}
+```
@@ -0,0 +1,94 @@
+# vm_digitprod64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_digitprod64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_digitprod64_loop.ll`
+- **Symbol:** `vm_digitprod64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_digitprod64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_digitprod64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | x=0: special-case 0 |
+| 2 | RCX=5 | 5 | 5 | 5 | yes | x=5: single digit |
+| 3 | RCX=12 | 2 | 2 | 2 | yes | x=12: 1*2 |
+| 4 | RCX=99 | 81 | 81 | 81 | yes | x=99: 9*9 |
+| 5 | RCX=100 | 0 | 0 | 0 | yes | x=100: contains 0 digit |
+| 6 | RCX=123 | 6 | 6 | 6 | yes | x=123: 1*2*3 |
+| 7 | RCX=999 | 729 | 729 | 729 | yes | x=999: 9^3 |
+| 8 | RCX=255 | 50 | 50 | 50 | yes | x=255: 2*5*5 |
+| 9 | RCX=999999999 | 387420489 | 387420489 | 387420489 | yes | x=10^9-1: 9^9 |
+| 10 | RCX=51966 | 1620 | 1620 | 1620 | yes | x=0xCAFE=51966 dec |
+
+## Source
+
+```c
+/* PC-state VM that computes the product of decimal digits of x.
+ *   if (x == 0) return 0;
+ *   p = 1;
+ *   while (s) { p *= s % 10; s /= 10; }
+ *   return p;
+ * Variable trip = number of decimal digits.  Returns full uint64_t (low
+ * bits dominate; any zero-digit collapses the product to 0).
+ * Lift target: vm_digitprod64_loop_target.
+ *
+ * Distinct from vm_decdigits64_loop (counts digits) and vm_base7sum64_loop
+ * (digit SUM in base 7): exercises i64 mul-by-digit accumulator with
+ * udiv-by-10 + urem-by-10 inside a data-dependent loop.  Any zero
+ * digit forces immediate sticky 0 result.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DpVmPc {
+    DP_LOAD       = 0,
+    DP_ZERO_CHECK = 1,
+    DP_LOOP_CHECK = 2,
+    DP_LOOP_BODY  = 3,
+    DP_HALT       = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_digitprod64_loop_target(uint64_t x) {
+    uint64_t s   = 0;
+    uint64_t p   = 0;
+    int      pc  = DP_LOAD;
+
+    while (1) {
+        if (pc == DP_LOAD) {
+            s = x;
+            p = 1ull;
+            pc = DP_ZERO_CHECK;
+        } else if (pc == DP_ZERO_CHECK) {
+            if (s == 0ull) {
+                p = 0ull;
+                pc = DP_HALT;
+            } else {
+                pc = DP_LOOP_CHECK;
+            }
+        } else if (pc == DP_LOOP_CHECK) {
+            pc = (s != 0ull) ? DP_LOOP_BODY : DP_HALT;
+        } else if (pc == DP_LOOP_BODY) {
+            p = p * (s % 10ull);
+            s = s / 10ull;
+            pc = DP_LOOP_CHECK;
+        } else if (pc == DP_HALT) {
+            return p;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_digitprod64(123)=%llu vm_digitprod64(999999999)=%llu\n",
+           (unsigned long long)vm_digitprod64_loop_target(123ull),
+           (unsigned long long)vm_digitprod64_loop_target(999999999ull));
+    return 0;
+}
+```
@@ -0,0 +1,90 @@
+# vm_digitsum_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 12/12 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_digitsum_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_digitsum_loop.ll`
+- **Symbol:** `vm_digitsum_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_digitsum_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_digitsum_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | n=0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | n=1 |
+| 3 | RCX=9 | 9 | 9 | 9 | yes | n=9: single digit |
+| 4 | RCX=10 | 1 | 1 | 1 | yes | n=10: 1+0 |
+| 5 | RCX=42 | 6 | 6 | 6 | yes | n=42: 4+2 |
+| 6 | RCX=99 | 18 | 18 | 18 | yes | n=99: 9+9 |
+| 7 | RCX=255 | 12 | 12 | 12 | yes | n=255: 2+5+5 |
+| 8 | RCX=1234 | 10 | 10 | 10 | yes | n=1234: 1+2+3+4 |
+| 9 | RCX=9999 | 36 | 36 | 36 | yes | n=9999: 4*9 |
+| 10 | RCX=65535 | 24 | 24 | 24 | yes | n=65535: 6+5+5+3+5 |
+| 11 | RCX=65536 | 0 | 0 | 0 | yes | n=0 again (mask drops bit 16) |
+| 12 | RCX=12345 | 15 | 15 | 15 | yes | n=12345: 1+2+3+4+5 |
+
+## Source
+
+```c
+/* PC-state VM that sums the decimal digits of a symbolic input.
+ * Lift target: vm_digitsum_loop_target.
+ * Goal: cover a non-counted loop terminating on `n != 0`, with both
+ * integer divide and modulo by 10 (non-power-of-2 divisor) in the body.
+ * Distinct from vm_gcd_loop (different recurrence: n /= 10 vs Euclidean)
+ * and vm_powermod_loop (smaller mod constant 13 with shift-driven loop).
+ */
+#include <stdio.h>
+
+enum DsVmPc {
+    DS_LOAD     = 0,
+    DS_INIT     = 1,
+    DS_CHECK    = 2,
+    DS_BODY_DIG = 3,
+    DS_BODY_ADD = 4,
+    DS_BODY_DIV = 5,
+    DS_HALT     = 6,
+};
+
+__declspec(noinline)
+int vm_digitsum_loop_target(int x) {
+    int n     = 0;
+    int sum   = 0;
+    int digit = 0;
+    int pc    = DS_LOAD;
+
+    while (1) {
+        if (pc == DS_LOAD) {
+            n = x & 0xFFFF;
+            sum = 0;
+            pc = DS_INIT;
+        } else if (pc == DS_INIT) {
+            pc = DS_CHECK;
+        } else if (pc == DS_CHECK) {
+            pc = (n > 0) ? DS_BODY_DIG : DS_HALT;
+        } else if (pc == DS_BODY_DIG) {
+            digit = n % 10;
+            pc = DS_BODY_ADD;
+        } else if (pc == DS_BODY_ADD) {
+            sum = sum + digit;
+            pc = DS_BODY_DIV;
+        } else if (pc == DS_BODY_DIV) {
+            n = n / 10;
+            pc = DS_CHECK;
+        } else if (pc == DS_HALT) {
+            return sum;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_digitsum_loop(1234)=%d vm_digitsum_loop(65535)=%d\n",
+           vm_digitsum_loop_target(1234), vm_digitsum_loop_target(65535));
+    return 0;
+}
+```
@@ -0,0 +1,71 @@
+# vm_dispatch_table_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dispatch_table_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dispatch_table_loop.ll`
+- **Symbol:** `vm_dispatch_table_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dispatch_table_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dispatch_table_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 15 | 15 | 15 | yes | start=0: 0->3->2->1->5->4->7 |
+| 2 | RCX=1 | 10 | 10 | 10 | yes | start=1: 1->5->4->7 |
+| 3 | RCX=2 | 12 | 12 | 12 | yes | start=2: 2->1->5->4->7 |
+| 4 | RCX=3 | 15 | 15 | 15 | yes | start=3: 3->2->1->5->4->7 |
+| 5 | RCX=4 | 4 | 4 | 4 | yes | start=4: 4->7 |
+| 6 | RCX=5 | 9 | 9 | 9 | yes | start=5: 5->4->7 |
+| 7 | RCX=6 | 21 | 21 | 21 | yes | start=6: 6->0->3->2->1->5->4->7 |
+| 8 | RCX=7 | 0 | 0 | 0 | yes | start=7: halt immediately |
+| 9 | RCX=8 | 15 | 15 | 15 | yes | start=0 again (mask drops bit 3) |
+| 10 | RCX=15 | 0 | 0 | 0 | yes | start=7 again after mask |
+
+## Source
+
+```c
+/* PC-state VM whose successor PC comes from a stack-resident lookup table.
+ * Lift target: vm_dispatch_table_loop_target.
+ * Goal: cover a VM whose control flow graph is encoded as data, not code.
+ * Each iteration adds the current PC to an accumulator, then advances via
+ * NEXT[pc].  The starting PC is symbolic (x & 7); index 7 is the halt state
+ * so the loop trip count is data-dependent and hits a different terminator
+ * for each input.
+ */
+#include <stdio.h>
+
+__declspec(noinline)
+int vm_dispatch_table_loop_target(int x) {
+    int next[8];
+    int pc  = 0;
+    int acc = 0;
+
+    next[0] = 3;
+    next[1] = 5;
+    next[2] = 1;
+    next[3] = 2;
+    next[4] = 7;
+    next[5] = 4;
+    next[6] = 0;
+    next[7] = 7;
+
+    pc = x & 7;
+
+    while (pc != 7) {
+        acc = acc + pc;
+        pc = next[pc];
+    }
+
+    return acc;
+}
+
+int main(void) {
+    printf("vm_dispatch_table_loop(0)=%d vm_dispatch_table_loop(6)=%d\n",
+           vm_dispatch_table_loop_target(0), vm_dispatch_table_loop_target(6));
+    return 0;
+}
+```
@@ -0,0 +1,87 @@
+# vm_divcount64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_divcount64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_divcount64_loop.ll`
+- **Symbol:** `vm_divcount64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_divcount64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_divcount64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 63 | 63 | 63 | yes | x=0: ~x=max u64, div=2 -> 63 halvings |
+| 2 | RCX=1 | 40 | 40 | 40 | yes | x=1: div=3, log_3(max-1) |
+| 3 | RCX=2 | 31 | 31 | 31 | yes | x=2: div=4, log_4(max-2) |
+| 4 | RCX=255 | 7 | 7 | 7 | yes | x=0xFF: div=257, log_257(max-255) |
+| 5 | RCX=51966 | 7 | 7 | 7 | yes | x=0xCAFE: div=256 |
+| 6 | RCX=3405691582 | 8 | 8 | 8 | yes | x=0xCAFEBABE: div=192 |
+| 7 | RCX=1311768467463790320 | 8 | 8 | 8 | yes | x=0x123...DEF0: div=242 |
+| 8 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | max u64: ~x=0 < div, count=0 |
+| 9 | RCX=11400714819323198485 | 13 | 13 | 13 | yes | x=K: div=23, log_23 |
+| 10 | RCX=3735928559 | 8 | 8 | 8 | yes | x=0xDEADBEEF: div=241 |
+
+## Source
+
+```c
+/* PC-state VM that counts how many times an i64 state can be divided
+ * by an input-derived divisor before it falls below the divisor.
+ *   divisor = (x & 0xFF) + 2;   // 2..257, never zero
+ *   state   = ~x;
+ *   count   = 0;
+ *   while (state >= divisor) { state /= divisor; count++; }
+ *   return count;
+ * Lift target: vm_divcount64_loop_target.
+ *
+ * Distinct from vm_gcd64_loop (urem-driven Euclidean): exercises
+ * repeated i64 udiv inside a data-dependent loop (variable trip 0..63
+ * depending on log_{divisor}(state)).
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DvVmPc {
+    DV_LOAD       = 0,
+    DV_LOOP_CHECK = 1,
+    DV_LOOP_BODY  = 2,
+    DV_HALT       = 3,
+};
+
+__declspec(noinline)
+int vm_divcount64_loop_target(uint64_t x) {
+    uint64_t divisor = 0;
+    uint64_t state   = 0;
+    int      count   = 0;
+    int      pc      = DV_LOAD;
+
+    while (1) {
+        if (pc == DV_LOAD) {
+            divisor = (x & 0xFFull) + 2ull;
+            state   = ~x;
+            count   = 0;
+            pc = DV_LOOP_CHECK;
+        } else if (pc == DV_LOOP_CHECK) {
+            pc = (state >= divisor) ? DV_LOOP_BODY : DV_HALT;
+        } else if (pc == DV_LOOP_BODY) {
+            state = state / divisor;
+            count = count + 1;
+            pc = DV_LOOP_CHECK;
+        } else if (pc == DV_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_divcount64(0)=%d vm_divcount64(0xCAFE)=%d\n",
+           vm_divcount64_loop_target(0ull),
+           vm_divcount64_loop_target(0xCAFEull));
+    return 0;
+}
+```
@@ -0,0 +1,94 @@
+# vm_djb264_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_djb264_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_djb264_loop.ll`
+- **Symbol:** `vm_djb264_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_djb264_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_djb264_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 177573 | 177573 | 177573 | yes | x=0, n=1: 5381*33+0 |
+| 2 | RCX=1 | 5859942 | 5859942 | 5859942 | yes | x=1, n=2 |
+| 3 | RCX=7 | 7568183103855660 | 7568183103855660 | 7568183103855660 | yes | x=7, n=8 (max) |
+| 4 | RCX=255 | 7578752477713956 | 7578752477713956 | 7578752477713956 | yes | x=0xFF, n=8 |
+| 5 | RCX=51966 | 229665779872749 | 229665779872749 | 229665779872749 | yes | x=0xCAFE, n=7 |
+| 6 | RCX=3405691582 | 229582808239653 | 229582808239653 | 229582808239653 | yes | x=0xCAFEBABE, n=7 |
+| 7 | RCX=1311768467463790320 | 177813 | 177813 | 177813 | yes | x=0x123...DEF0, n=1: low byte 0xF0 |
+| 8 | RCX=18446744073709551615 | 7579092093431421 | 7579092093431421 | 7579092093431421 | yes | max u64, n=8 |
+| 9 | RCX=11400714819323198485 | 6950360842513 | 6950360842513 | 6950360842513 | yes | x=K (golden ratio), n=6 |
+| 10 | RCX=3735928559 | 7578322995237885 | 7578322995237885 | 7578322995237885 | yes | x=0xDEADBEEF, n=8 |
+
+## Source
+
+```c
+/* PC-state VM running an i64 djb2-style hash over the bytes of x.
+ *   h = 5381;
+ *   for i in 0..n: { b = (x >> (i*8)) & 0xFF; h = h * 33 + b; }
+ *   return h;
+ * Where n = (x & 7) + 1 (1..8 bytes consumed).  Returns full uint64_t.
+ * Lift target: vm_djb264_loop_target.
+ *
+ * Distinct from vm_djb2_loop (i32 hash): exercises i64 mul-by-33 + i64
+ * add inside a variable-trip loop body that also performs a symbolic
+ * shift-by-loop-counter byte extraction.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DjVmPc {
+    DJ_LOAD       = 0,
+    DJ_INIT       = 1,
+    DJ_LOOP_CHECK = 2,
+    DJ_LOOP_BODY  = 3,
+    DJ_LOOP_INC   = 4,
+    DJ_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_djb264_loop_target(uint64_t x) {
+    int      idx = 0;
+    int      n   = 0;
+    uint64_t h   = 0;
+    uint64_t xx  = 0;
+    int      pc  = DJ_LOAD;
+
+    while (1) {
+        if (pc == DJ_LOAD) {
+            n  = (int)(x & 7ull) + 1;
+            xx = x;
+            h  = 5381ull;
+            pc = DJ_INIT;
+        } else if (pc == DJ_INIT) {
+            idx = 0;
+            pc = DJ_LOOP_CHECK;
+        } else if (pc == DJ_LOOP_CHECK) {
+            pc = (idx < n) ? DJ_LOOP_BODY : DJ_HALT;
+        } else if (pc == DJ_LOOP_BODY) {
+            uint64_t b = (xx >> (idx * 8)) & 0xFFull;
+            h = h * 33ull + b;
+            pc = DJ_LOOP_INC;
+        } else if (pc == DJ_LOOP_INC) {
+            idx = idx + 1;
+            pc = DJ_LOOP_CHECK;
+        } else if (pc == DJ_HALT) {
+            return h;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_djb264(0xCAFEBABE)=%llu vm_djb264(0xFFFFFFFFFFFFFFFF)=%llu\n",
+           (unsigned long long)vm_djb264_loop_target(0xCAFEBABEull),
+           (unsigned long long)vm_djb264_loop_target(0xFFFFFFFFFFFFFFFFull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_djb2_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 12/12 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_djb2_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_djb2_loop.ll`
+- **Symbol:** `vm_djb2_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_djb2_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_djb2_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 46501 | 46501 | 46501 | yes | limit=1, nib=0 |
+| 2 | RCX=1 | 27238 | 27238 | 27238 | yes | limit=2, nibs=[1,0] |
+| 3 | RCX=2 | 47975 | 47975 | 47975 | yes | limit=3, nibs=[2,0,0] |
+| 4 | RCX=7 | 7212 | 7212 | 7212 | yes | limit=8, nibs=[7,0,..,0] |
+| 5 | RCX=18 | 48008 | 48008 | 48008 | yes | 0x12: limit=3, nibs=[2,1,0] |
+| 6 | RCX=291 | 48459 | 48459 | 48459 | yes | 0x123: limit=4, nibs=[3,2,1,0] |
+| 7 | RCX=4660 | 4079 | 4079 | 4079 | yes | 0x1234: limit=5 |
+| 8 | RCX=74565 | 57268 | 57268 | 57268 | yes | 0x12345: limit=6 |
+| 9 | RCX=16777215 | 40191 | 40191 | 40191 | yes | all F: limit=8 |
+| 10 | RCX=11259375 | 32432 | 32432 | 32432 | yes | 0xABCDEF: limit=8 |
+| 11 | RCX=85 | 19055 | 19055 | 19055 | yes | 0x55: limit=6 |
+| 12 | RCX=170 | 57017 | 57017 | 57017 | yes | 0xAA: limit=3 |
+
+## Source
+
+```c
+/* PC-state VM running a DJB2-style hash recurrence:
+ *   hash = (hash * 33 + nibble) & 0xFFFF
+ * over the low (limit*4) bits of x.
+ * Lift target: vm_djb2_loop_target.
+ * Goal: cover a multiplicative-then-additive recurrence with symbolic input
+ * shape (each iteration consumes a different nibble).  Distinct from
+ * vm_lcg_loop (no per-iter input) and vm_polynomial_loop (constant
+ * coefficient array).
+ */
+#include <stdio.h>
+
+enum DjVmPc {
+    DJ_LOAD       = 0,
+    DJ_INIT       = 1,
+    DJ_CHECK      = 2,
+    DJ_BODY_NIB   = 3,
+    DJ_BODY_MUL   = 4,
+    DJ_BODY_ADD   = 5,
+    DJ_BODY_INC   = 6,
+    DJ_HALT       = 7,
+};
+
+__declspec(noinline)
+int vm_djb2_loop_target(int x) {
+    int hash  = 0;
+    int limit = 0;
+    int idx   = 0;
+    int nib   = 0;
+    int prod  = 0;
+    int shift = 0;
+    int pc    = DJ_LOAD;
+
+    while (1) {
+        if (pc == DJ_LOAD) {
+            limit = (x & 7) + 1;
+            hash = 5381;
+            idx = 0;
+            pc = DJ_INIT;
+        } else if (pc == DJ_INIT) {
+            pc = DJ_CHECK;
+        } else if (pc == DJ_CHECK) {
+            pc = (idx < limit) ? DJ_BODY_NIB : DJ_HALT;
+        } else if (pc == DJ_BODY_NIB) {
+            shift = idx * 4;
+            nib = (x >> shift) & 0xF;
+            pc = DJ_BODY_MUL;
+        } else if (pc == DJ_BODY_MUL) {
+            prod = hash * 33;
+            pc = DJ_BODY_ADD;
+        } else if (pc == DJ_BODY_ADD) {
+            hash = (prod + nib) & 0xFFFF;
+            pc = DJ_BODY_INC;
+        } else if (pc == DJ_BODY_INC) {
+            idx = idx + 1;
+            pc = DJ_CHECK;
+        } else if (pc == DJ_HALT) {
+            return hash;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_djb2_loop(0x12345)=%d vm_djb2_loop(0xABCDEF)=%d\n",
+           vm_djb2_loop_target(0x12345), vm_djb2_loop_target(0xABCDEF));
+    return 0;
+}
+```
@@ -0,0 +1,104 @@
+# vm_dual_array_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dual_array_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dual_array_loop.ll`
+- **Symbol:** `vm_dual_array_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dual_array_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dual_array_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | seed=0: zero product |
+| 2 | RCX=1 | 120 | 120 | 120 | yes | seed=1 |
+| 3 | RCX=2 | 312 | 312 | 312 | yes | seed=2 |
+| 4 | RCX=5 | 1320 | 1320 | 1320 | yes | seed=5 |
+| 5 | RCX=10 | 4440 | 4440 | 4440 | yes | seed=10 |
+| 6 | RCX=100 | 368400 | 368400 | 368400 | yes | seed=100 |
+| 7 | RCX=1000 | 36084000 | 36084000 | 36084000 | yes | seed=1000 |
+| 8 | RCX=65536 | 5505024 | 5505024 | 5505024 | yes | seed=0x10000: high-bit interaction |
+| 9 | RCX=2147483647 | 4294967248 | 4294967248 | 4294967248 | yes | INT_MAX: 2-comp wrap |
+| 10 | RCX=4294967295 | 4294967248 | 4294967248 | 4294967248 | yes | -1 u32: same as INT_MAX (mul wraps) |
+
+## Source
+
+```c
+/* PC-state VM that allocates TWO independent int[8] stack arrays at the
+ * same time, fills each with a different formula, and accumulates a
+ * cross-product sum_{i}(a[i] * b[7-i]).
+ * Lift target: vm_dual_array_loop_target.
+ * Goal: cover two simultaneous stack arrays in flight (distinct stack
+ * slots, independent fill loops, paired access in a third loop), as
+ * opposed to existing samples that operate on a single stack array.
+ */
+#include <stdio.h>
+
+enum DaVmPc {
+    DA_LOAD       = 0,
+    DA_INIT_FILL  = 1,
+    DA_FILL_CHECK = 2,
+    DA_FILL_BODY  = 3,
+    DA_FILL_INC   = 4,
+    DA_INIT_PROD  = 5,
+    DA_PROD_CHECK = 6,
+    DA_PROD_BODY  = 7,
+    DA_PROD_INC   = 8,
+    DA_HALT       = 9,
+};
+
+__declspec(noinline)
+int vm_dual_array_loop_target(int x) {
+    int a[8];
+    int b[8];
+    int idx  = 0;
+    int sum  = 0;
+    int seed = 0;
+    int pc   = DA_LOAD;
+
+    while (1) {
+        if (pc == DA_LOAD) {
+            seed = x;
+            pc = DA_INIT_FILL;
+        } else if (pc == DA_INIT_FILL) {
+            idx = 0;
+            pc = DA_FILL_CHECK;
+        } else if (pc == DA_FILL_CHECK) {
+            pc = (idx < 8) ? DA_FILL_BODY : DA_INIT_PROD;
+        } else if (pc == DA_FILL_BODY) {
+            a[idx] = seed + idx;
+            b[idx] = seed * (idx + 1);
+            pc = DA_FILL_INC;
+        } else if (pc == DA_FILL_INC) {
+            idx = idx + 1;
+            pc = DA_FILL_CHECK;
+        } else if (pc == DA_INIT_PROD) {
+            idx = 0;
+            pc = DA_PROD_CHECK;
+        } else if (pc == DA_PROD_CHECK) {
+            pc = (idx < 8) ? DA_PROD_BODY : DA_HALT;
+        } else if (pc == DA_PROD_BODY) {
+            sum = sum + a[idx] * b[7 - idx];
+            pc = DA_PROD_INC;
+        } else if (pc == DA_PROD_INC) {
+            idx = idx + 1;
+            pc = DA_PROD_CHECK;
+        } else if (pc == DA_HALT) {
+            return sum;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dual_array_loop(10)=%d vm_dual_array_loop(100)=%d\n",
+           vm_dual_array_loop_target(10),
+           vm_dual_array_loop_target(100));
+    return 0;
+}
+```
@@ -0,0 +1,103 @@
+# vm_dual_counter_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 8/8 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dual_counter_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dual_counter_loop.ll`
+- **Symbol:** `vm_dual_counter_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dual_counter_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dual_counter_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | limit=0 |
+| 2 | RCX=1 | 100 | 100 | 100 | yes | limit=1: 1 even, 0 odd |
+| 3 | RCX=2 | 101 | 101 | 101 | yes | limit=2: 1 even, 1 odd |
+| 4 | RCX=5 | 302 | 302 | 302 | yes | limit=5: 3 even, 2 odd |
+| 5 | RCX=8 | 404 | 404 | 404 | yes | limit=8: 4 even, 4 odd |
+| 6 | RCX=10 | 505 | 505 | 505 | yes | limit=10: 5 even, 5 odd |
+| 7 | RCX=15 | 807 | 807 | 807 | yes | limit=15: 8 even, 7 odd |
+| 8 | RCX=16 | 0 | 0 | 0 | yes | limit=0 again (mask drops bit 4) |
+
+## Source
+
+```c
+/* PC-state VM whose loop body updates two independent counters per iteration.
+ * Lift target: vm_dual_counter_loop_target.
+ * Goal: cover a loop where the parity-driven branch sends control to one of
+ * two distinct increment handlers and merges back, so the lifter must
+ * preserve two independent phi nodes inside the loop body.  Returns
+ * even_count * 100 + odd_count for limit = x & 0xF.
+ */
+#include <stdio.h>
+
+enum DualVmPc {
+    DV_INIT       = 0,
+    DV_LOAD_LIMIT = 1,
+    DV_INIT_CTRS  = 2,
+    DV_INIT_IDX   = 3,
+    DV_CHECK      = 4,
+    DV_TEST_PAR   = 5,
+    DV_INC_EVEN   = 6,
+    DV_INC_ODD    = 7,
+    DV_INC_IDX    = 8,
+    DV_PACK       = 9,
+    DV_HALT       = 10,
+};
+
+__declspec(noinline)
+int vm_dual_counter_loop_target(int x) {
+    int limit  = 0;
+    int idx    = 0;
+    int evens  = 0;
+    int odds   = 0;
+    int result = 0;
+    int pc     = DV_INIT;
+
+    while (1) {
+        if (pc == DV_INIT) {
+            pc = DV_LOAD_LIMIT;
+        } else if (pc == DV_LOAD_LIMIT) {
+            limit = x & 0xF;
+            pc = DV_INIT_CTRS;
+        } else if (pc == DV_INIT_CTRS) {
+            evens = 0;
+            odds = 0;
+            pc = DV_INIT_IDX;
+        } else if (pc == DV_INIT_IDX) {
+            idx = 0;
+            pc = DV_CHECK;
+        } else if (pc == DV_CHECK) {
+            pc = (idx < limit) ? DV_TEST_PAR : DV_PACK;
+        } else if (pc == DV_TEST_PAR) {
+            pc = ((idx & 1) == 0) ? DV_INC_EVEN : DV_INC_ODD;
+        } else if (pc == DV_INC_EVEN) {
+            evens = evens + 1;
+            pc = DV_INC_IDX;
+        } else if (pc == DV_INC_ODD) {
+            odds = odds + 1;
+            pc = DV_INC_IDX;
+        } else if (pc == DV_INC_IDX) {
+            idx = idx + 1;
+            pc = DV_CHECK;
+        } else if (pc == DV_PACK) {
+            result = evens * 100 + odds;
+            pc = DV_HALT;
+        } else if (pc == DV_HALT) {
+            return result;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dual_counter_loop(10)=%d vm_dual_counter_loop(15)=%d\n",
+           vm_dual_counter_loop_target(10), vm_dual_counter_loop_target(15));
+    return 0;
+}
+```
@@ -0,0 +1,94 @@
+# vm_dual_i64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dual_i64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dual_i64_loop.ll`
+- **Symbol:** `vm_dual_i64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dual_i64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dual_i64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0, RDX=0 | 0 | 0 | 0 | yes | x=0, y=0 |
+| 2 | RCX=1, RDX=2 | 15 | 15 | 15 | yes | x=1, y=2, n=2 |
+| 3 | RCX=51966, RDX=47806 | 17848445641019346730 | 17848445641019346730 | 17848445641019346730 | yes | x=0xCAFE, y=0xBABE |
+| 4 | RCX=18446744073709551615, RDX=1 | 18446744073709551606 | 18446744073709551606 | 18446744073709551606 | yes | x=max u64 (n=8), y=1: linear |
+| 5 | RCX=1, RDX=18446744073709551615 | 18446744073709551614 | 18446744073709551614 | 18446744073709551614 | yes | x=1 (n=2), y=max u64 |
+| 6 | RCX=1311768467463790320, RDX=18364758544493064720 | 9002574064070388976 | 9002574064070388976 | 9002574064070388976 | yes | 0x123..F0, 0xFEDC..10 |
+| 7 | RCX=7, RDX=11 | 2722357788 | 2722357788 | 2722357788 | yes | x=7, y=11, n=8 |
+| 8 | RCX=65537, RDX=65537 | 4295163906 | 4295163906 | 4295163906 | yes | both 0x10001, n=2 |
+| 9 | RCX=9223372036854775808, RDX=9223372036854775808 | 9223372036854775808 | 9223372036854775808 | 9223372036854775808 | yes | both 2^63 |
+| 10 | RCX=3, RDX=11400714819323198485 | 11583513995942334250 | 11583513995942334250 | 11583513995942334250 | yes | x=3, y=K (golden ratio), n=4 |
+
+## Source
+
+```c
+/* PC-state VM with TWO full uint64_t inputs (x in RCX, y in RDX).
+ * Runs state = state * y + x for n = (x & 7) + 1 iterations starting
+ * from state = x ^ y, returning the full uint64_t state.
+ * Lift target: vm_dual_i64_loop_target.
+ *
+ * Distinct from vm_mixed_args_loop (i32+i64) and vm_two_input_loop
+ * (i32+i32): here BOTH arguments are full 64-bit live across the loop
+ * body, with a 64-bit return.  Exercises the lifter's 64-bit register
+ * tracking for both RCX and RDX simultaneously.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DqVmPc {
+    DQ_LOAD       = 0,
+    DQ_INIT       = 1,
+    DQ_LOOP_CHECK = 2,
+    DQ_LOOP_BODY  = 3,
+    DQ_LOOP_INC   = 4,
+    DQ_HALT       = 5,
+};
+
+__declspec(noinline)
+uint64_t vm_dual_i64_loop_target(uint64_t x, uint64_t y) {
+    int      idx   = 0;
+    int      n     = 0;
+    uint64_t state = 0;
+    uint64_t xx    = 0;
+    uint64_t yy    = 0;
+    int      pc    = DQ_LOAD;
+
+    while (1) {
+        if (pc == DQ_LOAD) {
+            n     = (int)(x & 7ull) + 1;
+            xx    = x;
+            yy    = y;
+            state = x ^ y;
+            pc = DQ_INIT;
+        } else if (pc == DQ_INIT) {
+            idx = 0;
+            pc = DQ_LOOP_CHECK;
+        } else if (pc == DQ_LOOP_CHECK) {
+            pc = (idx < n) ? DQ_LOOP_BODY : DQ_HALT;
+        } else if (pc == DQ_LOOP_BODY) {
+            state = state * yy + xx;
+            pc = DQ_LOOP_INC;
+        } else if (pc == DQ_LOOP_INC) {
+            idx = idx + 1;
+            pc = DQ_LOOP_CHECK;
+        } else if (pc == DQ_HALT) {
+            return state;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dual_i64(7,11)=0x%llx vm_dual_i64(0xCAFE,0xBABE)=0x%llx\n",
+           (unsigned long long)vm_dual_i64_loop_target(7ull, 11ull),
+           (unsigned long long)vm_dual_i64_loop_target(0xCAFEull, 0xBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,112 @@
+# vm_dupcount_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 11/11 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dupcount_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dupcount_loop.ll`
+- **Symbol:** `vm_dupcount_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dupcount_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dupcount_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | limit=1: no compares |
+| 2 | RCX=1 | 0 | 0 | 0 | yes | limit=2, data=[1,0] |
+| 3 | RCX=4369 | 1 | 1 | 1 | yes | 0x1111: limit=2, data=[1,1] |
+| 4 | RCX=74565 | 0 | 0 | 0 | yes | 0x12345: limit=6, all distinct |
+| 5 | RCX=858996001 | 0 | 0 | 0 | yes | 0x33334321: limit=2, data=[1,2] |
+| 6 | RCX=2004318071 | 7 | 7 | 7 | yes | 0x77777777: limit=8, all 7s |
+| 7 | RCX=287454020 | 2 | 2 | 2 | yes | 0x11223344: limit=5, data=[4,4,3,3,2] |
+| 8 | RCX=305419895 | 1 | 1 | 1 | yes | 0x12345677: limit=8 |
+| 9 | RCX=4294967295 | 7 | 7 | 7 | yes | all F: 7 dups |
+| 10 | RCX=268439552 | 0 | 0 | 0 | yes | 0x10001000: limit=1, no scan |
+| 11 | RCX=171 | 1 | 1 | 1 | yes | 0xAB: limit=4, data=[B,A,0,0] |
+
+## Source
+
+```c
+/* PC-state VM that counts adjacent equal nibbles extracted from x.
+ * Lift target: vm_dupcount_loop_target.
+ * Goal: cover a loop body that loads TWO stack-array elements at adjacent
+ * indices (data[i-1] and data[i]) and conditionally increments a counter
+ * on equality.  Distinct from vm_runlength_loop (compares previous *bit*,
+ * here previous *array element*).
+ */
+#include <stdio.h>
+
+enum DcVmPc {
+    DC_LOAD       = 0,
+    DC_INIT_FILL  = 1,
+    DC_FILL_CHECK = 2,
+    DC_FILL_BODY  = 3,
+    DC_FILL_INC   = 4,
+    DC_INIT_SCAN  = 5,
+    DC_SCAN_CHECK = 6,
+    DC_SCAN_LOAD  = 7,
+    DC_SCAN_TEST  = 8,
+    DC_SCAN_INC_C = 9,
+    DC_SCAN_INC_I = 10,
+    DC_HALT       = 11,
+};
+
+__declspec(noinline)
+int vm_dupcount_loop_target(int x) {
+    int data[8];
+    int limit = 0;
+    int idx   = 0;
+    int count = 0;
+    int prev  = 0;
+    int cur   = 0;
+    int pc    = DC_LOAD;
+
+    while (1) {
+        if (pc == DC_LOAD) {
+            limit = (x & 7) + 1;
+            count = 0;
+            pc = DC_INIT_FILL;
+        } else if (pc == DC_INIT_FILL) {
+            idx = 0;
+            pc = DC_FILL_CHECK;
+        } else if (pc == DC_FILL_CHECK) {
+            pc = (idx < limit) ? DC_FILL_BODY : DC_INIT_SCAN;
+        } else if (pc == DC_FILL_BODY) {
+            data[idx] = (x >> (idx * 4)) & 0xF;
+            pc = DC_FILL_INC;
+        } else if (pc == DC_FILL_INC) {
+            idx = idx + 1;
+            pc = DC_FILL_CHECK;
+        } else if (pc == DC_INIT_SCAN) {
+            idx = 1;
+            pc = DC_SCAN_CHECK;
+        } else if (pc == DC_SCAN_CHECK) {
+            pc = (idx < limit) ? DC_SCAN_LOAD : DC_HALT;
+        } else if (pc == DC_SCAN_LOAD) {
+            prev = data[idx - 1];
+            cur = data[idx];
+            pc = DC_SCAN_TEST;
+        } else if (pc == DC_SCAN_TEST) {
+            pc = (cur == prev) ? DC_SCAN_INC_C : DC_SCAN_INC_I;
+        } else if (pc == DC_SCAN_INC_C) {
+            count = count + 1;
+            pc = DC_SCAN_INC_I;
+        } else if (pc == DC_SCAN_INC_I) {
+            idx = idx + 1;
+            pc = DC_SCAN_CHECK;
+        } else if (pc == DC_HALT) {
+            return count;
+        } else {
+            return -1;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dupcount_loop(0x77777777)=%d vm_dupcount_loop(0x11223344)=%d\n",
+           vm_dupcount_loop_target(0x77777777), vm_dupcount_loop_target(0x11223344));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_dword_range64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dword_range64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dword_range64_loop.ll`
+- **Symbol:** `vm_dword_range64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dword_range64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dword_range64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: dwords [1,0] |
+| 3 | RCX=2 | 0 | 0 | 0 | yes | x=2 n=1 single dword |
+| 4 | RCX=3 | 3 | 3 | 3 | yes | x=3 n=2: dwords [3,0] |
+| 5 | RCX=3405691582 | 0 | 0 | 0 | yes | 0xCAFEBABE: n=1 single dword |
+| 6 | RCX=3735928559 | 3735928559 | 3735928559 | 3735928559 | yes | 0xDEADBEEF: n=2 dwords [0xDEADBEEF,0] |
+| 7 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | all 0xFF: mx=mn=0xFFFFFFFF |
+| 8 | RCX=72623859790382856 | 0 | 0 | 0 | yes | 0x0102...0708: n=1 single dword |
+| 9 | RCX=1311768467463790320 | 0 | 0 | 0 | yes | 0x12345...EF0: n=1 single dword |
+| 10 | RCX=18364758544493064720 | 0 | 0 | 0 | yes | 0xFEDCBA9876543210: n=1 single dword |
+
+## Source
+
+```c
+/* PC-state VM tracking u32 dword min/max range over n=(x&1)+1 iters:
+ *
+ *   n = (x & 1) + 1;
+ *   s = x; mn = 0xFFFFFFFF; mx = 0;
+ *   while (n) {
+ *     uint64_t d = s & 0xFFFFFFFF;
+ *     if (d > mx) mx = d;
+ *     if (d < mn) mn = d;
+ *     s >>= 32;
+ *     n--;
+ *   }
+ *   return mx - mn;
+ *
+ * Lift target: vm_dword_range64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_byterange64_loop  (u8 byte stride)
+ *   - vm_word_range64_loop (u16 word stride)
+ *
+ * Tests umax/umin folds at 32-bit dword stride.  Single-dword inputs
+ * always return 0 (mx=mn=dword).  4 stateful slots (n,s,mn,mx) with
+ * n-decrement loop control.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DrVmPc {
+    DR_INIT_ALL = 0,
+    DR_CHECK    = 1,
+    DR_BODY     = 2,
+    DR_HALT     = 3,
+};
+
+__declspec(noinline)
+uint64_t vm_dword_range64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t mn = 0;
+    uint64_t mx = 0;
+    int      pc = DR_INIT_ALL;
+
+    while (1) {
+        if (pc == DR_INIT_ALL) {
+            n  = (x & 1ull) + 1ull;
+            s  = x;
+            mn = 0xFFFFFFFFull;
+            mx = 0ull;
+            pc = DR_CHECK;
+        } else if (pc == DR_CHECK) {
+            pc = (n > 0ull) ? DR_BODY : DR_HALT;
+        } else if (pc == DR_BODY) {
+            uint64_t d = s & 0xFFFFFFFFull;
+            mx = (d > mx) ? d : mx;
+            mn = (d < mn) ? d : mn;
+            s = s >> 32;
+            n = n - 1ull;
+            pc = DR_CHECK;
+        } else if (pc == DR_HALT) {
+            return mx - mn;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dword_range64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_dword_range64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
@@ -0,0 +1,102 @@
+# vm_dword_xormul64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dword_xormul64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dword_xormul64_loop.ll`
+- **Symbol:** `vm_dword_xormul64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dword_xormul64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dword_xormul64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 2654435769 | 2654435769 | 2654435769 | yes | x=1 n=2: 1*GR^0=GR |
+| 3 | RCX=2 | 5308871538 | 5308871538 | 5308871538 | yes | x=2 n=1 |
+| 4 | RCX=3 | 7963307307 | 7963307307 | 7963307307 | yes | x=3 n=2: dword 3 then 0 |
+| 5 | RCX=3405691582 | 9040189553442996558 | 9040189553442996558 | 9040189553442996558 | yes | 0xCAFEBABE: n=1 single dword |
+| 6 | RCX=3735928559 | 9916782397438226871 | 9916782397438226871 | 9916782397438226871 | yes | 0xDEADBEEF: n=2 dword + 0 |
+| 7 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | all 0xFF: 2 XOR of 0xFFFFFFFF*GR cancel |
+| 8 | RCX=72623859790382856 | 223718755872922824 | 223718755872922824 | 223718755872922824 | yes | 0x0102...0708: n=1 lower dword=0x05060708 |
+| 9 | RCX=1311768467463790320 | 6891098688453380976 | 6891098688453380976 | 6891098688453380976 | yes | 0x12345...EF0: n=1 lower dword=0x9ABCDEF0 |
+| 10 | RCX=18364758544493064720 | 5269663737911033232 | 5269663737911033232 | 5269663737911033232 | yes | 0xFEDCBA9876543210: n=1 lower dword=0x76543210 |
+
+## Source
+
+```c
+/* PC-state VM that processes u32 dwords per iteration:
+ *
+ *   n = (x & 1) + 1;     // 1..2 dword iters
+ *   s = x; r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     uint64_t d = s & 0xFFFFFFFF;
+ *     r = r ^ (d * 0x9E3779B9);   // golden-ratio prime mul
+ *     s >>= 32;
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_dword_xormul64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_word_xormul64_loop  (u16 word stride)
+ *   - vm_quad_byte_xor64_loop (4 BYTES per iter)
+ *   - vm_xormuladd_chain64_loop (xor + mul + add, no stride)
+ *
+ * Tests u32 zext-i32 reads (mask 0xFFFFFFFF) multiplied by the
+ * 32-bit golden-ratio prime 0x9E3779B9 and XOR-folded into the
+ * accumulator.  Stride is 32 bits per iter; loop runs 1..2 times.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DwVmPc {
+    DW_INIT_ALL = 0,
+    DW_CHECK    = 1,
+    DW_BODY     = 2,
+    DW_INC      = 3,
+    DW_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_dword_xormul64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t s  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = DW_INIT_ALL;
+
+    while (1) {
+        if (pc == DW_INIT_ALL) {
+            n = (x & 1ull) + 1ull;
+            s = x;
+            r = 0ull;
+            i = 0ull;
+            pc = DW_CHECK;
+        } else if (pc == DW_CHECK) {
+            pc = (i < n) ? DW_BODY : DW_HALT;
+        } else if (pc == DW_BODY) {
+            uint64_t d = s & 0xFFFFFFFFull;
+            r = r ^ (d * 0x9E3779B9ull);
+            s = s >> 32;
+            pc = DW_INC;
+        } else if (pc == DW_INC) {
+            i = i + 1ull;
+            pc = DW_CHECK;
+        } else if (pc == DW_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dword_xormul64(0xCAFEBABE)=%llu\n",
+           (unsigned long long)vm_dword_xormul64_loop_target(0xCAFEBABEull));
+    return 0;
+}
+```
@@ -0,0 +1,101 @@
+# vm_dyn_ashr64_loop - original vs lifted equivalence
+
+- **Verdict:** PASS
+- **Cases:** 10/10 equivalent
+- **Source:** `testcases/rewrite_smoke/vm_dyn_ashr64_loop.c`
+- **Lifted IR:** `rewrite-regression-work/ir_outputs/vm_dyn_ashr64_loop.ll`
+- **Symbol:** `vm_dyn_ashr64_loop_target`
+- **Native driver:** `rewrite-regression-work/eq/vm_dyn_ashr64_loop_eq.exe`
+- **Lifted signature:** `define i64 @main(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory, i128 %XMM0, i128 %XMM1, i128 %XMM2, i128 %XMM3, i128 %XMM4, i128 %XMM5, i128 %XMM6, i128 %XMM7, i128 %XMM8, i128 %XMM9, i128 %XMM10, i128 %XMM11, i128 %XMM12, i128 %XMM13, i128 %XMM14, i128 %XMM15) local_unnamed_addr #0`
+
+## Equivalence (native vs lifted)
+
+Each row runs the same inputs through (a) the original program compiled to a real Win64 binary that calls `vm_dyn_ashr64_loop_target` directly, and (b) the lifted+optimized LLVM IR executed via `lli`. A case is equivalent only if both observations agree and also match the manifest's expected value.
+
+| # | Inputs | Manifest | Native | Lifted | Equivalent | Label |
+|---|--------|----------|--------|--------|------------|-------|
+| 1 | RCX=0 | 0 | 0 | 0 | yes | all zero -> 0 |
+| 2 | RCX=1 | 1 | 1 | 1 | yes | x=1 n=2: byte0=1 xor byte1=0 |
+| 3 | RCX=2 | 3 | 3 | 3 | yes | x=2 n=3 |
+| 4 | RCX=7 | 5 | 5 | 5 | yes | x=7 n=8: max trip |
+| 5 | RCX=8 | 8 | 8 | 8 | yes | x=8 n=1: byte0 of x |
+| 6 | RCX=3405691582 | 141 | 141 | 141 | yes | 0xCAFEBABE: n=7 mixed shifts |
+| 7 | RCX=3735928559 | 97 | 97 | 97 | yes | 0xDEADBEEF: n=8 |
+| 8 | RCX=18446744073709551615 | 0 | 0 | 0 | yes | all 0xFF: ashr fills 1s; 8 xor of 0xFF cancel to 0 |
+| 9 | RCX=9223372036854775808 | 0 | 0 | 0 | yes | x=2^63 n=1: byte0=0 single iter (high bit only) |
+| 10 | RCX=1311768467463790320 | 240 | 240 | 240 | yes | 0x12345...EF0: n=1 byte0=0xF0 |
+
+## Source
+
+```c
+/* PC-state VM running a dynamic-amount ASHR (signed shift right) and
+ * XOR-fold of the low byte over n = (x & 7) + 1 iterations:
+ *
+ *   n = (x & 7) + 1;
+ *   r = 0;
+ *   for (i = 0; i < n; i++) {
+ *     int64_t sx = (int64_t)x >> i;       // dynamic ashr by i
+ *     r = r ^ ((uint64_t)sx & 0xFF);
+ *   }
+ *   return r;
+ *
+ * Lift target: vm_dyn_ashr64_loop_target.
+ *
+ * Distinct from:
+ *   - vm_bitfetch_window64_loop  (dynamic LSHR by counter)
+ *   - vm_dynshl_pack64_loop      (dynamic SHL by counter)
+ *   - vm_zigzag_step64_loop      (constant ashr-by-63)
+ *
+ * Completes the dynamic-shift trio (lshr / shl / ashr) for tests of
+ * `ashr i64 x, %i` where %i is the loop-index phi.  Sign-extends the
+ * input one position-shift further each iteration; the low byte
+ * captures the moving signed window.  Negative inputs (high bit set)
+ * fill with 1s, leading to different XOR patterns than unsigned shift.
+ */
+#include <stdio.h>
+#include <stdint.h>
+
+enum DaVmPc {
+    DA_INIT_ALL = 0,
+    DA_CHECK    = 1,
+    DA_BODY     = 2,
+    DA_INC      = 3,
+    DA_HALT     = 4,
+};
+
+__declspec(noinline)
+uint64_t vm_dyn_ashr64_loop_target(uint64_t x) {
+    uint64_t n  = 0;
+    uint64_t r  = 0;
+    uint64_t i  = 0;
+    int      pc = DA_INIT_ALL;
+
+    while (1) {
+        if (pc == DA_INIT_ALL) {
+            n = (x & 7ull) + 1ull;
+            r = 0ull;
+            i = 0ull;
+            pc = DA_CHECK;
+        } else if (pc == DA_CHECK) {
+            pc = (i < n) ? DA_BODY : DA_HALT;
+        } else if (pc == DA_BODY) {
+            int64_t sx = (int64_t)x >> (int)i;
+            r = r ^ ((uint64_t)sx & 0xFFull);
+            pc = DA_INC;
+        } else if (pc == DA_INC) {
+            i = i + 1ull;
+            pc = DA_CHECK;
+        } else if (pc == DA_HALT) {
+            return r;
+        } else {
+            return 0xFFFFFFFFFFFFFFFFull;
+        }
+    }
+}
+
+int main(void) {
+    printf("vm_dyn_ashr64(0xDEADBEEF)=%llu\n",
+           (unsigned long long)vm_dyn_ashr64_loop_target(0xDEADBEEFull));
+    return 0;
+}
+```
--- a/Show More
+++ b/Show More