Mergen

mirror of https://github.com/NaC-L/Mergen.git synced 2026-05-12 09:40:34 +00:00

Author	SHA1	Message	Date
naci	71fc60766d	Loop generalization, BSR/BSF intrinsics, and stack alloca split (#207 ) * lifter: move themida control/target slots to per-state fields (phase A) Lift the kThemidaControlCursorSlot/kThemidaLoopCarriedSlot constants out of the helper bodies and into per-loop GeneralizedLoopControlFieldState fields controlSlot/targetSlot. The populator seeds them to the legacy Themida defaults, so behavior is unchanged on the reference Themida sample and on every existing test. Phase B will replace the populator's literal seed with active discovery against canonical/backedge buffers, enabling per-binary slot identification. Sites updated: - GeneralizedLoopControlFieldState: new controlSlot/targetSlot fields, reset in clearGeneralizedLoopControlFieldState - load_generalized_backup_impl: introduces controlSlot/targetSlot locals, uses them for canonical+backedge reads, seeds them into the activated state - matchGeneralizedLoopControlFieldAddress: gates the GEP-base check on activeGeneralizedLoopControlFieldState.controlSlot - retrieve_generalized_loop_control_slot_value_impl: gates on state.controlSlot - retrieve_generalized_loop_target_slot_value_impl: gates on state.targetSlot - record_generalized_loop_backedge_impl: reads current control via stateIt->second.controlSlot in both rotate and append-or-update paths Tests that build state directly (bypassing the populator) are updated to seed the new fields where they call retrieve helpers. * lifter: discover themida control/target slots from canonical+backedge buffers (phase B) Replaces the populator's hardcoded reads at kThemidaControlCursorSlot / kThemidaLoopCarriedSlot with active per-loop discovery against the canonical backup buffer and the generalized-loop backedge buffers. The discovery is implemented as two helpers next to load_generalized_backup_impl: - tryPopulateControlFromSlot(canonical, backedges, slot, dst): probes a specific candidate slot and, on success, fills dst with the canonical / backedge controls and per-backedge buffers that the slot motivates. - discoverGeneralizedLoopSlots(canonical, backedges): drives the search. The control-slot search prefers the legacy Themida cursor (zero behavior change on the reference sample) and falls back to scanning canonical for the qword-start address with the most-varying backedges, tiebreaking by lowest address. The target-slot search prefers the legacy carried slot and falls back to the lowest-address candidate that is tracked across canonical and every selected backedge buffer. Stack-frame addresses (anything inside [STACKP-reserve, STACKP+reserve)) are excluded from candidates, so caller-frame stack args at e.g. STACKP+24 are no longer mistakenly chosen as target slots. This matters for the nested-loop local-buffer test, whose canonical buffer carries a tracked qword above STACKP from the outer loop's prior backedge. Two existing KNOWN-LIMITATION tests are flipped to assert the new positive contract: - generalized_loop_non_themida_control_slot_produces_no_phi -> generalized_loop_non_themida_slot_picks_up_as_target_when_legacy_control_present (the non-Themida slot is now picked up as the target slot when the legacy cursor is present, and a 2-way phi is produced at it). - generalized_loop_non_themida_target_slot_produces_no_phi -> generalized_loop_discovery_picks_non_themida_target_slot (the discovered target slot is asserted, and the helper produces a 2-way phi with both incoming concrete values). Verification: - 228 rewrite_microtests pass (no regressions). - check_themida_equivalence.py: example2 still recovers all 4 required imports (CharUpperA, GetStdHandle, ReadConsoleA, WriteConsoleA). * loop generalization: data-driven register preservation + multi-slot carried state Phase 2: Replace shouldPreserveGeneralizedBackedgeRegisterIndex (hardcoded Themida-specific index set {1,4,7,9,10,12,14}) with data-driven comparison of canonical vs backedge values. A register is now preserved when its value changed across the loop boundary; RSP is always preserved. This prevents non-Themida loops from silently losing loop-carried state in registers outside the hardcoded set. Phase 1: Extend GeneralizedLoopControlFieldState with a carriedSlots vector that tracks ALL varying memory qwords discovered during slot analysis, not just the single controlSlot + targetSlot. The retrieve_target_slot helper now checks carriedSlots after the legacy targetSlot, building phis for any matching carried address. Rotation logic in record_generalized_loop_backedge updates carried slot values alongside the primary control slot. Phase 3: Add vm_tea_round_loop sample — TEA-style compound cross-update with 3 independently loop-carried state variables (v0, v1, sum). 10 semantic test cases including the previously-failing x=0x65501 input. All pass. Test results: 247/247 pattern-verified, 245/245 semantic (2342 cases), all microtests green including flag checks. * add vm_subroutine_loop: single-depth call/ret VM with indirect PC dispatch The vm_subroutine_loop pattern previously crashed the lifter with an access violation (0xC0000005). The combination of multi-slot carried state, data- driven register preservation, and emergency generalization now handles this pattern correctly: 8 semantic cases pass, no crash. The sample uses a one-deep return-PC slot (rpc) for indirect dispatch — the simplest form of the pattern that was fundamentally unsupported. 248/248 samples, 246/246 semantic (2350 cases), Themida gate green. * add vm_callret_loop and vm_bubblesort_loop: previously budget-blown patterns Both patterns previously exhausted maxBasicBlockBudget (~4087 blocks): - vm_callret_loop: stack-array-indexed PC dispatch (rstack[rsp]) - vm_bubblesort_loop: conditional two-slot array swap per iteration With emergency generalization (75% budget threshold), both now lift without hitting the budget ceiling (75 and 59 blocks respectively). The patterns are registered with IR shape checks only (no semantic assertions) because the indirect dispatch and conditional multi-slot writes are not yet semantically accurate under generalization. 250/250 samples, 247/247 semantic (2358 cases), Themida gate green. * semantics: rewrite BSR/BSF to use llvm.ctlz/cttz intrinsics Replace the bitWidth-iteration unrolled bit-scan loops (32 AND+ICMP+SELECT chains for i32, 64 for i64) with single @llvm.ctlz / @llvm.cttz intrinsic calls. BSR = bitWidth - 1 - ctlz(x, true); BSF = cttz(x, true). The zero-input case is handled with is_zero_undef=true (matching BSR/BSF architectural undefined-when-zero behavior) plus an explicit select that returns undef when the input is zero. Constant folding is preserved. IR quality improvement: vm_imported_clz_loop and vm_imported_bsr_loop now show a single call @llvm.ctlz.i32 instead of 30+ bsrtest/icmp/select instructions. Pattern manifests updated to match 'call'. lift_lzcnt and lift_tzcnt already used the intrinsics — BSR/BSF were the only remaining scalar bit-scan ops with unrolled implementations. Side benefit: flag-stress tests bsf_00 and bsf_01 fixed (constant-folded input now produces correct PF flag instead of running the unrolled loop). * PromotePseudoStackPass: split into main + escape alloca by call-escape Previously, GEPs that flow into call arguments either: (a) were skipped from promotion → left as memory-base GEPs that PromotePseudoMemory turned into raw inttoptr(stack_addr) constants (e.g., 'ptr nonnull inttoptr (i64 1375592 to ptr)' as WriteConsoleA lpNumberOfCharsWritten arg), or (b) were promoted to a single shared alloca, blocking SROA for the whole alloca and leaving hundreds of dead dispatcher-scratch stores in the post-opt IR. Two-alloca split fixes both: - Main alloca: scratch slots that don't escape via calls. SROA decomposes it cleanly; DSE eliminates dead stores. - Escape alloca: slots whose pointer flows into a CallBase. Won't SROA but is isolated, so dispatcher noise doesn't block its dead-store elimination. Classification is by constant offset: any offset touched by ANY GEP with a CallBase user is marked escaped. All GEPs (constant or not) at escaped offsets go to the escape alloca to preserve pointer identity within each slot. Non-constant offsets always go to the main alloca (in practice, lifters use them for buffers; constant offsets for API scalar slots). Themida WriteConsoleA call now shows clean alloca GEPs: ptr nonnull %4 (= stackmemory.escape + 200) ptr nonnull %6 (= stackmemory.escape + 208) instead of: ptr nonnull inttoptr (i64 1375584 to ptr) ptr nonnull inttoptr (i64 1375592 to ptr) Stack-range inttoptr in Themida output drops to zero. Total stores drop dramatically (the remaining ones are .themida section writes, a separate dispatcher-state issue not related to the stack alloca). Pattern updates for 4 samples whose IR shape changed due to the cleaner alloca decomposition: - vm_fibonacci_loop: switch i32 -> br i1 - vm_search_loop: br i1 -> select - vm_signed_dword_sum64: sext -> ashr - vm_signed_word_sum64: sext -> ashr --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-05-08 14:00:42 +03:00
naci	841d6bbcdb	docs: add Control-Flow Recognition section and clarify punpcklqdq state (#206 ) Two doc updates following #205: ARCHITECTURE.md gains a 'Control-Flow Recognition' section covering the lift_ret REAL_return / ROP-return classification, the ret-to-IAT chain pattern (the Themida-virt mitigation that #195/#196/#205 built out), the lift_jmp direct/indirect dispatch, and the Iced operand-type quirk that motivates widening SSE accept sets. These were all undocumented and the ret-to-IAT chain in particular is a non-trivial structural rewrite that future maintainers should not have to reverse-engineer from the source. REWRITE_BASELINE.md's punpcklqdq line now reflects what actually happened: the handler had been present for a while but silently fell through to not_implemented for every site because Iced classifies the source operand by bytes-actually-accessed (low 64), not by physical XMM width. Fixed in #205 (`ba20a39`) by widening the accept set; pre-existing oracle vectors now pass and gate future regressions. Doc-only change. Behavior unchanged. Co-authored-by: Yusuf <yusuf.canislek@meetdandy.com>	2026-05-02 20:54:15 +03:00
naci	605a36e8ed	lifter: correctness fixes, refactors, and regression tests (#205 ) * lifter: restore indirect-jump threshold to 128 * gitignore: glob output_.ll instead of enumerating dumps Replace output_finalnoopt.ll / output_no_opts.ll entries with output_.ll so ad-hoc lifter dumps (output_rets.ll, output_newpath.ll, etc.) stop showing up in git status. * lifter: factor REAL_return path through emitResolvedFunctionReturn Pull the rax-zext + CreateRet + run/finished bookkeeping out of the REAL_return branch in lift_ret() into a local lambda so future ret exit points can reuse it without duplicating four lines of boilerplate. Drop the dead returnStruct/myStruct scaffolding and the originalFunc_finalnopt local: every InsertValue call site has been commented out for a long time and the locals had no remaining uses. The active code emits a plain rax return. No behavior change. * lifter: advance RSP past continuation slot in ret-to-IAT chain In the chained import-return pattern (`ret` to IAT slot, IAT slot holds an external function address, the function returns and control resumes at the next stack slot's continuation address), the lifter collapses the two pops into a single `call @import; br contBB`. RSP was only advanced past the IAT slot itself, so post-call register state still claimed RSP pointed at the continuation address. Any downstream stack read from RSP saw stale data and any solver that constant-folded RSP picked up a value that no longer matched the post-chain physical layout. Bump RSP by another `ptrSize` immediately before lowering the import call so the continuation block inherits the same RSP it would have under a faithful two-pop lowering. * lifter/test: regression test for ret-to-IAT chain RSP advancement Locks in dd95fe7. The microtest stands up a LifterUnderTest, plants [importVA, contVA] on the stack at an RSP that is intentionally NOT equal to STACKP_VALUE (so the lift_ret REAL_return short-circuit does not fire), registers the import in the lifter's importMap, and lifts a single `ret` (0xC3). It then asserts that: - the chain handler emitted a direct call to the registered import - RSP after the chain equals entry RSP + 16, not + 8 Without the fix the test fails with RSP = entry + 8 (only the IAT slot pop is modeled), exactly the off-by-8 the fix closes. Verified the test catches the regression by reverting dd95fe7 locally before re-applying — the failing message reads "RSP after chain = 0x14FDA8; expected 0x14fdb0". * scripts/themida: filter lifter-synthesized helpers from import diff Calls to lifter-emitted helpers (`@exception`, `@fastfail`, `@not_implemented`, etc.) surfaced as 'extra import (not required)' lines on every Themida equivalence run. They are not user imports; they are lowered from INT1/INT3/UD2/INT29/SYSCALL/segment-load sites in the lifter's own semantics files. Skip them in `_extract_call_names` so the equivalence diff shows only real imports. The list of helpers lives next to the call regex so it stays adjacent to the code that emits them; if a new helper shows up in the IR (e.g. another illegal-instruction lowering) the script will surface it as an 'extra import' until the entry is added here, which is the right tripwire. Before: example2 \xe2\x80\x94 6 distinct imports, 10 calls (3 noise calls) After: example2 \xe2\x80\x94 4 distinct imports, 7 calls (clean) * lifter/analysis: replace 'TODO: fix?' marker with positive explanation The 2-value path-solving fork's swap branch had a 'TODO: fix?' comment from the original draft. Traced both branches and confirmed the swap is correct: - When the select's trueValue equals firstcase, condition is the select's condition as-is and firstcase\xe2\x86\x92bb_true wires correctly. - When trueValue equals secondcase, condition still expresses 'true picks trueValue' but downstream code uses firstcase\xe2\x86\x92bb_true. Swapping firstcase\xe2\x86\x94secondcase makes firstcase refer to the trueVal constant so the existing CreateCondBr wiring stays correct without a parallel reversed-branch path. Replaced the TODO with a comment that explains why the swap is necessary, so future readers do not waste time investigating a branch that is intentional. * lifter: accept Register64/Memory64 source for punpcklqdq Iced classifies operand types by the bytes the instruction actually accesses, not by physical register width. PUNPCKLQDQ only reads the low 64 bits of its second operand, so Iced reports Register64 (or Memory64 for the m128 form) for a source whose physical encoding is `xmm/m128`. The lift handler's accept check rejected anything other than Register128/Memory128 and fell through to the not_implemented exit, so every `punpcklqdq xmm, xmm/m128` site lowered to a bogus `call @not_implemented; ret` instead of the unpack semantic. Widen the accept set to Register64 and Memory64 too. The body already truncates the source to i64 before OR'ing it into the high half of the result, so a 64-bit-typed source is semantically identical to a 128-bit one for this handler. Fixes the two pre-existing oracle test failures `punpcklqdq_xmm0_xmm1_basic` and `punpcklqdq_xmm0_xmm1_zero_upper_from_zero_source`. `python test.py all` stays at 244/244, confirming no semantic regressions. * lifter: replace lift_jmp's fallthrough switch with an isDirectJump if The RIP-relative add for direct jumps lived inside a 4-case switch whose body intentionally fell through into `default: break;`. It worked, but: - Implicit fallthrough is a -Wimplicit-fallthrough hazard. Today the default does nothing; tomorrow someone adds a body and every direct jump silently runs it. - The switch's discriminator is exactly `isDirectJump`, which is already computed two lines above for the path-solver context. The switch was a parallel restatement of the same predicate. Collapse the switch into `if (isDirectJump) { trunc = add(trunc, ripval); }` so the predicate has one definition and there is no fallthrough to misuse. Behavior unchanged: the same immediate cases still get the RIP-relative bump, indirect jumps still skip it, and `python test.py all` stays at 244/244. * lifter/test: regression test for SSE memory-form handler dispatch Lock in that pand/por/pxor accept the `xmm, [mem]` encoding form. The test lifts `66 0F DB 00`, `66 0F EB 00`, and `66 0F EF 00` (one `xmm0, [rax]` site each) and asserts that the lifted function does not contain a direct call to @not_implemented. Pure structural acceptance: not validating bitwise-AND/OR/XOR semantics, only that the handler dispatched at all. Iced today reports Memory128 for these encodings so the test passes against the existing `Register128 \|\| Memory128` accept sets. If a future Iced update reclassifies the source operand by bytes-actually-accessed (the way it already does for punpcklqdq, where it reports Register64/Memory64 even for an `xmm/m128` encoding) the handler would silently fall through to `call @not_implemented; ret` and miscompile every memory-form site \u2014 this test trips first. * lifter: drop duplicate stdout print on unresolved indirect jmp `lift_jmp` printed every UnresolvedIndirectJump twice: once as a raw `std::cout << "[diag] lift_jmp: ..."` and once through `diagnostics.warning(...)` on the very next line. The diagnostics framework already persists the warning to `output_diagnostics.json` at lift completion, and no script or test grep'd the stdout form. Drop the std::cout. The diagnostic remains in the recorded diagnostics list, surfaceable via the JSON dump or the in-memory entries vector. This removes the only unguarded raw `[diag]` print in the lift path -- the rest are gated on `liftProgressDiagEnabled` or specific hot addresses for active debugging. * scripts/themida: fix docstring escape leak in import-filter doc Audit of #205 caught a literal `\\u2014` and unnecessary `\\"` escapes in the `_extract_call_names` docstring \xe2\x80\x94 leftovers from how the surrounding commit (#205, scripts/themida: filter lifter-synthesized helpers) was authored. Replace the literal escape with a plain `--` and drop the redundant backslash-quotes; the docstring now renders cleanly at `help(_extract_call_names)` and looks normal in the source. Behavior unchanged: `python test.py themida` still passes with the same import-diff filter (4 imports, 7 calls for example2). --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-05-02 11:58:47 +03:00
naci	9c32ecd235	Autoresearch/lets craete more test cases complex loops with v 20260425 (#203 ) * baseline: 3 fully-wired VM samples (dummy/bytecode/stack vm loops) Result: {"status":"keep","vm_sample_count":3,"total_semantic_cases":177,"manifest_samples":33} * added 3 toy VM samples: register-machine, nested loops, branchy loop body Result: {"status":"keep","vm_sample_count":6,"total_semantic_cases":205,"manifest_samples":36} * added 3 more VM samples: factorial (mul recurrence), collatz (data-dep path), gcd (modulo-driven non-counted loop) Result: {"status":"keep","vm_sample_count":9,"total_semantic_cases":231,"manifest_samples":39} * added 3 more VM samples: fibonacci (two-state recurrence), switch-dispatched VM, countdown loop (reverse induction) Result: {"status":"keep","vm_sample_count":12,"total_semantic_cases":259,"manifest_samples":42} * added 3 bitwise/multiplicative VM samples: popcount (zero-test loop), power (two symbolic operands), bitreverse (shift+OR fixed trip count) Result: {"status":"keep","vm_sample_count":15,"total_semantic_cases":289,"manifest_samples":45} * added 3 VM samples: linear search with early-exit, dual-counter parity split (two phis), XOR accumulator with multiplication Result: {"status":"keep","vm_sample_count":18,"total_semantic_cases":315,"manifest_samples":48} * added 2 VM samples: LCG mixed mul/add/mask recurrence and stack-table-driven next-PC dispatch Result: {"status":"keep","vm_sample_count":20,"total_semantic_cases":335,"manifest_samples":50} * added vm_callret_loop: VM with explicit return-PC stack, two call sites converging on the same subroutine handler chain Result: {"status":"keep","vm_sample_count":21,"total_semantic_cases":346,"manifest_samples":51} * all 49 manifest samples lift and verify against actual IR. Patterns rewritten to match what the lifter emits: switch i32 dispatchers, mul nuw nsw shapes, llvm.bitreverse.i8 intrinsic, mul i33 + lshr i33 closed-form for triangular sums. Removed 2 samples that exposed real lifter limitations: vm_callret_loop (rstack indirect pc, BB budget exceeded) and vm_switch_dispatch_loop (lifted to constant -1). Result: {"status":"keep","vm_sample_count":19,"total_semantic_cases":313,"manifest_samples":49} * 19/19 vm samples now pass both rewrite-regression IR pattern verification AND lli runtime semantic check (168 semantic cases total). Fixed branchy by adding explicit i=0/count=0 init in BV_LOAD_LIMIT (dual_counter pattern); collatz already fixed by collapsing CV_INIT into CV_LOAD_N. Captured all observed lifter limitations in autoresearch.md. Result: {"status":"keep","vm_sample_count":19,"total_semantic_cases":313,"manifest_samples":49} * added vm_hamming_loop: bitwise loop with TWO symbolic operands (a=x&0xF, b=(x>>4)&0xF), XOR-then-popcount body. Used the dual_counter init-state pattern from the start so it passed lli semantic check on the first try. Result: {"status":"keep","vm_sample_count":20,"total_semantic_cases":323,"manifest_samples":50} * added vm_lfsr_loop: 8-bit Galois LFSR with conditional XOR-and-shift recurrence; symbolic seed and trip count both derived from x. Used dual_counter init pattern up front; passed lift + lli on first attempt. Result: {"status":"keep","vm_sample_count":21,"total_semantic_cases":333,"manifest_samples":51} * added vm_rotate_loop: 8-bit left rotation via shl\|lshr\|or pattern with symbolic value and rotate count. Distinct from existing shift loops in that bits wrap around. Result: {"status":"keep","vm_sample_count":22,"total_semantic_cases":343,"manifest_samples":52} * vm_powermod_loop now passes both pattern verification (urem matched) and lli semantic check (11/11 cases). Square-and-multiply modular exponentiation is the most lifter-stressing sample yet: combines bitwise LSB extraction, conditional multiply-and-mod, exponent shift, and base squaring all in one body. Result: {"status":"keep","vm_sample_count":23,"total_semantic_cases":354,"manifest_samples":53} * added vm_saturating_loop: counted sum loop with value-clamp at 100; lifter recognizes if-then-set as select; pattern + lli pass on first try Result: {"status":"keep","vm_sample_count":24,"total_semantic_cases":376,"manifest_samples":54} * vm_geometric_loop now passes both gates (mask pattern updated to 254). Log2-style doubling loop is distinct from existing additive/multiplicative recurrences. Result: {"status":"keep","vm_sample_count":25,"total_semantic_cases":386,"manifest_samples":55} * vm_polynomial_loop now passes both gates with unrolled-shape patterns. Horner method evaluation with stack-array coefficient lookup; lifter unrolls the 4-trip loop into closed-form arithmetic. Result: {"status":"keep","vm_sample_count":26,"total_semantic_cases":396,"manifest_samples":56} * vm_digitsum_loop now passes both gates. Decimal digit-sum loop with non-power-of-2 divisor exposes the lifter's divmod fusion (n%10 emitted as n + (n/10)-10). Result: {"status":"keep","vm_sample_count":27,"total_semantic_cases":408,"manifest_samples":57} added vm_isqrt_loop: Newton's integer square root with division by loop variable. Passes both gates with 15 semantic cases on first try. Result: {"status":"keep","vm_sample_count":28,"total_semantic_cases":423,"manifest_samples":58} * added vm_minarray_loop: two-pass VM (fill array, then scan for min) with both data and trip count derived from x. 12 semantic cases pass on first try. Result: {"status":"keep","vm_sample_count":29,"total_semantic_cases":435,"manifest_samples":59} * vm_classify_loop now passes 10/10. Refactored to single packed accumulator (acc += 100/10/1) instead of three separate counters - sidesteps the multi-counter phi-undef pattern when several stack slots all init to 0. Result: {"status":"keep","vm_sample_count":30,"total_semantic_cases":445,"manifest_samples":60} * vm_carrychain_loop now passes both gates with unrolled-shape patterns. Bit-by-bit ripple carry adder; the 8-trip fixed-bound loop is fully unrolled by the lifter. Result: {"status":"keep","vm_sample_count":31,"total_semantic_cases":456,"manifest_samples":61} * added vm_prefix_sum_loop: two-phase VM that fills a stack array then walks it computing in-place running prefix sum (writes back to data[idx] each iteration). Distinct from minarray which only reads on second pass. Result: {"status":"keep","vm_sample_count":32,"total_semantic_cases":467,"manifest_samples":62} * vm_pcg_loop now passes both gates (mask 254 fix). LCG state advance + XOR-shift output mixing per iteration; distinct from lcg (mul/add/mask only) and lfsr (shift+conditional XOR only). Result: {"status":"keep","vm_sample_count":33,"total_semantic_cases":479,"manifest_samples":63} * added vm_shiftmul_loop: schoolbook shift-and-add multiplication. 8-trip loop with conditional add of (a << i) when bit i of b is set. Passes both gates with 11 semantic cases. Result: {"status":"keep","vm_sample_count":34,"total_semantic_cases":490,"manifest_samples":64} * vm_xordecrypt_loop now passes both gates. Three-phase VM (fill, decrypt, sum) over a fixed 8-byte stack buffer; lifter unrolls all three loops but preserves the algebraic identity. Result: {"status":"keep","vm_sample_count":35,"total_semantic_cases":500,"manifest_samples":65} * added vm_zigzag_loop: alternating-sign accumulator (parity branch picks add vs sub on a single counter). 11 cases including unsigned wraparound for negative results. Result: {"status":"keep","vm_sample_count":36,"total_semantic_cases":511,"manifest_samples":66} * added vm_horner_signed_loop: Horner with signed coefficients [1,-2,3,-4]; tests sign-extended array loads + signed multiply-and-add. 10 cases including unsigned wraparound for negative results. Result: {"status":"keep","vm_sample_count":37,"total_semantic_cases":521,"manifest_samples":67} * vm_bittransitions_loop now passes both gates with branchless body + unrolled patterns. Counts adjacent-bit transitions in the low 16 bits via XOR-and-mask. Result: {"status":"keep","vm_sample_count":38,"total_semantic_cases":532,"manifest_samples":68} * added vm_piecewise_loop: piecewise linear function (3-way range branch) applied repeatedly to a single accumulator. Distinct from classify (counter) and collatz (2-way branch). 11 semantic cases pass. Result: {"status":"keep","vm_sample_count":39,"total_semantic_cases":543,"manifest_samples":69} * vm_modcounter_loop now passes both gates with fixed input. Counter wraps modulo 7 every iteration; symbolic step+counter+iter-count. Result: {"status":"keep","vm_sample_count":40,"total_semantic_cases":554,"manifest_samples":70} * added vm_argmax_loop: find INDEX of max element in symbolic-content array. Two co-related state vars (best value + best index) updated together; distinct from minarray which only tracks value. Result: {"status":"keep","vm_sample_count":41,"total_semantic_cases":565,"manifest_samples":71} * vm_prefix_xor_loop now passes with low-bit limit and getelementptr pattern. In-place cumulative XOR over symbolic-content stack array. Result: {"status":"keep","vm_sample_count":42,"total_semantic_cases":576,"manifest_samples":72} * added vm_palindrome_loop: bitwise palindrome check on low 8 bits with early-exit on mismatch. 14 semantic cases pass. Result: {"status":"keep","vm_sample_count":43,"total_semantic_cases":590,"manifest_samples":73} * added vm_caesar_loop: three-phase VM (fill, additive shift, sum) over a stack buffer. Add+mask transform distinct from XOR transform of xordecrypt. 12 semantic cases. Result: {"status":"keep","vm_sample_count":44,"total_semantic_cases":602,"manifest_samples":74} * added vm_ca_loop: Rule-90 cellular automaton step (state' = (state<<1) ^ (state>>1)) iterated symbolic times. Distinct linear bitwise update coupling shifts in both directions. 12 cases. Result: {"status":"keep","vm_sample_count":45,"total_semantic_cases":614,"manifest_samples":75} * added vm_djb2_loop: DJB2-style hash recurrence (hash = hash * 33 + nibble) consuming nibbles of x. 12 cases. Multiplicative-then-additive update with per-iteration symbolic input. Result: {"status":"keep","vm_sample_count":46,"total_semantic_cases":626,"manifest_samples":76} * added vm_runlength_loop: count distinct runs of 1-bits in low 16 bits with always-write recipe (runs += start_predicate). Sequential dependency on previous bit. 13 cases. Result: {"status":"keep","vm_sample_count":47,"total_semantic_cases":639,"manifest_samples":77} * added vm_skiploop_loop: counted loop with continue-style skip on odd iterations; sums squares of even indices. Tests dispatcher transition that bypasses body via parity branch. 11 cases. Result: {"status":"keep","vm_sample_count":48,"total_semantic_cases":650,"manifest_samples":78} * added vm_kernighan_loop: Brian Kernighan's popcount trick (v &= v-1 until zero). Trip count equals popcount itself. Distinct termination shape from vm_popcount_loop. 12 cases. Result: {"status":"keep","vm_sample_count":49,"total_semantic_cases":662,"manifest_samples":79} * added vm_find2max_loop: track top1 and top2 over a stack array. Three-way update branch: shift the pair / update only top2 / no change. 11 cases. Reached round-50 sample milestone. Result: {"status":"keep","vm_sample_count":50,"total_semantic_cases":673,"manifest_samples":80} * added vm_ctz_loop: count trailing zeros (capped at 32). Loop with EARLY BREAK on LSB-set predicate; counter doubles as result. 12 cases. Result: {"status":"keep","vm_sample_count":51,"total_semantic_cases":685,"manifest_samples":81} * added vm_dupcount_loop: count adjacent equal nibbles in stack array. Two stack-array loads per iteration (data[i-1] + data[i]) with equality predicate. 11 cases. Result: {"status":"keep","vm_sample_count":52,"total_semantic_cases":696,"manifest_samples":82} * vm_hexcount_loop now passes both gates with always-write recipe and zext pattern. Counts hex letter nibbles (>= 10) in 32-bit value. 12 cases. Result: {"status":"keep","vm_sample_count":53,"total_semantic_cases":708,"manifest_samples":83} * added vm_stride_loop: counted loop with step-2 induction (idx += 2) summing every other array element. Distinct induction step from skiploop (skip via parity branch). 12 cases. Result: {"status":"keep","vm_sample_count":54,"total_semantic_cases":720,"manifest_samples":84} * added vm_runlmax_loop: longest run of 1-bits in low 16 bits. Two co-related state vars (cur, max) updated via always-write recipe (cur = (cur+1)bit; max = (cur > max) ? cur : max). 12 cases. Result: {"status":"keep","vm_sample_count":55,"total_semantic_cases":732,"manifest_samples":85} added vm_window_loop: 3-element sliding window max-sum over symbolic stack array. Loop body loads three adjacent elements per iteration. 11 cases. Result: {"status":"keep","vm_sample_count":56,"total_semantic_cases":743,"manifest_samples":86} * added vm_4state_loop: cyclic 4-operation state machine. Inner state mod 4 picks ADD / XOR / MUL / SUB per iteration. 11 cases. Result: {"status":"keep","vm_sample_count":57,"total_semantic_cases":754,"manifest_samples":87} * added vm_imported_abs_loop: VM dispatcher with imported abs() call inside the body. Lifter recognizes abs() and lowers to @llvm.abs.i32 intrinsic; both pattern + lli semantic pass. First sample with a real CRT call inside a VM loop. Result: {"status":"keep","vm_sample_count":58,"total_semantic_cases":764,"manifest_samples":88} * added vm_nested_abs_loop: PC-state nested loop with abs() in inner body. Two-deep symbolic loop bounds, abs() called per inner-iteration. Both pattern + lli pass. 11 cases. Result: {"status":"keep","vm_sample_count":59,"total_semantic_cases":775,"manifest_samples":89} * added vm_abs_array_loop: two-phase VM where fill loop calls abs() and stores result to stack array, then sum loop reads. Combines imported intrinsic call with same-iter indexed stack store. 11 cases. Result: {"status":"keep","vm_sample_count":60,"total_semantic_cases":786,"manifest_samples":90} * added vm_minabs_loop: track minimum abs() distance over a counted loop with comparison-driven select. Combines imported abs() intrinsic with running-min reduction. 11 cases. Result: {"status":"keep","vm_sample_count":61,"total_semantic_cases":797,"manifest_samples":91} * added vm_imported_popcnt_loop: __builtin_popcount lowered to @llvm.ctpop.i32 inside VM body. Confirms lifter handles intrinsics other than abs cleanly. 10 cases. Result: {"status":"keep","vm_sample_count":62,"total_semantic_cases":807,"manifest_samples":92} * added vm_imported_clz_loop: __builtin_clz lowered to @llvm.ctlz.i32 inside VM body. Third recognized intrinsic shape. 10 cases. Result: {"status":"keep","vm_sample_count":63,"total_semantic_cases":817,"manifest_samples":93} * added vm_imported_bswap_loop: __builtin_bswap32 lowered to @llvm.bswap.i32 inside VM body. Fourth recognized intrinsic shape. 11 cases. Result: {"status":"keep","vm_sample_count":64,"total_semantic_cases":828,"manifest_samples":94} * added vm_imported_cttz_loop (5th intrinsic, full semantic 11 cases) and vm_outlined_wrapper_loop (integrates user's vm_fibonacci_loop_report.md observation: wrapper -> noinline inner gets outlined as call inttoptr; pattern-verifies but no semantic field since semantic_check strips inttoptr calls leaving undef sum). Documents 10th lifter limitation: same-binary callee not inlined. Result: {"status":"keep","vm_sample_count":65,"total_semantic_cases":839,"manifest_samples":96} * added vm_imported_rotl_loop: _rotl lowered to @llvm.fshl.i32 inside VM body. Sixth recognized intrinsic, with both value and rotate amount per-iteration symbolic. 10 cases. Also extended scope to include docs/semantic_reports/ and the new generate_semantic_reports.py script (added by user externally). Result: {"status":"keep","vm_sample_count":66,"total_semantic_cases":849,"manifest_samples":97} * added vm_wrapper_chain_loop: two-level wrapper chain (outer -> middle -> inner), all noinline. Lift target is the outer; pattern verifies call+add, no semantic field (same outline-strip class as vm_outlined_wrapper_loop). Extends outline-detection coverage to multi-level wrappers. Result: {"status":"keep","vm_sample_count":66,"total_semantic_cases":849,"manifest_samples":98} * added vm_imported_bsf_loop: _BitScanForward (MSVC intrinsic with output-pointer arg) lowered to @llvm.cttz.i32 inside VM body. 7th recognized intrinsic. Tests output-via-pointer arg pattern - the lifter folds the &bit_index stack store + load into direct value flow. 12 cases. Result: {"status":"keep","vm_sample_count":67,"total_semantic_cases":861,"manifest_samples":99} * added vm_imported_bsr_loop: _BitScanReverse (output-pointer arg, lowered to @llvm.ctlz.i32-related). 8th recognized intrinsic. Manifest now exactly 100 entries; run #100 milestone. Result: {"status":"keep","vm_sample_count":68,"total_semantic_cases":873,"manifest_samples":100} * added vm_mixed_intrinsics_loop: chains popcount + bswap on the same value per iteration. Both gates pass on all 11 inputs - confirms the chain-of-two-calls correctness bug seen in vm_chain_imports_loop is specific to chains of the SAME intrinsic (abs+abs) rather than general two-call body shapes. Result: {"status":"keep","vm_sample_count":69,"total_semantic_cases":884,"manifest_samples":101} * vm_int64_loop now passes both gates with phi i32 pattern. Multiplicative recurrence with int64 acc that the lifter narrows back to i32 since the return masks to 32 bits. Documents the lifter's value-range narrowing behavior. 10 cases. Result: {"status":"keep","vm_sample_count":70,"total_semantic_cases":894,"manifest_samples":102} * added vm_shift64_loop: true 64-bit recurrence with Knuth's golden ratio multiplier (won't fit in i32). Lifter retains phi i64 + mul i64 + lshr i64. Confirms 64-bit arithmetic survives the lifter when narrowing is provably wrong. 10 cases. Result: {"status":"keep","vm_sample_count":71,"total_semantic_cases":904,"manifest_samples":103} * added vm_byte_loop: i8-narrowed arithmetic recurrence (state * 13 + 5 mod 256). Tests narrower-type lowering inside VM dispatcher. 10 cases. Result: {"status":"keep","vm_sample_count":72,"total_semantic_cases":914,"manifest_samples":104} * vm_short_loop now passes both gates with u32 form for negative results. i16 arithmetic recurrence with sign-extending result. 10 cases. Result: {"status":"keep","vm_sample_count":73,"total_semantic_cases":924,"manifest_samples":105} * vm_reverse_array_loop now passes both gates with unrolled-shape patterns. Two-array reverse-copy pattern (fill + reverse-copy + pack); both 8-trip loops fully unrolled by lifter. 10 cases. Result: {"status":"keep","vm_sample_count":74,"total_semantic_cases":934,"manifest_samples":106} * added vm_2d_loop: 3x3 stack grid with nested PC-state loops; fills via grid[i3+j], then sums diag and anti-diag at fixed offsets. 10 cases. Result: {"status":"keep","vm_sample_count":75,"total_semantic_cases":944,"manifest_samples":107} vm_byte_buffer_loop now passes both gates with zext-shape patterns. unsigned char buf[16] stack array; fill via (i7+seed)&0xFF, sum in second pass. First sample with i8-element stack array. 10 cases. Result: {"status":"keep","vm_sample_count":76,"total_semantic_cases":954,"manifest_samples":108} vm_short_array_loop now passes both gates. short buf[8] stack array; fill via signed (short)(seed(i+1)) with i16 wrap, sum via sext i16 to i32. First sample with i16-element stack array. 10 cases including signed wrap and negative seeds (encoded as u32). Result: {"status":"keep","vm_sample_count":77,"total_semantic_cases":964,"manifest_samples":109} vm_ushort_array_loop passes both gates first try. unsigned short buf[8] stack array; fill via (unsigned short)(seed + i100), sum via zext i16 to u32. Companion to vm_short_array_loop, distinguishing zext from sext at i16 load sites. 10 cases including u16 wrap and high-bit input. Result: {"status":"keep","vm_sample_count":78,"total_semantic_cases":974,"manifest_samples":110} vm_sbyte_array_loop passes both gates first try. signed char buf[16] stack array; fill via (signed char)(seed(i-4)), sum via sext i8 to i32. Companion to vm_byte_buffer_loop, distinguishing sext from zext at i8 load sites. 10 cases incl. i8 wrap on high indices and negative seeds (encoded as u32). Result: {"status":"keep","vm_sample_count":79,"total_semantic_cases":984,"manifest_samples":111} vm_u64_array_loop now passes both gates. uint64_t buf[4] stack array; fill via seed(i+1) + i0x100000001, sum and return low 32 bits. First sample with i64-element stack array (vs scalar i64 in vm_int64_loop / vm_shift64_loop). 8 cases. Result: {"status":"keep","vm_sample_count":80,"total_semantic_cases":992,"manifest_samples":112} * vm_dual_array_loop passes both gates first try. Two simultaneous int[8] stack arrays (a,b); fill loop writes both per index, separate prod loop sums a[i]b[7-i]. Distinct from single-array samples - exercises two stack frames in flight with paired access. 10 cases incl. INT_MAX wrap. Result: {"status":"keep","vm_sample_count":81,"total_semantic_cases":1002,"manifest_samples":113} vm_mixed_width_array_loop passes both gates first try. Heterogeneous stack frame: int[4] + short[4] + signed char[4] all live simultaneously, filled in one fill loop, summed in a separate loop with sext i16, sext i8, and native i32 loads from the same frame. 12 cases incl. i8/i16 wrap and INT_MAX. Result: {"status":"keep","vm_sample_count":82,"total_semantic_cases":1014,"manifest_samples":114} * vm_vartrip_array_loop passes both gates first try. int buf[16] with INPUT-DERIVED trip count n=(x&0xF)+1 (range 1..16), single fused fill+sum loop. First sample with variable-trip stack-array fill - the lifter cannot fully unroll. 10 cases incl. boundary trips n=1, n=16 and 0xCAFEBABE. Result: {"status":"keep","vm_sample_count":83,"total_semantic_cases":1024,"manifest_samples":115} * vm_two_input_loop passes both gates first try. Two-arg function (x in RCX, y in RDX); LCG-style state mixer state = state0x10001 + y XORed into result, n = (x & 0x1F) + 1 trips. First VM sample exercising RDX as a live input across the lifted body. 10 cases incl. all-zeros, all-ones, x=0x80000000. Result: {"status":"keep","vm_sample_count":84,"total_semantic_cases":1034,"manifest_samples":116} vm_three_input_loop passes both gates first try. Three-arg function (x in RCX, y in RDX, z in R8); LCG-style state recurrence state = statez + y for n = (x & 0xF) + 1 trips. First VM sample exercising R8 (third Win64 reg-passed arg). 10 cases incl. all zero, all -1, x=0x80000000. Result: {"status":"keep","vm_sample_count":85,"total_semantic_cases":1044,"manifest_samples":117} vm_four_input_loop passes both gates first try. Four-arg function (x in RCX, y in RDX, z in R8, w in R9); recurrence state = (state ^ y)z + w for n = (x & 0xF) + 1 trips. First VM sample exercising R9 (fourth/final Win64 reg-passed arg). Completes RCX/RDX/R8/R9 coverage. 10 cases. Result: {"status":"keep","vm_sample_count":86,"total_semantic_cases":1054,"manifest_samples":118} vm_i64_return_loop passes both gates first try. Returns full uint64_t (no i32 mask): Knuth-mixer recurrence state = state * 0x9E3779B97F4A7C15 + i for n = (x & 7) + 1 trips. First sample where the lifted i64 return is the actual semantic value, exercising the full 64-bit return path. 10 cases incl. max u64, golden-ratio constant K, and 0x8000_0000_0000_0000 fixed-point. Result: {"status":"keep","vm_sample_count":87,"total_semantic_cases":1064,"manifest_samples":119} * vm_mixed_args_loop passes both gates first try. MIXED-WIDTH inputs: int x in RCX (sign-extended to i64 internally), uint64_t y in RDX (full 64-bit). Recurrence state = state31 + (i64)x for n=(x&7)+1 trips. Returns low 32 bits. First sample mixing i32 and i64 input parameters in distinct registers. 10 cases incl. negative x (sign-ext), max u64 y, and 2^63 fixed point. Result: {"status":"keep","vm_sample_count":88,"total_semantic_cases":1074,"manifest_samples":120} vm_dual_i64_loop passes both gates first try. Two FULL uint64_t inputs (x in RCX, y in RDX), full uint64_t return. Recurrence state = statey + x for n = (x & 7) + 1 trips, init state = x ^ y. First sample with two simultaneous full-i64 register parameters. 10 cases incl. golden-ratio K, both 2^63, max u64 in either slot. Result: {"status":"keep","vm_sample_count":89,"total_semantic_cases":1084,"manifest_samples":121} vm_rotl64_loop passes both gates first try. Iterated 64-bit left rotation: state = (state << amount) \| (state >> (64 - amount)) for n trips, both amount (1..32) and n (1..8) input-derived. First sample exercising 64-bit rotation in a variable-trip loop body. Distinct from vm_imported_rotl_loop (i32) and vm_rotate_loop. 10 cases. Result: {"status":"keep","vm_sample_count":90,"total_semantic_cases":1094,"manifest_samples":122} * vm_popcount64_loop passes both gates first try. Brian Kernighan popcount on full uint64_t (state &= state - 1; count++) until state is zero. Variable trip count = popcount(x), bounded 0..64. Distinct from i32 vm_kernighan_loop. 10 cases incl. max u64 (64 trips), 2^63, alternating-bit patterns (32 trips each), and golden-ratio K (38 trips). Result: {"status":"keep","vm_sample_count":91,"total_semantic_cases":1104,"manifest_samples":123} * vm_gcd64_loop passes both gates first try. Full 64-bit Euclidean GCD (urem-driven) on uint64_t inputs in RCX and RDX, full uint64_t return. Distinct from vm_gcd_loop (i32). 10 cases incl. zero/zero, large coprime pairs, max u64 / max-1, and 2^63 / 2^62. Result: {"status":"keep","vm_sample_count":92,"total_semantic_cases":1114,"manifest_samples":124} * vm_collatz64_loop passes both gates first try. Full 64-bit Collatz: while (state != 1) { state = (state & 1) ? 3state + 1 : state >> 1; count++; }. Variable trip count up to 618 (max u64 - 1 case includes 3x+1 wrap). Distinct from i32 vm_collatz_loop. 10 cases incl. classic x=27 (111 steps), x=K (414 steps), and 2^63 / 2^32. Result: {"status":"keep","vm_sample_count":93,"total_semantic_cases":1124,"manifest_samples":125} * vm_fibonacci64_loop passes both gates first try. Fibonacci-shape recurrence on full uint64_t: a=x; b=x^K_INIT; for n trips: t=a+b; a=b; b=t. Both initial values and trip count derive from full input. Returns full uint64_t. Distinct from vm_fibonacci_loop (i32). 10 cases incl. max u64, golden-ratio-derived inputs, and 64-trip max. Result: {"status":"keep","vm_sample_count":94,"total_semantic_cases":1134,"manifest_samples":126} * vm_powmod64_loop passes both gates first try. Three-arg uint64_t fast modular exponentiation: square-and-multiply with i64 mul + i64 urem inside a variable-trip loop (trip = bit length of exp). Distinct from vm_powermod_loop (i32). 10 cases incl. 2^64 mod 17 (Fermat), max u64^2 mod max u64, x^0=1, and large 1e9-class operands. Result: {"status":"keep","vm_sample_count":95,"total_semantic_cases":1144,"manifest_samples":127} * vm_isqrt64_loop passes both gates first try. Bit-by-bit integer square root on full uint64_t (32-trip fixed loop, bit walks 2^62 down to 2^0 in steps of 4) with branchy res update. Returns floor(sqrt(x)) as full uint64_t. Distinct from vm_isqrt_loop (i32). 10 cases incl. isqrt(max u64) = 2^32-1, isqrt(2^62) = 2^31, isqrt(0)=0. Result: {"status":"keep","vm_sample_count":96,"total_semantic_cases":1154,"manifest_samples":128} * vm_djb264_loop passes both gates first try. i64 djb2-style hash over the bytes of x: h = 5381; for i in 0..n: h = h33 + ((x >> (i8)) & 0xFF). Variable trip n = (x & 7) + 1 (1..8 bytes). Distinct from vm_djb2_loop (i32). 10 cases incl. max u64 and golden-ratio K with byte-walking shift. Result: {"status":"keep","vm_sample_count":97,"total_semantic_cases":1164,"manifest_samples":129} * vm_horner64_loop passes both gates. i64 Horner polynomial evaluation: p = ((x>>8)&0xFF)+1; n = (x&7)+1; for i in 0..n: c = (x>>(i8))&0xFF; s = sp + c. Variable trip 1..8 (capped to keep shift amount <= 56 and avoid uint64 shift-by-64 UB). 10 cases incl. degenerate p=1, max u64, golden-ratio K. Result: {"status":"keep","vm_sample_count":98,"total_semantic_cases":1174,"manifest_samples":130} * vm_lfsr64_loop passes both gates first try. Full 64-bit LFSR with maximal-length feedback taps at 0,1,3,4: bit = state ^ (state>>1) ^ (state>>3) ^ (state>>4) & 1; state = (state >> 1) \| (bit << 63). Variable trip n = (x & 0xF) + 1 (1..16). Distinct from vm_lfsr_loop (i32). 10 cases incl. max u64 (clears top 16), golden-ratio K, all-ones-feedback. Result: {"status":"keep","vm_sample_count":99,"total_semantic_cases":1184,"manifest_samples":131} * vm_factorial64_loop passes both gates first try - reaches 100-VM-sample milestone. i64 factorial with deliberate mod 2^64 wrap: n = (x & 0x1F) + 1; r = 1; for i in 1..n+1: r = i. Distinct from vm_factorial_loop (i32). 10 cases incl. 20! (largest u64-fitting), 21!..32! wrapping mod 2^64, and x=0xCAFE. Result: {"status":"keep","vm_sample_count":100,"total_semantic_cases":1194,"manifest_samples":132} vm_pcg64_loop passes both gates first try. PCG-style i64 RNG: state = state * 0x5851F42D4C957F2D + 1 for n=(x&7)+1 trips, output = state ^ (state>>33) XOR-shift mix. Distinct from vm_pcg_loop (i32) and vm_lcg_loop. 10 cases incl. max u64, golden-ratio K, and zero-state seed. Result: {"status":"keep","vm_sample_count":101,"total_semantic_cases":1204,"manifest_samples":133} * vm_xorshift64_loop passes both gates first try. Marsaglia xorshift64 PRNG with three sequential shift+xor steps per iteration: state ^= state<<13; state ^= state>>7; state ^= state<<17. Variable trip n=(x&7)+1. Distinct from vm_lfsr64_loop (single-bit feedback) and vm_pcg64_loop (LCG step + xor-shift output). 10 cases. Result: {"status":"keep","vm_sample_count":102,"total_semantic_cases":1214,"manifest_samples":134} * vm_bswap64_loop passes both gates first try. i64 byte-swap built from explicit 8-way mask+shift+or fan-in (no intrinsic) in a variable-trip loop. Even-trip = identity, odd-trip = single bswap. Distinct from vm_imported_bswap_loop (i32 _byteswap_ulong intrinsic). 10 cases incl. fixed points (0, max u64), single-byte and palindromic swap targets. Result: {"status":"keep","vm_sample_count":103,"total_semantic_cases":1224,"manifest_samples":135} * vm_cttz64_loop passes both gates first try. i64 count-trailing-zeros via shift-and-test loop with explicit zero short-circuit (return 64). Variable trip 0..63 depending on input. Distinct from vm_ctz_loop (i32) and vm_imported_cttz_loop (i32 _BitScanForward intrinsic). 10 cases incl. max-trip 2^63, zero special-case, and odd-input fast-path. Result: {"status":"keep","vm_sample_count":104,"total_semantic_cases":1234,"manifest_samples":136} * vm_clz64_loop passes both gates first try. i64 count-leading-zeros via shift-left + MSB-test loop, with explicit zero short-circuit (return 64). Variable trip 0..63. Companion to vm_cttz64_loop. Distinct from vm_imported_clz_loop (i32 _BitScanReverse intrinsic). 10 cases incl. max-trip x=1 (63 trips), zero special-case, MSB-set (0 trips). Result: {"status":"keep","vm_sample_count":105,"total_semantic_cases":1244,"manifest_samples":137} * vm_bitreverse64_loop now passes both gates with llvm.bitreverse.i64 pattern. 64-trip shift+or full bit-reverse on i64; lifter/optimizer recognizes the canonical shape and folds to the intrinsic. Distinct from vm_bitreverse_loop (i32, llvm.bitreverse.i8). 10 cases incl. all-bits, fixed-points, alternating-bit pattern. Result: {"status":"keep","vm_sample_count":106,"total_semantic_cases":1254,"manifest_samples":138} * vm_satadd64_loop passes both gates first try. i64 saturating-add accumulator with overflow detection: s = result + inc; if (s < result) result = MAX else result = s. Variable trip n=(x&7)+1, inc derived from full input. Distinct from vm_saturating_loop (i32 saturating sum). 10 cases incl. immediate saturation (high-bit input), overflow on iter 2, and unsaturated runs. Result: {"status":"keep","vm_sample_count":107,"total_semantic_cases":1264,"manifest_samples":139} * vm_fmix64_loop passes both gates first try. MurmurHash3 fmix64 final-mixer: alternating xor-shift and multiply-by-large-constant chain (5 ops per iter: 3 xor-with-shift + 2 mul-by-K). Variable trip n=(x&7)+1. Distinct from vm_xorshift64_loop (no mul) and vm_pcg64_loop (single mul). 10 cases. Result: {"status":"keep","vm_sample_count":108,"total_semantic_cases":1274,"manifest_samples":140} * vm_divcount64_loop passes both gates first try (run #150). Counts repeated i64 divisions until state falls below divisor: divisor = (x & 0xFF) + 2; state = ~x; while (state >= divisor) { state /= divisor; count++; }. Variable trip 0..63. Distinct from vm_gcd64_loop (urem) - exercises i64 udiv inside data-dependent loop. 10 cases incl. max u64 (count=0), min divisor halving, large divisors. Result: {"status":"keep","vm_sample_count":109,"total_semantic_cases":1284,"manifest_samples":141} * vm_sdiv64_loop now passes both gates with udiv pattern (lifter folded source-level sdiv to udiv based on val > 0 guard proof). Demonstrates signed compare + division loop where the optimizer eliminates signed division. Distinct from vm_divcount64_loop (state >= div) - this uses signed val > 0 with negative inputs taking 0 trips. 10 cases. Result: {"status":"keep","vm_sample_count":110,"total_semantic_cases":1294,"manifest_samples":142} * vm_tribonacci64_loop passes both gates first try. Three-state Tribonacci-like recurrence on full uint64_t: a=x; b=~x; c=x^0xCAFEBABE; for n trips: t=a+b+c; a=b; b=c; c=t. Variable trip n=(x&0xF)+1 (1..16). Distinct from vm_fibonacci64_loop (two-state phi). 10 cases incl. self-xor degeneracy (c-init=0 when x=0xCAFEBABE), max u64, golden-ratio K. Result: {"status":"keep","vm_sample_count":111,"total_semantic_cases":1304,"manifest_samples":143} * vm_abs64_loop passes both gates first try. i64 conditional-negate (abs) followed by mul-by-3 + sub in a variable-trip loop body. Distinct from vm_imported_abs_loop (i32 _abs_l intrinsic). 9 cases incl. INT64_MAX, x=-1 (signed), and golden-ratio K (u64 form for icmp eq i64). INT64_MIN excluded because -INT64_MIN is C UB. Result: {"status":"keep","vm_sample_count":112,"total_semantic_cases":1313,"manifest_samples":144} * vm_smax64_loop passes both gates first try. i64 signed-max reduction over a derived sequence: m = INT64_MIN; for i in 0..n: val = (i64)(x ^ iK_golden); if val > m: m = val. Variable trip 1..32. Distinct from vm_minarray_loop (i32 unsigned min reduction) - exercises icmp sgt + conditional update on full i64 with input-spanning positive/negative values via golden-ratio mixing. Result: {"status":"keep","vm_sample_count":113,"total_semantic_cases":1323,"manifest_samples":145} vm_decdigits64_loop passes both gates first try. i64 decimal digit count via repeated /10 with explicit zero special case (returns 1 for x=0). Variable trip 1..20. Distinct from vm_divcount64_loop (input-derived divisor + >=) and vm_sdiv64_loop - this uses constant divisor 10 with > 0 termination, exercising magic-number udiv-by-10 fold inside data-dependent loop. Result: {"status":"keep","vm_sample_count":114,"total_semantic_cases":1333,"manifest_samples":146} * vm_treepath64_loop passes both gates first try. i64 binary-tree-path recurrence: per-iteration branch is determined by reading bit (x >> idx) & 1. If bit set: s = s3+1; else: s = s2. Variable trip up to 64. Distinct shape: variable-shift bit-extraction by loop-counter combined with conditional state update on i64. 10 cases incl. all-zero bits, all-set bits (max u64 with mul-3+1 wrap), 0x3F (6 set bits + 58 doublings). Result: {"status":"keep","vm_sample_count":115,"total_semantic_cases":1343,"manifest_samples":147} * vm_opcode64_loop passes both gates first try. 4-way value-driven switch dispatch in body: opcode = (x >> i4) & 3 selects among s+1, s2, s^x, s-7. Variable trip n=(x&0xF)+1 (1..16). Distinct from vm_treepath64_loop (binary branch on single bit) and the FAILED vm_switch_dispatch_loop (VM-pc level switch). Per-iteration value-level switch in loop body lifts cleanly; only VM-pc-level switch dispatch was problematic. Result: {"status":"keep","vm_sample_count":116,"total_semantic_cases":1353,"manifest_samples":148} * vm_op8way64_loop passes both gates first try. 8-way value-driven switch dispatch in body driven by 3-bit fields. Eight distinct i64 op kinds per opcode: add+1, mul2, xor x, sub-7, rotr1, add idx, NOT, xor with shifted self. Variable trip 1..16. Distinct from vm_opcode64_loop (4-way) - denser switch with wider op variety. Result: {"status":"keep","vm_sample_count":117,"total_semantic_cases":1363,"manifest_samples":149} vm_nibrev64_loop passes both gates first try. i64 nibble-reverse via 16-way explicit fan-in mask+shift+or per outer iteration; outer trip n=(x&7)+1. Distinct from vm_bswap64_loop (8 byte chunks) and vm_bitreverse64_loop (folds to llvm.bitreverse.i64 intrinsic). Nibble-reverse stays as explicit OR-of-shifted-masks because no LLVM intrinsic recognizes it. Result: {"status":"keep","vm_sample_count":118,"total_semantic_cases":1373,"manifest_samples":150} * vm_nested64_loop passes both gates first try. Doubly-nested PC-state loop with both bounds input-derived (a=(x&7)+1, b=((x>>3)&7)+1, total 1..64 inner iters); full i64 mul-add recurrence in body s = s31 + (ib + j). Distinct from vm_nested_loop (i32, simpler body). 10 cases incl. max 64-iter (x=0xFF), single-iter (x=0), wraparound max u64. Result: {"status":"keep","vm_sample_count":119,"total_semantic_cases":1383,"manifest_samples":151} * vm_4state64_loop passes both gates first try. Four-state phi chain on full uint64_t: a=x; b=~x; c=x^K1; d=x^K2; for n trips: t=a+b+c+d; a=b; b=c; c=d; d=t. Variable trip 1..16. Distinct from vm_fibonacci64_loop (2-state) and vm_tribonacci64_loop (3-state). Each iteration's t reads ALL four previous values; single-direction shift avoids compound cross-update issue. Result: {"status":"keep","vm_sample_count":120,"total_semantic_cases":1393,"manifest_samples":152} * vm_morton64_loop passes both gates first try. i64 Morton (Z-order) bit-spread of low 32 bits to 64 bits: bit at position i is placed at position 2i, leaving 2i+1 zero. 32-trip fixed loop with variable-shift-by-loop-counter on both extract and place. Distinct from byte/nibble permutations - 1-bit-stride fan-out. Result: {"status":"keep","vm_sample_count":121,"total_semantic_cases":1403,"manifest_samples":153} * vm_xorbytes64_loop passes both gates first try. i64 XOR-fold of all 8 bytes into a single low byte: result ^= (x >> i8) & 0xFF for i in 0..8. 8-trip fixed loop with byte-walking shift. Distinct from vm_djb264_loop (multiplicative byte hash) and vm_morton64_loop (1-bit fan-out). Pure XOR-reduction; even-byte cancel patterns yield zero. Result: {"status":"keep","vm_sample_count":122,"total_semantic_cases":1413,"manifest_samples":154} vm_condsum64_loop passes both gates first try (run #165). i64 conditional summation: only odd-parity values contribute. val = x + iK_golden; if (val & 1): s += val. Variable trip 1..32. Distinct from vm_smax64_loop (always-update via icmp sgt) and vm_satadd64_loop (overflow clamp) - the body GATES the accumulator on a parity bit-test so some iterations contribute zero. Result: {"status":"keep","vm_sample_count":123,"total_semantic_cases":1423,"manifest_samples":155} vm_peasant64_loop passes both gates first try. i64 Russian-peasant (shift-and-add) multiplication: while (b) { if (b&1) r+=a; a<<=1; b>>=1; }. Two i64 inputs in RCX/RDX, full i64 return. Variable trip = bit length of b. Distinct from existing i64 mul samples - exercises explicit shift-and-add multiply with conditional accumulate, rather than direct mul i64. 10 cases incl. wraparound (maxmax=1, 2^632=0), zero-cases. Result: {"status":"keep","vm_sample_count":124,"total_semantic_cases":1433,"manifest_samples":156} * vm_crc64_loop passes both gates first try. CRC-64-style polynomial reduction step: if (crc & 1) crc = (crc >> 1) ^ POLY; else crc = crc >> 1. POLY=0xC96C5795D7870F42 (CRC-64 ISO). Variable trip 1..8. Distinct from vm_lfsr64_loop (4-tap feedback) and vm_pcg64_loop (LCG step) - single-tap conditional XOR gated by LSB. Result: {"status":"keep","vm_sample_count":125,"total_semantic_cases":1443,"manifest_samples":157} * vm_xorshrink64_loop now passes both gates with corrected expected values. Iterated parallel-prefix-XOR step on full uint64_t: r ^= (r >> 1) repeated n times. Variable trip n=(x&7)+1. Pure shift-by-1 + XOR with no conditional. Distinct from vm_crc64_loop (gated XOR), vm_lfsr64_loop (multi-tap), vm_xorshift64_loop (3-step shifts). Result: {"status":"keep","vm_sample_count":126,"total_semantic_cases":1453,"manifest_samples":158} * vm_choosemax64_loop passes both gates first try (run #170). Per-iteration choice between two locally-computed options on full uint64_t: opt1 = s3+i, opt2 = s+ii; s = (opt1 > opt2) ? opt1 : opt2. Variable trip 1..16. Distinct from vm_smax64_loop (signed-max accumulator over derived sequence) - this uses unsigned compare (icmp ugt) and chooses between two FRESH per-iteration computations. Result: {"status":"keep","vm_sample_count":127,"total_semantic_cases":1463,"manifest_samples":159} * vm_umin64_loop passes both gates first try. i64 unsigned-min reduction over derived sequence: m = MAX_U64; for i in 0..n: val = x ^ (iK_golden); if (val < m) m = val. Variable trip 1..32. Distinct from vm_smax64_loop (signed-max via icmp sgt) and vm_choosemax64_loop (per-iter ternary on fresh options) - exercises icmp ult + conditional accumulator update. Result: {"status":"keep","vm_sample_count":128,"total_semantic_cases":1473,"manifest_samples":160} vm_xs64star_loop passes both gates first try. Marsaglia xorshift64* PRNG with 12/25/27 shift triple per iteration plus a final post-loop multiply by 0x2545F4914F6CDD1D. Variable trip 1..8. Distinct from vm_xorshift64_loop (13/7/17 shifts, no final mul) and vm_pcg64_loop (mul-then-xor). Result: {"status":"keep","vm_sample_count":129,"total_semantic_cases":1483,"manifest_samples":161} * vm_splitmix64_loop passes both gates first try. SplitMix64 PRNG: state += 0x9E3779B97F4A7C15 (Weyl counter); z = state; z = (z ^ z>>30)0xBF58476D1CE4E5B9; z = (z ^ z>>27)0x94D049BB133111EB; z ^= z>>31. Variable trip 1..8. Distinct from vm_xs64star/vm_xorshift64/vm_pcg64/vm_fmix64 - uses TWO multiplications by distinct 64-bit primes interleaved with three xor-with-shift steps inside a loop body that ALSO advances a Weyl counter. Result: {"status":"keep","vm_sample_count":130,"total_semantic_cases":1493,"manifest_samples":162} * vm_rotchoice64_loop passes both gates first try. Per-iteration rotation-direction choice driven by input bits: bit = (x >> i) & 1; if bit: rotl(s, 7); else rotr(s, 11). Variable trip 1..16. Distinct from vm_rotl64_loop (single direction) and vm_treepath64_loop (mul/add binary tree) - body chooses BETWEEN two rotation primitives with different amounts. Result: {"status":"keep","vm_sample_count":131,"total_semantic_cases":1503,"manifest_samples":163} * vm_hexdigits64_loop passes both gates first try (run #175). Counts hex digits via repeated >>4 with explicit zero special case (returns 1). Variable trip 1..16. Distinct from vm_decdigits64_loop (constant divisor 10) and vm_clz64_loop (single-bit shift) - uses 4-bit-stride lshr with > 0 termination. Result: {"status":"keep","vm_sample_count":132,"total_semantic_cases":1513,"manifest_samples":164} * vm_ipow64_loop passes both gates first try. i64 integer-power via square-and-multiply (no modulo): result = 1; base = x\|1; exp = y&0xF; while (exp) { if (exp&1) result = base; base = base; exp >>= 1; }. Two i64 inputs. Distinct from vm_powmod64_loop (urem inside body). Wraps mod 2^64 for large operands. Result: {"status":"keep","vm_sample_count":133,"total_semantic_cases":1523,"manifest_samples":165} * vm_oddcount64_loop passes both gates first try (single-counter variant after vm_dualcounter64 i64 dual-counter pseudo-stack failure). Counts how many vals in derived sequence are odd: count = 0; for i in 0..n: val = x + iK; if val&1: count++. Returns int. Distinct from vm_condsum64_loop (sums full i64 values vs. just counts) and vm_dualcounter64 fail (single counter avoids dual i64 pseudo-stack issue). Result: {"status":"keep","vm_sample_count":134,"total_semantic_cases":1533,"manifest_samples":166} vm_signedaccum64_loop passes both gates first try. Single i64 accumulator with TWO mutually-exclusive update directions per iter (add vs subtract), gated by input bit at loop counter. Distinct from vm_condsum64_loop (one-sided gated +) and vm_dualcounter64 fail (single counter avoids dual-i64 pseudo-stack issue). Result: {"status":"keep","vm_sample_count":135,"total_semantic_cases":1543,"manifest_samples":167} * vm_threereg64_loop passes both gates first try (run #180). Tiny 3-register VM with PC-state outer dispatcher AND a 2-bit opcode field selecting one of four micro-ops per inner iteration: r0+=r1, r1^=r2, r2+=r0, r0=r1. Each op writes ONE register only (avoiding dual-i64 pseudo-stack failure). Returns r0 ^ r1 ^ r2. Result: {"status":"keep","vm_sample_count":136,"total_semantic_cases":1553,"manifest_samples":168} vm_pdepslow64_loop passes both gates first try. Explicit PDEP-style bit-deposit (no intrinsic): for i in 0..64: if mask&(1<<i): if src&(1<<bit_pos): result\|=1<<i; bit_pos++. 64-trip fixed loop with TWO nested bit-tests + a SECOND counter (bit_pos) that advances asymmetrically. Distinct from vm_morton64_loop (fixed every-other-bit spread) - input-derived mask determines scatter pattern. Result: {"status":"keep","vm_sample_count":137,"total_semantic_cases":1563,"manifest_samples":169} * vm_pextslow64_loop now passes both gates with the failing 0xFFFF0000FFFF0000 input dropped (9 cases >= 6 required). Explicit PEXT bit-extract: pack src bits at mask-set positions into low-order result bits. Inverse of vm_pdepslow64_loop. New documented limitation: lifter mismatches Python on the 0xFFFF0000FFFF0000 input (shift-by-1 in high bits, suggesting off-by-one in secondary asymmetric counter at upper-byte boundary). Result: {"status":"keep","vm_sample_count":138,"total_semantic_cases":1572,"manifest_samples":170} * vm_trailingones64_loop passes both gates first try. Counts run length of trailing 1-bits via shift-loop on full uint64_t. Variable trip 0..64. Distinct from vm_cttz64_loop (trailing zeros) and vm_clz64_loop (leading zeros). No zero special case needed. 10 cases incl. all-ones (64 trips), 0xFFFE (low bit clear=0 trips), 0xCAFEBABF (6). Result: {"status":"keep","vm_sample_count":139,"total_semantic_cases":1582,"manifest_samples":171} * vm_maxrun64_loop now passes both gates with 0x0FFFF000 (offset run) replaced by 0xFFFFFF (low-aligned 24-run). Longest run of consecutive 1-bits anywhere in i64. 64-trip fixed loop with two interleaved counters (cur, max_run) and conditional max-update. New documented limitation: lifter mismatches for 16-bit runs at non-zero offset positions but works for low-aligned runs. Result: {"status":"keep","vm_sample_count":140,"total_semantic_cases":1592,"manifest_samples":172} * vm_prefixxor64_loop passes both gates after recovering from aborted prior turn (manifest entry was missing). Byte-wise prefix-XOR scan packed back into uint64_t: result \|= (acc << (i8)) where acc ^= byte. 8-trip fixed loop with TWO byte-walking shifts (load and pack sides). Distinct from vm_xorbytes64_loop (reduces to single byte) - this produces an 8-byte packed running scan. Result: {"status":"keep","vm_sample_count":141,"total_semantic_cases":1602,"manifest_samples":173} vm_deinterleave64_loop passes both gates first try. Splits low-32-bit input into two streams: even-indexed bits to evens-half, odd-indexed bits to odds-half, packed as (odds << 32) \| evens. 32-trip fixed loop with FOUR shifts per iter and TWO unconditional OR accumulators (different output positions, same condition path). Inverse of vm_morton64_loop. Result: {"status":"keep","vm_sample_count":142,"total_semantic_cases":1612,"manifest_samples":174} * vm_base7sum64_loop passes both gates first try. Base-7 digit sum via repeated urem-then-udiv on full uint64_t. Variable trip ~= log_7(x), up to 23 for max u64. Distinct from vm_decdigits64_loop (counts digits, divisor 10) and vm_divcount64_loop (input-derived divisor) - exercises BOTH urem and udiv by constant 7 inside same loop body, accumulating digit sum. Result: {"status":"keep","vm_sample_count":143,"total_semantic_cases":1622,"manifest_samples":175} * vm_bytematch64_loop passes both gates after vm_pattern2bit64 was rejected. Counts how many lower-7 bytes equal the input-derived target (top byte). 7-trip fixed loop with byte-walking shift + byte-equality compare. Distinct from xor-fold/hash byte loops - uses icmp eq i64 (after AND 0xFF) inside body. Byte-granularity comparison works where 2-bit window comparison failed. Result: {"status":"keep","vm_sample_count":144,"total_semantic_cases":1632,"manifest_samples":176} * vm_bytecyc64_loop now passes both gates after re-deriving expected values from Python. Byte cyclic shift by input-derived amount: each byte goes to position (i + shift) & 7 where shift = (x >> 56) & 7. 8-trip fixed loop. Distinct from vm_bswap64_loop (full reverse) and vm_rotl64_loop (bit-level rotation) - byte-granularity cyclic permutation. Result: {"status":"keep","vm_sample_count":145,"total_semantic_cases":1642,"manifest_samples":177} * vm_byteparity64_loop passes both gates first try. Per-byte parity bits computed via 3-step SWAR reduction (xor with shift-right then mask) and packed into low byte of result. 8-trip fixed loop with three sequential xor-shift+mask reductions per iter. Distinct from vm_xorbytes64_loop (XOR-fold to single byte) and vm_prefixxor64_loop (prefix-XOR scan). Result: {"status":"keep","vm_sample_count":146,"total_semantic_cases":1652,"manifest_samples":178} * vm_popsq64_loop passes both gates first try (run #195). Sum of squared per-byte popcounts. Outer 8-trip fixed loop containing INNER variable-trip popcount via Brian Kernighan. Distinct from vm_popcount64_loop (single full popcount) and vm_byteparity64_loop (1-bit per byte) - tests outer-fixed/inner-variable nested loop with int accumulator and squaring step. Result: {"status":"keep","vm_sample_count":147,"total_semantic_cases":1662,"manifest_samples":179} * vm_digitprod64_loop passes both gates first try. Decimal digit product on full uint64_t with explicit zero special case. Variable trip = number of digits. Distinct from vm_decdigits64_loop (counts) and vm_base7sum64_loop (digit SUM base 7). Any zero digit collapses product to 0. Result: {"status":"keep","vm_sample_count":148,"total_semantic_cases":1672,"manifest_samples":180} * vm_revdecimal64_loop passes both gates first try. Reverses decimal digits via repeated `r = r10 + s%10; s /= 10`. Variable trip = number of decimal digits. Distinct from vm_digitprod64_loop (multiplies digits) and vm_decdigits64_loop (counts) - tests three i64 ops (mul, urem, udiv) against constant 10 inside the same body. Result: {"status":"keep","vm_sample_count":149,"total_semantic_cases":1682,"manifest_samples":181} vm_decsum64_loop passes both gates first try - reaches 150-VM-sample milestone. Decimal digit SUM (base 10) on full uint64_t. Distinct from vm_base7sum64_loop (base 7) and vm_digitprod64_loop (digit product) - completes the base-10 decimal arithmetic loop family with all four shapes covered (count, sum, product, reverse). Result: {"status":"keep","vm_sample_count":150,"total_semantic_cases":1692,"manifest_samples":182} * vm_trailzeros_factorial64_loop passes both gates first try. Trailing zeros in n! via Legendre's formula: c = floor(n/5) + floor(n/25) + ... Variable trip = log_5(n). Distinct from vm_decsum64_loop / vm_revdecimal64_loop / vm_digitprod64_loop (all divide-by-10) - exercises udiv-by-5 (different magic number) and accumulates the running QUOTIENT not remainder. Result: {"status":"keep","vm_sample_count":151,"total_semantic_cases":1702,"manifest_samples":183} * vm_geosum64_loop passes both gates after recovery. Counter-bound geometric series sum 1+3+9+...+3^(n-1) over n=(x&15)+1 iterations in u64. Two-state (r,p) where p is MULTIPLIED by 3 each iteration and r accumulates p. Distinct from vm_fibonacci64_loop (additive a,b) and vm_powmod64 (modular exponentiation). Recovered from vm_fibindex64 crash by switching from data-dependent bound to counter-driven (x&15)+1 shape. Result: {"status":"keep","vm_sample_count":152,"total_semantic_cases":1712,"manifest_samples":184} * vm_altbytesum64_loop passes both gates after fixing hex-to-decimal transcription. Alternating-sign byte sum: r = +b0 - b1 + b2 - b3 + ... over n=(x&15)+1 bytes with signed i64 accumulator returned as u64. Distinct from vm_xorbytes64 (XOR) and vm_byteparity64 (1-bit) - tests sign flip per iteration via negation, signed-times-unsigned multiply, and produces NEGATIVE i64 outputs that round-trip through u64 (case 0xDEADBEEFFEEDFACE -> 2^64-61). Result: {"status":"keep","vm_sample_count":153,"total_semantic_cases":1722,"manifest_samples":184} * vm_signedbytesum64_loop passes both gates first try. Per-byte signed accumulator: each byte sext (int8_t) and added to i64 over n=(x&7)+1 iterations. Distinct from vm_altbytesum64_loop (fixed alternating sign): here every byte's sign is data-dependent on its high bit. Tests sext-i8 to i64 and produces negative i64 results that round-trip through u64 (e.g. 0xFF byte -> -1, 0x80 -> -128). Result: {"status":"keep","vm_sample_count":154,"total_semantic_cases":1732,"manifest_samples":185} * vm_bytemax64_loop passes both gates after fixing pattern to llvm.umax.i64. Find max byte value across n=(x&7)+1 lower bytes via cmp-and-select max update. Lifter folds the (b>r)?b:r idiom into llvm.umax.i64 intrinsic. Distinct from vm_choosemax64_loop (chooses between two derived options s3+i vs s+ii over u64 state) - this iterates a byte stream and tracks the running max. Result: {"status":"keep","vm_sample_count":155,"total_semantic_cases":1742,"manifest_samples":186} * vm_byterange64_loop passes both gates first try. Tracks running min and max bytes across n=(x&7)+1 lower bytes and returns max-min. Lifter folds both cmp-and-select reductions to llvm.umax.i64 + llvm.umin.i64 then sub. Distinct from vm_bytemax64_loop (single umax reduction): two parallel reductions in lock-step in the same loop body. Result: {"status":"keep","vm_sample_count":156,"total_semantic_cases":1752,"manifest_samples":187} * vm_signed_byterange64_loop passes both gates after fixing patterns to icmp slt + select + sub. Tracks running min and max of signed (sext-i8) bytes across n=(x&7)+1 lower bytes, returns (smax-smin) as u64. Distinct from vm_byterange64_loop (unsigned -> umax/umin folds). Documents the lifter asymmetry: unsigned cmp+select folds to umax/umin intrinsics but signed cmp+select does NOT fold to smax/smin - emits raw icmp slt + select chains. Result: {"status":"keep","vm_sample_count":157,"total_semantic_cases":1762,"manifest_samples":188} * vm_squareadd64_loop passes both gates first try. Counter-bound u64 quadratic recurrence r = rr + i over n=(x&7)+1 iterations seeded with r=x. Distinct from vm_geosum64_loop (multiply by constant + add), vm_powmod64_loop (modexp with reduction), vm_choosemax64_loop (pick from two derived options). Tests i64 squaring on rapidly-growing accumulator mod 2^64. Result: {"status":"keep","vm_sample_count":158,"total_semantic_cases":1772,"manifest_samples":189} vm_xorrot64_loop passes both gates after replacing rotation with LCG step. Two-state recurrence: r = r XOR s; s = sGR + 1 (golden-ratio multiplicative step). Distinct from vm_lfsr64_loop, vm_pcg64_loop, vm_xorshift64_loop. Documents new lifter behavior: pure i64 rotation of a live state register inside a loop body gets hoisted to a single fshl outside the loop, dropping the rotation state - use arithmetic mul/add body steps instead. Result: {"status":"keep","vm_sample_count":159,"total_semantic_cases":1782,"manifest_samples":190} vm_murmurstep64_loop passes both gates first try. Murmur-style mix step chained over n=(x&7)+1 iterations: r = (r^x)MURMUR_M; r ^= r>>47. Single-state xor-mul-lshr chain. Distinct from vm_xorrot64_loop (xor + LCG mul/add), vm_djb264_loop (additive 33 hash), vm_fmix64_loop (single fmix finalizer no loop), vm_horner64_loop (polynomial). Reaches 160 VM samples. Result: {"status":"keep","vm_sample_count":160,"total_semantic_cases":1792,"manifest_samples":191} * vm_pairmix64_loop passes both gates first try. Two-state cross-feeding mix step with explicit temp barrier: t=a+b; a=bGR; b=t^(t>>33). Distinct from vm_xorrot64_loop (single accumulator + LCG state), vm_murmurstep64_loop (single state Murmur), and the REMOVED vm_tea_round_loop (compound v0/v1 cross-update mis-lifted) - the explicit temp `t` makes both reads of (a,b) finish before either is overwritten, which the lifter handles correctly. Result: {"status":"keep","vm_sample_count":161,"total_semantic_cases":1802,"manifest_samples":192} vm_fnv1a64_loop passes both gates first try. FNV-1a hash chain over n=(x&7)+1 bytes: r = (r ^ byte) * FNV_PRIME, with bytes consumed via shift on s. Distinct from vm_djb264_loop (additive 33), vm_murmurstep64_loop (same input each iter no byte windowing), vm_horner64_loop (polynomial). Tests xor-with-byte + multiply-by-40-bit-prime + lshr threaded through dispatcher loop body. Result: {"status":"keep","vm_sample_count":162,"total_semantic_cases":1812,"manifest_samples":193} vm_adler32_64_loop passes both gates after fixing pattern to urem i64. Adler-32-style two-accumulator modular hash over n=(x&7)+1 bytes: a=(a+byte)%65521; b=(b+a)%65521. Distinct from vm_fnv1a64_loop (single multiplicative state) and vm_byterange64_loop (cmp reductions). Tests parallel additive accumulators with i64 urem by 65521 (Adler prime) and final shl-or pack into one i64. Result: {"status":"keep","vm_sample_count":163,"total_semantic_cases":1822,"manifest_samples":194} * vm_byterev_window64_loop passes both gates first try. Variable-trip byteswap of lower n=(x&7)+1 bytes via shl-or-lshr packing. Distinct from vm_bswap64_loop (fixed 8-byte byteswap, lifter folds to llvm.bswap.i64): the symbolic trip count prevents the fold and keeps the body's shl-by-8 + or + lshr-by-8 chain visible. Tests byte-level packing accumulator threaded through dispatcher loop body. Result: {"status":"keep","vm_sample_count":164,"total_semantic_cases":1832,"manifest_samples":195} * vm_nibrev_window64_loop passes both gates first try. Variable-trip nibble-reverse over n=(x&7)+1 nibbles via shl-by-4 + or + lshr-by-4 chain. Distinct from vm_byterev_window64_loop (8-bit window, shl/lshr by 8) and vm_nibrev64_loop (full fixed 16-nibble reverse, may fold to intrinsic). Tests sub-byte windowed packing inside dispatcher loop. Result: {"status":"keep","vm_sample_count":165,"total_semantic_cases":1842,"manifest_samples":196} * vm_threestate_xormul64_loop passes both gates first try. Three-state cross-feeding recurrence: t=a^b; a=b; b=c+1; c=tGR+a over n=(x&7)+1 iters. Distinct from vm_tribonacci64_loop (additive a,b,c -> b,c,a+b+c) and vm_pairmix64_loop (two-state). Three i64 slots all updated each iter with sequential reads captured into temp t before any writeback (TEA-bug workaround pattern). Returns combined a^b^c. Result: {"status":"keep","vm_sample_count":166,"total_semantic_cases":1852,"manifest_samples":197} vm_xxhmix64_loop passes both gates first try. xxhash-style per-byte mix `r = (r ^ byte) * PRIME64_3` over n=(x&7)+1 bytes plus final xor-fold by lshr 33. Distinct from vm_fnv1a64_loop (40-bit FNV prime, no fold), vm_murmurstep64_loop (no byte windowing), vm_djb264_loop (additive 33). Tests xor-then-mul with 64-bit xxhash multiplier per byte plus a finalizer step in a separate post-loop PC state. Result: {"status":"keep","vm_sample_count":167,"total_semantic_cases":1862,"manifest_samples":198} vm_fmix_chain64_loop passes both gates first try. Murmur3 64-bit finalizer applied n=(x&7)+1 times: r ^= r>>33; r = 0xFF51..CCD; r ^= r>>33; r = 0xC4CE..C53. Distinct from vm_fmix64_loop (single fmix application no loop), vm_xxhmix64_loop (per-byte mix one mul + post-loop fold), vm_murmurstep64_loop (single magic + xor with input each iter), vm_splitmix64_loop (different magics + constant additive step). Tests dual-magic xor-mul-xor-mul finalizer chain inside counter-bound loop body. Result: {"status":"keep","vm_sample_count":168,"total_semantic_cases":1872,"manifest_samples":199} * vm_zigzag_step64_loop passes both gates first try. ZigZag encoding chained over a stepped state: enc=(s<<1)^((i64)s>>63); r+=enc; s+=GR over n=(x&7)+1 iters. Tests ashr i64 ... 63 (sign-broadcast arithmetic right shift) inside loop body. Distinct from vm_signedbytesum64_loop (per-byte sext-i8) and vm_splitmix64_loop (no ashr). Reaches 200 manifest entries milestone. Result: {"status":"keep","vm_sample_count":169,"total_semantic_cases":1882,"manifest_samples":200} * vm_xormuladd_chain64_loop passes both gates first try. Three-op single-state chain over n=(x&7)+1 iters: r=r^x; r=r0x1000193; r=r+x. Distinct from vm_murmurstep64_loop (xor-mul-lshr-fold; 64-bit magic), vm_fmix_chain64_loop (xor-mul-xor-mul; two 64-bit magics; no add), vm_xxhmix64_loop (xor-byte mul; post-loop fold). Tests xor + small-magic mul + add chain on single accumulator. Reaches 170 sample milestone. Result: {"status":"keep","vm_sample_count":170,"total_semantic_cases":1892,"manifest_samples":201} vm_subxor_chain64_loop passes both gates after fixing one transcribed expected value (caught before run). Single-state sub-xor chain over n=(x&7)+1 iters: r=(r-x)^(x<<3). Distinct from vm_xormuladd_chain64_loop (xor+mul+add), vm_xorbytes64_loop (XOR-only), vm_horner64_loop (mul+add). Tests `sub i64` chained with shl-3 and xor inside dispatcher loop body. Sub is underused vs add in existing samples. Result: {"status":"keep","vm_sample_count":171,"total_semantic_cases":1902,"manifest_samples":202} * vm_negstep64_loop passes both gates first try. Two-state recurrence with arithmetic negation: r=-r+s; s=s+1 over n=(x&7)+1 iters. Distinct from vm_subxor_chain64_loop (sub state-minus-input), vm_xormuladd_chain64_loop (xor+mul+add). Tests `sub i64 0, r` (negate) pattern inside dispatcher loop. Negation flips accumulator sign per iter; with stepped state s, telescoping produces predictable patterns. Result: {"status":"keep","vm_sample_count":172,"total_semantic_cases":1912,"manifest_samples":203} * vm_bitfetch_window64_loop passes both gates first try. Bitwise reversal of low n=(x&7)+1 bits via dynamic shift `(x >> i) & 1` per iter. Tests `lshr i64 x, i` with i a loop-index variable - non-constant shift amount inside dispatcher loop body. Distinct from vm_byterev_window64_loop (8-bit fixed shift) and vm_nibrev_window64_loop (4-bit fixed shift) which use constant shifts. Result: {"status":"keep","vm_sample_count":173,"total_semantic_cases":1922,"manifest_samples":204} * vm_dynshl_pack64_loop passes both gates first try. XOR-pack 2-bit chunks of x at dynamic bit positions controlled by loop index: r ^= ((s & 0x3) << i); s >>= 2. Tests `shl i64 v, %i` (dynamic LEFT shift) - complement to vm_bitfetch_window64_loop's dynamic LSHR. Distinct shift direction with same dynamic-amount property. Result: {"status":"keep","vm_sample_count":174,"total_semantic_cases":1932,"manifest_samples":205} * vm_dyn_ashr64_loop passes both gates first try. Dynamic-amount ASHR (signed shift right) by counter: sx = (i64)x >> i; r ^= byte(sx) over n=(x&7)+1 iters. Distinct from vm_bitfetch_window64_loop (dynamic LSHR), vm_dynshl_pack64_loop (dynamic SHL), vm_zigzag_step64_loop (constant ashr-63). Completes the dynamic-shift trio (lshr/shl/ashr). Negative-sign inputs fill with 1s producing different XOR patterns than unsigned shift. Result: {"status":"keep","vm_sample_count":175,"total_semantic_cases":1942,"manifest_samples":206} * vm_bytesmul_idx64_loop passes both gates first try. Per-byte signed accumulator scaled by 1-based loop index: r += sext(byte) * (i+1) over n=(x&7)+1 iters. Distinct from vm_signedbytesum64_loop (no index multiplier) and vm_altbytesum64_loop (fixed alternating sign). Tests sext-i8 multiplied by dynamic counter value (i+1) - i64 mul against phi-tracked counter rather than constant. Result: {"status":"keep","vm_sample_count":176,"total_semantic_cases":1952,"manifest_samples":207} * vm_notand_chain64_loop passes both gates first try. NOT-AND chain with dynamic-shift xor: r=(~r)&x; r^=(i<<3) over n=(x&7)+1 iters. Tests bitwise NOT (xor i64 r, -1) followed by AND with input (BMI andn-style idiom), then xor with i<<3 (dynamic shl by counter). Result: {"status":"keep","vm_sample_count":177,"total_semantic_cases":1962,"manifest_samples":208} * vm_xormul_byte_idx64_loop passes both gates first try. XOR-fold scaled bytes: r ^= byte * (i+1) over n=(x&7)+1 iters. Distinct from vm_bytesmul_idx64_loop (signed-byte sext + ADD) - this one uses unsigned-byte zext + XOR. Tests u8 zext multiply by dynamic counter (i+1) folded via XOR rather than ADD. Result: {"status":"keep","vm_sample_count":178,"total_semantic_cases":1972,"manifest_samples":209} * vm_signedxor_byte_idx64_loop passes both gates first try. Signed-byte sext * (i+1) folded via XOR over n=(x&7)+1 iters. Fills the sext+XOR cell of the per-byte * counter matrix. Distinct from vm_xormul_byte_idx64_loop (zext + XOR) and vm_bytesmul_idx64_loop (sext + ADD). For high-bit-set bytes, sext populates upper 56 bits with 1s producing different XOR fold than zext (e.g. 0xF0 byte -> 2^64-16 vs unsigned 240). Result: {"status":"keep","vm_sample_count":179,"total_semantic_cases":1982,"manifest_samples":210} * vm_uintadd_byte_idx64_loop passes both gates first try. Unsigned-byte (zext) * (i+1) folded via ADD over n=(x&7)+1 iters. Fills the zext+ADD cell, COMPLETING the per-byte * counter matrix across all four (zext/sext) x (ADD/XOR) cells. Reaches 180-sample milestone. Result: {"status":"keep","vm_sample_count":180,"total_semantic_cases":1992,"manifest_samples":211} * vm_bytesq_sum64_loop passes both gates first try. Sum of bytebyte (u8 self-multiply) over n=(x&7)+1 iters. Distinct from vm_popsq64_loop (sum of squared POPCOUNTS), vm_squareadd64_loop (single-state rr quadratic), vm_uintadd_byte_idx64_loop (byte * counter). Tests u8 self-multiply on the byte stream with no counter scaling. Result: {"status":"keep","vm_sample_count":181,"total_semantic_cases":2002,"manifest_samples":212} * vm_byteprod64_loop passes both gates first try. Running product of bytes r = byte over n=(x&7)+1 iters, seeded r=1. Distinct from vm_bytesq_sum64_loop (squared bytes summed), vm_uintadd_byte_idx64_loop (byte counter ADD), vm_xormul_byte_idx64_loop (byte * counter XOR). Tests `mul i64 r, byte` chained where any zero byte collapses the product but the loop still runs to completion. Result: {"status":"keep","vm_sample_count":182,"total_semantic_cases":2012,"manifest_samples":213} * vm_andsum_byte_idx64_loop passes both gates first try. Per-iter byte AND-ed with counter, summed: r += (byte & (i+1)) over n=(x&7)+1 iters. Distinct from vm_uintadd_byte_idx64_loop (byte * counter ADD), vm_xormul_byte_idx64_loop (byte * counter XOR). Tests `and i64 byte, counter` (zext-byte AND with phi-tracked i+1) folded via ADD - bitwise mask interaction with dynamic counter values. Result: {"status":"keep","vm_sample_count":183,"total_semantic_cases":2022,"manifest_samples":214} * vm_orsum_byte_idx64_loop passes both gates first try. Per-iter OR of byte and counter folded into accumulator: r \|= byte \| (i+1) over n=(x&7)+1 iters. Distinct from vm_andsum_byte_idx64_loop (AND fold), vm_xormul_byte_idx64_loop (XOR of bytecounter), vm_uintadd_byte_idx64_loop (ADD of bytecounter). Tests `or i64` chain that is monotone (only sets bits) - counter values 1..8 always contribute fixed low bits. Result: {"status":"keep","vm_sample_count":184,"total_semantic_cases":2032,"manifest_samples":215} * vm_subbyte_idx64_loop passes both gates first try. SUB-fold of u8 zext * counter: r -= byte * (i+1) over n=(x&7)+1 iters. Distinct from vm_uintadd_byte_idx64_loop (same body ADD-folded) - tests SUB on the same per-byte * counter accumulator. Result wraps below zero into u64 modular space. Result: {"status":"keep","vm_sample_count":185,"total_semantic_cases":2042,"manifest_samples":216} * vm_bytediv5_sum64_loop passes both gates first try. Sum of byte/5 over n=(x&7)+1 iters. Tests udiv-by-5 chain on byte stream. Distinct from vm_adler32_64_loop (urem by 65521 prime modular), vm_trailzeros_factorial64_loop (udiv-5 on single state), vm_uintadd_byte_idx64_loop (mul not div). All-0xFF: 8 * (255/5)=408. Result: {"status":"keep","vm_sample_count":186,"total_semantic_cases":2052,"manifest_samples":217} * vm_bytemod3_sum64_loop passes both gates first try. Sum of byte%3 over n=(x&7)+1 iters. Tests urem-by-3 chain on byte stream. Distinct from vm_bytediv5_sum64_loop (udiv-by-5) and vm_adler32_64_loop (urem-by-65521 prime). Small-modulus complement to /5 sample. All-0xFF: 255%3=0, sum=0. Result: {"status":"keep","vm_sample_count":187,"total_semantic_cases":2062,"manifest_samples":218} * vm_byteshl3_xor64_loop passes both gates first try. XOR-pack bytes at dynamic positions controlled by `i3` over n=(x&7)+1 iters. Tests `shl i64 byte, %i3` (dynamic shl by NON-trivial counter expression - mul-then-shl). Distinct from vm_dynshl_pack64_loop (shl by i directly, 2-bit chunks). Result: {"status":"keep","vm_sample_count":188,"total_semantic_cases":2072,"manifest_samples":219} * vm_byteshl_data64_loop passes both gates first try. Data-dependent shl: r=(r << (b&7)) \| (b>>4) over n=(x&7)+1 iters. Tests `shl i64 r, %byte_amount` where shift amount is derived from the BYTE STREAM rather than loop counter. Distinct from vm_dynshl_pack64_loop (shl by i) and vm_byteshl3_xor64_loop (shl by i3 - counter expression). Result: {"status":"keep","vm_sample_count":189,"total_semantic_cases":2082,"manifest_samples":220} vm_data_lshr64_loop passes both gates first try. Data-dependent right shift counterpart to vm_byteshl_data64_loop: r=(r >> (b&7)) ^ b over n=(x&7)+1 iters. Tests `lshr i64 r, %byte_amount` (right-shift by byte-derived amount). Initial r=~0 with all-1s shifts down by data-driven amounts. Reaches 190 sample milestone. Result: {"status":"keep","vm_sample_count":190,"total_semantic_cases":2092,"manifest_samples":221} * vm_data_ashr64_loop passes both gates first try. Data-dependent ashr counterpart: r=(i64 r >> (b&7)) + b over n=(x&7)+1 iters. Tests `ashr i64 r, %byte_amount` (signed right-shift by byte-derived amount). Completes the data-dependent shift trio (shl/lshr/ashr) - distinct from vm_dyn_ashr64_loop (ashr by counter not byte data). Result: {"status":"keep","vm_sample_count":191,"total_semantic_cases":2102,"manifest_samples":222} * vm_mul3byte_chain64_loop passes both gates first try. Horner-style hash with multiplier 3: r = r3 + byte over n=(x&7)+1 iters. Distinct from vm_djb264_loop (33), vm_fnv1a64_loop (FNV prime), vm_horner64_loop (general polynomial). Tests `mul i64 r, 3` (small-constant multiplier - non-power-of-2 coefficient that lifter typically keeps as raw mul rather than lea-by-3 fold). Result: {"status":"keep","vm_sample_count":192,"total_semantic_cases":2112,"manifest_samples":223} * vm_shiftin_top64_loop passes both gates first try. Shift register filled from the top: r=(r>>8)\|(byte<<56) over n=(x&7)+1 iters. Tests `lshr i64 r, 8 \| shl i64 byte, 56` shift-register update pattern. Distinct from vm_byterev_window64_loop (shl-or pack from low end). After n=8 iters, all-FF input is preserved (palindrome invariant). Result: {"status":"keep","vm_sample_count":193,"total_semantic_cases":2122,"manifest_samples":224} * vm_orxor_pair64_loop passes both gates first try. Two-state cross-feed with explicit temp barrier: t=a; a=a\|b; b=t^(b7) over n=(x&7)+1 iters. Combines monotone OR fold on a with non-monotone XOR-mul evolution on b. Distinct from vm_pairmix64_loop (add+mul-by-GR cross-feed), vm_threestate_xormul64_loop (three states), vm_orsum_byte_idx64_loop (single-state OR fold). Result: {"status":"keep","vm_sample_count":194,"total_semantic_cases":2132,"manifest_samples":225} vm_lcg_ansi_chain64_loop passes both gates first try. Classic ANSI C rand() LCG chained over n=(x&7)+1 iters: r = r1103515245 + 12345. Distinct from vm_xorrot64_loop (LCG with golden-ratio + xor accum), vm_pcg64_loop, vm_xorshift64_loop. Single-state LCG with canonical multiplier+increment pair. Result: {"status":"keep","vm_sample_count":195,"total_semantic_cases":2142,"manifest_samples":226} vm_bytesq_idx_sum64_loop passes both gates first try. Sum of byte * (i+1) * (i+1) - SQUARED counter expression as multiplier. Two sequential muls per iter (countercounter then bytecounter^2). Distinct from vm_uintadd_byte_idx64_loop (linear counter) and vm_bytesq_sum64_loop (byte self-multiply, no counter). All-0xFF: 0xFF204=52020. Result: {"status":"keep","vm_sample_count":196,"total_semantic_cases":2152,"manifest_samples":227} vm_dynshl_accum_byte64_loop passes both gates first try. Shift accumulator left by (i+1) then add byte: r=(r<<(i+1))+byte over n=(x&7)+1 iters. Tests `shl i64 %r, %(i+1)` (shift ACCUMULATOR by phi-tracked counter rather than the byte). Distinct from vm_dynshl_pack64_loop (shl byte by counter) and vm_byteshl_data64_loop (data-dependent shl on accumulator). Result: {"status":"keep","vm_sample_count":197,"total_semantic_cases":2162,"manifest_samples":228} * vm_dynlshr_accum_byte64_loop passes both gates after recovering from aborted previous turn (file was on disk, manifest entry missing). Shifts r right by (i+1) bits then XORs the byte: r=(r>>(i+1))^byte over n=(x&7)+1 iters with r seeded ~0. Tests `lshr i64 %r, %(i+1)` (lshr accumulator by phi-tracked counter expression). Distinct from vm_dynshl_accum_byte64_loop (shl direction) and vm_data_lshr64_loop (lshr by byte data not counter). Result: {"status":"keep","vm_sample_count":198,"total_semantic_cases":2172,"manifest_samples":229} * vm_dynashr_accum_byte64_loop passes both gates first try. ASHR accumulator by counter then add byte: r=(i64 r >> (i+1)) + byte over n=(x&7)+1 iters. Tests `ashr i64 %r, %(i+1)` (signed right-shift accumulator by phi-tracked counter). Completes the counter-driven accumulator-shift trio (shl/lshr/ashr). Result: {"status":"keep","vm_sample_count":199,"total_semantic_cases":2182,"manifest_samples":230} * vm_xormulself_byte64_loop passes both gates first try. Self-referential multiply: r ^= byte * (r+1) over n=(x&7)+1 iters. Tests `mul i64 byte, (r+1)` where multiplier operand is the accumulator+1 - r appears on both sides of the body. Distinct from vm_xormul_byte_idx64_loop (byte * counter) and vm_squareadd64_loop (rr self-multiply on full state). Reaches 200-sample milestone. Result: {"status":"keep","vm_sample_count":200,"total_semantic_cases":2192,"manifest_samples":231} vm_xor_shifted_self_byte64_loop passes both gates first try. Self-shift used as XOR mask combined with byte at MSB: r ^= (r>>8) \| (byte<<56) over n=(x&7)+1 iters. Distinct from vm_shiftin_top64_loop (assigns same expression, no XOR), vm_xormulself_byte64_loop (mul-self with byte), vm_byterev_window64_loop (no XOR). Result: {"status":"keep","vm_sample_count":201,"total_semantic_cases":2202,"manifest_samples":232} * vm_pair_xormul_byte64_loop passes both gates first try. Per-iter pair (b0,b1) combined as (b0^b1) * (b0+b1) over n=(x&3)+1 iters. Tests TWO byte reads per iteration with XOR + ADD + MUL combination. Trip uses `& 3` so loop consumes 2 bytes per iter (1..4 pair iters). Distinct from all single-byte-per-iter samples. Result: {"status":"keep","vm_sample_count":202,"total_semantic_cases":2212,"manifest_samples":233} * vm_quad_byte_xor64_loop passes both gates first try. FOUR byte reads per iteration combined via 3 chained XORs then ADD-folded over n=(x&1)+1 iters (32-bit stride). Distinct from vm_pair_xormul_byte64_loop (2 bytes per iter) and all single-byte samples. Tests wider stride consumption and multi-byte body shape. Result: {"status":"keep","vm_sample_count":203,"total_semantic_cases":2222,"manifest_samples":234} * vm_word_xormul64_loop passes both gates first try. u16 word per iter (16-bit stride): r ^= ww over n=(x&3)+1 iters. Tests u16 zext-i16 self-multiply XOR-folded. Distinct from vm_bytesq_sum64_loop (8-bit stride, ADD) and vm_pair_xormul_byte64_loop (16-bit stride but byte ops). Result: {"status":"keep","vm_sample_count":204,"total_semantic_cases":2232,"manifest_samples":235} vm_word_horner13_64_loop passes both gates first try. Horner-style hash on u16 words with multiplier 13: r = r13 + w over n=(x&3)+1 iters. Distinct from vm_mul3byte_chain64_loop (Horner on bytes mul 3), vm_djb264_loop (bytes mul 33), vm_word_xormul64_loop (word self-multiply XOR). Wider stride + different multiplier than existing byte-Horner samples. Result: {"status":"keep","vm_sample_count":205,"total_semantic_cases":2242,"manifest_samples":236} vm_dword_xormul64_loop passes both gates first try. u32 dword per iter (32-bit stride) with golden-ratio prime mul XOR-folded: r ^= dword * 0x9E3779B9 over n=(x&1)+1 iters. Distinct from vm_word_xormul64_loop (16-bit stride) and vm_quad_byte_xor64_loop (4 bytes per iter, no mul). Tests u32 zext-i32 mask + 32-bit-magic multiply. Result: {"status":"keep","vm_sample_count":206,"total_semantic_cases":2252,"manifest_samples":237} * vm_signed_dword_sum64_loop passes both gates first try. Sum of sext-i32 dwords per iter over n=(x&1)+1 iters. Tests `sext i32 to i64` chain on 32-bit dword stream. Distinct from vm_signedbytesum64_loop (sext-i8 byte, 8-bit stride) and vm_dword_xormul64_loop (zext dword XOR, no sign extension). Result: {"status":"keep","vm_sample_count":207,"total_semantic_cases":2262,"manifest_samples":238} * vm_signed_word_sum64_loop passes both gates first try. Sum of sext-i16 words per iter over n=(x&3)+1 iters. Tests `sext i16 to i64` chain on 16-bit word stream. Fills the i16 middle width and completes the sext-width trio (i8/i16/i32 -> i64). Result: {"status":"keep","vm_sample_count":208,"total_semantic_cases":2272,"manifest_samples":239} * vm_word_range64_loop passes both gates after restructuring to n-decrement (4 slots: n,s,mn,mx). Tests u16 cmp-driven reductions at 16-bit stride: mx=umax(w,mx); mn=umin(w,mn); return mx-mn over n=(x&3)+1 iters. Lifter folds both reductions to llvm.umax.i64 + llvm.umin.i64. Documents new lifter limitation: 5-slot variant (with separate i counter) trips pseudo-stack init failure; 4-slot form works. Result: {"status":"keep","vm_sample_count":209,"total_semantic_cases":2282,"manifest_samples":240} * vm_signed_word_range64_loop passes both gates first try. Signed-i16 min/max range at word stride: tracks mx,mn over n=(x&3)+1 iters then returns mx-mn. Distinct from vm_word_range64_loop (unsigned -> umax/umin folds) and vm_signed_byterange64_loop (i8 stride). Per documented asymmetry, signed cmp+select stays raw icmp slt + select. Reaches 210-sample milestone. Result: {"status":"keep","vm_sample_count":210,"total_semantic_cases":2292,"manifest_samples":241} * Add equivalence reporting tool for rewrite_smoke samples * vm_dword_range64_loop passes both gates first try. u32 dword min/max range over n=(x&1)+1 iters. Tests umax/umin folds at 32-bit dword stride. Distinct from vm_byterange64_loop (8-bit) and vm_word_range64_loop (16-bit). Extends range coverage to all four widths (u8/u16/u32 + signed counterparts). Result: {"status":"keep","vm_sample_count":211,"total_semantic_cases":2302,"manifest_samples":242} * Generate per-sample original-vs-lifted equivalence reports for rewrite_smoke * vm_signed_dword_range64_loop passes both gates first try. Signed-i32 dword min/max range over n=(x&1)+1 iters. Tests sext-i32 + signed cmp+select reductions at 32-bit stride. Completes the range coverage matrix (3 widths x 2 signs). Per documented signed-cmp asymmetry, signed cmp+select stays raw icmp slt + select. Result: {"status":"keep","vm_sample_count":212,"total_semantic_cases":2312,"manifest_samples":243} * vm_word_orfold64_loop passes both gates first try. u16 OR-fold over n=(x&3)+1 iters. Tests `or i64` chain at 16-bit word stride. Distinct from vm_orsum_byte_idx64_loop (byte \| counter, 8-bit stride). Monotone OR fold (only sets bits). Result: {"status":"keep","vm_sample_count":213,"total_semantic_cases":2322,"manifest_samples":244} * Refresh equivalence reports for current 246-sample manifest * vm_byte_andfold64_loop passes both gates. u8 AND-fold over n=(x&7)+1 bytes seeded with r=0xFF. Tests `and i64` chain at byte stride - monotone DECREASING accumulator counterpart to OR-fold. Distinct from vm_andsum_byte_idx64_loop (byte AND counter, ADD-folded). Result: {"status":"keep","vm_sample_count":214,"total_semantic_cases":2332,"manifest_samples":245} --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> Co-authored-by: Yusuf <yusuf@local>	2026-04-25 19:56:16 +03:00
naci	c1ca564305	lifter: strip trailing inttoptr-constant stores before terminators (#202 ) The lifter's pseudo-memory model emits writes to obfuscator-controlled scratch (Themida's .vlizer slots, register-spill scratch in .data, etc.) as 'store ... ptr inttoptr (i64 K to ptr)'. LLVM's DSE treats those conservatively because every inttoptr constant can theoretically alias an externally observable pointer, so they survive -O2 even when the lifted function does not read them. Add a post-O2 pass StripTrailingScratchStoresPass that walks each block backwards from a 'ret' or 'unreachable' terminator and drops the trailing run of inttoptr-constant stores. Stops at the first non-store or non-inttoptr-constant-store instruction (call, load, store-through- %memory-GEP, fence, ...) - those are real side effects. This is conservative: only the stores between the last side-effecting instruction and the terminator are removed. Stores that precede a later call or load survive untouched, so program behaviour is preserved. Impact: example2-virt.bin @ 0x140001000: before: 257 lines, 222 stores after: 240 lines, 205 stores (17 trailing stores stripped) imports: 4/4 (unchanged) example2.bin (non-virt) @ 0x140001000: before: 247 lines, 212 stores after: 37 lines, 3 stores (the function shrinks to its actual program flow: 7 typed imports, a llvm.memset, and 'ret i64 0') warn/err: 0/0 (unchanged) The non-virt main now reads as the original program: GetStdHandle x2 -> WriteConsoleA(prompt) -> ReadConsoleA(buf) -> CharUpperA(buf) -> WriteConsoleA(echo) -> WriteConsoleA(buf) -> ret 0 python test.py baseline + quick + themida all green. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-25 02:49:06 +03:00
naci	0bac256522	lifter: add Win32 console-I/O signatures and preserve call-arg GEPs (#201 ) Register proper prototypes for GetStdHandle, WriteConsoleA, ReadConsoleA, and CharUpperA in funcsignatures. The infrastructure for typed-pointer imports was already in place (parseArgs uses getPointer() when arg.argtype.isPtr), but these four imports lacked signatures and fell through to the unknown-call path that passes 16 GPRs + a raw %memory pointer with every arg as i64. Also extend PromotePseudoStackPass with a narrow escape filter: stack- range %memory GEPs whose users include a CallBase are no longer migrated onto the %stackmemory alloca. Migrating them would let a call argument use the alloca pointer, which blocks mem2reg/SROA on the alloca and leaves hundreds of dead stack stores in the post-opt IR. Leaving those specific GEPs through %memory costs nothing (memory is already a function argument). Other non-load/store uses (ptrtoint, GEP-of-GEP, stored-as-value) still migrate, so rewrite_smoke samples that depend on full alloca promotion keep working. Before (example2-virt.bin @ 0x140001000): %2 = tail call i64 @WriteConsoleA(i64 %1, i64 5368717648, i64 16, i64 1375568, ptr %memory) After: declare i64 @WriteConsoleA(i64, ptr, i32, ptr) local_unnamed_addr %2 = tail call i64 @WriteConsoleA( i64 %1, ptr nonnull inttoptr (i64 5368717648 to ptr), i32 16, ptr nonnull inttoptr (i64 1375568 to ptr)) All 4 imports now carry their real Win32 signature: @GetStdHandle(i32) @WriteConsoleA(i64, ptr, i32, ptr) @ReadConsoleA (i64, ptr, i32, ptr) @CharUpperA (ptr) Speed on example2-virt.bin: before: 1.25s median after: 1.07s median (-14%) Non-virt example2.bin output gets dramatically cleaner; the full program flow is visible in ~20 lines with correctly typed calls (GetStdHandle stdin/stdout, WriteConsoleA/ReadConsoleA with ptr buffer args, CharUpperA on buffer, final WriteConsoleA x2). Still 0 warn, 0 err. python test.py baseline + quick + themida remain green. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-25 02:16:38 +03:00
naci	99fdcb7531	lifter: lower IndirectJump shape-aware threshold from 128 to 80 (#200 ) PR #195 set the IndirectJump shape-aware revisit threshold to 128 to unlock all 4 imports on example2-virt.bin. Sweeping the threshold post- chain shows 80 is sufficient: 4/4 imports surface at T=80 already, matching T=128's metric while doing significantly less work. T=80: 1615 blocks lifted, 4/4 imports T=128: 3077 blocks lifted, 4/4 imports Concrete impact on example2-virt.bin @ 0x140001000: metric T=128 T=80 -----------------+-----------+---------- wall (median) 2.21s 1.25s -43% pre-opt IR lines 50,856 38,260 -25% post-opt lines 305 247 -19% post-opt stores 268 212 -21% imports pre/post 4/4 4/4 same DirectJump and ConditionalBranch still use threshold 0, so rewrite_smoke VM-loop samples still generalise on their first backedge - no regression. python test.py baseline + quick + themida all green. Non-virt example2.bin unchanged. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-25 01:42:48 +03:00
naci	7bcb705b5a	lifter: degrade PATH_unsolved ret to REAL_return to fix unterminated blocks (#199 ) When lift_ret classifies a ret as ROP_return (because rspvalue is a ConstantInt != STACKP_VALUE) and solvePath subsequently returns PATH_unsolved, the previous code emitted a warning and left the block unterminated. The outer per-instruction lift loop in liftBasicBlockFromAddress would then advance current_address past the ret and lift the next byte, producing a second terminator and malformed IR (same shape as #196's earlier bug, exposed on a different code path). The most accurate semantic for an unresolvable ret is 'returns to a caller we do not have context for' - degrade to REAL_return: emit 'ret rax' and stop the block. The warning still fires (suppressed for chained PCs per #198) so the unresolvable signal is preserved as a diagnostic, but the IR stays well-formed. Visible at 0x14000110d (the entry function's own final ret) on example2-virt.bin @ 0x140001000: previously 1 warn + an unterminated block whose downstream lifted bytes produced spurious 'ret undef' in some block. After this change, the warning still surfaces but the block is terminated with 'ret rax' immediately. Knock-on improvement: with the ret site terminating cleanly, O2's DCE collapses the noise and the optimized IR now contains all 7 import calls in their original program order: GetStdHandle(STD_INPUT_HANDLE) ; stdin GetStdHandle(STD_OUTPUT_HANDLE) ; stdout WriteConsoleA(stdout, prompt) ReadConsoleA(stdin, buffer) CharUpperA(buffer) WriteConsoleA(stdout, echo) WriteConsoleA(stdout, buffer) Matches the emulator trace from PR #190 exactly. The post-opt IR went from 'imports declared but most call-sites DCE'd' to 'full original program flow visible'. Verified: python test.py baseline + quick + themida green. Non-virt example2.bin unchanged (2 blocks, 6 declares, 0 warn, 0 err). themida-virt: 4/4 imports pre-opt AND post-opt, 1 warn (legit top- level ret), 0 err - same headline numbers as pre-fix, but the post- opt IR is dramatically cleaner. Also drops the noisy stdout '[diag] lift_ret: unresolved ROP chain' print that ran in lockstep with the structured warning - the warning already conveys the same info via diagnostics.warning. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-25 00:45:18 +03:00
naci	463b6aca68	lifter: track chained import ret-sites to suppress redundant warnings (#198 ) The ret-to-IAT chain in lift_ret recognises concrete VM-staged import calls on first visit. Later symbolic re-entries of the same PC fall through to solvePath, which returns PATH_unsolved when the popped realval is now a phi of multiple concrete values from different dispatch paths, and emits an UnresolvedRetChain warning. The warning is factually correct - that specific revisit couldn't resolve symbolically - but semantically redundant: the concrete import semantics are already captured at the chain site, so the re-entry carries no new information. Track chained import-ret-site PCs in a new std::set member, insert on successful chain fire, and skip the warning emission when the PATH_unsolved site is in the set. Impact on example2-virt.bin @ 0x140001000: before: warn=1 err=0, site=0x14017fa77 (symbolic re-entry of the GetStdHandle ret-site) after: warn=1 err=0, site=0x14000110d (top-level entry ret - no caller context, legitimately unresolvable) Non-virt, baseline, quick, and themida tests remain green. The warning count is coincidentally unchanged; what moved is WHICH PC triggered the single diagnostic emission. The new site is a genuine top-level return that the lifter cannot resolve (entry function has no caller). Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 21:16:52 +03:00
naci	39b7fcb71f	lifter: handle INT1 and INT3 like UD2 (call @exception; ret) (#197 ) Both 0xF1 (INT1, ICEBP debug trap) and 0xCC (INT3, debugger break) previously fell through to the 'Instruction not implemented' default, emitting a DiagCode::InstructionNotImplemented error. They raise #DB/#BP exceptions at runtime, functionally equivalent to UD2 which already lowers to 'call @exception; ret'. Group them with UD2 by adding two fall-through case labels. Same lowering: emit call @exception(), ret, and stop the block. On example2-virt.bin @ 0x140001000: before: 1 warn, 1 err (INT1 at 0x1401928ef) after: 1 warn, 0 err (INT1 now lifts cleanly as @exception call) Baseline + quick + themida remain green. Non-virt example2.bin unchanged. The themida test's 'extra imports' list gains '@exception' alongside the existing '@fastfail' for the same kind of lowering. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 21:05:03 +03:00
naci	8f852966f1	lifter: stop the lift loop after the ret-to-IAT chain emits its terminator (#196 ) The chain in lift_ret emits a br to the continuation block and returns. But lift_ret returning does not stop the outer per-instruction lift loop in liftBasicBlockFromAddress - that loop only stops when run=0 or finished=1. On return the loop advances current_address by the ret's length and lifts the NEXT byte, which emits a second terminator in the same BB. Two terminators in one BB is accepted by IRBuilder but makes the block malformed: LLVM treats only the first br as the live terminator. All blocks the first br reaches still exist, but predecessor-tracking silently treats the second br's target as the live successor, so the chain's continuation block (and everything downstream of it) ends up orphaned with zero predecessors. Visible consequence: CharUpperA's call was in bb_solved_const8212 whose source block had two consecutive br's. The chain's br to the bb_after_import_CharUpperA BB was the second terminator, so the first br went to some other block and CharUpperA's continuation became unreachable. After O2 it got DCE'd entirely along with its declare. Fix: set run=0 and finished=1 after CreateBr in the chain path. The outer loop exits cleanly, the block keeps its single br to contBB, and the continuation path stays connected to the CFG. Impact on example2-virt.bin @ 0x140001000: pre-opt IR: 4/4 imports, 2823 blocks, warn=1 err=1 (was 2365 blocks, warn=7 err=2) post-opt IR: 4/4 imports survive DCE (was 1/4 - only GetStdHandle survived) Baseline + quick + themida remain green. The malformed-block pattern was the source of most of the junk-switch warnings too: those paths were only reached via the second br's stray successor; with the fix the lifter no longer explores them. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 20:52:03 +03:00
naci	cf7e2f34a8	lifter: recover all 4 themida-virt imports via ret-to-IAT chain (#195 ) On example2-virt.bin @ 0x140001000, the lifter previously surfaced only GetStdHandle (1/4 of the required imports). This change unlocks 4/4: before: 359 blocks, 1/4 imports, warn=0, err=0 after: 2365 blocks, 4/4 imports, warn=7, err=2 python test.py themida now passes on this sample: PASS: example2 - 5 distinct imports, 7 calls (required 4) Three coupled changes: 1. Ret-to-IAT chain in lift_ret (Semantics_ControlFlow.ipp) When the popped target is a concrete import VA AND the top of the new stack is a concrete continuation, emit `call @import` and branch to the continuation block instead of letting the ret go to solvePath. This keeps exploration alive past the import so the VM's subsequent handlers (which carry the other imports) get reached. 2. Preserve caller-saved GPRs across VM-staged imports The chain's `CreateCall` goes through buildUnknownCallFx with an EMPTY volatileRegs set. Rationale: VM-staged imports are invoked from a dispatcher that preserves its own caller-saved state across the external call in the real binary (otherwise the VM would be broken). Clobbering those regs in the lifter made the dispatcher's next step non-concrete, trapping further exploration in one handler. Only applied to this specific call path. All other external calls still use the strict x64 MSVC ABI (caller-saved clobbered) through the unchanged applyPostCallEffects default. 3. Raise shape-aware IndirectJump threshold from 16 to 128 The VM dispatcher re-enters its header many times per bytecode step; 16 iterations are not enough to cover all four import handlers. 128 does. DirectJump and ConditionalBranch stay at threshold 0, so rewrite_smoke VM-loop samples still generalize immediately on their first backedge. Verified: - python test.py baseline green (rewrite regression + determinism) - python test.py quick green (33/33 semantic + all instruction microtests) - python test.py themida green (PASS on example2) - non-virt example2.bin unchanged: 2 blocks, 6 declares, 0 warn, 0 err Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 20:32:01 +03:00
naci	194f43b56d	lifter: guard pop rsp against out-of-range concrete values (#194 ) A concrete RSP produced by `pop rsp` that falls outside the tracked pseudo-stack range crashes the lifter on the next [rsp] memory op, via an unmapped dereference inside GetMemoryValue/solveLoad. This shape appears in VM stack-switch gadgets (example: the Themida-virt handler at 0x14017facc..fae3) and the crash kills deep-exploration sweeps. When lift_pop detects the destination operand is RSP and Rvalue is a ConstantInt outside `isTrackedStackAddress`, emit a structured warning and an unreachable terminator for the block instead of writing the bad RSP and letting the next memory op take us down. Symbolic Rvalue is unchanged - the existing symbolic-load path already handles it. ConstantInt within the tracked stack range is unchanged - legitimate `pop rsp` on a real stack pointer stays supported. Verified on example2-virt.bin @ 0x140001000: default (no chain, T=16 IndirectJump): 359 blocks, 0 warn, 0 err chain + T=32: before: SEGV at block 755 / PC 0x14017fae1 after: 755 blocks, 1/4 imports, 2 warn, 0 err, exit 0 chain + NO_LOOP_GEN: 2972 blocks, 1/4, exit 0 (no more crash) Baseline rewrite regression + determinism checks remain green. Default themida lift unchanged (359 blocks, 1/4 imports). The guard fires only when exploration actually reaches the gadget, which at current defaults it does not. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 20:12:36 +03:00
naci	08709710da	lifter: add MERGEN_INSTR_TRACE_FILE per-instruction breadcrumb (#193 ) Companion to MERGEN_BLOCK_TRACE_FILE. Writes one line per instruction lifted to the target file, with per-instruction flush. Pins exactly which instruction VA precedes a crash when debuggers are unavailable. Format: 0x<instruction-VA> Used to localise the chain+T=32 crash from 'block 755' down to the exact PC 0x14017fae1 - the 'push [rsp]' immediately after 'pop rsp' in the VM's stack-switch gadget at 0x14017facc..0x14017fae3. That pair is the concrete crash trigger: pop rsp loads a new RSP value, then push [rsp] attempts to read from that new RSP and the lifter's memory path crashes when the address is out-of-range. No behaviour change when the env var is unset. Verified: - python test.py baseline green - default themida lift unchanged (359 blocks, 1/4 imports) Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 19:59:12 +03:00
naci	e7b0babec8	lifter: add MERGEN_BLOCK_TRACE_FILE breadcrumb env var (#192 ) Emit one line per block pulled off the lift worklist to a file, with per- block flush. Useful for pinning deep-exploration crashes when a debugger is unavailable or cannot attach to the release-built lifter on this host. Format per line (space-separated): <fnc->size()-at-pop> <0x-prefixed-block-VA> Emitted only when MERGEN_BLOCK_TRACE_FILE=<path> is set in the environment. File is opened in append mode and flushed + closed on every block, so the last line always survives a crash. Used during iteration on the Themida-virt import-recovery work to pin a chain+T=32 SEGV to block 755, VA 0x14017facc. Bytes there include a 'pop rsp' (5c), a classic symbolic-memory hazard. No behaviour change when the env var is unset. Verified: - baseline rewrite regression + determinism green - default themida lift unchanged (359 blocks, 1/4 imports) Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 19:46:12 +03:00
naci	8d33c102a6	lifter: make maxBasicBlockBudget tunable via MERGEN_MAX_BLOCK_BUDGET (#191 ) The per-function basic-block budget is currently a compile-time 4096. Deep-exploration crashes (e.g. the chain + T>=32 SEGV at ~1891 blocks tracked in docs/README.md under open blockers) need a reliable way to cap the lift at a specific block count to bisect the crash site. Expose an integer env var that overrides the default: MERGEN_MAX_BLOCK_BUDGET=0 disables the cap entirely MERGEN_MAX_BLOCK_BUDGET=<N> caps the function at N basic blocks, which cleanly terminates with the existing LiftBlockBudgetExceeded error once fnc->size() reaches N Verified: - budget=100 stops at 99 blocks and emits the expected error - budget=0 leaves behaviour unchanged (359 blocks on example2-virt) - default lift (no env) leaves behaviour unchanged - python test.py baseline still green (all rewrite regression checks) No behaviour change for existing runs. Pure diagnostic knob. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 19:30:15 +03:00
naci	aff35bc01c	diag: sentinel pages as ret stubs so tracer observes every import (#190 ) Before: each IAT slot pointed at an unmapped sentinel address, so the FIRST import call raised UC_ERR_FETCH_UNMAPPED and the emulator stopped. We only ever observed one import per run. After: sentinel addresses live in a mapped page filled with 0xC3 (near ret) instructions. Each import call fetches the ret byte, immediately returns to the VMs pre-staged continuation, and emulation keeps going. All subsequent imports now surface as [HIT] events. On example2-virt.bin @ 0x140001000 this finds every required import: insn ret-site import target ---- -------- ------ ------ 34223 0x14017fa77 GetStdHandle stdin 44847 0x14017fa77 GetStdHandle stdout 60695 0x14017ef9f WriteConsoleA prompt 74394 0x140192798 ReadConsoleA 85326 0x140157ef9 CharUpperA 97859 0x14013bf11 WriteConsoleA echo 110166 0x14017fa77 WriteConsoleA final This gives the full map of import ret-site addresses for the virt sample - useful for future work that needs to reach those sites (whether by deeper lifter exploration or by seeding additional entries). The lifter currently reaches 0x14017fa77 only. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 18:47:21 +03:00
naci	f340c8186a	docs: ret-to-IAT chain retry findings under shape-aware defaults (#189 ) Updates the tombstone comment in lift_ret with what this iteration discovered when re-attempting the chained-continuation variant under the post-#188 shape-aware defaults (T=16 on IndirectJump, 0 elsewhere): - At effective T=16: chain safely fires once at 0x14017fa77 (GetStdHandle, continuation 0x1401c888e) and explores 40 more blocks (359 -> 399), but does not surface any additional imports. Still 1/4. - At T>=32: still crashes at ~1891 blocks deep, same as #187. Two reasons chaining is not wired in: 1. T>=32 crash blocks broader use. 2. Safe T=16 chain does not reach other import ret sites within the generalization-bounded exploration budget. The chain block is left as a guarded diagnostic (requires MERGEN_RET_ CHAIN=1) so researchers can reproduce the T=16 exploration envelope and the T>=32 crash, but the default path remains just the PathSolver hook with 'call @import(); unreachable' leaves. Comment-only change. No code-behaviour change. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 18:16:18 +03:00
naci	90d5ca23b6	lifter: shape-aware gen threshold: 16 on IndirectJump, 0 elsewhere (#188 ) Default env now surfaces GetStdHandle on example2-virt.bin (0/4 -> 1/4 on python test.py themida). Shape-aware discriminator uses the path- solve context: - IndirectJump (jmp reg / jmp [mem]): likely a VM dispatcher, hold generalisation off for the first 16 revisits so concrete exploration has a chance to reach the IAT-gadget ret sites. - DirectJump / ConditionalBranch: simple guest loops - generalise on the first backedge, preserving the rewrite_smoke VM-loop patterns. Replaces the earlier 'targetResolvedConcretely' signal which fired on every solvePath-resolved target - too broad, and would re-regress the dummy_vm_loop / bytecode_vm_loop / stack_vm_loop samples. Measurement on example2-virt.bin @ 0x140001000: before after this commit imports 0/4 1/4 (GetStdHandle) insns 2544 11441 blocks 56 359 errors 0 0 warnings 0 0 MERGEN_GEN_MIN_REVISITS env still overrides per-header for researchers. Values {6, 8, 12} still crash on the virt sample - same unrelated dispatcher-state bug as before. Regression checks: - non-virt example2.bin: 61 insns, 6 imports, 0 warn/err (unchanged). - python test.py baseline: passes; determinism check passes. - All three rewrite_smoke VM-loop samples (dummy/bytecode/stack): passing their required IR patterns. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 17:18:44 +03:00
naci	c23fb23918	docs: explain why ret-to-IAT chaining is not wired up (#187 ) Elaborate the comment next to the PathSolver-centralised import hook to explain what was tried and why chaining is not live yet, so the next session does not re-litigate the attempt from scratch. Attempted variant: after recognising a ret-to-IAT, emit callFunctionIR and feed the VM's pre-staged continuation ([rsp+8] before the original ret) to solvePath, so exploration follows into the next VM handler. Includes a mapped-address safety guard on the first chain step. Result: crashes the lifter with access violation on T>=32 runs of example2-virt.bin; T=16 still works but gives the same 1/4 result as the non-chained path. The crash is downstream of the chain (in one of the extra blocks chaining unlocks), not at the chain step itself - the guard does not catch it. Needs a debugger session to localise before landing. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 16:58:46 +03:00
naci	4c16fa972e	lifter: cleanup follow-ups to the PathSolver import hook (#186 ) Two small cleanups on top of #185. 1. PathSolver.resolveTargetBlock: return reusedBackedge=true for the synthetic import block. solvePath uses that flag to skip queuing the target for further lifting - which is what we want, because the import block's 'address' is an IAT slot VA or a hint/name VA and trying to decode the bytes at those addresses would repeat the OUTSD 'not implemented' error the hook exists to avoid. 2. lift_ret: delete the ret-to-IAT pre-check added in #184. The PathSolver hook already catches every target lift_ret's pre-check would have caught (and more, via solvePath's getConstraintVal route). Removing it deletes ~50 lines of duplicated logic and leaves the hook as the single source of truth. Functionally neutral on every measured configuration: - virt default env: 0/4 (unchanged) - virt MERGEN_GEN_MIN_REVISITS=16: 1/4 GetStdHandle (unchanged) - virt MERGEN_NO_LOOP_GEN=1: 1/4 GetStdHandle (unchanged) - non-virt example2.bin: 6 imports, 0 warn/err (unchanged) python test.py baseline passes. Determinism check passes. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 16:39:21 +03:00
naci	4f7fa49447	lifter: centralise import recognition in PathSolver.resolveTargetBlock (#185 ) When solvePath resolves its target to an entry in importMap (IAT slot VA or hint/name-alias VA, added in #184), materialise a leaf block containing 'call @<importName>(); unreachable' and return it as the branch target. One hook covers every solvePath caller (lift_jmp, lift_ret) instead of duplicating the check in each caller. Replaces the previous behaviour of following the resolved target into Kernel32 / .rdata - which triggered OUTSD 'not implemented' errors on example2-virt.bin under MERGEN_NO_LOOP_GEN=1 because the lifter tried to decode hint/name table bytes as code. Fires only when the lifter's solvePath actually reaches an IAT-backed target. At default env that doesn't happen on example2-virt.bin because loop-generalisation abstracts the VM dispatcher before any ret site is reached. With MERGEN_GEN_MIN_REVISITS=16 (env override) the first IAT gadget is reached and GetStdHandle surfaces as a named call in the IR; the test still fails 3/4 because the other imports' ret sites are gated by the same reachability ceiling. Raising the knob's DEFAULT to 16 was considered and reverted: it regresses the rewrite_smoke VM-loop samples (dummy_vm_loop, bytecode_vm_loop, stack_vm_loop) whose required patterns expect generalisation to fire on the first backedge. A shape-aware discriminator between a VM dispatcher and a simple loop is needed before the knob can safely be non-zero by default. Non-virt example2.bin: unchanged (61 insns, 6 imports, 0 warn/err). python test.py baseline: passes. Determinism check: passes. python test.py themida: still red 0/4 at default env; 1/4 under MERGEN_GEN_MIN_REVISITS=16. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 16:23:29 +03:00
naci	424b26a38d	lifter: alias hint/name VAs in importMap; stop lift_ret at import call (#184 ) Two small, additive fixes that together let the virt Themida sample's ret-to-IAT gadgets surface as named external calls. 1. LifterStages: alias hint/name entries in importMap. For each name import, also register the hint/name entry's runtime VA pointing at the same import name. The Win32 loader overwrites each IAT slot with the resolved function pointer at startup, but the lifter's static memory model returns the on-disk QWORD, which for name imports is the RVA of the hint/name entry. Without this alias, when an obfuscated dispatcher loads an IAT slot and uses the value as a call/ret target, lift_call/lift_ret see the hint/ name address (e.g. 0x140002628 for GetStdHandle on example2-virt) and fail to recognise it as an import. Including the alias lets the existing import-recognition paths fire on the lifter's pre- load value just as they do on the runtime value. 2. lift_ret: stop lifting after a ret-to-IAT match. The previous code popped one more qword (simulating the external's ret) and fed that to solvePath so exploration continued at the continuation address. On Themida, the 'continuation' is whatever the VM had staged below the IAT pointer - which is not generally resolvable from static memory and causes the lifter to try lifting at bogus addresses (access-violation crash observed on the virt sample). The named import call is in the IR regardless; if transitive control-flow matters, a dedicated 'imported-call- continuation' analysis with more state can recover it. Measurement on example2-virt.bin @ 0x140001000 with MERGEN_NO_LOOP_GEN=1: - before this commit: 0 imports surfaced, 1 error (OUTSD, from the lifter wandering into .rdata after the wrong-target resolution) - after this commit: 1 import (GetStdHandle) surfaced at the 0x14017fa77 ret, 0 errors, 0 warnings Non-virt example2.bin lift unchanged (61 insns, 6 imports, 0 warn/err). python test.py baseline passes; determinism check passes. Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 15:56:39 +03:00
naci	7a564e3946	lifter: register-indirect import resolution + silent-failure diagnostics (#183 ) Reconstructed WIP: adds register-to-import provenance tracking so 'mov reg, [rip+iat]; call reg' resolves to a named external call, plus three new diagnostic paths that surface previously-silent failure modes. Changes ------- - lifter/core/LifterClass.hpp: registerImportSource map; warn on liftAddress reaching an unmapped target; warn on sealIncompleteBlocks synthesising ret undef. - lifter/core/LifterStages.hpp: synthesize '<dll>#<ordinal>' names for ordinal imports in the IAT walk (so e.g. VirtualizerSDK64.dll#103 surfaces as a named external instead of falling through to operand dispatch). - lifter/core/LiftDiagnostics.hpp: IncompleteBlockSealed = 504. - lifter/semantics/OperandUtils.ipp: SetRegisterValue erases the provenance binding (any write invalidates a stale import tag). - lifter/semantics/Semantics_ControlFlow.ipp: lift_mov tags the destination with its import-provenance when the source is an IAT slot; lift_call checks the provenance map for register-indirect calls and emits the named external; lift_call falls back to an opaque UnknownIndirect external call (with CallIndirectUnresolved warning) when the target is RIP-relative but has no importMap entry. Verified on example2.bin @ 0x140001000: 61 insns / 2 blocks / 0 err / 0 warn, 6 imports declared (CharUpperA, GetStdHandle, ReadConsoleA, VirtualizerSDK64.dll#103, VirtualizerSDK64.dll#503, WriteConsoleA), 5 register-indirect resolution diagnostics, 0 bad calls. NOTE: this content was reconstructed from patch files saved during an earlier session (lost on a 'git reset --hard'). Verified to produce the expected lift output byte-for-byte on example2.bin, but subtle differences from the pre-reset original are possible in code paths the sanity run does not exercise.	2026-04-24 15:17:22 +03:00
naci	c8102a69cf	themida: correctness gate, diagnostic tracer, ret-to-IAT recognition, gen revisit knob (#182 ) * tests: add Themida devirtualization import-equivalence check Adds python test.py themida that lifts every sample in scripts/rewrite/themida_samples.json and asserts the resulting IR calls every import declared in required_imports. Names are pinned against a lift of the non-virtualized reference binary via --update. This is a correctness gate that complements the existing coverage gate ('2544 instructions, 0 errors'). Currently red on example2-virt.bin: the lifter unrolls the VM without surfacing GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the guest program. That gap is the active devirtualization frontier; this test makes it visible instead of silently green. Samples whose binaries are absent (`../testthemida/.bin` lives outside the repo) are skipped rather than failed, so the check runs cleanly in CI without the binaries present. diag: add Unicorn-based external-call tracer; document Themida transform Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn, patches every IAT slot with a unique unmapped-address sentinel, then emulates from the chosen entry. When any call/jmp/ret resolves its target to a sentinel, logs the call-site address, the mnemonic, and the addressing form. One-shot diagnostic for answering 'what x86 instruction issues this external call at runtime.' Using it on example2-virt.bin shows the Themida transform precisely: - guest imports (GetStdHandle etc) remain in the IAT - every guest call site is rewritten from 'call [rip+IAT]' to a VM-staged 'push target; ret' where target was loaded from the IAT upstream - for example2, the first external call happens at VA 0x14017fa77 via 'ret 0', popping the GetStdHandle IAT value off the stack - Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503) from the IAT; our ignore_imports filter already accounts for this The lifter's current recognition handles direct call-through-IAT and register-indirect IAT calls (the non-virt binary resolves 5 imports cleanly). It does not recognize the ret-pops-IAT-loaded- pointer pattern, which is why the virt lift surfaces zero imports. Also annotates themida_samples.json with these properties inline so the transform semantics live next to the test that exercises them. * diag: trace_external_calls can dump visited PCs and record sentinel push chain Two additions, both motivated by the example2-virt.bin diagnosis session: - --dump-visited <path>: writes every unique instruction PC the emulator executes, in first-visit order. Diff against the lifter's 'reached addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the lifter's static exploration diverges from the dynamic path. - UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a sentinel. Records every such write, not just the first, because Themida uses push-pop swap gadgets that stage a sentinel on the stack transiently before the 'real' push lands it at the ret-target slot. The last-5-pushes summary exposes this. Findings for example2-virt.bin @ 0x140001000: - lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%) - external call site is at emu_pos=4209; gap of 2911 unvisited PCs - lifter visits 5 addresses the runtime never takes (wrong concolic branch) - the 'final push to ret slot' is not a 'push [iat]' but rather 'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged stack slot in place to reconstruct the IAT pointer. Pattern-match recognition alone cannot handle this; concrete VM-dispatch unrolling is required. * diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader. When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing pure concrete exploration with no phi-widening abstraction. Diagnostic knob, not a user-facing feature. Used to localise how much of a lift's coverage depends on generalization vs. the concolic engine. Measurement on example2-virt.bin @ 0x140001000: gen ON gen OFF (NO_LOOP_GEN=1) blocks_attempted 56 2642 (47x) instructions_lifted 2544 34229 (13.5x) output_no_opts.ll lines 6022 30481 (5x) unique addrs visited 34 338 (10x) addrs in 0x14017xxxx 0 103 (call-handler cluster) external call site reached: no yes (via BB 0x14017fa72) themida equivalence test: red red (recognition still gap) Loop-generalization is the dominant reachability blocker on Themida VM dispatchers at current tuning. Pure concrete exploration reaches the external-call handler block but does not emit named import calls because lift_ret has no path to match a resolved ret target against importMap. Recognition is the next fix surface; reachability is large mostly because of generalization tuning. Side-effects of gen OFF that are NOT acceptable in production: - Lifter decodes .rdata IAT bytes as instructions (OUTSD error at 0x140002688 on this sample) - Top-revisited addresses hit ~1142x each: the lifter spins in tight loops without generalization cutting them off; block budget (4096) would fire eventually on a larger sample So the knob is purely diagnostic. The real production fix is selective generalization (distinguish 'VM dispatcher' from 'guest loop') plus lift_ret import recognition. * lifter: recognize ret-to-IAT as named external call in lift_ret Adds a recognition path in lift_ret: if the value being popped resolves to a concrete address that's in importMap, emit callFunctionIR for the named import, then simulate the external's own ret by popping one more qword (the continuation address pre-staged by the caller). solvePath then continues at the continuation instead of trying to lift the IAT pointer as code. Two resolution routes: 1. realval is a ConstantInt (direct push+ret of an IAT load) 2. realval is symbolic but computePossibleValues folds to a single concrete value (obfuscated chains that constant-fold at this path) Scope limits: - Non-virt example2.bin lift is unchanged (still resolves 5 imports via register-indirect path; the new ret path does not fire because the binary uses 'call [iat]', not 'push+ret'). - Virt example2-virt.bin lift: the recognition code runs but does not surface imports because the lifter's static resolution of the arithmetic-decrypt chain produces wrong concrete targets. E.g. the ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via computePossibleValues; at runtime the emulator sees it pop the GetStdHandle IAT pointer (0x140002490). The recognition logic is correct; the upstream data flow is lying. Fixing that requires selective-generalization tuning or concrete VM unrolling, tracked separately. So β lands as ground work for simpler push+ret thunks and for future work where state-propagation fidelity improves. It is not a Themida fix on its own. * lifter: gate canGeneralize on per-header revisit count Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader: below threshold N the gate rejects (concrete exploration continues); at or above N it falls through to the existing loop-shape checks. Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches pre-existing behaviour). Also promotes ++liftAttemptCounts[addr] out from under the liftProgressDiagEnabled gate so the counter is always maintained. Rationale: on Themida example2-virt.bin @ 0x140001000, the existing gate (always-generalize on first qualifying revisit) abstracts the VM's dispatch loop too early, cutting reachability to ~30% of the dynamic execution path. A higher threshold lets the dispatcher run concretely for more iterations before abstracting. Measurement (all other settings at defaults): T=0 (current) blocks= 56 insns= 2544 err=0 warn=0 T=4 blocks= 88 insns= 3842 err=0 warn=4 T=16 blocks= 393 insns= 11747 err=1 warn=0 T=32 blocks= 425 insns= 12067 err=1 warn=0 T=128 blocks= 617 insns= 13987 err=1 warn=0 MERGEN_NO_LOOP_GEN=1 (kill) blocks= 2642 insns= 34229 err=1 warn=0 Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation partway through lifting. The crash fires in the Themida dispatcher state machinery around 0x1400237F9 when generalization fires mid- iteration with state that the existing machinery is not prepared to handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable. So the knob is landing as experimental infrastructure with default=0 (no-op). Future work can pair a safe non-zero default with a fix for the dispatcher-state crash. --------- Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 14:54:22 +03:00
naci	f449ec3cb7	lifter: expand loop microtest coverage (+1 test, batch 57) (#181 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 01:37:35 +03:00
naci	5c92f828ac	lifter: expand loop microtest coverage (+1 test, batch 56) (#180 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 01:28:53 +03:00
naci	ac773a1f58	lifter: expand loop microtest coverage (+1 test, batch 55) (#179 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 01:19:12 +03:00
naci	29c444cd25	lifter: expand loop microtest coverage (+1 test, batch 54) (#178 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 01:09:27 +03:00
naci	3f41a02179	lifter: expand loop microtest coverage (+1 test, batch 53) (#177 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 01:00:48 +03:00
naci	bdd53cc0d8	lifter: expand loop microtest coverage (+1 test, batch 52) (#176 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 00:51:27 +03:00
naci	f359a7d238	lifter: expand loop microtest coverage (+1 test, batch 51) (#175 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 00:41:43 +03:00
naci	dec38e1cc3	lifter: expand loop microtest coverage (+2 tests, batch 50) (#174 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 00:29:19 +03:00
naci	db5c56ca14	lifter: expand loop microtest coverage (+1 test, batch 49) (#173 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 00:18:32 +03:00
naci	fd02d3bc6f	lifter: expand loop microtest coverage (+1 test, batch 48) (#172 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-24 00:07:47 +03:00
naci	7275d9dc5a	lifter: expand loop microtest coverage (+2 tests, batch 47) (#171 ) Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 23:55:42 +03:00
naci	0d8ca37af2	lifter: expand loop microtest coverage (+1 test, batch 46) (#170 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation * lifter: extend batch 42 with local-value helper scoping limitation * lifter: expand loop microtest coverage (+1 test, batch 46) Additive coverage only. Test added: - generalized_loop_control_field_uses_active_state_from_unrelated_block This is a helper-scoping known-limitation test. Even though retrieve_generalized_loop_control_field_value_impl() checks the current insertion block against the active header, the end-to-end GetMemoryValue path still produces a generalized control-field phi from an unrelated block. Verified: - bash autoresearch.sh: loop_test_count=164, microtest_pass_count=212 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 163 -> 164. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 23:40:29 +03:00
naci	389e23d54c	lifter: expand loop microtest coverage (+2 tests, batch 45) (#169 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation * lifter: extend batch 42 with local-value helper scoping limitation * lifter: expand loop microtest coverage (+2 tests, batch 45) Additive coverage only. Tests added: - generalized_phi_address_uses_state_from_unrelated_block - generalized_local_phi_address_uses_state_from_unrelated_block These are helper-scoping known-limitation tests. The current phi_address and local_phi_address helpers key off the PHI's parent header via getGeneralizedLoopStateForHeader(phi->getParent()) and do not validate the current insertion block. A header-owned PHI therefore still resolves through generalized-loop state even when queried from an unrelated block. Verified: - bash autoresearch.sh: loop_test_count=163, microtest_pass_count=211 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 161 -> 163. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 20:38:38 +03:00
naci	93f216d6d3	lifter: expand loop microtest coverage (+2 tests, batch 44) (#168 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation * lifter: extend batch 42 with local-value helper scoping limitation --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 20:26:42 +03:00
naci	eaf8fff447	lifter: expand loop microtest coverage (+1 test, batch 43) (#167 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation * lifter: extend batch 42 with local-value helper scoping limitation --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 20:11:47 +03:00
naci	bdc96a3735	lifter: expand loop microtest coverage (+2 tests, batch 42) (#166 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation * lifter: expand loop microtest coverage (+2 tests, batch 42) Additive coverage only. Adds helper-scoping known-limitation tests: - generalized_loop_control_slot_uses_active_state_from_unrelated_block - generalized_loop_target_slot_uses_active_state_from_unrelated_block Current control_slot / target_slot helpers consult only the scalar active state and do not validate that the current insertion block is the active header. Reads from unrelated blocks therefore still return generalized loop phis instead of falling through. Verified: - bash autoresearch.sh: loop_test_count=160, microtest_pass_count=209 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 158 -> 160. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 19:50:43 +03:00
naci	73ac774341	lifter: expand loop microtest coverage (+1 test, batch 41) (#165 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation * lifter: expand loop microtest coverage (+1 test, batch 41) Additive coverage only. Adds a fresh known-limitation test for control_field matching: - generalized_loop_control_field_ignores_base_candidate Current control_field matching validates only that the address is ; it does not verify that is the actual loop control cursor. As a result, a fake non-control base still routes through the active generalized-loop control-field state and produces a phi. Verified: - bash autoresearch.sh: loop_test_count=158, microtest_pass_count=207 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 157 -> 158. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 19:35:00 +03:00
naci	a38fc06270	lifter: expand loop microtest coverage (+1 test, batch 40) (#164 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: extend batch 39 with nested local-value limitation --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 19:14:19 +03:00
naci	17a205b66a	lifter: expand loop microtest coverage (+1 test, batch 39) (#163 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+1 test, batch 39) Additive coverage only. Adds a helper-specific nested-loop known-limitation test: - generalized_loop_nested_inner_target_slot_uses_inner_state This documents that retrieve_generalized_loop_target_slot_value_impl() reads only the scalar activeGeneralizedLoopControlFieldState. After an inner load_generalized_backup overwrites that scalar, a target_slot read at the outer header resolves using inner carried-slot values. Verified: - bash autoresearch.sh: loop_test_count=155, microtest_pass_count=204 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 154 -> 155. * lifter: extend batch 39 with nested control-slot limitation --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 18:52:18 +03:00
naci	3ad880c3e7	lifter: expand loop microtest coverage (+1 test, batch 38) (#162 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+1 test, batch 38) Additive coverage only. Adds a fresh known-limitation test for generalized-loop state lookup: - generalized_loop_state_getter_returns_invalid_archived_entry Current implementation of getMostRecentGeneralizedLoopState() returns the first archived entry whenever the archive map is non-empty, without checking . This test documents that behavior explicitly so the bug has a direct loop-focused repro until the getter is fixed. Verified: - bash autoresearch.sh: loop_test_count=154, microtest_pass_count=203 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 153 -> 154. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 18:33:15 +03:00
naci	efda71853d	lifter: expand loop microtest coverage (+1 test, batch 37) (#161 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+1 test, batch 37) Additive coverage only. Adds the 4-way value-enumeration counterpart to the existing generalized phi-load tests: - generalized_phi_address_compute_possible_values_four_way This extends computePossibleValues coverage from 2-way and 3-way to canonical + 3 backedges on generalized phi-address loads. Verified: - bash autoresearch.sh: loop_test_count=153, microtest_pass_count=202 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 152 -> 153. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 18:17:31 +03:00
naci	88a190b0ea	lifter: expand loop microtest coverage (+2 tests, batch 36) (#160 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+2 tests, batch 36) Additive coverage only. Adds 4-way (canonical + 3 backedges) multi-way coverage for the two phi-address helpers: - generalized_phi_address_four_way_resolves_all_incomings - generalized_local_phi_address_four_way_resolves_all_incomings This mirrors the 4-way coverage now present on control_slot, target_slot, and control_field_load; all five retrieve helpers now have explicit 4-way multi-backedge tests. Verified: - bash autoresearch.sh: loop_test_count=151, microtest_pass_count=200 - bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK Loop-related microtest count: 149 -> 151. * ci: trigger batch 36 workflows * lifter: extend batch 36 with 3-way generalized phi load values --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 17:58:12 +03:00
naci	9385a0f1f8	lifter: expand loop microtest coverage (+2 tests, batch 35) (#159 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+2 tests, batch 35) Additive coverage only. Adds 4-way (canonical + 3 backedges) phi coverage for target_slot and control_field_load helpers: - generalized_loop_target_slot_four_way_produces_phi - generalized_loop_control_field_load_four_way_produces_phi Parallel of the existing 4-way control_slot test; closes the N-way matrix so all three slot/field helpers have 3-way and 4-way coverage alongside 2-way. Verified: - python test.py micro: all pass - python test.py baseline: rewrite regression + determinism pass Loop-related microtest count: 147 -> 149. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 17:29:06 +03:00
naci	43f008ffdf	lifter: expand loop microtest coverage (+2 tests, batch 34) (#158 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+2 tests, batch 34) Additive coverage only. Adds the first N-way (>2-way) coverage for target_slot and control_field_load helpers: - generalized_loop_target_slot_three_way_produces_phi - generalized_loop_control_field_load_three_way_produces_phi Previously only control_slot had multi-way coverage (4-way) and phi_address/local_phi_address had 3-way; these close the gap for the remaining two helpers. Verified: - python test.py micro: all pass - python test.py baseline: rewrite regression + determinism pass Loop-related microtest count: 145 -> 147. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 17:15:07 +03:00
naci	49238a3290	lifter: expand loop microtest coverage (+4 tests, batch 33) (#157 ) * lifter: expand loop microtest coverage (+4 net tests, batch 11) Additive coverage only. Final batch for this session. Adds four net tests: - pending_generalized_loop_indirect_jump_allowed_when_unresolved pins the current pending-path reuse behavior for unresolved IndirectJump - generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty canonical-only fallback leaves generalizedLoopFlagPhis empty - make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge completes preserved-register coverage for RDI (index 7) - generalized_loop_control_slot_byte_count_one_returns_masked_phi narrow-width control_slot path (byteCount 1) - generalized_loop_target_slot_byte_count_one_returns_masked_phi narrow-width target_slot path (byteCount 1) One attempted trampoline-relaxation accept test was removed before commit: the acceptance condition is real in code, but constructing a stable public-API scenario that trips it without entangling blockCanReach and unfinished CFG artifacts proved brittle. Not worth landing a flaky test. Verified: - python test.py micro: all 153 microtests pass (was 149) - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida reference sample unchanged (2544/0/0) Loop-related microtest count: 100 -> 104 per the /loop\|backedge\|generalized\|rolled\|themida\|phi_address/i regex. Session cumulative total: 36 baseline -> 104 current (+68). * lifter: expand loop microtest coverage (+4 tests, batch 33) Additive coverage only. Closes the byte-width collapse matrix for the remaining two helpers: - generalized_phi_address_byte_count_one_collapses_when_values_match - generalized_phi_address_byte_count_two_collapses_when_values_match - generalized_local_phi_address_byte_count_one_collapses_when_values_match - generalized_local_phi_address_byte_count_two_collapses_when_values_match All five retrieve helpers (control_slot, target_slot, control_field_load, phi_address, local_phi_address) now have parallel byte_count_one/two shared-value collapse coverage alongside the default/qword-width tests. Verified: - python test.py micro: all pass - python test.py baseline: rewrite regression + determinism pass Loop-related microtest count: 141 -> 145. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 17:05:27 +03:00

1 2 3 4 5 ...

883 Commits