mirror of
https://github.com/NaC-L/Mergen.git
synced 2026-05-12 09:40:34 +00:00
main
883 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
71fc60766d |
Loop generalization, BSR/BSF intrinsics, and stack alloca split (#207)
* lifter: move themida control/target slots to per-state fields (phase A)
Lift the kThemidaControlCursorSlot/kThemidaLoopCarriedSlot constants out of
the helper bodies and into per-loop GeneralizedLoopControlFieldState fields
controlSlot/targetSlot. The populator seeds them to the legacy Themida
defaults, so behavior is unchanged on the reference Themida sample and on
every existing test. Phase B will replace the populator's literal seed with
active discovery against canonical/backedge buffers, enabling per-binary
slot identification.
Sites updated:
- GeneralizedLoopControlFieldState: new controlSlot/targetSlot fields,
reset in clearGeneralizedLoopControlFieldState
- load_generalized_backup_impl: introduces controlSlot/targetSlot locals,
uses them for canonical+backedge reads, seeds them into the activated state
- matchGeneralizedLoopControlFieldAddress: gates the GEP-base check on
activeGeneralizedLoopControlFieldState.controlSlot
- retrieve_generalized_loop_control_slot_value_impl: gates on state.controlSlot
- retrieve_generalized_loop_target_slot_value_impl: gates on state.targetSlot
- record_generalized_loop_backedge_impl: reads current control via
stateIt->second.controlSlot in both rotate and append-or-update paths
Tests that build state directly (bypassing the populator) are updated to
seed the new fields where they call retrieve helpers.
* lifter: discover themida control/target slots from canonical+backedge buffers (phase B)
Replaces the populator's hardcoded reads at kThemidaControlCursorSlot /
kThemidaLoopCarriedSlot with active per-loop discovery against the canonical
backup buffer and the generalized-loop backedge buffers. The discovery is
implemented as two helpers next to load_generalized_backup_impl:
- tryPopulateControlFromSlot(canonical, backedges, slot, dst): probes a
specific candidate slot and, on success, fills dst with the canonical /
backedge controls and per-backedge buffers that the slot motivates.
- discoverGeneralizedLoopSlots(canonical, backedges): drives the search.
The control-slot search prefers the legacy Themida cursor (zero behavior
change on the reference sample) and falls back to scanning canonical for the
qword-start address with the most-varying backedges, tiebreaking by lowest
address. The target-slot search prefers the legacy carried slot and falls
back to the lowest-address candidate that is tracked across canonical and
every selected backedge buffer.
Stack-frame addresses (anything inside [STACKP-reserve, STACKP+reserve)) are
excluded from candidates, so caller-frame stack args at e.g. STACKP+24 are
no longer mistakenly chosen as target slots. This matters for the nested-loop
local-buffer test, whose canonical buffer carries a tracked qword above
STACKP from the outer loop's prior backedge.
Two existing KNOWN-LIMITATION tests are flipped to assert the new positive
contract:
- generalized_loop_non_themida_control_slot_produces_no_phi ->
generalized_loop_non_themida_slot_picks_up_as_target_when_legacy_control_present
(the non-Themida slot is now picked up as the target slot when the legacy
cursor is present, and a 2-way phi is produced at it).
- generalized_loop_non_themida_target_slot_produces_no_phi ->
generalized_loop_discovery_picks_non_themida_target_slot
(the discovered target slot is asserted, and the helper produces a 2-way
phi with both incoming concrete values).
Verification:
- 228 rewrite_microtests pass (no regressions).
- check_themida_equivalence.py: example2 still recovers all 4 required
imports (CharUpperA, GetStdHandle, ReadConsoleA, WriteConsoleA).
* loop generalization: data-driven register preservation + multi-slot carried state
Phase 2: Replace shouldPreserveGeneralizedBackedgeRegisterIndex (hardcoded
Themida-specific index set {1,4,7,9,10,12,14}) with data-driven comparison
of canonical vs backedge values. A register is now preserved when its value
changed across the loop boundary; RSP is always preserved. This prevents
non-Themida loops from silently losing loop-carried state in registers
outside the hardcoded set.
Phase 1: Extend GeneralizedLoopControlFieldState with a carriedSlots vector
that tracks ALL varying memory qwords discovered during slot analysis, not
just the single controlSlot + targetSlot. The retrieve_target_slot helper
now checks carriedSlots after the legacy targetSlot, building phis for any
matching carried address. Rotation logic in record_generalized_loop_backedge
updates carried slot values alongside the primary control slot.
Phase 3: Add vm_tea_round_loop sample — TEA-style compound cross-update with
3 independently loop-carried state variables (v0, v1, sum). 10 semantic test
cases including the previously-failing x=0x65501 input. All pass.
Test results: 247/247 pattern-verified, 245/245 semantic (2342 cases), all
microtests green including flag checks.
* add vm_subroutine_loop: single-depth call/ret VM with indirect PC dispatch
The vm_subroutine_loop pattern previously crashed the lifter with an access
violation (0xC0000005). The combination of multi-slot carried state, data-
driven register preservation, and emergency generalization now handles this
pattern correctly: 8 semantic cases pass, no crash.
The sample uses a one-deep return-PC slot (rpc) for indirect dispatch —
the simplest form of the pattern that was fundamentally unsupported.
248/248 samples, 246/246 semantic (2350 cases), Themida gate green.
* add vm_callret_loop and vm_bubblesort_loop: previously budget-blown patterns
Both patterns previously exhausted maxBasicBlockBudget (~4087 blocks):
- vm_callret_loop: stack-array-indexed PC dispatch (rstack[rsp])
- vm_bubblesort_loop: conditional two-slot array swap per iteration
With emergency generalization (75% budget threshold), both now lift
without hitting the budget ceiling (75 and 59 blocks respectively).
The patterns are registered with IR shape checks only (no semantic
assertions) because the indirect dispatch and conditional multi-slot
writes are not yet semantically accurate under generalization.
250/250 samples, 247/247 semantic (2358 cases), Themida gate green.
* semantics: rewrite BSR/BSF to use llvm.ctlz/cttz intrinsics
Replace the bitWidth-iteration unrolled bit-scan loops (32 AND+ICMP+SELECT
chains for i32, 64 for i64) with single @llvm.ctlz / @llvm.cttz intrinsic
calls. BSR = bitWidth - 1 - ctlz(x, true); BSF = cttz(x, true).
The zero-input case is handled with is_zero_undef=true (matching BSR/BSF
architectural undefined-when-zero behavior) plus an explicit select that
returns undef when the input is zero. Constant folding is preserved.
IR quality improvement: vm_imported_clz_loop and vm_imported_bsr_loop
now show a single call @llvm.ctlz.i32 instead of 30+ bsrtest/icmp/select
instructions. Pattern manifests updated to match 'call'.
lift_lzcnt and lift_tzcnt already used the intrinsics — BSR/BSF were the
only remaining scalar bit-scan ops with unrolled implementations.
Side benefit: flag-stress tests bsf_00 and bsf_01 fixed (constant-folded
input now produces correct PF flag instead of running the unrolled loop).
* PromotePseudoStackPass: split into main + escape alloca by call-escape
Previously, GEPs that flow into call arguments either:
(a) were skipped from promotion → left as memory-base GEPs that
PromotePseudoMemory turned into raw inttoptr(stack_addr) constants
(e.g., 'ptr nonnull inttoptr (i64 1375592 to ptr)' as WriteConsoleA
lpNumberOfCharsWritten arg), or
(b) were promoted to a single shared alloca, blocking SROA for the
whole alloca and leaving hundreds of dead dispatcher-scratch stores
in the post-opt IR.
Two-alloca split fixes both:
- Main alloca: scratch slots that don't escape via calls. SROA
decomposes it cleanly; DSE eliminates dead stores.
- Escape alloca: slots whose pointer flows into a CallBase. Won't
SROA but is isolated, so dispatcher noise doesn't
block its dead-store elimination.
Classification is by constant offset: any offset touched by ANY GEP
with a CallBase user is marked escaped. All GEPs (constant or not) at
escaped offsets go to the escape alloca to preserve pointer identity
within each slot. Non-constant offsets always go to the main alloca
(in practice, lifters use them for buffers; constant offsets for API
scalar slots).
Themida WriteConsoleA call now shows clean alloca GEPs:
ptr nonnull %4 (= stackmemory.escape + 200)
ptr nonnull %6 (= stackmemory.escape + 208)
instead of:
ptr nonnull inttoptr (i64 1375584 to ptr)
ptr nonnull inttoptr (i64 1375592 to ptr)
Stack-range inttoptr in Themida output drops to zero. Total stores
drop dramatically (the remaining ones are .themida section writes,
a separate dispatcher-state issue not related to the stack alloca).
Pattern updates for 4 samples whose IR shape changed due to the
cleaner alloca decomposition:
- vm_fibonacci_loop: switch i32 -> br i1
- vm_search_loop: br i1 -> select
- vm_signed_dword_sum64: sext -> ashr
- vm_signed_word_sum64: sext -> ashr
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
841d6bbcdb |
docs: add Control-Flow Recognition section and clarify punpcklqdq state (#206)
Two doc updates following #205:
ARCHITECTURE.md gains a 'Control-Flow Recognition' section covering
the lift_ret REAL_return / ROP-return classification, the ret-to-IAT
chain pattern (the Themida-virt mitigation that #195/#196/#205 built
out), the lift_jmp direct/indirect dispatch, and the Iced
operand-type quirk that motivates widening SSE accept sets. These
were all undocumented and the ret-to-IAT chain in particular is a
non-trivial structural rewrite that future maintainers should not
have to reverse-engineer from the source.
REWRITE_BASELINE.md's punpcklqdq line now reflects what actually
happened: the handler had been present for a while but silently fell
through to not_implemented for every site because Iced classifies
the source operand by bytes-actually-accessed (low 64), not by
physical XMM width. Fixed in #205 (
|
||
|
|
605a36e8ed |
lifter: correctness fixes, refactors, and regression tests (#205)
* lifter: restore indirect-jump threshold to 128
* gitignore: glob output_*.ll instead of enumerating dumps
Replace output_finalnoopt.ll / output_no_opts.ll entries with
output_*.ll so ad-hoc lifter dumps (output_rets.ll, output_newpath.ll,
etc.) stop showing up in git status.
* lifter: factor REAL_return path through emitResolvedFunctionReturn
Pull the rax-zext + CreateRet + run/finished bookkeeping out of the
REAL_return branch in lift_ret() into a local lambda so future ret
exit points can reuse it without duplicating four lines of
boilerplate.
Drop the dead returnStruct/myStruct scaffolding and the
originalFunc_finalnopt local: every InsertValue call site has been
commented out for a long time and the locals had no remaining uses.
The active code emits a plain rax return.
No behavior change.
* lifter: advance RSP past continuation slot in ret-to-IAT chain
In the chained import-return pattern (`ret` to IAT slot, IAT slot
holds an external function address, the function returns and control
resumes at the next stack slot's continuation address), the lifter
collapses the two pops into a single `call @import; br contBB`. RSP
was only advanced past the IAT slot itself, so post-call register
state still claimed RSP pointed at the continuation address. Any
downstream stack read from RSP saw stale data and any solver that
constant-folded RSP picked up a value that no longer matched the
post-chain physical layout.
Bump RSP by another `ptrSize` immediately before lowering the
import call so the continuation block inherits the same RSP it would
have under a faithful two-pop lowering.
* lifter/test: regression test for ret-to-IAT chain RSP advancement
Locks in dd95fe7. The microtest stands up a LifterUnderTest, plants
[importVA, contVA] on the stack at an RSP that is intentionally NOT
equal to STACKP_VALUE (so the lift_ret REAL_return short-circuit does
not fire), registers the import in the lifter's importMap, and lifts
a single `ret` (0xC3).
It then asserts that:
- the chain handler emitted a direct call to the registered import
- RSP after the chain equals entry RSP + 16, not + 8
Without the fix the test fails with RSP = entry + 8 (only the IAT
slot pop is modeled), exactly the off-by-8 the fix closes.
Verified the test catches the regression by reverting dd95fe7
locally before re-applying — the failing message reads
"RSP after chain = 0x14FDA8; expected 0x14fdb0".
* scripts/themida: filter lifter-synthesized helpers from import diff
Calls to lifter-emitted helpers (`@exception`, `@fastfail`,
`@not_implemented`, etc.) surfaced as 'extra import (not required)'
lines on every Themida equivalence run. They are not user imports;
they are lowered from INT1/INT3/UD2/INT29/SYSCALL/segment-load
sites in the lifter's own semantics files.
Skip them in `_extract_call_names` so the equivalence diff shows
only real imports. The list of helpers lives next to the call regex
so it stays adjacent to the code that emits them; if a new helper
shows up in the IR (e.g. another illegal-instruction lowering) the
script will surface it as an 'extra import' until the entry is added
here, which is the right tripwire.
Before: example2 \xe2\x80\x94 6 distinct imports, 10 calls (3 noise calls)
After: example2 \xe2\x80\x94 4 distinct imports, 7 calls (clean)
* lifter/analysis: replace 'TODO: fix?' marker with positive explanation
The 2-value path-solving fork's swap branch had a 'TODO: fix?'
comment from the original draft. Traced both branches and confirmed
the swap is correct:
- When the select's trueValue equals firstcase, condition is the
select's condition as-is and firstcase\xe2\x86\x92bb_true wires correctly.
- When trueValue equals secondcase, condition still expresses 'true
picks trueValue' but downstream code uses firstcase\xe2\x86\x92bb_true.
Swapping firstcase\xe2\x86\x94secondcase makes firstcase refer to the trueVal
constant so the existing CreateCondBr wiring stays correct without
a parallel reversed-branch path.
Replaced the TODO with a comment that explains why the swap is
necessary, so future readers do not waste time investigating a
branch that is intentional.
* lifter: accept Register64/Memory64 source for punpcklqdq
Iced classifies operand types by the bytes the instruction actually
accesses, not by physical register width. PUNPCKLQDQ only reads the
low 64 bits of its second operand, so Iced reports Register64 (or
Memory64 for the m128 form) for a source whose physical encoding is
`xmm/m128`. The lift handler's accept check rejected anything other
than Register128/Memory128 and fell through to the not_implemented
exit, so every `punpcklqdq xmm, xmm/m128` site lowered to a bogus
`call @not_implemented; ret` instead of the unpack semantic.
Widen the accept set to Register64 and Memory64 too. The body
already truncates the source to i64 before OR'ing it into the high
half of the result, so a 64-bit-typed source is semantically
identical to a 128-bit one for this handler.
Fixes the two pre-existing oracle test failures
`punpcklqdq_xmm0_xmm1_basic` and
`punpcklqdq_xmm0_xmm1_zero_upper_from_zero_source`. `python test.py
all` stays at 244/244, confirming no semantic regressions.
* lifter: replace lift_jmp's fallthrough switch with an isDirectJump if
The RIP-relative add for direct jumps lived inside a 4-case switch
whose body intentionally fell through into `default: break;`. It
worked, but:
- Implicit fallthrough is a -Wimplicit-fallthrough hazard. Today the
default does nothing; tomorrow someone adds a body and every direct
jump silently runs it.
- The switch's discriminator is exactly `isDirectJump`, which is
already computed two lines above for the path-solver context. The
switch was a parallel restatement of the same predicate.
Collapse the switch into `if (isDirectJump) { trunc = add(trunc,
ripval); }` so the predicate has one definition and there is no
fallthrough to misuse. Behavior unchanged: the same immediate cases
still get the RIP-relative bump, indirect jumps still skip it, and
`python test.py all` stays at 244/244.
* lifter/test: regression test for SSE memory-form handler dispatch
Lock in that pand/por/pxor accept the `xmm, [mem]` encoding form. The
test lifts `66 0F DB 00`, `66 0F EB 00`, and `66 0F EF 00` (one
`xmm0, [rax]` site each) and asserts that the lifted function does
not contain a direct call to @not_implemented.
Pure structural acceptance: not validating bitwise-AND/OR/XOR
semantics, only that the handler dispatched at all. Iced today
reports Memory128 for these encodings so the test passes against the
existing `Register128 || Memory128` accept sets. If a future Iced
update reclassifies the source operand by bytes-actually-accessed
(the way it already does for punpcklqdq, where it reports
Register64/Memory64 even for an `xmm/m128` encoding) the handler
would silently fall through to `call @not_implemented; ret` and
miscompile every memory-form site \u2014 this test trips first.
* lifter: drop duplicate stdout print on unresolved indirect jmp
`lift_jmp` printed every UnresolvedIndirectJump twice: once as a raw
`std::cout << "[diag] lift_jmp: ..."` and once through
`diagnostics.warning(...)` on the very next line. The diagnostics
framework already persists the warning to `output_diagnostics.json`
at lift completion, and no script or test grep'd the stdout form.
Drop the std::cout. The diagnostic remains in the recorded diagnostics
list, surfaceable via the JSON dump or the in-memory entries vector.
This removes the only unguarded raw `[diag]` print in the lift path
-- the rest are gated on `liftProgressDiagEnabled` or specific hot
addresses for active debugging.
* scripts/themida: fix docstring escape leak in import-filter doc
Audit of #205 caught a literal `\\u2014` and unnecessary
`\\"` escapes in the `_extract_call_names` docstring \xe2\x80\x94 leftovers
from how the surrounding commit (#205, scripts/themida: filter
lifter-synthesized helpers) was authored. Replace the literal
escape with a plain `--` and drop the redundant backslash-quotes;
the docstring now renders cleanly at `help(_extract_call_names)`
and looks normal in the source.
Behavior unchanged: `python test.py themida` still passes with
the same import-diff filter (4 imports, 7 calls for example2).
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
9c32ecd235 |
Autoresearch/lets craete more test cases complex loops with v 20260425 (#203)
* baseline: 3 fully-wired VM samples (dummy/bytecode/stack vm loops)
Result: {"status":"keep","vm_sample_count":3,"total_semantic_cases":177,"manifest_samples":33}
* added 3 toy VM samples: register-machine, nested loops, branchy loop body
Result: {"status":"keep","vm_sample_count":6,"total_semantic_cases":205,"manifest_samples":36}
* added 3 more VM samples: factorial (mul recurrence), collatz (data-dep path), gcd (modulo-driven non-counted loop)
Result: {"status":"keep","vm_sample_count":9,"total_semantic_cases":231,"manifest_samples":39}
* added 3 more VM samples: fibonacci (two-state recurrence), switch-dispatched VM, countdown loop (reverse induction)
Result: {"status":"keep","vm_sample_count":12,"total_semantic_cases":259,"manifest_samples":42}
* added 3 bitwise/multiplicative VM samples: popcount (zero-test loop), power (two symbolic operands), bitreverse (shift+OR fixed trip count)
Result: {"status":"keep","vm_sample_count":15,"total_semantic_cases":289,"manifest_samples":45}
* added 3 VM samples: linear search with early-exit, dual-counter parity split (two phis), XOR accumulator with multiplication
Result: {"status":"keep","vm_sample_count":18,"total_semantic_cases":315,"manifest_samples":48}
* added 2 VM samples: LCG mixed mul/add/mask recurrence and stack-table-driven next-PC dispatch
Result: {"status":"keep","vm_sample_count":20,"total_semantic_cases":335,"manifest_samples":50}
* added vm_callret_loop: VM with explicit return-PC stack, two call sites converging on the same subroutine handler chain
Result: {"status":"keep","vm_sample_count":21,"total_semantic_cases":346,"manifest_samples":51}
* all 49 manifest samples lift and verify against actual IR. Patterns rewritten to match what the lifter emits: switch i32 dispatchers, mul nuw nsw shapes, llvm.bitreverse.i8 intrinsic, mul i33 + lshr i33 closed-form for triangular sums. Removed 2 samples that exposed real lifter limitations: vm_callret_loop (rstack indirect pc, BB budget exceeded) and vm_switch_dispatch_loop (lifted to constant -1).
Result: {"status":"keep","vm_sample_count":19,"total_semantic_cases":313,"manifest_samples":49}
* 19/19 vm samples now pass both rewrite-regression IR pattern verification AND lli runtime semantic check (168 semantic cases total). Fixed branchy by adding explicit i=0/count=0 init in BV_LOAD_LIMIT (dual_counter pattern); collatz already fixed by collapsing CV_INIT into CV_LOAD_N. Captured all observed lifter limitations in autoresearch.md.
Result: {"status":"keep","vm_sample_count":19,"total_semantic_cases":313,"manifest_samples":49}
* added vm_hamming_loop: bitwise loop with TWO symbolic operands (a=x&0xF, b=(x>>4)&0xF), XOR-then-popcount body. Used the dual_counter init-state pattern from the start so it passed lli semantic check on the first try.
Result: {"status":"keep","vm_sample_count":20,"total_semantic_cases":323,"manifest_samples":50}
* added vm_lfsr_loop: 8-bit Galois LFSR with conditional XOR-and-shift recurrence; symbolic seed and trip count both derived from x. Used dual_counter init pattern up front; passed lift + lli on first attempt.
Result: {"status":"keep","vm_sample_count":21,"total_semantic_cases":333,"manifest_samples":51}
* added vm_rotate_loop: 8-bit left rotation via shl|lshr|or pattern with symbolic value and rotate count. Distinct from existing shift loops in that bits wrap around.
Result: {"status":"keep","vm_sample_count":22,"total_semantic_cases":343,"manifest_samples":52}
* vm_powermod_loop now passes both pattern verification (urem matched) and lli semantic check (11/11 cases). Square-and-multiply modular exponentiation is the most lifter-stressing sample yet: combines bitwise LSB extraction, conditional multiply-and-mod, exponent shift, and base squaring all in one body.
Result: {"status":"keep","vm_sample_count":23,"total_semantic_cases":354,"manifest_samples":53}
* added vm_saturating_loop: counted sum loop with value-clamp at 100; lifter recognizes if-then-set as select; pattern + lli pass on first try
Result: {"status":"keep","vm_sample_count":24,"total_semantic_cases":376,"manifest_samples":54}
* vm_geometric_loop now passes both gates (mask pattern updated to 254). Log2-style doubling loop is distinct from existing additive/multiplicative recurrences.
Result: {"status":"keep","vm_sample_count":25,"total_semantic_cases":386,"manifest_samples":55}
* vm_polynomial_loop now passes both gates with unrolled-shape patterns. Horner method evaluation with stack-array coefficient lookup; lifter unrolls the 4-trip loop into closed-form arithmetic.
Result: {"status":"keep","vm_sample_count":26,"total_semantic_cases":396,"manifest_samples":56}
* vm_digitsum_loop now passes both gates. Decimal digit-sum loop with non-power-of-2 divisor exposes the lifter's divmod fusion (n%10 emitted as n + (n/10)*-10).
Result: {"status":"keep","vm_sample_count":27,"total_semantic_cases":408,"manifest_samples":57}
* added vm_isqrt_loop: Newton's integer square root with division by loop variable. Passes both gates with 15 semantic cases on first try.
Result: {"status":"keep","vm_sample_count":28,"total_semantic_cases":423,"manifest_samples":58}
* added vm_minarray_loop: two-pass VM (fill array, then scan for min) with both data and trip count derived from x. 12 semantic cases pass on first try.
Result: {"status":"keep","vm_sample_count":29,"total_semantic_cases":435,"manifest_samples":59}
* vm_classify_loop now passes 10/10. Refactored to single packed accumulator (acc += 100/10/1) instead of three separate counters - sidesteps the multi-counter phi-undef pattern when several stack slots all init to 0.
Result: {"status":"keep","vm_sample_count":30,"total_semantic_cases":445,"manifest_samples":60}
* vm_carrychain_loop now passes both gates with unrolled-shape patterns. Bit-by-bit ripple carry adder; the 8-trip fixed-bound loop is fully unrolled by the lifter.
Result: {"status":"keep","vm_sample_count":31,"total_semantic_cases":456,"manifest_samples":61}
* added vm_prefix_sum_loop: two-phase VM that fills a stack array then walks it computing in-place running prefix sum (writes back to data[idx] each iteration). Distinct from minarray which only reads on second pass.
Result: {"status":"keep","vm_sample_count":32,"total_semantic_cases":467,"manifest_samples":62}
* vm_pcg_loop now passes both gates (mask 254 fix). LCG state advance + XOR-shift output mixing per iteration; distinct from lcg (mul/add/mask only) and lfsr (shift+conditional XOR only).
Result: {"status":"keep","vm_sample_count":33,"total_semantic_cases":479,"manifest_samples":63}
* added vm_shiftmul_loop: schoolbook shift-and-add multiplication. 8-trip loop with conditional add of (a << i) when bit i of b is set. Passes both gates with 11 semantic cases.
Result: {"status":"keep","vm_sample_count":34,"total_semantic_cases":490,"manifest_samples":64}
* vm_xordecrypt_loop now passes both gates. Three-phase VM (fill, decrypt, sum) over a fixed 8-byte stack buffer; lifter unrolls all three loops but preserves the algebraic identity.
Result: {"status":"keep","vm_sample_count":35,"total_semantic_cases":500,"manifest_samples":65}
* added vm_zigzag_loop: alternating-sign accumulator (parity branch picks add vs sub on a single counter). 11 cases including unsigned wraparound for negative results.
Result: {"status":"keep","vm_sample_count":36,"total_semantic_cases":511,"manifest_samples":66}
* added vm_horner_signed_loop: Horner with signed coefficients [1,-2,3,-4]; tests sign-extended array loads + signed multiply-and-add. 10 cases including unsigned wraparound for negative results.
Result: {"status":"keep","vm_sample_count":37,"total_semantic_cases":521,"manifest_samples":67}
* vm_bittransitions_loop now passes both gates with branchless body + unrolled patterns. Counts adjacent-bit transitions in the low 16 bits via XOR-and-mask.
Result: {"status":"keep","vm_sample_count":38,"total_semantic_cases":532,"manifest_samples":68}
* added vm_piecewise_loop: piecewise linear function (3-way range branch) applied repeatedly to a single accumulator. Distinct from classify (counter) and collatz (2-way branch). 11 semantic cases pass.
Result: {"status":"keep","vm_sample_count":39,"total_semantic_cases":543,"manifest_samples":69}
* vm_modcounter_loop now passes both gates with fixed input. Counter wraps modulo 7 every iteration; symbolic step+counter+iter-count.
Result: {"status":"keep","vm_sample_count":40,"total_semantic_cases":554,"manifest_samples":70}
* added vm_argmax_loop: find INDEX of max element in symbolic-content array. Two co-related state vars (best value + best index) updated together; distinct from minarray which only tracks value.
Result: {"status":"keep","vm_sample_count":41,"total_semantic_cases":565,"manifest_samples":71}
* vm_prefix_xor_loop now passes with low-bit limit and getelementptr pattern. In-place cumulative XOR over symbolic-content stack array.
Result: {"status":"keep","vm_sample_count":42,"total_semantic_cases":576,"manifest_samples":72}
* added vm_palindrome_loop: bitwise palindrome check on low 8 bits with early-exit on mismatch. 14 semantic cases pass.
Result: {"status":"keep","vm_sample_count":43,"total_semantic_cases":590,"manifest_samples":73}
* added vm_caesar_loop: three-phase VM (fill, additive shift, sum) over a stack buffer. Add+mask transform distinct from XOR transform of xordecrypt. 12 semantic cases.
Result: {"status":"keep","vm_sample_count":44,"total_semantic_cases":602,"manifest_samples":74}
* added vm_ca_loop: Rule-90 cellular automaton step (state' = (state<<1) ^ (state>>1)) iterated symbolic times. Distinct linear bitwise update coupling shifts in both directions. 12 cases.
Result: {"status":"keep","vm_sample_count":45,"total_semantic_cases":614,"manifest_samples":75}
* added vm_djb2_loop: DJB2-style hash recurrence (hash = hash * 33 + nibble) consuming nibbles of x. 12 cases. Multiplicative-then-additive update with per-iteration symbolic input.
Result: {"status":"keep","vm_sample_count":46,"total_semantic_cases":626,"manifest_samples":76}
* added vm_runlength_loop: count distinct runs of 1-bits in low 16 bits with always-write recipe (runs += start_predicate). Sequential dependency on previous bit. 13 cases.
Result: {"status":"keep","vm_sample_count":47,"total_semantic_cases":639,"manifest_samples":77}
* added vm_skiploop_loop: counted loop with continue-style skip on odd iterations; sums squares of even indices. Tests dispatcher transition that bypasses body via parity branch. 11 cases.
Result: {"status":"keep","vm_sample_count":48,"total_semantic_cases":650,"manifest_samples":78}
* added vm_kernighan_loop: Brian Kernighan's popcount trick (v &= v-1 until zero). Trip count equals popcount itself. Distinct termination shape from vm_popcount_loop. 12 cases.
Result: {"status":"keep","vm_sample_count":49,"total_semantic_cases":662,"manifest_samples":79}
* added vm_find2max_loop: track top1 and top2 over a stack array. Three-way update branch: shift the pair / update only top2 / no change. 11 cases. Reached round-50 sample milestone.
Result: {"status":"keep","vm_sample_count":50,"total_semantic_cases":673,"manifest_samples":80}
* added vm_ctz_loop: count trailing zeros (capped at 32). Loop with EARLY BREAK on LSB-set predicate; counter doubles as result. 12 cases.
Result: {"status":"keep","vm_sample_count":51,"total_semantic_cases":685,"manifest_samples":81}
* added vm_dupcount_loop: count adjacent equal nibbles in stack array. Two stack-array loads per iteration (data[i-1] + data[i]) with equality predicate. 11 cases.
Result: {"status":"keep","vm_sample_count":52,"total_semantic_cases":696,"manifest_samples":82}
* vm_hexcount_loop now passes both gates with always-write recipe and zext pattern. Counts hex letter nibbles (>= 10) in 32-bit value. 12 cases.
Result: {"status":"keep","vm_sample_count":53,"total_semantic_cases":708,"manifest_samples":83}
* added vm_stride_loop: counted loop with step-2 induction (idx += 2) summing every other array element. Distinct induction step from skiploop (skip via parity branch). 12 cases.
Result: {"status":"keep","vm_sample_count":54,"total_semantic_cases":720,"manifest_samples":84}
* added vm_runlmax_loop: longest run of 1-bits in low 16 bits. Two co-related state vars (cur, max) updated via always-write recipe (cur = (cur+1)*bit; max = (cur > max) ? cur : max). 12 cases.
Result: {"status":"keep","vm_sample_count":55,"total_semantic_cases":732,"manifest_samples":85}
* added vm_window_loop: 3-element sliding window max-sum over symbolic stack array. Loop body loads three adjacent elements per iteration. 11 cases.
Result: {"status":"keep","vm_sample_count":56,"total_semantic_cases":743,"manifest_samples":86}
* added vm_4state_loop: cyclic 4-operation state machine. Inner state mod 4 picks ADD / XOR / MUL / SUB per iteration. 11 cases.
Result: {"status":"keep","vm_sample_count":57,"total_semantic_cases":754,"manifest_samples":87}
* added vm_imported_abs_loop: VM dispatcher with imported abs() call inside the body. Lifter recognizes abs() and lowers to @llvm.abs.i32 intrinsic; both pattern + lli semantic pass. First sample with a real CRT call inside a VM loop.
Result: {"status":"keep","vm_sample_count":58,"total_semantic_cases":764,"manifest_samples":88}
* added vm_nested_abs_loop: PC-state nested loop with abs() in inner body. Two-deep symbolic loop bounds, abs() called per inner-iteration. Both pattern + lli pass. 11 cases.
Result: {"status":"keep","vm_sample_count":59,"total_semantic_cases":775,"manifest_samples":89}
* added vm_abs_array_loop: two-phase VM where fill loop calls abs() and stores result to stack array, then sum loop reads. Combines imported intrinsic call with same-iter indexed stack store. 11 cases.
Result: {"status":"keep","vm_sample_count":60,"total_semantic_cases":786,"manifest_samples":90}
* added vm_minabs_loop: track minimum abs() distance over a counted loop with comparison-driven select. Combines imported abs() intrinsic with running-min reduction. 11 cases.
Result: {"status":"keep","vm_sample_count":61,"total_semantic_cases":797,"manifest_samples":91}
* added vm_imported_popcnt_loop: __builtin_popcount lowered to @llvm.ctpop.i32 inside VM body. Confirms lifter handles intrinsics other than abs cleanly. 10 cases.
Result: {"status":"keep","vm_sample_count":62,"total_semantic_cases":807,"manifest_samples":92}
* added vm_imported_clz_loop: __builtin_clz lowered to @llvm.ctlz.i32 inside VM body. Third recognized intrinsic shape. 10 cases.
Result: {"status":"keep","vm_sample_count":63,"total_semantic_cases":817,"manifest_samples":93}
* added vm_imported_bswap_loop: __builtin_bswap32 lowered to @llvm.bswap.i32 inside VM body. Fourth recognized intrinsic shape. 11 cases.
Result: {"status":"keep","vm_sample_count":64,"total_semantic_cases":828,"manifest_samples":94}
* added vm_imported_cttz_loop (5th intrinsic, full semantic 11 cases) and vm_outlined_wrapper_loop (integrates user's vm_fibonacci_loop_report.md observation: wrapper -> noinline inner gets outlined as call inttoptr; pattern-verifies but no semantic field since semantic_check strips inttoptr calls leaving undef sum). Documents 10th lifter limitation: same-binary callee not inlined.
Result: {"status":"keep","vm_sample_count":65,"total_semantic_cases":839,"manifest_samples":96}
* added vm_imported_rotl_loop: _rotl lowered to @llvm.fshl.i32 inside VM body. Sixth recognized intrinsic, with both value and rotate amount per-iteration symbolic. 10 cases. Also extended scope to include docs/semantic_reports/ and the new generate_semantic_reports.py script (added by user externally).
Result: {"status":"keep","vm_sample_count":66,"total_semantic_cases":849,"manifest_samples":97}
* added vm_wrapper_chain_loop: two-level wrapper chain (outer -> middle -> inner), all noinline. Lift target is the outer; pattern verifies call+add, no semantic field (same outline-strip class as vm_outlined_wrapper_loop). Extends outline-detection coverage to multi-level wrappers.
Result: {"status":"keep","vm_sample_count":66,"total_semantic_cases":849,"manifest_samples":98}
* added vm_imported_bsf_loop: _BitScanForward (MSVC intrinsic with output-pointer arg) lowered to @llvm.cttz.i32 inside VM body. 7th recognized intrinsic. Tests output-via-pointer arg pattern - the lifter folds the &bit_index stack store + load into direct value flow. 12 cases.
Result: {"status":"keep","vm_sample_count":67,"total_semantic_cases":861,"manifest_samples":99}
* added vm_imported_bsr_loop: _BitScanReverse (output-pointer arg, lowered to @llvm.ctlz.i32-related). 8th recognized intrinsic. Manifest now exactly 100 entries; run #100 milestone.
Result: {"status":"keep","vm_sample_count":68,"total_semantic_cases":873,"manifest_samples":100}
* added vm_mixed_intrinsics_loop: chains popcount + bswap on the same value per iteration. Both gates pass on all 11 inputs - confirms the chain-of-two-calls correctness bug seen in vm_chain_imports_loop is specific to chains of the SAME intrinsic (abs+abs) rather than general two-call body shapes.
Result: {"status":"keep","vm_sample_count":69,"total_semantic_cases":884,"manifest_samples":101}
* vm_int64_loop now passes both gates with phi i32 pattern. Multiplicative recurrence with int64 acc that the lifter narrows back to i32 since the return masks to 32 bits. Documents the lifter's value-range narrowing behavior. 10 cases.
Result: {"status":"keep","vm_sample_count":70,"total_semantic_cases":894,"manifest_samples":102}
* added vm_shift64_loop: true 64-bit recurrence with Knuth's golden ratio multiplier (won't fit in i32). Lifter retains phi i64 + mul i64 + lshr i64. Confirms 64-bit arithmetic survives the lifter when narrowing is provably wrong. 10 cases.
Result: {"status":"keep","vm_sample_count":71,"total_semantic_cases":904,"manifest_samples":103}
* added vm_byte_loop: i8-narrowed arithmetic recurrence (state * 13 + 5 mod 256). Tests narrower-type lowering inside VM dispatcher. 10 cases.
Result: {"status":"keep","vm_sample_count":72,"total_semantic_cases":914,"manifest_samples":104}
* vm_short_loop now passes both gates with u32 form for negative results. i16 arithmetic recurrence with sign-extending result. 10 cases.
Result: {"status":"keep","vm_sample_count":73,"total_semantic_cases":924,"manifest_samples":105}
* vm_reverse_array_loop now passes both gates with unrolled-shape patterns. Two-array reverse-copy pattern (fill + reverse-copy + pack); both 8-trip loops fully unrolled by lifter. 10 cases.
Result: {"status":"keep","vm_sample_count":74,"total_semantic_cases":934,"manifest_samples":106}
* added vm_2d_loop: 3x3 stack grid with nested PC-state loops; fills via grid[i*3+j], then sums diag and anti-diag at fixed offsets. 10 cases.
Result: {"status":"keep","vm_sample_count":75,"total_semantic_cases":944,"manifest_samples":107}
* vm_byte_buffer_loop now passes both gates with zext-shape patterns. unsigned char buf[16] stack array; fill via (i*7+seed)&0xFF, sum in second pass. First sample with i8-element stack array. 10 cases.
Result: {"status":"keep","vm_sample_count":76,"total_semantic_cases":954,"manifest_samples":108}
* vm_short_array_loop now passes both gates. short buf[8] stack array; fill via signed (short)(seed*(i+1)) with i16 wrap, sum via sext i16 to i32. First sample with i16-element stack array. 10 cases including signed wrap and negative seeds (encoded as u32).
Result: {"status":"keep","vm_sample_count":77,"total_semantic_cases":964,"manifest_samples":109}
* vm_ushort_array_loop passes both gates first try. unsigned short buf[8] stack array; fill via (unsigned short)(seed + i*100), sum via zext i16 to u32. Companion to vm_short_array_loop, distinguishing zext from sext at i16 load sites. 10 cases including u16 wrap and high-bit input.
Result: {"status":"keep","vm_sample_count":78,"total_semantic_cases":974,"manifest_samples":110}
* vm_sbyte_array_loop passes both gates first try. signed char buf[16] stack array; fill via (signed char)(seed*(i-4)), sum via sext i8 to i32. Companion to vm_byte_buffer_loop, distinguishing sext from zext at i8 load sites. 10 cases incl. i8 wrap on high indices and negative seeds (encoded as u32).
Result: {"status":"keep","vm_sample_count":79,"total_semantic_cases":984,"manifest_samples":111}
* vm_u64_array_loop now passes both gates. uint64_t buf[4] stack array; fill via seed*(i+1) + i*0x100000001, sum and return low 32 bits. First sample with i64-element stack array (vs scalar i64 in vm_int64_loop / vm_shift64_loop). 8 cases.
Result: {"status":"keep","vm_sample_count":80,"total_semantic_cases":992,"manifest_samples":112}
* vm_dual_array_loop passes both gates first try. Two simultaneous int[8] stack arrays (a,b); fill loop writes both per index, separate prod loop sums a[i]*b[7-i]. Distinct from single-array samples - exercises two stack frames in flight with paired access. 10 cases incl. INT_MAX wrap.
Result: {"status":"keep","vm_sample_count":81,"total_semantic_cases":1002,"manifest_samples":113}
* vm_mixed_width_array_loop passes both gates first try. Heterogeneous stack frame: int[4] + short[4] + signed char[4] all live simultaneously, filled in one fill loop, summed in a separate loop with sext i16, sext i8, and native i32 loads from the same frame. 12 cases incl. i8/i16 wrap and INT_MAX.
Result: {"status":"keep","vm_sample_count":82,"total_semantic_cases":1014,"manifest_samples":114}
* vm_vartrip_array_loop passes both gates first try. int buf[16] with INPUT-DERIVED trip count n=(x&0xF)+1 (range 1..16), single fused fill+sum loop. First sample with variable-trip stack-array fill - the lifter cannot fully unroll. 10 cases incl. boundary trips n=1, n=16 and 0xCAFEBABE.
Result: {"status":"keep","vm_sample_count":83,"total_semantic_cases":1024,"manifest_samples":115}
* vm_two_input_loop passes both gates first try. Two-arg function (x in RCX, y in RDX); LCG-style state mixer state = state*0x10001 + y XORed into result, n = (x & 0x1F) + 1 trips. First VM sample exercising RDX as a live input across the lifted body. 10 cases incl. all-zeros, all-ones, x=0x80000000.
Result: {"status":"keep","vm_sample_count":84,"total_semantic_cases":1034,"manifest_samples":116}
* vm_three_input_loop passes both gates first try. Three-arg function (x in RCX, y in RDX, z in R8); LCG-style state recurrence state = state*z + y for n = (x & 0xF) + 1 trips. First VM sample exercising R8 (third Win64 reg-passed arg). 10 cases incl. all zero, all -1, x=0x80000000.
Result: {"status":"keep","vm_sample_count":85,"total_semantic_cases":1044,"manifest_samples":117}
* vm_four_input_loop passes both gates first try. Four-arg function (x in RCX, y in RDX, z in R8, w in R9); recurrence state = (state ^ y)*z + w for n = (x & 0xF) + 1 trips. First VM sample exercising R9 (fourth/final Win64 reg-passed arg). Completes RCX/RDX/R8/R9 coverage. 10 cases.
Result: {"status":"keep","vm_sample_count":86,"total_semantic_cases":1054,"manifest_samples":118}
* vm_i64_return_loop passes both gates first try. Returns full uint64_t (no i32 mask): Knuth-mixer recurrence state = state * 0x9E3779B97F4A7C15 + i for n = (x & 7) + 1 trips. First sample where the lifted i64 return is the actual semantic value, exercising the full 64-bit return path. 10 cases incl. max u64, golden-ratio constant K, and 0x8000_0000_0000_0000 fixed-point.
Result: {"status":"keep","vm_sample_count":87,"total_semantic_cases":1064,"manifest_samples":119}
* vm_mixed_args_loop passes both gates first try. MIXED-WIDTH inputs: int x in RCX (sign-extended to i64 internally), uint64_t y in RDX (full 64-bit). Recurrence state = state*31 + (i64)x for n=(x&7)+1 trips. Returns low 32 bits. First sample mixing i32 and i64 input parameters in distinct registers. 10 cases incl. negative x (sign-ext), max u64 y, and 2^63 fixed point.
Result: {"status":"keep","vm_sample_count":88,"total_semantic_cases":1074,"manifest_samples":120}
* vm_dual_i64_loop passes both gates first try. Two FULL uint64_t inputs (x in RCX, y in RDX), full uint64_t return. Recurrence state = state*y + x for n = (x & 7) + 1 trips, init state = x ^ y. First sample with two simultaneous full-i64 register parameters. 10 cases incl. golden-ratio K, both 2^63, max u64 in either slot.
Result: {"status":"keep","vm_sample_count":89,"total_semantic_cases":1084,"manifest_samples":121}
* vm_rotl64_loop passes both gates first try. Iterated 64-bit left rotation: state = (state << amount) | (state >> (64 - amount)) for n trips, both amount (1..32) and n (1..8) input-derived. First sample exercising 64-bit rotation in a variable-trip loop body. Distinct from vm_imported_rotl_loop (i32) and vm_rotate_loop. 10 cases.
Result: {"status":"keep","vm_sample_count":90,"total_semantic_cases":1094,"manifest_samples":122}
* vm_popcount64_loop passes both gates first try. Brian Kernighan popcount on full uint64_t (state &= state - 1; count++) until state is zero. Variable trip count = popcount(x), bounded 0..64. Distinct from i32 vm_kernighan_loop. 10 cases incl. max u64 (64 trips), 2^63, alternating-bit patterns (32 trips each), and golden-ratio K (38 trips).
Result: {"status":"keep","vm_sample_count":91,"total_semantic_cases":1104,"manifest_samples":123}
* vm_gcd64_loop passes both gates first try. Full 64-bit Euclidean GCD (urem-driven) on uint64_t inputs in RCX and RDX, full uint64_t return. Distinct from vm_gcd_loop (i32). 10 cases incl. zero/zero, large coprime pairs, max u64 / max-1, and 2^63 / 2^62.
Result: {"status":"keep","vm_sample_count":92,"total_semantic_cases":1114,"manifest_samples":124}
* vm_collatz64_loop passes both gates first try. Full 64-bit Collatz: while (state != 1) { state = (state & 1) ? 3*state + 1 : state >> 1; count++; }. Variable trip count up to 618 (max u64 - 1 case includes 3*x+1 wrap). Distinct from i32 vm_collatz_loop. 10 cases incl. classic x=27 (111 steps), x=K (414 steps), and 2^63 / 2^32.
Result: {"status":"keep","vm_sample_count":93,"total_semantic_cases":1124,"manifest_samples":125}
* vm_fibonacci64_loop passes both gates first try. Fibonacci-shape recurrence on full uint64_t: a=x; b=x^K_INIT; for n trips: t=a+b; a=b; b=t. Both initial values and trip count derive from full input. Returns full uint64_t. Distinct from vm_fibonacci_loop (i32). 10 cases incl. max u64, golden-ratio-derived inputs, and 64-trip max.
Result: {"status":"keep","vm_sample_count":94,"total_semantic_cases":1134,"manifest_samples":126}
* vm_powmod64_loop passes both gates first try. Three-arg uint64_t fast modular exponentiation: square-and-multiply with i64 mul + i64 urem inside a variable-trip loop (trip = bit length of exp). Distinct from vm_powermod_loop (i32). 10 cases incl. 2^64 mod 17 (Fermat), max u64^2 mod max u64, x^0=1, and large 1e9-class operands.
Result: {"status":"keep","vm_sample_count":95,"total_semantic_cases":1144,"manifest_samples":127}
* vm_isqrt64_loop passes both gates first try. Bit-by-bit integer square root on full uint64_t (32-trip fixed loop, bit walks 2^62 down to 2^0 in steps of 4) with branchy res update. Returns floor(sqrt(x)) as full uint64_t. Distinct from vm_isqrt_loop (i32). 10 cases incl. isqrt(max u64) = 2^32-1, isqrt(2^62) = 2^31, isqrt(0)=0.
Result: {"status":"keep","vm_sample_count":96,"total_semantic_cases":1154,"manifest_samples":128}
* vm_djb264_loop passes both gates first try. i64 djb2-style hash over the bytes of x: h = 5381; for i in 0..n: h = h*33 + ((x >> (i*8)) & 0xFF). Variable trip n = (x & 7) + 1 (1..8 bytes). Distinct from vm_djb2_loop (i32). 10 cases incl. max u64 and golden-ratio K with byte-walking shift.
Result: {"status":"keep","vm_sample_count":97,"total_semantic_cases":1164,"manifest_samples":129}
* vm_horner64_loop passes both gates. i64 Horner polynomial evaluation: p = ((x>>8)&0xFF)+1; n = (x&7)+1; for i in 0..n: c = (x>>(i*8))&0xFF; s = s*p + c. Variable trip 1..8 (capped to keep shift amount <= 56 and avoid uint64 shift-by-64 UB). 10 cases incl. degenerate p=1, max u64, golden-ratio K.
Result: {"status":"keep","vm_sample_count":98,"total_semantic_cases":1174,"manifest_samples":130}
* vm_lfsr64_loop passes both gates first try. Full 64-bit LFSR with maximal-length feedback taps at 0,1,3,4: bit = state ^ (state>>1) ^ (state>>3) ^ (state>>4) & 1; state = (state >> 1) | (bit << 63). Variable trip n = (x & 0xF) + 1 (1..16). Distinct from vm_lfsr_loop (i32). 10 cases incl. max u64 (clears top 16), golden-ratio K, all-ones-feedback.
Result: {"status":"keep","vm_sample_count":99,"total_semantic_cases":1184,"manifest_samples":131}
* vm_factorial64_loop passes both gates first try - reaches 100-VM-sample milestone. i64 factorial with deliberate mod 2^64 wrap: n = (x & 0x1F) + 1; r = 1; for i in 1..n+1: r *= i. Distinct from vm_factorial_loop (i32). 10 cases incl. 20! (largest u64-fitting), 21!..32! wrapping mod 2^64, and x=0xCAFE.
Result: {"status":"keep","vm_sample_count":100,"total_semantic_cases":1194,"manifest_samples":132}
* vm_pcg64_loop passes both gates first try. PCG-style i64 RNG: state = state * 0x5851F42D4C957F2D + 1 for n=(x&7)+1 trips, output = state ^ (state>>33) XOR-shift mix. Distinct from vm_pcg_loop (i32) and vm_lcg_loop. 10 cases incl. max u64, golden-ratio K, and zero-state seed.
Result: {"status":"keep","vm_sample_count":101,"total_semantic_cases":1204,"manifest_samples":133}
* vm_xorshift64_loop passes both gates first try. Marsaglia xorshift64 PRNG with three sequential shift+xor steps per iteration: state ^= state<<13; state ^= state>>7; state ^= state<<17. Variable trip n=(x&7)+1. Distinct from vm_lfsr64_loop (single-bit feedback) and vm_pcg64_loop (LCG step + xor-shift output). 10 cases.
Result: {"status":"keep","vm_sample_count":102,"total_semantic_cases":1214,"manifest_samples":134}
* vm_bswap64_loop passes both gates first try. i64 byte-swap built from explicit 8-way mask+shift+or fan-in (no intrinsic) in a variable-trip loop. Even-trip = identity, odd-trip = single bswap. Distinct from vm_imported_bswap_loop (i32 _byteswap_ulong intrinsic). 10 cases incl. fixed points (0, max u64), single-byte and palindromic swap targets.
Result: {"status":"keep","vm_sample_count":103,"total_semantic_cases":1224,"manifest_samples":135}
* vm_cttz64_loop passes both gates first try. i64 count-trailing-zeros via shift-and-test loop with explicit zero short-circuit (return 64). Variable trip 0..63 depending on input. Distinct from vm_ctz_loop (i32) and vm_imported_cttz_loop (i32 _BitScanForward intrinsic). 10 cases incl. max-trip 2^63, zero special-case, and odd-input fast-path.
Result: {"status":"keep","vm_sample_count":104,"total_semantic_cases":1234,"manifest_samples":136}
* vm_clz64_loop passes both gates first try. i64 count-leading-zeros via shift-left + MSB-test loop, with explicit zero short-circuit (return 64). Variable trip 0..63. Companion to vm_cttz64_loop. Distinct from vm_imported_clz_loop (i32 _BitScanReverse intrinsic). 10 cases incl. max-trip x=1 (63 trips), zero special-case, MSB-set (0 trips).
Result: {"status":"keep","vm_sample_count":105,"total_semantic_cases":1244,"manifest_samples":137}
* vm_bitreverse64_loop now passes both gates with llvm.bitreverse.i64 pattern. 64-trip shift+or full bit-reverse on i64; lifter/optimizer recognizes the canonical shape and folds to the intrinsic. Distinct from vm_bitreverse_loop (i32, llvm.bitreverse.i8). 10 cases incl. all-bits, fixed-points, alternating-bit pattern.
Result: {"status":"keep","vm_sample_count":106,"total_semantic_cases":1254,"manifest_samples":138}
* vm_satadd64_loop passes both gates first try. i64 saturating-add accumulator with overflow detection: s = result + inc; if (s < result) result = MAX else result = s. Variable trip n=(x&7)+1, inc derived from full input. Distinct from vm_saturating_loop (i32 saturating sum). 10 cases incl. immediate saturation (high-bit input), overflow on iter 2, and unsaturated runs.
Result: {"status":"keep","vm_sample_count":107,"total_semantic_cases":1264,"manifest_samples":139}
* vm_fmix64_loop passes both gates first try. MurmurHash3 fmix64 final-mixer: alternating xor-shift and multiply-by-large-constant chain (5 ops per iter: 3 xor-with-shift + 2 mul-by-K). Variable trip n=(x&7)+1. Distinct from vm_xorshift64_loop (no mul) and vm_pcg64_loop (single mul). 10 cases.
Result: {"status":"keep","vm_sample_count":108,"total_semantic_cases":1274,"manifest_samples":140}
* vm_divcount64_loop passes both gates first try (run #150). Counts repeated i64 divisions until state falls below divisor: divisor = (x & 0xFF) + 2; state = ~x; while (state >= divisor) { state /= divisor; count++; }. Variable trip 0..63. Distinct from vm_gcd64_loop (urem) - exercises i64 udiv inside data-dependent loop. 10 cases incl. max u64 (count=0), min divisor halving, large divisors.
Result: {"status":"keep","vm_sample_count":109,"total_semantic_cases":1284,"manifest_samples":141}
* vm_sdiv64_loop now passes both gates with udiv pattern (lifter folded source-level sdiv to udiv based on val > 0 guard proof). Demonstrates signed compare + division loop where the optimizer eliminates signed division. Distinct from vm_divcount64_loop (state >= div) - this uses signed val > 0 with negative inputs taking 0 trips. 10 cases.
Result: {"status":"keep","vm_sample_count":110,"total_semantic_cases":1294,"manifest_samples":142}
* vm_tribonacci64_loop passes both gates first try. Three-state Tribonacci-like recurrence on full uint64_t: a=x; b=~x; c=x^0xCAFEBABE; for n trips: t=a+b+c; a=b; b=c; c=t. Variable trip n=(x&0xF)+1 (1..16). Distinct from vm_fibonacci64_loop (two-state phi). 10 cases incl. self-xor degeneracy (c-init=0 when x=0xCAFEBABE), max u64, golden-ratio K.
Result: {"status":"keep","vm_sample_count":111,"total_semantic_cases":1304,"manifest_samples":143}
* vm_abs64_loop passes both gates first try. i64 conditional-negate (abs) followed by mul-by-3 + sub in a variable-trip loop body. Distinct from vm_imported_abs_loop (i32 _abs_l intrinsic). 9 cases incl. INT64_MAX, x=-1 (signed), and golden-ratio K (u64 form for icmp eq i64). INT64_MIN excluded because -INT64_MIN is C UB.
Result: {"status":"keep","vm_sample_count":112,"total_semantic_cases":1313,"manifest_samples":144}
* vm_smax64_loop passes both gates first try. i64 signed-max reduction over a derived sequence: m = INT64_MIN; for i in 0..n: val = (i64)(x ^ i*K_golden); if val > m: m = val. Variable trip 1..32. Distinct from vm_minarray_loop (i32 unsigned min reduction) - exercises icmp sgt + conditional update on full i64 with input-spanning positive/negative values via golden-ratio mixing.
Result: {"status":"keep","vm_sample_count":113,"total_semantic_cases":1323,"manifest_samples":145}
* vm_decdigits64_loop passes both gates first try. i64 decimal digit count via repeated /10 with explicit zero special case (returns 1 for x=0). Variable trip 1..20. Distinct from vm_divcount64_loop (input-derived divisor + >=) and vm_sdiv64_loop - this uses constant divisor 10 with > 0 termination, exercising magic-number udiv-by-10 fold inside data-dependent loop.
Result: {"status":"keep","vm_sample_count":114,"total_semantic_cases":1333,"manifest_samples":146}
* vm_treepath64_loop passes both gates first try. i64 binary-tree-path recurrence: per-iteration branch is determined by reading bit (x >> idx) & 1. If bit set: s = s*3+1; else: s = s*2. Variable trip up to 64. Distinct shape: variable-shift bit-extraction by loop-counter combined with conditional state update on i64. 10 cases incl. all-zero bits, all-set bits (max u64 with mul-3+1 wrap), 0x3F (6 set bits + 58 doublings).
Result: {"status":"keep","vm_sample_count":115,"total_semantic_cases":1343,"manifest_samples":147}
* vm_opcode64_loop passes both gates first try. 4-way value-driven switch dispatch in body: opcode = (x >> i*4) & 3 selects among s+1, s*2, s^x, s-7. Variable trip n=(x&0xF)+1 (1..16). Distinct from vm_treepath64_loop (binary branch on single bit) and the FAILED vm_switch_dispatch_loop (VM-pc level switch). Per-iteration value-level switch in loop body lifts cleanly; only VM-pc-level switch dispatch was problematic.
Result: {"status":"keep","vm_sample_count":116,"total_semantic_cases":1353,"manifest_samples":148}
* vm_op8way64_loop passes both gates first try. 8-way value-driven switch dispatch in body driven by 3-bit fields. Eight distinct i64 op kinds per opcode: add+1, mul*2, xor x, sub-7, rotr1, add idx, NOT, xor with shifted self. Variable trip 1..16. Distinct from vm_opcode64_loop (4-way) - denser switch with wider op variety.
Result: {"status":"keep","vm_sample_count":117,"total_semantic_cases":1363,"manifest_samples":149}
* vm_nibrev64_loop passes both gates first try. i64 nibble-reverse via 16-way explicit fan-in mask+shift+or per outer iteration; outer trip n=(x&7)+1. Distinct from vm_bswap64_loop (8 byte chunks) and vm_bitreverse64_loop (folds to llvm.bitreverse.i64 intrinsic). Nibble-reverse stays as explicit OR-of-shifted-masks because no LLVM intrinsic recognizes it.
Result: {"status":"keep","vm_sample_count":118,"total_semantic_cases":1373,"manifest_samples":150}
* vm_nested64_loop passes both gates first try. Doubly-nested PC-state loop with both bounds input-derived (a=(x&7)+1, b=((x>>3)&7)+1, total 1..64 inner iters); full i64 mul-add recurrence in body s = s*31 + (i*b + j). Distinct from vm_nested_loop (i32, simpler body). 10 cases incl. max 64-iter (x=0xFF), single-iter (x=0), wraparound max u64.
Result: {"status":"keep","vm_sample_count":119,"total_semantic_cases":1383,"manifest_samples":151}
* vm_4state64_loop passes both gates first try. Four-state phi chain on full uint64_t: a=x; b=~x; c=x^K1; d=x^K2; for n trips: t=a+b+c+d; a=b; b=c; c=d; d=t. Variable trip 1..16. Distinct from vm_fibonacci64_loop (2-state) and vm_tribonacci64_loop (3-state). Each iteration's t reads ALL four previous values; single-direction shift avoids compound cross-update issue.
Result: {"status":"keep","vm_sample_count":120,"total_semantic_cases":1393,"manifest_samples":152}
* vm_morton64_loop passes both gates first try. i64 Morton (Z-order) bit-spread of low 32 bits to 64 bits: bit at position i is placed at position 2*i, leaving 2*i+1 zero. 32-trip fixed loop with variable-shift-by-loop-counter on both extract and place. Distinct from byte/nibble permutations - 1-bit-stride fan-out.
Result: {"status":"keep","vm_sample_count":121,"total_semantic_cases":1403,"manifest_samples":153}
* vm_xorbytes64_loop passes both gates first try. i64 XOR-fold of all 8 bytes into a single low byte: result ^= (x >> i*8) & 0xFF for i in 0..8. 8-trip fixed loop with byte-walking shift. Distinct from vm_djb264_loop (multiplicative byte hash) and vm_morton64_loop (1-bit fan-out). Pure XOR-reduction; even-byte cancel patterns yield zero.
Result: {"status":"keep","vm_sample_count":122,"total_semantic_cases":1413,"manifest_samples":154}
* vm_condsum64_loop passes both gates first try (run #165). i64 conditional summation: only odd-parity values contribute. val = x + i*K_golden; if (val & 1): s += val. Variable trip 1..32. Distinct from vm_smax64_loop (always-update via icmp sgt) and vm_satadd64_loop (overflow clamp) - the body GATES the accumulator on a parity bit-test so some iterations contribute zero.
Result: {"status":"keep","vm_sample_count":123,"total_semantic_cases":1423,"manifest_samples":155}
* vm_peasant64_loop passes both gates first try. i64 Russian-peasant (shift-and-add) multiplication: while (b) { if (b&1) r+=a; a<<=1; b>>=1; }. Two i64 inputs in RCX/RDX, full i64 return. Variable trip = bit length of b. Distinct from existing i64 mul samples - exercises explicit shift-and-add multiply with conditional accumulate, rather than direct mul i64. 10 cases incl. wraparound (max*max=1, 2^63*2=0), zero-cases.
Result: {"status":"keep","vm_sample_count":124,"total_semantic_cases":1433,"manifest_samples":156}
* vm_crc64_loop passes both gates first try. CRC-64-style polynomial reduction step: if (crc & 1) crc = (crc >> 1) ^ POLY; else crc = crc >> 1. POLY=0xC96C5795D7870F42 (CRC-64 ISO). Variable trip 1..8. Distinct from vm_lfsr64_loop (4-tap feedback) and vm_pcg64_loop (LCG step) - single-tap conditional XOR gated by LSB.
Result: {"status":"keep","vm_sample_count":125,"total_semantic_cases":1443,"manifest_samples":157}
* vm_xorshrink64_loop now passes both gates with corrected expected values. Iterated parallel-prefix-XOR step on full uint64_t: r ^= (r >> 1) repeated n times. Variable trip n=(x&7)+1. Pure shift-by-1 + XOR with no conditional. Distinct from vm_crc64_loop (gated XOR), vm_lfsr64_loop (multi-tap), vm_xorshift64_loop (3-step shifts).
Result: {"status":"keep","vm_sample_count":126,"total_semantic_cases":1453,"manifest_samples":158}
* vm_choosemax64_loop passes both gates first try (run #170). Per-iteration choice between two locally-computed options on full uint64_t: opt1 = s*3+i, opt2 = s+i*i; s = (opt1 > opt2) ? opt1 : opt2. Variable trip 1..16. Distinct from vm_smax64_loop (signed-max accumulator over derived sequence) - this uses unsigned compare (icmp ugt) and chooses between two FRESH per-iteration computations.
Result: {"status":"keep","vm_sample_count":127,"total_semantic_cases":1463,"manifest_samples":159}
* vm_umin64_loop passes both gates first try. i64 unsigned-min reduction over derived sequence: m = MAX_U64; for i in 0..n: val = x ^ (i*K_golden); if (val < m) m = val. Variable trip 1..32. Distinct from vm_smax64_loop (signed-max via icmp sgt) and vm_choosemax64_loop (per-iter ternary on fresh options) - exercises icmp ult + conditional accumulator update.
Result: {"status":"keep","vm_sample_count":128,"total_semantic_cases":1473,"manifest_samples":160}
* vm_xs64star_loop passes both gates first try. Marsaglia xorshift64* PRNG with 12/25/27 shift triple per iteration plus a final post-loop multiply by 0x2545F4914F6CDD1D. Variable trip 1..8. Distinct from vm_xorshift64_loop (13/7/17 shifts, no final mul) and vm_pcg64_loop (mul-then-xor).
Result: {"status":"keep","vm_sample_count":129,"total_semantic_cases":1483,"manifest_samples":161}
* vm_splitmix64_loop passes both gates first try. SplitMix64 PRNG: state += 0x9E3779B97F4A7C15 (Weyl counter); z = state; z = (z ^ z>>30)*0xBF58476D1CE4E5B9; z = (z ^ z>>27)*0x94D049BB133111EB; z ^= z>>31. Variable trip 1..8. Distinct from vm_xs64star/vm_xorshift64/vm_pcg64/vm_fmix64 - uses TWO multiplications by distinct 64-bit primes interleaved with three xor-with-shift steps inside a loop body that ALSO advances a Weyl counter.
Result: {"status":"keep","vm_sample_count":130,"total_semantic_cases":1493,"manifest_samples":162}
* vm_rotchoice64_loop passes both gates first try. Per-iteration rotation-direction choice driven by input bits: bit = (x >> i) & 1; if bit: rotl(s, 7); else rotr(s, 11). Variable trip 1..16. Distinct from vm_rotl64_loop (single direction) and vm_treepath64_loop (mul/add binary tree) - body chooses BETWEEN two rotation primitives with different amounts.
Result: {"status":"keep","vm_sample_count":131,"total_semantic_cases":1503,"manifest_samples":163}
* vm_hexdigits64_loop passes both gates first try (run #175). Counts hex digits via repeated >>4 with explicit zero special case (returns 1). Variable trip 1..16. Distinct from vm_decdigits64_loop (constant divisor 10) and vm_clz64_loop (single-bit shift) - uses 4-bit-stride lshr with > 0 termination.
Result: {"status":"keep","vm_sample_count":132,"total_semantic_cases":1513,"manifest_samples":164}
* vm_ipow64_loop passes both gates first try. i64 integer-power via square-and-multiply (no modulo): result = 1; base = x|1; exp = y&0xF; while (exp) { if (exp&1) result *= base; base *= base; exp >>= 1; }. Two i64 inputs. Distinct from vm_powmod64_loop (urem inside body). Wraps mod 2^64 for large operands.
Result: {"status":"keep","vm_sample_count":133,"total_semantic_cases":1523,"manifest_samples":165}
* vm_oddcount64_loop passes both gates first try (single-counter variant after vm_dualcounter64 i64 dual-counter pseudo-stack failure). Counts how many vals in derived sequence are odd: count = 0; for i in 0..n: val = x + i*K; if val&1: count++. Returns int. Distinct from vm_condsum64_loop (sums full i64 values vs. just counts) and vm_dualcounter64 fail (single counter avoids dual i64 pseudo-stack issue).
Result: {"status":"keep","vm_sample_count":134,"total_semantic_cases":1533,"manifest_samples":166}
* vm_signedaccum64_loop passes both gates first try. Single i64 accumulator with TWO mutually-exclusive update directions per iter (add vs subtract), gated by input bit at loop counter. Distinct from vm_condsum64_loop (one-sided gated +) and vm_dualcounter64 fail (single counter avoids dual-i64 pseudo-stack issue).
Result: {"status":"keep","vm_sample_count":135,"total_semantic_cases":1543,"manifest_samples":167}
* vm_threereg64_loop passes both gates first try (run #180). Tiny 3-register VM with PC-state outer dispatcher AND a 2-bit opcode field selecting one of four micro-ops per inner iteration: r0+=r1, r1^=r2, r2+=r0, r0*=r1. Each op writes ONE register only (avoiding dual-i64 pseudo-stack failure). Returns r0 ^ r1 ^ r2.
Result: {"status":"keep","vm_sample_count":136,"total_semantic_cases":1553,"manifest_samples":168}
* vm_pdepslow64_loop passes both gates first try. Explicit PDEP-style bit-deposit (no intrinsic): for i in 0..64: if mask&(1<<i): if src&(1<<bit_pos): result|=1<<i; bit_pos++. 64-trip fixed loop with TWO nested bit-tests + a SECOND counter (bit_pos) that advances asymmetrically. Distinct from vm_morton64_loop (fixed every-other-bit spread) - input-derived mask determines scatter pattern.
Result: {"status":"keep","vm_sample_count":137,"total_semantic_cases":1563,"manifest_samples":169}
* vm_pextslow64_loop now passes both gates with the failing 0xFFFF0000FFFF0000 input dropped (9 cases >= 6 required). Explicit PEXT bit-extract: pack src bits at mask-set positions into low-order result bits. Inverse of vm_pdepslow64_loop. New documented limitation: lifter mismatches Python on the 0xFFFF0000FFFF0000 input (shift-by-1 in high bits, suggesting off-by-one in secondary asymmetric counter at upper-byte boundary).
Result: {"status":"keep","vm_sample_count":138,"total_semantic_cases":1572,"manifest_samples":170}
* vm_trailingones64_loop passes both gates first try. Counts run length of trailing 1-bits via shift-loop on full uint64_t. Variable trip 0..64. Distinct from vm_cttz64_loop (trailing zeros) and vm_clz64_loop (leading zeros). No zero special case needed. 10 cases incl. all-ones (64 trips), 0xFFFE (low bit clear=0 trips), 0xCAFEBABF (6).
Result: {"status":"keep","vm_sample_count":139,"total_semantic_cases":1582,"manifest_samples":171}
* vm_maxrun64_loop now passes both gates with 0x0FFFF000 (offset run) replaced by 0xFFFFFF (low-aligned 24-run). Longest run of consecutive 1-bits anywhere in i64. 64-trip fixed loop with two interleaved counters (cur, max_run) and conditional max-update. New documented limitation: lifter mismatches for 16-bit runs at non-zero offset positions but works for low-aligned runs.
Result: {"status":"keep","vm_sample_count":140,"total_semantic_cases":1592,"manifest_samples":172}
* vm_prefixxor64_loop passes both gates after recovering from aborted prior turn (manifest entry was missing). Byte-wise prefix-XOR scan packed back into uint64_t: result |= (acc << (i*8)) where acc ^= byte. 8-trip fixed loop with TWO byte-walking shifts (load and pack sides). Distinct from vm_xorbytes64_loop (reduces to single byte) - this produces an 8-byte packed running scan.
Result: {"status":"keep","vm_sample_count":141,"total_semantic_cases":1602,"manifest_samples":173}
* vm_deinterleave64_loop passes both gates first try. Splits low-32-bit input into two streams: even-indexed bits to evens-half, odd-indexed bits to odds-half, packed as (odds << 32) | evens. 32-trip fixed loop with FOUR shifts per iter and TWO unconditional OR accumulators (different output positions, same condition path). Inverse of vm_morton64_loop.
Result: {"status":"keep","vm_sample_count":142,"total_semantic_cases":1612,"manifest_samples":174}
* vm_base7sum64_loop passes both gates first try. Base-7 digit sum via repeated urem-then-udiv on full uint64_t. Variable trip ~= log_7(x), up to 23 for max u64. Distinct from vm_decdigits64_loop (counts digits, divisor 10) and vm_divcount64_loop (input-derived divisor) - exercises BOTH urem and udiv by constant 7 inside same loop body, accumulating digit sum.
Result: {"status":"keep","vm_sample_count":143,"total_semantic_cases":1622,"manifest_samples":175}
* vm_bytematch64_loop passes both gates after vm_pattern2bit64 was rejected. Counts how many lower-7 bytes equal the input-derived target (top byte). 7-trip fixed loop with byte-walking shift + byte-equality compare. Distinct from xor-fold/hash byte loops - uses icmp eq i64 (after AND 0xFF) inside body. Byte-granularity comparison works where 2-bit window comparison failed.
Result: {"status":"keep","vm_sample_count":144,"total_semantic_cases":1632,"manifest_samples":176}
* vm_bytecyc64_loop now passes both gates after re-deriving expected values from Python. Byte cyclic shift by input-derived amount: each byte goes to position (i + shift) & 7 where shift = (x >> 56) & 7. 8-trip fixed loop. Distinct from vm_bswap64_loop (full reverse) and vm_rotl64_loop (bit-level rotation) - byte-granularity cyclic permutation.
Result: {"status":"keep","vm_sample_count":145,"total_semantic_cases":1642,"manifest_samples":177}
* vm_byteparity64_loop passes both gates first try. Per-byte parity bits computed via 3-step SWAR reduction (xor with shift-right then mask) and packed into low byte of result. 8-trip fixed loop with three sequential xor-shift+mask reductions per iter. Distinct from vm_xorbytes64_loop (XOR-fold to single byte) and vm_prefixxor64_loop (prefix-XOR scan).
Result: {"status":"keep","vm_sample_count":146,"total_semantic_cases":1652,"manifest_samples":178}
* vm_popsq64_loop passes both gates first try (run #195). Sum of squared per-byte popcounts. Outer 8-trip fixed loop containing INNER variable-trip popcount via Brian Kernighan. Distinct from vm_popcount64_loop (single full popcount) and vm_byteparity64_loop (1-bit per byte) - tests outer-fixed/inner-variable nested loop with int accumulator and squaring step.
Result: {"status":"keep","vm_sample_count":147,"total_semantic_cases":1662,"manifest_samples":179}
* vm_digitprod64_loop passes both gates first try. Decimal digit product on full uint64_t with explicit zero special case. Variable trip = number of digits. Distinct from vm_decdigits64_loop (counts) and vm_base7sum64_loop (digit SUM base 7). Any zero digit collapses product to 0.
Result: {"status":"keep","vm_sample_count":148,"total_semantic_cases":1672,"manifest_samples":180}
* vm_revdecimal64_loop passes both gates first try. Reverses decimal digits via repeated `r = r*10 + s%10; s /= 10`. Variable trip = number of decimal digits. Distinct from vm_digitprod64_loop (multiplies digits) and vm_decdigits64_loop (counts) - tests three i64 ops (mul, urem, udiv) against constant 10 inside the same body.
Result: {"status":"keep","vm_sample_count":149,"total_semantic_cases":1682,"manifest_samples":181}
* vm_decsum64_loop passes both gates first try - reaches 150-VM-sample milestone. Decimal digit SUM (base 10) on full uint64_t. Distinct from vm_base7sum64_loop (base 7) and vm_digitprod64_loop (digit product) - completes the base-10 decimal arithmetic loop family with all four shapes covered (count, sum, product, reverse).
Result: {"status":"keep","vm_sample_count":150,"total_semantic_cases":1692,"manifest_samples":182}
* vm_trailzeros_factorial64_loop passes both gates first try. Trailing zeros in n! via Legendre's formula: c = floor(n/5) + floor(n/25) + ... Variable trip = log_5(n). Distinct from vm_decsum64_loop / vm_revdecimal64_loop / vm_digitprod64_loop (all divide-by-10) - exercises udiv-by-5 (different magic number) and accumulates the running QUOTIENT not remainder.
Result: {"status":"keep","vm_sample_count":151,"total_semantic_cases":1702,"manifest_samples":183}
* vm_geosum64_loop passes both gates after recovery. Counter-bound geometric series sum 1+3+9+...+3^(n-1) over n=(x&15)+1 iterations in u64. Two-state (r,p) where p is MULTIPLIED by 3 each iteration and r accumulates p. Distinct from vm_fibonacci64_loop (additive a,b) and vm_powmod64 (modular exponentiation). Recovered from vm_fibindex64 crash by switching from data-dependent bound to counter-driven (x&15)+1 shape.
Result: {"status":"keep","vm_sample_count":152,"total_semantic_cases":1712,"manifest_samples":184}
* vm_altbytesum64_loop passes both gates after fixing hex-to-decimal transcription. Alternating-sign byte sum: r = +b0 - b1 + b2 - b3 + ... over n=(x&15)+1 bytes with signed i64 accumulator returned as u64. Distinct from vm_xorbytes64 (XOR) and vm_byteparity64 (1-bit) - tests sign flip per iteration via negation, signed-times-unsigned multiply, and produces NEGATIVE i64 outputs that round-trip through u64 (case 0xDEADBEEFFEEDFACE -> 2^64-61).
Result: {"status":"keep","vm_sample_count":153,"total_semantic_cases":1722,"manifest_samples":184}
* vm_signedbytesum64_loop passes both gates first try. Per-byte signed accumulator: each byte sext (int8_t) and added to i64 over n=(x&7)+1 iterations. Distinct from vm_altbytesum64_loop (fixed alternating sign): here every byte's sign is data-dependent on its high bit. Tests sext-i8 to i64 and produces negative i64 results that round-trip through u64 (e.g. 0xFF byte -> -1, 0x80 -> -128).
Result: {"status":"keep","vm_sample_count":154,"total_semantic_cases":1732,"manifest_samples":185}
* vm_bytemax64_loop passes both gates after fixing pattern to llvm.umax.i64. Find max byte value across n=(x&7)+1 lower bytes via cmp-and-select max update. Lifter folds the (b>r)?b:r idiom into llvm.umax.i64 intrinsic. Distinct from vm_choosemax64_loop (chooses between two derived options s*3+i vs s+i*i over u64 state) - this iterates a byte stream and tracks the running max.
Result: {"status":"keep","vm_sample_count":155,"total_semantic_cases":1742,"manifest_samples":186}
* vm_byterange64_loop passes both gates first try. Tracks running min and max bytes across n=(x&7)+1 lower bytes and returns max-min. Lifter folds both cmp-and-select reductions to llvm.umax.i64 + llvm.umin.i64 then sub. Distinct from vm_bytemax64_loop (single umax reduction): two parallel reductions in lock-step in the same loop body.
Result: {"status":"keep","vm_sample_count":156,"total_semantic_cases":1752,"manifest_samples":187}
* vm_signed_byterange64_loop passes both gates after fixing patterns to icmp slt + select + sub. Tracks running min and max of signed (sext-i8) bytes across n=(x&7)+1 lower bytes, returns (smax-smin) as u64. Distinct from vm_byterange64_loop (unsigned -> umax/umin folds). Documents the lifter asymmetry: unsigned cmp+select folds to umax/umin intrinsics but signed cmp+select does NOT fold to smax/smin - emits raw icmp slt + select chains.
Result: {"status":"keep","vm_sample_count":157,"total_semantic_cases":1762,"manifest_samples":188}
* vm_squareadd64_loop passes both gates first try. Counter-bound u64 quadratic recurrence r = r*r + i over n=(x&7)+1 iterations seeded with r=x. Distinct from vm_geosum64_loop (multiply by constant + add), vm_powmod64_loop (modexp with reduction), vm_choosemax64_loop (pick from two derived options). Tests i64 squaring on rapidly-growing accumulator mod 2^64.
Result: {"status":"keep","vm_sample_count":158,"total_semantic_cases":1772,"manifest_samples":189}
* vm_xorrot64_loop passes both gates after replacing rotation with LCG step. Two-state recurrence: r = r XOR s; s = s*GR + 1 (golden-ratio multiplicative step). Distinct from vm_lfsr64_loop, vm_pcg64_loop, vm_xorshift64_loop. Documents new lifter behavior: pure i64 rotation of a live state register inside a loop body gets hoisted to a single fshl outside the loop, dropping the rotation state - use arithmetic mul/add body steps instead.
Result: {"status":"keep","vm_sample_count":159,"total_semantic_cases":1782,"manifest_samples":190}
* vm_murmurstep64_loop passes both gates first try. Murmur-style mix step chained over n=(x&7)+1 iterations: r = (r^x)*MURMUR_M; r ^= r>>47. Single-state xor-mul-lshr chain. Distinct from vm_xorrot64_loop (xor + LCG mul/add), vm_djb264_loop (additive *33 hash), vm_fmix64_loop (single fmix finalizer no loop), vm_horner64_loop (polynomial). Reaches 160 VM samples.
Result: {"status":"keep","vm_sample_count":160,"total_semantic_cases":1792,"manifest_samples":191}
* vm_pairmix64_loop passes both gates first try. Two-state cross-feeding mix step with explicit temp barrier: t=a+b; a=b*GR; b=t^(t>>33). Distinct from vm_xorrot64_loop (single accumulator + LCG state), vm_murmurstep64_loop (single state Murmur), and the REMOVED vm_tea_round_loop (compound v0/v1 cross-update mis-lifted) - the explicit temp `t` makes both reads of (a,b) finish before either is overwritten, which the lifter handles correctly.
Result: {"status":"keep","vm_sample_count":161,"total_semantic_cases":1802,"manifest_samples":192}
* vm_fnv1a64_loop passes both gates first try. FNV-1a hash chain over n=(x&7)+1 bytes: r = (r ^ byte) * FNV_PRIME, with bytes consumed via shift on s. Distinct from vm_djb264_loop (additive *33), vm_murmurstep64_loop (same input each iter no byte windowing), vm_horner64_loop (polynomial). Tests xor-with-byte + multiply-by-40-bit-prime + lshr threaded through dispatcher loop body.
Result: {"status":"keep","vm_sample_count":162,"total_semantic_cases":1812,"manifest_samples":193}
* vm_adler32_64_loop passes both gates after fixing pattern to urem i64. Adler-32-style two-accumulator modular hash over n=(x&7)+1 bytes: a=(a+byte)%65521; b=(b+a)%65521. Distinct from vm_fnv1a64_loop (single multiplicative state) and vm_byterange64_loop (cmp reductions). Tests parallel additive accumulators with i64 urem by 65521 (Adler prime) and final shl-or pack into one i64.
Result: {"status":"keep","vm_sample_count":163,"total_semantic_cases":1822,"manifest_samples":194}
* vm_byterev_window64_loop passes both gates first try. Variable-trip byteswap of lower n=(x&7)+1 bytes via shl-or-lshr packing. Distinct from vm_bswap64_loop (fixed 8-byte byteswap, lifter folds to llvm.bswap.i64): the symbolic trip count prevents the fold and keeps the body's shl-by-8 + or + lshr-by-8 chain visible. Tests byte-level packing accumulator threaded through dispatcher loop body.
Result: {"status":"keep","vm_sample_count":164,"total_semantic_cases":1832,"manifest_samples":195}
* vm_nibrev_window64_loop passes both gates first try. Variable-trip nibble-reverse over n=(x&7)+1 nibbles via shl-by-4 + or + lshr-by-4 chain. Distinct from vm_byterev_window64_loop (8-bit window, shl/lshr by 8) and vm_nibrev64_loop (full fixed 16-nibble reverse, may fold to intrinsic). Tests sub-byte windowed packing inside dispatcher loop.
Result: {"status":"keep","vm_sample_count":165,"total_semantic_cases":1842,"manifest_samples":196}
* vm_threestate_xormul64_loop passes both gates first try. Three-state cross-feeding recurrence: t=a^b; a=b; b=c+1; c=t*GR+a over n=(x&7)+1 iters. Distinct from vm_tribonacci64_loop (additive a,b,c -> b,c,a+b+c) and vm_pairmix64_loop (two-state). Three i64 slots all updated each iter with sequential reads captured into temp t before any writeback (TEA-bug workaround pattern). Returns combined a^b^c.
Result: {"status":"keep","vm_sample_count":166,"total_semantic_cases":1852,"manifest_samples":197}
* vm_xxhmix64_loop passes both gates first try. xxhash-style per-byte mix `r = (r ^ byte) * PRIME64_3` over n=(x&7)+1 bytes plus final xor-fold by lshr 33. Distinct from vm_fnv1a64_loop (40-bit FNV prime, no fold), vm_murmurstep64_loop (no byte windowing), vm_djb264_loop (additive *33). Tests xor-then-mul with 64-bit xxhash multiplier per byte plus a finalizer step in a separate post-loop PC state.
Result: {"status":"keep","vm_sample_count":167,"total_semantic_cases":1862,"manifest_samples":198}
* vm_fmix_chain64_loop passes both gates first try. Murmur3 64-bit finalizer applied n=(x&7)+1 times: r ^= r>>33; r *= 0xFF51..CCD; r ^= r>>33; r *= 0xC4CE..C53. Distinct from vm_fmix64_loop (single fmix application no loop), vm_xxhmix64_loop (per-byte mix one mul + post-loop fold), vm_murmurstep64_loop (single magic + xor with input each iter), vm_splitmix64_loop (different magics + constant additive step). Tests dual-magic xor-mul-xor-mul finalizer chain inside counter-bound loop body.
Result: {"status":"keep","vm_sample_count":168,"total_semantic_cases":1872,"manifest_samples":199}
* vm_zigzag_step64_loop passes both gates first try. ZigZag encoding chained over a stepped state: enc=(s<<1)^((i64)s>>63); r+=enc; s+=GR over n=(x&7)+1 iters. Tests ashr i64 ... 63 (sign-broadcast arithmetic right shift) inside loop body. Distinct from vm_signedbytesum64_loop (per-byte sext-i8) and vm_splitmix64_loop (no ashr). Reaches 200 manifest entries milestone.
Result: {"status":"keep","vm_sample_count":169,"total_semantic_cases":1882,"manifest_samples":200}
* vm_xormuladd_chain64_loop passes both gates first try. Three-op single-state chain over n=(x&7)+1 iters: r=r^x; r=r*0x1000193; r=r+x. Distinct from vm_murmurstep64_loop (xor-mul-lshr-fold; 64-bit magic), vm_fmix_chain64_loop (xor-mul-xor-mul; two 64-bit magics; no add), vm_xxhmix64_loop (xor-byte mul; post-loop fold). Tests xor + small-magic mul + add chain on single accumulator. Reaches 170 sample milestone.
Result: {"status":"keep","vm_sample_count":170,"total_semantic_cases":1892,"manifest_samples":201}
* vm_subxor_chain64_loop passes both gates after fixing one transcribed expected value (caught before run). Single-state sub-xor chain over n=(x&7)+1 iters: r=(r-x)^(x<<3). Distinct from vm_xormuladd_chain64_loop (xor+mul+add), vm_xorbytes64_loop (XOR-only), vm_horner64_loop (mul+add). Tests `sub i64` chained with shl-3 and xor inside dispatcher loop body. Sub is underused vs add in existing samples.
Result: {"status":"keep","vm_sample_count":171,"total_semantic_cases":1902,"manifest_samples":202}
* vm_negstep64_loop passes both gates first try. Two-state recurrence with arithmetic negation: r=-r+s; s=s+1 over n=(x&7)+1 iters. Distinct from vm_subxor_chain64_loop (sub state-minus-input), vm_xormuladd_chain64_loop (xor+mul+add). Tests `sub i64 0, r` (negate) pattern inside dispatcher loop. Negation flips accumulator sign per iter; with stepped state s, telescoping produces predictable patterns.
Result: {"status":"keep","vm_sample_count":172,"total_semantic_cases":1912,"manifest_samples":203}
* vm_bitfetch_window64_loop passes both gates first try. Bitwise reversal of low n=(x&7)+1 bits via dynamic shift `(x >> i) & 1` per iter. Tests `lshr i64 x, i` with i a loop-index variable - non-constant shift amount inside dispatcher loop body. Distinct from vm_byterev_window64_loop (8-bit fixed shift) and vm_nibrev_window64_loop (4-bit fixed shift) which use constant shifts.
Result: {"status":"keep","vm_sample_count":173,"total_semantic_cases":1922,"manifest_samples":204}
* vm_dynshl_pack64_loop passes both gates first try. XOR-pack 2-bit chunks of x at dynamic bit positions controlled by loop index: r ^= ((s & 0x3) << i); s >>= 2. Tests `shl i64 v, %i` (dynamic LEFT shift) - complement to vm_bitfetch_window64_loop's dynamic LSHR. Distinct shift direction with same dynamic-amount property.
Result: {"status":"keep","vm_sample_count":174,"total_semantic_cases":1932,"manifest_samples":205}
* vm_dyn_ashr64_loop passes both gates first try. Dynamic-amount ASHR (signed shift right) by counter: sx = (i64)x >> i; r ^= byte(sx) over n=(x&7)+1 iters. Distinct from vm_bitfetch_window64_loop (dynamic LSHR), vm_dynshl_pack64_loop (dynamic SHL), vm_zigzag_step64_loop (constant ashr-63). Completes the dynamic-shift trio (lshr/shl/ashr). Negative-sign inputs fill with 1s producing different XOR patterns than unsigned shift.
Result: {"status":"keep","vm_sample_count":175,"total_semantic_cases":1942,"manifest_samples":206}
* vm_bytesmul_idx64_loop passes both gates first try. Per-byte signed accumulator scaled by 1-based loop index: r += sext(byte) * (i+1) over n=(x&7)+1 iters. Distinct from vm_signedbytesum64_loop (no index multiplier) and vm_altbytesum64_loop (fixed alternating sign). Tests sext-i8 multiplied by dynamic counter value (i+1) - i64 mul against phi-tracked counter rather than constant.
Result: {"status":"keep","vm_sample_count":176,"total_semantic_cases":1952,"manifest_samples":207}
* vm_notand_chain64_loop passes both gates first try. NOT-AND chain with dynamic-shift xor: r=(~r)&x; r^=(i<<3) over n=(x&7)+1 iters. Tests bitwise NOT (xor i64 r, -1) followed by AND with input (BMI andn-style idiom), then xor with i<<3 (dynamic shl by counter).
Result: {"status":"keep","vm_sample_count":177,"total_semantic_cases":1962,"manifest_samples":208}
* vm_xormul_byte_idx64_loop passes both gates first try. XOR-fold scaled bytes: r ^= byte * (i+1) over n=(x&7)+1 iters. Distinct from vm_bytesmul_idx64_loop (signed-byte sext + ADD) - this one uses unsigned-byte zext + XOR. Tests u8 zext multiply by dynamic counter (i+1) folded via XOR rather than ADD.
Result: {"status":"keep","vm_sample_count":178,"total_semantic_cases":1972,"manifest_samples":209}
* vm_signedxor_byte_idx64_loop passes both gates first try. Signed-byte sext * (i+1) folded via XOR over n=(x&7)+1 iters. Fills the sext+XOR cell of the per-byte * counter matrix. Distinct from vm_xormul_byte_idx64_loop (zext + XOR) and vm_bytesmul_idx64_loop (sext + ADD). For high-bit-set bytes, sext populates upper 56 bits with 1s producing different XOR fold than zext (e.g. 0xF0 byte -> 2^64-16 vs unsigned 240).
Result: {"status":"keep","vm_sample_count":179,"total_semantic_cases":1982,"manifest_samples":210}
* vm_uintadd_byte_idx64_loop passes both gates first try. Unsigned-byte (zext) * (i+1) folded via ADD over n=(x&7)+1 iters. Fills the zext+ADD cell, COMPLETING the per-byte * counter matrix across all four (zext/sext) x (ADD/XOR) cells. Reaches 180-sample milestone.
Result: {"status":"keep","vm_sample_count":180,"total_semantic_cases":1992,"manifest_samples":211}
* vm_bytesq_sum64_loop passes both gates first try. Sum of byte*byte (u8 self-multiply) over n=(x&7)+1 iters. Distinct from vm_popsq64_loop (sum of squared POPCOUNTS), vm_squareadd64_loop (single-state r*r quadratic), vm_uintadd_byte_idx64_loop (byte * counter). Tests u8 self-multiply on the byte stream with no counter scaling.
Result: {"status":"keep","vm_sample_count":181,"total_semantic_cases":2002,"manifest_samples":212}
* vm_byteprod64_loop passes both gates first try. Running product of bytes r *= byte over n=(x&7)+1 iters, seeded r=1. Distinct from vm_bytesq_sum64_loop (squared bytes summed), vm_uintadd_byte_idx64_loop (byte * counter ADD), vm_xormul_byte_idx64_loop (byte * counter XOR). Tests `mul i64 r, byte` chained where any zero byte collapses the product but the loop still runs to completion.
Result: {"status":"keep","vm_sample_count":182,"total_semantic_cases":2012,"manifest_samples":213}
* vm_andsum_byte_idx64_loop passes both gates first try. Per-iter byte AND-ed with counter, summed: r += (byte & (i+1)) over n=(x&7)+1 iters. Distinct from vm_uintadd_byte_idx64_loop (byte * counter ADD), vm_xormul_byte_idx64_loop (byte * counter XOR). Tests `and i64 byte, counter` (zext-byte AND with phi-tracked i+1) folded via ADD - bitwise mask interaction with dynamic counter values.
Result: {"status":"keep","vm_sample_count":183,"total_semantic_cases":2022,"manifest_samples":214}
* vm_orsum_byte_idx64_loop passes both gates first try. Per-iter OR of byte and counter folded into accumulator: r |= byte | (i+1) over n=(x&7)+1 iters. Distinct from vm_andsum_byte_idx64_loop (AND fold), vm_xormul_byte_idx64_loop (XOR of byte*counter), vm_uintadd_byte_idx64_loop (ADD of byte*counter). Tests `or i64` chain that is monotone (only sets bits) - counter values 1..8 always contribute fixed low bits.
Result: {"status":"keep","vm_sample_count":184,"total_semantic_cases":2032,"manifest_samples":215}
* vm_subbyte_idx64_loop passes both gates first try. SUB-fold of u8 zext * counter: r -= byte * (i+1) over n=(x&7)+1 iters. Distinct from vm_uintadd_byte_idx64_loop (same body ADD-folded) - tests SUB on the same per-byte * counter accumulator. Result wraps below zero into u64 modular space.
Result: {"status":"keep","vm_sample_count":185,"total_semantic_cases":2042,"manifest_samples":216}
* vm_bytediv5_sum64_loop passes both gates first try. Sum of byte/5 over n=(x&7)+1 iters. Tests udiv-by-5 chain on byte stream. Distinct from vm_adler32_64_loop (urem by 65521 prime modular), vm_trailzeros_factorial64_loop (udiv-5 on single state), vm_uintadd_byte_idx64_loop (mul not div). All-0xFF: 8 * (255/5)=408.
Result: {"status":"keep","vm_sample_count":186,"total_semantic_cases":2052,"manifest_samples":217}
* vm_bytemod3_sum64_loop passes both gates first try. Sum of byte%3 over n=(x&7)+1 iters. Tests urem-by-3 chain on byte stream. Distinct from vm_bytediv5_sum64_loop (udiv-by-5) and vm_adler32_64_loop (urem-by-65521 prime). Small-modulus complement to /5 sample. All-0xFF: 255%3=0, sum=0.
Result: {"status":"keep","vm_sample_count":187,"total_semantic_cases":2062,"manifest_samples":218}
* vm_byteshl3_xor64_loop passes both gates first try. XOR-pack bytes at dynamic positions controlled by `i*3` over n=(x&7)+1 iters. Tests `shl i64 byte, %i*3` (dynamic shl by NON-trivial counter expression - mul-then-shl). Distinct from vm_dynshl_pack64_loop (shl by i directly, 2-bit chunks).
Result: {"status":"keep","vm_sample_count":188,"total_semantic_cases":2072,"manifest_samples":219}
* vm_byteshl_data64_loop passes both gates first try. Data-dependent shl: r=(r << (b&7)) | (b>>4) over n=(x&7)+1 iters. Tests `shl i64 r, %byte_amount` where shift amount is derived from the BYTE STREAM rather than loop counter. Distinct from vm_dynshl_pack64_loop (shl by i) and vm_byteshl3_xor64_loop (shl by i*3 - counter expression).
Result: {"status":"keep","vm_sample_count":189,"total_semantic_cases":2082,"manifest_samples":220}
* vm_data_lshr64_loop passes both gates first try. Data-dependent right shift counterpart to vm_byteshl_data64_loop: r=(r >> (b&7)) ^ b over n=(x&7)+1 iters. Tests `lshr i64 r, %byte_amount` (right-shift by byte-derived amount). Initial r=~0 with all-1s shifts down by data-driven amounts. Reaches 190 sample milestone.
Result: {"status":"keep","vm_sample_count":190,"total_semantic_cases":2092,"manifest_samples":221}
* vm_data_ashr64_loop passes both gates first try. Data-dependent ashr counterpart: r=(i64 r >> (b&7)) + b over n=(x&7)+1 iters. Tests `ashr i64 r, %byte_amount` (signed right-shift by byte-derived amount). Completes the data-dependent shift trio (shl/lshr/ashr) - distinct from vm_dyn_ashr64_loop (ashr by counter not byte data).
Result: {"status":"keep","vm_sample_count":191,"total_semantic_cases":2102,"manifest_samples":222}
* vm_mul3byte_chain64_loop passes both gates first try. Horner-style hash with multiplier 3: r = r*3 + byte over n=(x&7)+1 iters. Distinct from vm_djb264_loop (*33), vm_fnv1a64_loop (FNV prime), vm_horner64_loop (general polynomial). Tests `mul i64 r, 3` (small-constant multiplier - non-power-of-2 coefficient that lifter typically keeps as raw mul rather than lea-by-3 fold).
Result: {"status":"keep","vm_sample_count":192,"total_semantic_cases":2112,"manifest_samples":223}
* vm_shiftin_top64_loop passes both gates first try. Shift register filled from the top: r=(r>>8)|(byte<<56) over n=(x&7)+1 iters. Tests `lshr i64 r, 8 | shl i64 byte, 56` shift-register update pattern. Distinct from vm_byterev_window64_loop (shl-or pack from low end). After n=8 iters, all-FF input is preserved (palindrome invariant).
Result: {"status":"keep","vm_sample_count":193,"total_semantic_cases":2122,"manifest_samples":224}
* vm_orxor_pair64_loop passes both gates first try. Two-state cross-feed with explicit temp barrier: t=a; a=a|b; b=t^(b*7) over n=(x&7)+1 iters. Combines monotone OR fold on a with non-monotone XOR-mul evolution on b. Distinct from vm_pairmix64_loop (add+mul-by-GR cross-feed), vm_threestate_xormul64_loop (three states), vm_orsum_byte_idx64_loop (single-state OR fold).
Result: {"status":"keep","vm_sample_count":194,"total_semantic_cases":2132,"manifest_samples":225}
* vm_lcg_ansi_chain64_loop passes both gates first try. Classic ANSI C rand() LCG chained over n=(x&7)+1 iters: r = r*1103515245 + 12345. Distinct from vm_xorrot64_loop (LCG with golden-ratio + xor accum), vm_pcg64_loop, vm_xorshift64_loop. Single-state LCG with canonical multiplier+increment pair.
Result: {"status":"keep","vm_sample_count":195,"total_semantic_cases":2142,"manifest_samples":226}
* vm_bytesq_idx_sum64_loop passes both gates first try. Sum of byte * (i+1) * (i+1) - SQUARED counter expression as multiplier. Two sequential muls per iter (counter*counter then byte*counter^2). Distinct from vm_uintadd_byte_idx64_loop (linear counter) and vm_bytesq_sum64_loop (byte self-multiply, no counter). All-0xFF: 0xFF*204=52020.
Result: {"status":"keep","vm_sample_count":196,"total_semantic_cases":2152,"manifest_samples":227}
* vm_dynshl_accum_byte64_loop passes both gates first try. Shift accumulator left by (i+1) then add byte: r=(r<<(i+1))+byte over n=(x&7)+1 iters. Tests `shl i64 %r, %(i+1)` (shift ACCUMULATOR by phi-tracked counter rather than the byte). Distinct from vm_dynshl_pack64_loop (shl byte by counter) and vm_byteshl_data64_loop (data-dependent shl on accumulator).
Result: {"status":"keep","vm_sample_count":197,"total_semantic_cases":2162,"manifest_samples":228}
* vm_dynlshr_accum_byte64_loop passes both gates after recovering from aborted previous turn (file was on disk, manifest entry missing). Shifts r right by (i+1) bits then XORs the byte: r=(r>>(i+1))^byte over n=(x&7)+1 iters with r seeded ~0. Tests `lshr i64 %r, %(i+1)` (lshr accumulator by phi-tracked counter expression). Distinct from vm_dynshl_accum_byte64_loop (shl direction) and vm_data_lshr64_loop (lshr by byte data not counter).
Result: {"status":"keep","vm_sample_count":198,"total_semantic_cases":2172,"manifest_samples":229}
* vm_dynashr_accum_byte64_loop passes both gates first try. ASHR accumulator by counter then add byte: r=(i64 r >> (i+1)) + byte over n=(x&7)+1 iters. Tests `ashr i64 %r, %(i+1)` (signed right-shift accumulator by phi-tracked counter). Completes the counter-driven accumulator-shift trio (shl/lshr/ashr).
Result: {"status":"keep","vm_sample_count":199,"total_semantic_cases":2182,"manifest_samples":230}
* vm_xormulself_byte64_loop passes both gates first try. Self-referential multiply: r ^= byte * (r+1) over n=(x&7)+1 iters. Tests `mul i64 byte, (r+1)` where multiplier operand is the accumulator+1 - r appears on both sides of the body. Distinct from vm_xormul_byte_idx64_loop (byte * counter) and vm_squareadd64_loop (r*r self-multiply on full state). Reaches 200-sample milestone.
Result: {"status":"keep","vm_sample_count":200,"total_semantic_cases":2192,"manifest_samples":231}
* vm_xor_shifted_self_byte64_loop passes both gates first try. Self-shift used as XOR mask combined with byte at MSB: r ^= (r>>8) | (byte<<56) over n=(x&7)+1 iters. Distinct from vm_shiftin_top64_loop (assigns same expression, no XOR), vm_xormulself_byte64_loop (mul-self with byte), vm_byterev_window64_loop (no XOR).
Result: {"status":"keep","vm_sample_count":201,"total_semantic_cases":2202,"manifest_samples":232}
* vm_pair_xormul_byte64_loop passes both gates first try. Per-iter pair (b0,b1) combined as (b0^b1) * (b0+b1) over n=(x&3)+1 iters. Tests TWO byte reads per iteration with XOR + ADD + MUL combination. Trip uses `& 3` so loop consumes 2 bytes per iter (1..4 pair iters). Distinct from all single-byte-per-iter samples.
Result: {"status":"keep","vm_sample_count":202,"total_semantic_cases":2212,"manifest_samples":233}
* vm_quad_byte_xor64_loop passes both gates first try. FOUR byte reads per iteration combined via 3 chained XORs then ADD-folded over n=(x&1)+1 iters (32-bit stride). Distinct from vm_pair_xormul_byte64_loop (2 bytes per iter) and all single-byte samples. Tests wider stride consumption and multi-byte body shape.
Result: {"status":"keep","vm_sample_count":203,"total_semantic_cases":2222,"manifest_samples":234}
* vm_word_xormul64_loop passes both gates first try. u16 word per iter (16-bit stride): r ^= w*w over n=(x&3)+1 iters. Tests u16 zext-i16 self-multiply XOR-folded. Distinct from vm_bytesq_sum64_loop (8-bit stride, ADD) and vm_pair_xormul_byte64_loop (16-bit stride but byte ops).
Result: {"status":"keep","vm_sample_count":204,"total_semantic_cases":2232,"manifest_samples":235}
* vm_word_horner13_64_loop passes both gates first try. Horner-style hash on u16 words with multiplier 13: r = r*13 + w over n=(x&3)+1 iters. Distinct from vm_mul3byte_chain64_loop (Horner on bytes mul 3), vm_djb264_loop (bytes mul 33), vm_word_xormul64_loop (word self-multiply XOR). Wider stride + different multiplier than existing byte-Horner samples.
Result: {"status":"keep","vm_sample_count":205,"total_semantic_cases":2242,"manifest_samples":236}
* vm_dword_xormul64_loop passes both gates first try. u32 dword per iter (32-bit stride) with golden-ratio prime mul XOR-folded: r ^= dword * 0x9E3779B9 over n=(x&1)+1 iters. Distinct from vm_word_xormul64_loop (16-bit stride) and vm_quad_byte_xor64_loop (4 bytes per iter, no mul). Tests u32 zext-i32 mask + 32-bit-magic multiply.
Result: {"status":"keep","vm_sample_count":206,"total_semantic_cases":2252,"manifest_samples":237}
* vm_signed_dword_sum64_loop passes both gates first try. Sum of sext-i32 dwords per iter over n=(x&1)+1 iters. Tests `sext i32 to i64` chain on 32-bit dword stream. Distinct from vm_signedbytesum64_loop (sext-i8 byte, 8-bit stride) and vm_dword_xormul64_loop (zext dword XOR, no sign extension).
Result: {"status":"keep","vm_sample_count":207,"total_semantic_cases":2262,"manifest_samples":238}
* vm_signed_word_sum64_loop passes both gates first try. Sum of sext-i16 words per iter over n=(x&3)+1 iters. Tests `sext i16 to i64` chain on 16-bit word stream. Fills the i16 middle width and completes the sext-width trio (i8/i16/i32 -> i64).
Result: {"status":"keep","vm_sample_count":208,"total_semantic_cases":2272,"manifest_samples":239}
* vm_word_range64_loop passes both gates after restructuring to n-decrement (4 slots: n,s,mn,mx). Tests u16 cmp-driven reductions at 16-bit stride: mx=umax(w,mx); mn=umin(w,mn); return mx-mn over n=(x&3)+1 iters. Lifter folds both reductions to llvm.umax.i64 + llvm.umin.i64. Documents new lifter limitation: 5-slot variant (with separate i counter) trips pseudo-stack init failure; 4-slot form works.
Result: {"status":"keep","vm_sample_count":209,"total_semantic_cases":2282,"manifest_samples":240}
* vm_signed_word_range64_loop passes both gates first try. Signed-i16 min/max range at word stride: tracks mx,mn over n=(x&3)+1 iters then returns mx-mn. Distinct from vm_word_range64_loop (unsigned -> umax/umin folds) and vm_signed_byterange64_loop (i8 stride). Per documented asymmetry, signed cmp+select stays raw icmp slt + select. Reaches 210-sample milestone.
Result: {"status":"keep","vm_sample_count":210,"total_semantic_cases":2292,"manifest_samples":241}
* Add equivalence reporting tool for rewrite_smoke samples
* vm_dword_range64_loop passes both gates first try. u32 dword min/max range over n=(x&1)+1 iters. Tests umax/umin folds at 32-bit dword stride. Distinct from vm_byterange64_loop (8-bit) and vm_word_range64_loop (16-bit). Extends range coverage to all four widths (u8/u16/u32 + signed counterparts).
Result: {"status":"keep","vm_sample_count":211,"total_semantic_cases":2302,"manifest_samples":242}
* Generate per-sample original-vs-lifted equivalence reports for rewrite_smoke
* vm_signed_dword_range64_loop passes both gates first try. Signed-i32 dword min/max range over n=(x&1)+1 iters. Tests sext-i32 + signed cmp+select reductions at 32-bit stride. Completes the range coverage matrix (3 widths x 2 signs). Per documented signed-cmp asymmetry, signed cmp+select stays raw icmp slt + select.
Result: {"status":"keep","vm_sample_count":212,"total_semantic_cases":2312,"manifest_samples":243}
* vm_word_orfold64_loop passes both gates first try. u16 OR-fold over n=(x&3)+1 iters. Tests `or i64` chain at 16-bit word stride. Distinct from vm_orsum_byte_idx64_loop (byte | counter, 8-bit stride). Monotone OR fold (only sets bits).
Result: {"status":"keep","vm_sample_count":213,"total_semantic_cases":2322,"manifest_samples":244}
* Refresh equivalence reports for current 246-sample manifest
* vm_byte_andfold64_loop passes both gates. u8 AND-fold over n=(x&7)+1 bytes seeded with r=0xFF. Tests `and i64` chain at byte stride - monotone DECREASING accumulator counterpart to OR-fold. Distinct from vm_andsum_byte_idx64_loop (byte AND counter, ADD-folded).
Result: {"status":"keep","vm_sample_count":214,"total_semantic_cases":2332,"manifest_samples":245}
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Co-authored-by: Yusuf <yusuf@local>
|
||
|
|
c1ca564305 |
lifter: strip trailing inttoptr-constant stores before terminators (#202)
The lifter's pseudo-memory model emits writes to obfuscator-controlled
scratch (Themida's .vlizer slots, register-spill scratch in .data, etc.)
as 'store ... ptr inttoptr (i64 K to ptr)'. LLVM's DSE treats those
conservatively because every inttoptr constant can theoretically alias
an externally observable pointer, so they survive -O2 even when the
lifted function does not read them.
Add a post-O2 pass StripTrailingScratchStoresPass that walks each block
backwards from a 'ret' or 'unreachable' terminator and drops the
trailing run of inttoptr-constant stores. Stops at the first non-store
or non-inttoptr-constant-store instruction (call, load, store-through-
%memory-GEP, fence, ...) - those are real side effects.
This is conservative: only the stores between the last side-effecting
instruction and the terminator are removed. Stores that precede a
later call or load survive untouched, so program behaviour is
preserved.
Impact:
example2-virt.bin @ 0x140001000:
before: 257 lines, 222 stores
after: 240 lines, 205 stores (17 trailing stores stripped)
imports: 4/4 (unchanged)
example2.bin (non-virt) @ 0x140001000:
before: 247 lines, 212 stores
after: 37 lines, 3 stores (the function shrinks to its actual
program flow: 7 typed imports, a
llvm.memset, and 'ret i64 0')
warn/err: 0/0 (unchanged)
The non-virt main now reads as the original program:
GetStdHandle x2 -> WriteConsoleA(prompt) -> ReadConsoleA(buf)
-> CharUpperA(buf) -> WriteConsoleA(echo) -> WriteConsoleA(buf)
-> ret 0
python test.py baseline + quick + themida all green.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
0bac256522 |
lifter: add Win32 console-I/O signatures and preserve call-arg GEPs (#201)
Register proper prototypes for GetStdHandle, WriteConsoleA, ReadConsoleA,
and CharUpperA in funcsignatures. The infrastructure for typed-pointer
imports was already in place (parseArgs uses getPointer() when
arg.argtype.isPtr), but these four imports lacked signatures and fell
through to the unknown-call path that passes 16 GPRs + a raw %memory
pointer with every arg as i64.
Also extend PromotePseudoStackPass with a narrow escape filter: stack-
range %memory GEPs whose users include a CallBase are no longer
migrated onto the %stackmemory alloca. Migrating them would let a
call argument use the alloca pointer, which blocks mem2reg/SROA on the
alloca and leaves hundreds of dead stack stores in the post-opt IR.
Leaving those specific GEPs through %memory costs nothing (memory is
already a function argument). Other non-load/store uses (ptrtoint,
GEP-of-GEP, stored-as-value) still migrate, so rewrite_smoke samples
that depend on full alloca promotion keep working.
Before (example2-virt.bin @ 0x140001000):
%2 = tail call i64 @WriteConsoleA(i64 %1, i64 5368717648, i64 16,
i64 1375568, ptr %memory)
After:
declare i64 @WriteConsoleA(i64, ptr, i32, ptr) local_unnamed_addr
%2 = tail call i64 @WriteConsoleA(
i64 %1, ptr nonnull inttoptr (i64 5368717648 to ptr),
i32 16, ptr nonnull inttoptr (i64 1375568 to ptr))
All 4 imports now carry their real Win32 signature:
@GetStdHandle(i32)
@WriteConsoleA(i64, ptr, i32, ptr)
@ReadConsoleA (i64, ptr, i32, ptr)
@CharUpperA (ptr)
Speed on example2-virt.bin:
before: 1.25s median
after: 1.07s median (-14%)
Non-virt example2.bin output gets dramatically cleaner; the full
program flow is visible in ~20 lines with correctly typed calls
(GetStdHandle stdin/stdout, WriteConsoleA/ReadConsoleA with ptr
buffer args, CharUpperA on buffer, final WriteConsoleA x2). Still
0 warn, 0 err.
python test.py baseline + quick + themida remain green.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
99fdcb7531 |
lifter: lower IndirectJump shape-aware threshold from 128 to 80 (#200)
PR #195 set the IndirectJump shape-aware revisit threshold to 128 to unlock all 4 imports on example2-virt.bin. Sweeping the threshold post- chain shows 80 is sufficient: 4/4 imports surface at T=80 already, matching T=128's metric while doing significantly less work. T=80: 1615 blocks lifted, 4/4 imports T=128: 3077 blocks lifted, 4/4 imports Concrete impact on example2-virt.bin @ 0x140001000: metric T=128 T=80 -----------------+-----------+---------- wall (median) 2.21s 1.25s -43% pre-opt IR lines 50,856 38,260 -25% post-opt lines 305 247 -19% post-opt stores 268 212 -21% imports pre/post 4/4 4/4 same DirectJump and ConditionalBranch still use threshold 0, so rewrite_smoke VM-loop samples still generalise on their first backedge - no regression. python test.py baseline + quick + themida all green. Non-virt example2.bin unchanged. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
7bcb705b5a |
lifter: degrade PATH_unsolved ret to REAL_return to fix unterminated blocks (#199)
When lift_ret classifies a ret as ROP_return (because rspvalue is a ConstantInt != STACKP_VALUE) and solvePath subsequently returns PATH_unsolved, the previous code emitted a warning and left the block unterminated. The outer per-instruction lift loop in liftBasicBlockFromAddress would then advance current_address past the ret and lift the next byte, producing a second terminator and malformed IR (same shape as #196's earlier bug, exposed on a different code path). The most accurate semantic for an unresolvable ret is 'returns to a caller we do not have context for' - degrade to REAL_return: emit 'ret rax' and stop the block. The warning still fires (suppressed for chained PCs per #198) so the unresolvable signal is preserved as a diagnostic, but the IR stays well-formed. Visible at 0x14000110d (the entry function's own final ret) on example2-virt.bin @ 0x140001000: previously 1 warn + an unterminated block whose downstream lifted bytes produced spurious 'ret undef' in some block. After this change, the warning still surfaces but the block is terminated with 'ret rax' immediately. Knock-on improvement: with the ret site terminating cleanly, O2's DCE collapses the noise and the optimized IR now contains all 7 import calls in their original program order: GetStdHandle(STD_INPUT_HANDLE) ; stdin GetStdHandle(STD_OUTPUT_HANDLE) ; stdout WriteConsoleA(stdout, prompt) ReadConsoleA(stdin, buffer) CharUpperA(buffer) WriteConsoleA(stdout, echo) WriteConsoleA(stdout, buffer) Matches the emulator trace from PR #190 exactly. The post-opt IR went from 'imports declared but most call-sites DCE'd' to 'full original program flow visible'. Verified: python test.py baseline + quick + themida green. Non-virt example2.bin unchanged (2 blocks, 6 declares, 0 warn, 0 err). themida-virt: 4/4 imports pre-opt AND post-opt, 1 warn (legit top- level ret), 0 err - same headline numbers as pre-fix, but the post- opt IR is dramatically cleaner. Also drops the noisy stdout '[diag] lift_ret: unresolved ROP chain' print that ran in lockstep with the structured warning - the warning already conveys the same info via diagnostics.warning. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
463b6aca68 |
lifter: track chained import ret-sites to suppress redundant warnings (#198)
The ret-to-IAT chain in lift_ret recognises concrete VM-staged import
calls on first visit. Later symbolic re-entries of the same PC fall
through to solvePath, which returns PATH_unsolved when the popped
realval is now a phi of multiple concrete values from different
dispatch paths, and emits an UnresolvedRetChain warning.
The warning is factually correct - that specific revisit couldn't
resolve symbolically - but semantically redundant: the concrete import
semantics are already captured at the chain site, so the re-entry
carries no new information.
Track chained import-ret-site PCs in a new std::set member, insert on
successful chain fire, and skip the warning emission when the
PATH_unsolved site is in the set.
Impact on example2-virt.bin @ 0x140001000:
before: warn=1 err=0, site=0x14017fa77 (symbolic re-entry of the
GetStdHandle ret-site)
after: warn=1 err=0, site=0x14000110d (top-level entry ret - no
caller context, legitimately
unresolvable)
Non-virt, baseline, quick, and themida tests remain green. The warning
count is coincidentally unchanged; what moved is WHICH PC triggered
the single diagnostic emission. The new site is a genuine top-level
return that the lifter cannot resolve (entry function has no caller).
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
39b7fcb71f |
lifter: handle INT1 and INT3 like UD2 (call @exception; ret) (#197)
Both 0xF1 (INT1, ICEBP debug trap) and 0xCC (INT3, debugger break) previously fell through to the 'Instruction not implemented' default, emitting a DiagCode::InstructionNotImplemented error. They raise #DB/#BP exceptions at runtime, functionally equivalent to UD2 which already lowers to 'call @exception; ret'. Group them with UD2 by adding two fall-through case labels. Same lowering: emit call @exception(), ret, and stop the block. On example2-virt.bin @ 0x140001000: before: 1 warn, 1 err (INT1 at 0x1401928ef) after: 1 warn, 0 err (INT1 now lifts cleanly as @exception call) Baseline + quick + themida remain green. Non-virt example2.bin unchanged. The themida test's 'extra imports' list gains '@exception' alongside the existing '@fastfail' for the same kind of lowering. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
8f852966f1 |
lifter: stop the lift loop after the ret-to-IAT chain emits its terminator (#196)
The chain in lift_ret emits a br to the continuation block and returns.
But lift_ret returning does not stop the outer per-instruction lift
loop in liftBasicBlockFromAddress - that loop only stops when run=0 or
finished=1. On return the loop advances current_address by the ret's
length and lifts the NEXT byte, which emits a second terminator in the
same BB.
Two terminators in one BB is accepted by IRBuilder but makes the block
malformed: LLVM treats only the first br as the live terminator. All
blocks the first br reaches still exist, but predecessor-tracking
silently treats the second br's target as the live successor, so the
chain's continuation block (and everything downstream of it) ends up
orphaned with zero predecessors.
Visible consequence: CharUpperA's call was in bb_solved_const8212 whose
source block had two consecutive br's. The chain's br to the
bb_after_import_CharUpperA BB was the second terminator, so the first
br went to some other block and CharUpperA's continuation became
unreachable. After O2 it got DCE'd entirely along with its declare.
Fix: set run=0 and finished=1 after CreateBr in the chain path. The
outer loop exits cleanly, the block keeps its single br to contBB, and
the continuation path stays connected to the CFG.
Impact on example2-virt.bin @ 0x140001000:
pre-opt IR: 4/4 imports, 2823 blocks, warn=1 err=1
(was 2365 blocks, warn=7 err=2)
post-opt IR: 4/4 imports survive DCE
(was 1/4 - only GetStdHandle survived)
Baseline + quick + themida remain green. The malformed-block pattern
was the source of most of the junk-switch warnings too: those paths
were only reached via the second br's stray successor; with the fix
the lifter no longer explores them.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
cf7e2f34a8 |
lifter: recover all 4 themida-virt imports via ret-to-IAT chain (#195)
On example2-virt.bin @ 0x140001000, the lifter previously surfaced only GetStdHandle (1/4 of the required imports). This change unlocks 4/4: before: 359 blocks, 1/4 imports, warn=0, err=0 after: 2365 blocks, 4/4 imports, warn=7, err=2 python test.py themida now passes on this sample: PASS: example2 - 5 distinct imports, 7 calls (required 4) Three coupled changes: 1. Ret-to-IAT chain in lift_ret (Semantics_ControlFlow.ipp) When the popped target is a concrete import VA AND the top of the new stack is a concrete continuation, emit `call @import` and branch to the continuation block instead of letting the ret go to solvePath. This keeps exploration alive past the import so the VM's subsequent handlers (which carry the other imports) get reached. 2. Preserve caller-saved GPRs across VM-staged imports The chain's `CreateCall` goes through buildUnknownCallFx with an EMPTY volatileRegs set. Rationale: VM-staged imports are invoked from a dispatcher that preserves its own caller-saved state across the external call in the real binary (otherwise the VM would be broken). Clobbering those regs in the lifter made the dispatcher's next step non-concrete, trapping further exploration in one handler. Only applied to this specific call path. All other external calls still use the strict x64 MSVC ABI (caller-saved clobbered) through the unchanged applyPostCallEffects default. 3. Raise shape-aware IndirectJump threshold from 16 to 128 The VM dispatcher re-enters its header many times per bytecode step; 16 iterations are not enough to cover all four import handlers. 128 does. DirectJump and ConditionalBranch stay at threshold 0, so rewrite_smoke VM-loop samples still generalize immediately on their first backedge. Verified: - python test.py baseline green (rewrite regression + determinism) - python test.py quick green (33/33 semantic + all instruction microtests) - python test.py themida green (PASS on example2) - non-virt example2.bin unchanged: 2 blocks, 6 declares, 0 warn, 0 err Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
194f43b56d |
lifter: guard pop rsp against out-of-range concrete values (#194)
A concrete RSP produced by `pop rsp` that falls outside the tracked
pseudo-stack range crashes the lifter on the next [rsp] memory op, via
an unmapped dereference inside GetMemoryValue/solveLoad. This shape
appears in VM stack-switch gadgets (example: the Themida-virt handler
at 0x14017facc..fae3) and the crash kills deep-exploration sweeps.
When lift_pop detects the destination operand is RSP and Rvalue is a
ConstantInt outside `isTrackedStackAddress`, emit a structured warning
and an unreachable terminator for the block instead of writing the bad
RSP and letting the next memory op take us down.
Symbolic Rvalue is unchanged - the existing symbolic-load path already
handles it. ConstantInt within the tracked stack range is unchanged -
legitimate `pop rsp` on a real stack pointer stays supported.
Verified on example2-virt.bin @ 0x140001000:
default (no chain, T=16 IndirectJump): 359 blocks, 0 warn, 0 err
chain + T=32: before: SEGV at block 755 / PC 0x14017fae1
after: 755 blocks, 1/4 imports, 2 warn, 0 err, exit 0
chain + NO_LOOP_GEN: 2972 blocks, 1/4, exit 0 (no more crash)
Baseline rewrite regression + determinism checks remain green. Default
themida lift unchanged (359 blocks, 1/4 imports). The guard fires only
when exploration actually reaches the gadget, which at current defaults
it does not.
Co-authored-by: Claude <claude@anthropic.com>
|
||
|
|
08709710da |
lifter: add MERGEN_INSTR_TRACE_FILE per-instruction breadcrumb (#193)
Companion to MERGEN_BLOCK_TRACE_FILE. Writes one line per instruction lifted to the target file, with per-instruction flush. Pins exactly which instruction VA precedes a crash when debuggers are unavailable. Format: 0x<instruction-VA> Used to localise the chain+T=32 crash from 'block 755' down to the exact PC 0x14017fae1 - the 'push [rsp]' immediately after 'pop rsp' in the VM's stack-switch gadget at 0x14017facc..0x14017fae3. That pair is the concrete crash trigger: pop rsp loads a new RSP value, then push [rsp] attempts to read from that new RSP and the lifter's memory path crashes when the address is out-of-range. No behaviour change when the env var is unset. Verified: - python test.py baseline green - default themida lift unchanged (359 blocks, 1/4 imports) Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
e7b0babec8 |
lifter: add MERGEN_BLOCK_TRACE_FILE breadcrumb env var (#192)
Emit one line per block pulled off the lift worklist to a file, with per- block flush. Useful for pinning deep-exploration crashes when a debugger is unavailable or cannot attach to the release-built lifter on this host. Format per line (space-separated): <fnc->size()-at-pop> <0x-prefixed-block-VA> Emitted only when MERGEN_BLOCK_TRACE_FILE=<path> is set in the environment. File is opened in append mode and flushed + closed on every block, so the last line always survives a crash. Used during iteration on the Themida-virt import-recovery work to pin a chain+T=32 SEGV to block 755, VA 0x14017facc. Bytes there include a 'pop rsp' (5c), a classic symbolic-memory hazard. No behaviour change when the env var is unset. Verified: - baseline rewrite regression + determinism green - default themida lift unchanged (359 blocks, 1/4 imports) Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
8d33c102a6 |
lifter: make maxBasicBlockBudget tunable via MERGEN_MAX_BLOCK_BUDGET (#191)
The per-function basic-block budget is currently a compile-time 4096.
Deep-exploration crashes (e.g. the chain + T>=32 SEGV at ~1891 blocks
tracked in docs/README.md under open blockers) need a reliable way to
cap the lift at a specific block count to bisect the crash site.
Expose an integer env var that overrides the default:
MERGEN_MAX_BLOCK_BUDGET=0 disables the cap entirely
MERGEN_MAX_BLOCK_BUDGET=<N> caps the function at N basic blocks,
which cleanly terminates with the
existing LiftBlockBudgetExceeded error
once fnc->size() reaches N
Verified:
- budget=100 stops at 99 blocks and emits the expected error
- budget=0 leaves behaviour unchanged (359 blocks on example2-virt)
- default lift (no env) leaves behaviour unchanged
- python test.py baseline still green (all rewrite regression checks)
No behaviour change for existing runs. Pure diagnostic knob.
Co-authored-by: Claude <claude@anthropic.com>
|
||
|
|
aff35bc01c |
diag: sentinel pages as ret stubs so tracer observes every import (#190)
Before: each IAT slot pointed at an unmapped sentinel address, so the FIRST import call raised UC_ERR_FETCH_UNMAPPED and the emulator stopped. We only ever observed one import per run. After: sentinel addresses live in a mapped page filled with 0xC3 (near ret) instructions. Each import call fetches the ret byte, immediately returns to the VMs pre-staged continuation, and emulation keeps going. All subsequent imports now surface as [HIT] events. On example2-virt.bin @ 0x140001000 this finds every required import: insn ret-site import target ---- -------- ------ ------ 34223 0x14017fa77 GetStdHandle stdin 44847 0x14017fa77 GetStdHandle stdout 60695 0x14017ef9f WriteConsoleA prompt 74394 0x140192798 ReadConsoleA 85326 0x140157ef9 CharUpperA 97859 0x14013bf11 WriteConsoleA echo 110166 0x14017fa77 WriteConsoleA final This gives the full map of import ret-site addresses for the virt sample - useful for future work that needs to reach those sites (whether by deeper lifter exploration or by seeding additional entries). The lifter currently reaches 0x14017fa77 only. Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
f340c8186a |
docs: ret-to-IAT chain retry findings under shape-aware defaults (#189)
Updates the tombstone comment in lift_ret with what this iteration discovered when re-attempting the chained-continuation variant under the post-#188 shape-aware defaults (T=16 on IndirectJump, 0 elsewhere): - At effective T=16: chain safely fires once at 0x14017fa77 (GetStdHandle, continuation 0x1401c888e) and explores 40 more blocks (359 -> 399), but does not surface any additional imports. Still 1/4. - At T>=32: still crashes at ~1891 blocks deep, same as #187. Two reasons chaining is not wired in: 1. T>=32 crash blocks broader use. 2. Safe T=16 chain does not reach other import ret sites within the generalization-bounded exploration budget. The chain block is left as a guarded diagnostic (requires MERGEN_RET_ CHAIN=1) so researchers can reproduce the T=16 exploration envelope and the T>=32 crash, but the default path remains just the PathSolver hook with 'call @import(); unreachable' leaves. Comment-only change. No code-behaviour change. Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
90d5ca23b6 |
lifter: shape-aware gen threshold: 16 on IndirectJump, 0 elsewhere (#188)
Default env now surfaces GetStdHandle on example2-virt.bin (0/4 -> 1/4
on python test.py themida). Shape-aware discriminator uses the path-
solve context:
- IndirectJump (jmp reg / jmp [mem]): likely a VM dispatcher, hold
generalisation off for the first 16 revisits so concrete exploration
has a chance to reach the IAT-gadget ret sites.
- DirectJump / ConditionalBranch: simple guest loops - generalise on
the first backedge, preserving the rewrite_smoke VM-loop patterns.
Replaces the earlier 'targetResolvedConcretely' signal which fired on
every solvePath-resolved target - too broad, and would re-regress the
dummy_vm_loop / bytecode_vm_loop / stack_vm_loop samples.
Measurement on example2-virt.bin @ 0x140001000:
before after this commit
imports 0/4 1/4 (GetStdHandle)
insns 2544 11441
blocks 56 359
errors 0 0
warnings 0 0
MERGEN_GEN_MIN_REVISITS env still overrides per-header for researchers.
Values {6, 8, 12} still crash on the virt sample - same unrelated
dispatcher-state bug as before.
Regression checks:
- non-virt example2.bin: 61 insns, 6 imports, 0 warn/err (unchanged).
- python test.py baseline: passes; determinism check passes.
- All three rewrite_smoke VM-loop samples (dummy/bytecode/stack): passing
their required IR patterns.
Co-authored-by: Claude <claude@anthropic.com>
|
||
|
|
c23fb23918 |
docs: explain why ret-to-IAT chaining is not wired up (#187)
Elaborate the comment next to the PathSolver-centralised import hook to explain what was tried and why chaining is not live yet, so the next session does not re-litigate the attempt from scratch. Attempted variant: after recognising a ret-to-IAT, emit callFunctionIR and feed the VM's pre-staged continuation ([rsp+8] before the original ret) to solvePath, so exploration follows into the next VM handler. Includes a mapped-address safety guard on the first chain step. Result: crashes the lifter with access violation on T>=32 runs of example2-virt.bin; T=16 still works but gives the same 1/4 result as the non-chained path. The crash is downstream of the chain (in one of the extra blocks chaining unlocks), not at the chain step itself - the guard does not catch it. Needs a debugger session to localise before landing. Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
4c16fa972e |
lifter: cleanup follow-ups to the PathSolver import hook (#186)
Two small cleanups on top of #185. 1. PathSolver.resolveTargetBlock: return reusedBackedge=true for the synthetic import block. solvePath uses that flag to skip queuing the target for further lifting - which is what we want, because the import block's 'address' is an IAT slot VA or a hint/name VA and trying to decode the bytes at those addresses would repeat the OUTSD 'not implemented' error the hook exists to avoid. 2. lift_ret: delete the ret-to-IAT pre-check added in #184. The PathSolver hook already catches every target lift_ret's pre-check would have caught (and more, via solvePath's getConstraintVal route). Removing it deletes ~50 lines of duplicated logic and leaves the hook as the single source of truth. Functionally neutral on every measured configuration: - virt default env: 0/4 (unchanged) - virt MERGEN_GEN_MIN_REVISITS=16: 1/4 GetStdHandle (unchanged) - virt MERGEN_NO_LOOP_GEN=1: 1/4 GetStdHandle (unchanged) - non-virt example2.bin: 6 imports, 0 warn/err (unchanged) python test.py baseline passes. Determinism check passes. Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
4f7fa49447 |
lifter: centralise import recognition in PathSolver.resolveTargetBlock (#185)
When solvePath resolves its target to an entry in importMap (IAT slot VA or hint/name-alias VA, added in #184), materialise a leaf block containing 'call @<importName>(); unreachable' and return it as the branch target. One hook covers every solvePath caller (lift_jmp, lift_ret) instead of duplicating the check in each caller. Replaces the previous behaviour of following the resolved target into Kernel32 / .rdata - which triggered OUTSD 'not implemented' errors on example2-virt.bin under MERGEN_NO_LOOP_GEN=1 because the lifter tried to decode hint/name table bytes as code. Fires only when the lifter's solvePath actually reaches an IAT-backed target. At default env that doesn't happen on example2-virt.bin because loop-generalisation abstracts the VM dispatcher before any ret site is reached. With MERGEN_GEN_MIN_REVISITS=16 (env override) the first IAT gadget is reached and GetStdHandle surfaces as a named call in the IR; the test still fails 3/4 because the other imports' ret sites are gated by the same reachability ceiling. Raising the knob's DEFAULT to 16 was considered and reverted: it regresses the rewrite_smoke VM-loop samples (dummy_vm_loop, bytecode_vm_loop, stack_vm_loop) whose required patterns expect generalisation to fire on the first backedge. A shape-aware discriminator between a VM dispatcher and a simple loop is needed before the knob can safely be non-zero by default. Non-virt example2.bin: unchanged (61 insns, 6 imports, 0 warn/err). python test.py baseline: passes. Determinism check: passes. python test.py themida: still red 0/4 at default env; 1/4 under MERGEN_GEN_MIN_REVISITS=16. Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
424b26a38d |
lifter: alias hint/name VAs in importMap; stop lift_ret at import call (#184)
Two small, additive fixes that together let the virt Themida sample's
ret-to-IAT gadgets surface as named external calls.
1. LifterStages: alias hint/name entries in importMap.
For each name import, also register the hint/name entry's runtime
VA pointing at the same import name. The Win32 loader overwrites
each IAT slot with the resolved function pointer at startup, but
the lifter's static memory model returns the on-disk QWORD, which
for name imports is the RVA of the hint/name entry. Without this
alias, when an obfuscated dispatcher loads an IAT slot and uses
the value as a call/ret target, lift_call/lift_ret see the hint/
name address (e.g. 0x140002628 for GetStdHandle on example2-virt)
and fail to recognise it as an import. Including the alias lets
the existing import-recognition paths fire on the lifter's pre-
load value just as they do on the runtime value.
2. lift_ret: stop lifting after a ret-to-IAT match.
The previous code popped one more qword (simulating the external's
ret) and fed that to solvePath so exploration continued at the
continuation address. On Themida, the 'continuation' is whatever
the VM had staged below the IAT pointer - which is not generally
resolvable from static memory and causes the lifter to try lifting
at bogus addresses (access-violation crash observed on the virt
sample). The named import call is in the IR regardless; if
transitive control-flow matters, a dedicated 'imported-call-
continuation' analysis with more state can recover it.
Measurement on example2-virt.bin @ 0x140001000 with
MERGEN_NO_LOOP_GEN=1:
- before this commit: 0 imports surfaced, 1 error (OUTSD, from the
lifter wandering into .rdata after the wrong-target resolution)
- after this commit: 1 import (GetStdHandle) surfaced at the
0x14017fa77 ret, 0 errors, 0 warnings
Non-virt example2.bin lift unchanged (61 insns, 6 imports, 0 warn/err).
python test.py baseline passes; determinism check passes.
Co-authored-by: Claude <claude@anthropic.com>
|
||
|
|
7a564e3946 |
lifter: register-indirect import resolution + silent-failure diagnostics (#183)
Reconstructed WIP: adds register-to-import provenance tracking so 'mov reg, [rip+iat]; call reg' resolves to a named external call, plus three new diagnostic paths that surface previously-silent failure modes. Changes ------- - lifter/core/LifterClass.hpp: registerImportSource map; warn on liftAddress reaching an unmapped target; warn on sealIncompleteBlocks synthesising ret undef. - lifter/core/LifterStages.hpp: synthesize '<dll>#<ordinal>' names for ordinal imports in the IAT walk (so e.g. VirtualizerSDK64.dll#103 surfaces as a named external instead of falling through to operand dispatch). - lifter/core/LiftDiagnostics.hpp: IncompleteBlockSealed = 504. - lifter/semantics/OperandUtils.ipp: SetRegisterValue erases the provenance binding (any write invalidates a stale import tag). - lifter/semantics/Semantics_ControlFlow.ipp: lift_mov tags the destination with its import-provenance when the source is an IAT slot; lift_call checks the provenance map for register-indirect calls and emits the named external; lift_call falls back to an opaque UnknownIndirect external call (with CallIndirectUnresolved warning) when the target is RIP-relative but has no importMap entry. Verified on example2.bin @ 0x140001000: 61 insns / 2 blocks / 0 err / 0 warn, 6 imports declared (CharUpperA, GetStdHandle, ReadConsoleA, VirtualizerSDK64.dll#103, VirtualizerSDK64.dll#503, WriteConsoleA), 5 register-indirect resolution diagnostics, 0 bad calls. NOTE: this content was reconstructed from patch files saved during an earlier session (lost on a 'git reset --hard'). Verified to produce the expected lift output byte-for-byte on example2.bin, but subtle differences from the pre-reset original are possible in code paths the sanity run does not exercise. |
||
|
|
c8102a69cf |
themida: correctness gate, diagnostic tracer, ret-to-IAT recognition, gen revisit knob (#182)
* tests: add Themida devirtualization import-equivalence check
Adds python test.py themida that lifts every sample in
scripts/rewrite/themida_samples.json and asserts the resulting IR calls
every import declared in required_imports. Names are pinned against
a lift of the non-virtualized reference binary via --update.
This is a correctness gate that complements the existing
coverage gate ('2544 instructions, 0 errors'). Currently red on
example2-virt.bin: the lifter unrolls the VM without surfacing
GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the
guest program. That gap is the active devirtualization frontier;
this test makes it visible instead of silently green.
Samples whose binaries are absent (`../testthemida/*.bin` lives
outside the repo) are skipped rather than failed, so the check
runs cleanly in CI without the binaries present.
* diag: add Unicorn-based external-call tracer; document Themida transform
Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn,
patches every IAT slot with a unique unmapped-address sentinel, then
emulates from the chosen entry. When any call/jmp/ret resolves its
target to a sentinel, logs the call-site address, the mnemonic, and
the addressing form. One-shot diagnostic for answering 'what x86
instruction issues this external call at runtime.'
Using it on example2-virt.bin shows the Themida transform precisely:
- guest imports (GetStdHandle etc) remain in the IAT
- every guest call site is rewritten from 'call [rip+IAT]' to a
VM-staged 'push target; ret' where target was loaded from the
IAT upstream
- for example2, the first external call happens at VA 0x14017fa77
via 'ret 0', popping the GetStdHandle IAT value off the stack
- Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503)
from the IAT; our ignore_imports filter already accounts for this
The lifter's current recognition handles direct call-through-IAT
and register-indirect IAT calls (the non-virt binary resolves 5
imports cleanly). It does not recognize the ret-pops-IAT-loaded-
pointer pattern, which is why the virt lift surfaces zero imports.
Also annotates themida_samples.json with these properties inline
so the transform semantics live next to the test that exercises
them.
* diag: trace_external_calls can dump visited PCs and record sentinel push chain
Two additions, both motivated by the example2-virt.bin diagnosis session:
- --dump-visited <path>: writes every unique instruction PC the emulator
executes, in first-visit order. Diff against the lifter's 'reached
addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the
lifter's static exploration diverges from the dynamic path.
- UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a
sentinel. Records every such write, not just the first, because Themida
uses push-pop swap gadgets that stage a sentinel on the stack
transiently before the 'real' push lands it at the ret-target slot.
The last-5-pushes summary exposes this.
Findings for example2-virt.bin @ 0x140001000:
- lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%)
- external call site is at emu_pos=4209; gap of 2911 unvisited PCs
- lifter visits 5 addresses the runtime never takes (wrong concolic branch)
- the 'final push to ret slot' is not a 'push [iat]' but rather
'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged
stack slot in place to reconstruct the IAT pointer. Pattern-match
recognition alone cannot handle this; concrete VM-dispatch unrolling
is required.
* diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization
Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader.
When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing
pure concrete exploration with no phi-widening abstraction.
Diagnostic knob, not a user-facing feature. Used to localise how much
of a lift's coverage depends on generalization vs. the concolic engine.
Measurement on example2-virt.bin @ 0x140001000:
gen ON gen OFF (NO_LOOP_GEN=1)
blocks_attempted 56 2642 (47x)
instructions_lifted 2544 34229 (13.5x)
output_no_opts.ll lines 6022 30481 (5x)
unique addrs visited 34 338 (10x)
addrs in 0x14017xxxx 0 103 (call-handler cluster)
external call site reached: no yes (via BB 0x14017fa72)
themida equivalence test: red red (recognition still gap)
Loop-generalization is the dominant reachability blocker on Themida
VM dispatchers at current tuning. Pure concrete exploration reaches
the external-call handler block but does not emit named import calls
because lift_ret has no path to match a resolved ret target against
importMap. Recognition is the next fix surface; reachability is large
mostly because of generalization tuning.
Side-effects of gen OFF that are NOT acceptable in production:
- Lifter decodes .rdata IAT bytes as instructions (OUTSD error at
0x140002688 on this sample)
- Top-revisited addresses hit ~1142x each: the lifter spins in
tight loops without generalization cutting them off; block budget
(4096) would fire eventually on a larger sample
So the knob is purely diagnostic. The real production fix is selective
generalization (distinguish 'VM dispatcher' from 'guest loop') plus
lift_ret import recognition.
* lifter: recognize ret-to-IAT as named external call in lift_ret
Adds a recognition path in lift_ret: if the value being popped resolves
to a concrete address that's in importMap, emit callFunctionIR for the
named import, then simulate the external's own ret by popping one more
qword (the continuation address pre-staged by the caller). solvePath
then continues at the continuation instead of trying to lift the IAT
pointer as code.
Two resolution routes:
1. realval is a ConstantInt (direct push+ret of an IAT load)
2. realval is symbolic but computePossibleValues folds to a single
concrete value (obfuscated chains that constant-fold at this path)
Scope limits:
- Non-virt example2.bin lift is unchanged (still resolves 5 imports
via register-indirect path; the new ret path does not fire because
the binary uses 'call [iat]', not 'push+ret').
- Virt example2-virt.bin lift: the recognition code runs but does not
surface imports because the lifter's static resolution of the
arithmetic-decrypt chain produces wrong concrete targets. E.g. the
ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via
computePossibleValues; at runtime the emulator sees it pop the
GetStdHandle IAT pointer (0x140002490). The recognition logic is
correct; the upstream data flow is lying. Fixing that requires
selective-generalization tuning or concrete VM unrolling, tracked
separately.
So β lands as ground work for simpler push+ret thunks and for future
work where state-propagation fidelity improves. It is not a Themida
fix on its own.
* lifter: gate canGeneralize on per-header revisit count
Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader:
below threshold N the gate rejects (concrete exploration continues);
at or above N it falls through to the existing loop-shape checks.
Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches
pre-existing behaviour).
Also promotes ++liftAttemptCounts[addr] out from under the
liftProgressDiagEnabled gate so the counter is always maintained.
Rationale: on Themida example2-virt.bin @ 0x140001000, the existing
gate (always-generalize on first qualifying revisit) abstracts the
VM's dispatch loop too early, cutting reachability to ~30% of the
dynamic execution path. A higher threshold lets the dispatcher run
concretely for more iterations before abstracting. Measurement (all
other settings at defaults):
T=0 (current) blocks= 56 insns= 2544 err=0 warn=0
T=4 blocks= 88 insns= 3842 err=0 warn=4
T=16 blocks= 393 insns= 11747 err=1 warn=0
T=32 blocks= 425 insns= 12067 err=1 warn=0
T=128 blocks= 617 insns= 13987 err=1 warn=0
MERGEN_NO_LOOP_GEN=1 (kill)
blocks= 2642 insns= 34229 err=1 warn=0
Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation
partway through lifting. The crash fires in the Themida dispatcher
state machinery around 0x1400237F9 when generalization fires mid-
iteration with state that the existing machinery is not prepared to
handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable.
So the knob is landing as experimental infrastructure with default=0
(no-op). Future work can pair a safe non-zero default with a fix for
the dispatcher-state crash.
---------
Co-authored-by: Claude <claude@anthropic.com>
|
||
|
|
f449ec3cb7 |
lifter: expand loop microtest coverage (+1 test, batch 57) (#181)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
5c92f828ac |
lifter: expand loop microtest coverage (+1 test, batch 56) (#180)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
ac773a1f58 |
lifter: expand loop microtest coverage (+1 test, batch 55) (#179)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
29c444cd25 |
lifter: expand loop microtest coverage (+1 test, batch 54) (#178)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
3f41a02179 |
lifter: expand loop microtest coverage (+1 test, batch 53) (#177)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
bdd53cc0d8 |
lifter: expand loop microtest coverage (+1 test, batch 52) (#176)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
f359a7d238 |
lifter: expand loop microtest coverage (+1 test, batch 51) (#175)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
dec38e1cc3 |
lifter: expand loop microtest coverage (+2 tests, batch 50) (#174)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
db5c56ca14 |
lifter: expand loop microtest coverage (+1 test, batch 49) (#173)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
fd02d3bc6f |
lifter: expand loop microtest coverage (+1 test, batch 48) (#172)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
7275d9dc5a |
lifter: expand loop microtest coverage (+2 tests, batch 47) (#171)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
0d8ca37af2 |
lifter: expand loop microtest coverage (+1 test, batch 46) (#170)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
* lifter: extend batch 42 with local-value helper scoping limitation
* lifter: expand loop microtest coverage (+1 test, batch 46)
Additive coverage only.
Test added:
- generalized_loop_control_field_uses_active_state_from_unrelated_block
This is a helper-scoping known-limitation test. Even though
retrieve_generalized_loop_control_field_value_impl() checks the current
insertion block against the active header, the end-to-end GetMemoryValue
path still produces a generalized control-field phi from an unrelated
block.
Verified:
- bash autoresearch.sh: loop_test_count=164, microtest_pass_count=212
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 163 -> 164.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
389e23d54c |
lifter: expand loop microtest coverage (+2 tests, batch 45) (#169)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
* lifter: extend batch 42 with local-value helper scoping limitation
* lifter: expand loop microtest coverage (+2 tests, batch 45)
Additive coverage only.
Tests added:
- generalized_phi_address_uses_state_from_unrelated_block
- generalized_local_phi_address_uses_state_from_unrelated_block
These are helper-scoping known-limitation tests. The current phi_address
and local_phi_address helpers key off the PHI's parent header via
getGeneralizedLoopStateForHeader(phi->getParent()) and do not validate
the current insertion block. A header-owned PHI therefore still resolves
through generalized-loop state even when queried from an unrelated block.
Verified:
- bash autoresearch.sh: loop_test_count=163, microtest_pass_count=211
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 161 -> 163.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
93f216d6d3 |
lifter: expand loop microtest coverage (+2 tests, batch 44) (#168)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
* lifter: extend batch 42 with local-value helper scoping limitation
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
eaf8fff447 |
lifter: expand loop microtest coverage (+1 test, batch 43) (#167)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
* lifter: extend batch 42 with local-value helper scoping limitation
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
bdc96a3735 |
lifter: expand loop microtest coverage (+2 tests, batch 42) (#166)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
* lifter: expand loop microtest coverage (+2 tests, batch 42)
Additive coverage only.
Adds helper-scoping known-limitation tests:
- generalized_loop_control_slot_uses_active_state_from_unrelated_block
- generalized_loop_target_slot_uses_active_state_from_unrelated_block
Current control_slot / target_slot helpers consult only the scalar active
state and do not validate that the current insertion block is the active
header. Reads from unrelated blocks therefore still return generalized
loop phis instead of falling through.
Verified:
- bash autoresearch.sh: loop_test_count=160, microtest_pass_count=209
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 158 -> 160.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
73ac774341 |
lifter: expand loop microtest coverage (+1 test, batch 41) (#165)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
* lifter: expand loop microtest coverage (+1 test, batch 41)
Additive coverage only.
Adds a fresh known-limitation test for control_field matching:
- generalized_loop_control_field_ignores_base_candidate
Current control_field matching validates only that the address is
; it does not verify that is the
actual loop control cursor. As a result, a fake non-control base still
routes through the active generalized-loop control-field state and
produces a phi.
Verified:
- bash autoresearch.sh: loop_test_count=158, microtest_pass_count=207
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 157 -> 158.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
a38fc06270 |
lifter: expand loop microtest coverage (+1 test, batch 40) (#164)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: extend batch 39 with nested local-value limitation
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
17a205b66a |
lifter: expand loop microtest coverage (+1 test, batch 39) (#163)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+1 test, batch 39)
Additive coverage only.
Adds a helper-specific nested-loop known-limitation test:
- generalized_loop_nested_inner_target_slot_uses_inner_state
This documents that retrieve_generalized_loop_target_slot_value_impl()
reads only the scalar activeGeneralizedLoopControlFieldState. After an
inner load_generalized_backup overwrites that scalar, a target_slot read
at the outer header resolves using inner carried-slot values.
Verified:
- bash autoresearch.sh: loop_test_count=155, microtest_pass_count=204
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 154 -> 155.
* lifter: extend batch 39 with nested control-slot limitation
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
3ad880c3e7 |
lifter: expand loop microtest coverage (+1 test, batch 38) (#162)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+1 test, batch 38)
Additive coverage only.
Adds a fresh known-limitation test for generalized-loop state lookup:
- generalized_loop_state_getter_returns_invalid_archived_entry
Current implementation of getMostRecentGeneralizedLoopState() returns the
first archived entry whenever the archive map is non-empty, without
checking . This test documents that behavior explicitly so the bug
has a direct loop-focused repro until the getter is fixed.
Verified:
- bash autoresearch.sh: loop_test_count=154, microtest_pass_count=203
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 153 -> 154.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
efda71853d |
lifter: expand loop microtest coverage (+1 test, batch 37) (#161)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+1 test, batch 37)
Additive coverage only.
Adds the 4-way value-enumeration counterpart to the existing generalized
phi-load tests:
- generalized_phi_address_compute_possible_values_four_way
This extends computePossibleValues coverage from 2-way and 3-way to
canonical + 3 backedges on generalized phi-address loads.
Verified:
- bash autoresearch.sh: loop_test_count=153, microtest_pass_count=202
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 152 -> 153.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
88a190b0ea |
lifter: expand loop microtest coverage (+2 tests, batch 36) (#160)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+2 tests, batch 36)
Additive coverage only.
Adds 4-way (canonical + 3 backedges) multi-way coverage for the two
phi-address helpers:
- generalized_phi_address_four_way_resolves_all_incomings
- generalized_local_phi_address_four_way_resolves_all_incomings
This mirrors the 4-way coverage now present on control_slot,
target_slot, and control_field_load; all five retrieve helpers now have
explicit 4-way multi-backedge tests.
Verified:
- bash autoresearch.sh: loop_test_count=151, microtest_pass_count=200
- bash autoresearch.checks.sh with CLANG_CL_EXE='C:/Program Files/LLVM/bin/clang-cl.exe': baseline + determinism OK
Loop-related microtest count: 149 -> 151.
* ci: trigger batch 36 workflows
* lifter: extend batch 36 with 3-way generalized phi load values
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
9385a0f1f8 |
lifter: expand loop microtest coverage (+2 tests, batch 35) (#159)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+2 tests, batch 35)
Additive coverage only.
Adds 4-way (canonical + 3 backedges) phi coverage for target_slot and
control_field_load helpers:
- generalized_loop_target_slot_four_way_produces_phi
- generalized_loop_control_field_load_four_way_produces_phi
Parallel of the existing 4-way control_slot test; closes the N-way
matrix so all three slot/field helpers have 3-way and 4-way coverage
alongside 2-way.
Verified:
- python test.py micro: all pass
- python test.py baseline: rewrite regression + determinism pass
Loop-related microtest count: 147 -> 149.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
43f008ffdf |
lifter: expand loop microtest coverage (+2 tests, batch 34) (#158)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+2 tests, batch 34)
Additive coverage only.
Adds the first N-way (>2-way) coverage for target_slot and
control_field_load helpers:
- generalized_loop_target_slot_three_way_produces_phi
- generalized_loop_control_field_load_three_way_produces_phi
Previously only control_slot had multi-way coverage (4-way) and
phi_address/local_phi_address had 3-way; these close the gap for the
remaining two helpers.
Verified:
- python test.py micro: all pass
- python test.py baseline: rewrite regression + determinism pass
Loop-related microtest count: 145 -> 147.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
49238a3290 |
lifter: expand loop microtest coverage (+4 tests, batch 33) (#157)
* lifter: expand loop microtest coverage (+4 net tests, batch 11)
Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
* lifter: expand loop microtest coverage (+4 tests, batch 33)
Additive coverage only.
Closes the byte-width collapse matrix for the remaining two helpers:
- generalized_phi_address_byte_count_one_collapses_when_values_match
- generalized_phi_address_byte_count_two_collapses_when_values_match
- generalized_local_phi_address_byte_count_one_collapses_when_values_match
- generalized_local_phi_address_byte_count_two_collapses_when_values_match
All five retrieve helpers (control_slot, target_slot, control_field_load,
phi_address, local_phi_address) now have parallel byte_count_one/two
shared-value collapse coverage alongside the default/qword-width tests.
Verified:
- python test.py micro: all pass
- python test.py baseline: rewrite regression + determinism pass
Loop-related microtest count: 141 -> 145.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|