Additive coverage only. Final batch for this session.
Adds four net tests:
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
pins the current pending-path reuse behavior for unresolved IndirectJump
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
canonical-only fallback leaves generalizedLoopFlagPhis empty
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
completes preserved-register coverage for RDI (index 7)
- generalized_loop_control_slot_byte_count_one_returns_masked_phi
narrow-width control_slot path (byteCount 1)
- generalized_loop_target_slot_byte_count_one_returns_masked_phi
narrow-width target_slot path (byteCount 1)
One attempted trampoline-relaxation accept test was removed before commit:
the acceptance condition is real in code, but constructing a stable
public-API scenario that trips it without entangling blockCanReach and
unfinished CFG artifacts proved brittle. Not worth landing a flaky test.
Verified:
- python test.py micro: all 153 microtests pass (was 149)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 100 -> 104 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 104 current (+68).
Additive coverage only. This batch closes three remaining small gaps:
- record_generalized_loop_backedge_single_source_rotates_canonical_and_backedge
Positive 1-backedge rotation case (the two no-op guards were already
covered; this pins the actual state transition when source and control
both change).
- migrate_generalized_loop_block_preserves_existing_register_and_flag_phi_maps
Do-not-overwrite branch for existing register/flag phi maps on newBlock.
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
Pins the current pending-path behavior: unresolved indirect-jump still
reuses the pending generalized-loop header once the target solved to it.
This intentionally documents the asymmetry with the stricter fresh-
promotion gate, rather than asserting a false symmetry.
Verified:
- python test.py micro: all 149 microtests pass (was 147)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 98 -> 100 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Session cumulative total: 36 baseline -> 100 current (+64).
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Additive test coverage only. Final small batch for this session.
- pending_generalized_loop_indirect_jump_allowed_when_unresolved
Pins the current pending-path behavior: once the target solved to the
pending generalized-loop header, the pending path reuses it even under
IndirectJump context. This differs from the stricter fresh-promotion
gate used by canGeneralizeStructuredLoopHeader; the test documents the
asymmetry instead of asserting a false symmetry.
- generalized_loop_backup_canonical_only_path_leaves_flag_phis_empty
Canonical-only load path leaves generalizedLoopFlagPhis empty, symmetric
to the existing canonical-only register-phi test.
- make_generalized_loop_backup_preserves_concrete_rdi_on_first_backedge
Completes preserved-register coverage for the remaining hot lane in
shouldPreserveGeneralizedBackedgeRegisterIndex (index 7 / RDI).
Verified:
- python test.py micro: all 147 microtests pass (was 144)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample unchanged (2544/0/0)
Loop-related microtest count: 95 -> 98 per the
/loop|backedge|generalized|rolled|themida|phi_address/i regex.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Follow-up to #123 (multi-way backedge N-way phi construction).
record_generalized_loop_backedge_impl previously guarded on
backedgeSources.size() == 1 and was a no-op for multi-way loops.
The 1-backedge rotation semantics (promote the existing backedge
into canonical, install the new body source as the backedge) does
not generalize to multi-way - no single backedge is 'the' one to
promote. This commit replaces the guard with two code paths:
- size == 1: unchanged rotation (preserves Themida semantics
where body exploration rolls the cursor forward through a
sequence of canonical -> backedge states).
- size >= 2: append-or-update by sourceBlock. The new body
source contributes a fresh backedge alongside the original
N. A repeat call from the same body source with the same
control value is a no-op (no progress); with a new control,
it updates that source's entry in place. Per-sourceBlock
dedup prevents unbounded growth as the body iterates.
Added microtest:
record_generalized_loop_backedge_multiway_appends_new_body_source
Sets up a 2-backedge loop, activates state, simulates body lift
rolling the control cursor, calls record and asserts the state
grew to 3 backedges with the body source reflected. Also covers
the no-progress repeat-call case and the update-in-place case.
Also considered and dropped in this session:
- Non-Themida control slot generalization. An initial attempt
to drop the hardcoded kThemidaControlCursorSlot gate in the
retrieve helpers in favor of buffer lookups was too aggressive:
every memory address where canonical/backedge buffers happened
to hold distinct constants produced a phi, which over-fired on
the Themida sample (2544 -> 108444 instructions, 1 error).
Proper fix needs per-function control-slot detection to pick
a single 'real' cursor slot. Left as follow-up; the #122 test
continues to pin the current behavior.
Verified:
- python test.py micro: all pass, including the new multi-way
rolled record test
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida reference sample: 2544 instructions, 0 warn, 0 err
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
branch_backup(bb, /*generalized=*/true) previously overwrote a single
backup_point per header in generalizedLoopBackedgeBackup[bb]. A loop
header reached from three or more backedges silently lost every
snapshot except the most recent, and the load_generalized_backup phi
was always 2-incoming (canonical + last-seen backedge). PR #121
pinned this as a KNOWN-LIMITATION microtest.
This commit widens the machinery end-to-end to 1 canonical + N
backedges.
Storage and state:
- generalizedLoopBackedgeBackup is now DenseMap<BB*,
SmallVector<backup_point, 2>>. branch_backup_impl appends,
deduplicated by sourceBlock (repeat call from the same source
replaces its entry in place).
- GeneralizedLoopControlFieldState.backedgeSource/Control/Buffer
become parallel SmallVectors sized N per header.
Phi construction:
- make_generalized_loop_backup takes ArrayRef<backup_point> sources.
Its mergeValue lambda constructs (1 + N)-incoming phis, one
incoming per distinct backedge sourceBlock, with canonicalSource
first. Sources duplicating canonicalSource are filtered. The N=1
path produces the same 2-incoming phi as before (determinism
gate: 42/42 golden hashes match).
- retrieve_generalized_loop_control_slot_value_impl,
retrieve_generalized_loop_target_slot_value_impl, and
retrieve_generalized_loop_control_field_value_impl each emit
(1 + N)-incoming phis from state.backedgeSources/Controls/Buffers.
- retrieve_generalized_loop_phi_address_value_impl and
retrieve_generalized_loop_local_phi_address_value_impl relax
their 'phi->getNumIncomingValues() != 2' sanity check to accept
any phi with >= 2 incomings, and match each incoming against
canonicalSource or any of state->backedgeSources[i].
load_generalized_backup_impl:
- Collects backedges whose sourceBlock differs from canonical AND
whose controlCursor value differs from canonical; activates state
only if at least one such backedge exists.
- seedInvariantLocalQwords requires the qword to read identically
from canonicalBuffer AND every backedgeBuffer to qualify.
record_generalized_loop_backedge_impl:
- The rolled-control promotion (move current backedge into
canonical, install new source as backedge) is only well-defined
for the 1-backedge case, so it now guards on
backedgeSources.size() == 1 and becomes a no-op for multi-way.
Extending the rolled-control semantics to multi-way loops is
left as follow-up when a real sample exercises it.
Tests (Tester.hpp):
- runGeneralizedLoopThirdBackedgeOverwritesPriorBackedgeSilently
flipped and renamed to runGeneralizedLoopThirdBackedgePreservesAllThreeSnapshots:
asserts three-backedge vector holds one entry per sourceBlock.
- runGeneralizedLoopLoadBackupWithThreeBackedgesProducesTwoWayPhiOnly
flipped and renamed to runGeneralizedLoopLoadBackupWithThreeBackedgesProducesFourWayPhi:
asserts GetMemoryValue(controlSlot) at the header yields a
4-incoming phi carrying canonical + all three backedge control
values.
Docs (docs/LOOP_HANDLING.md):
- Struct and mergeValue snippets updated to N-way shapes.
- branch_backup state-transition row describes append+dedup.
- Multi-way backedge row removed from Known limitations.
Verification:
- python test.py micro: all pass, including the two flipped tests.
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match - 2-way loop
IR shape unchanged).
- Themida reference sample (../testthemida/example2-virt.bin @
0x140001000): 2544 instructions lifted, 0 warnings, 0 errors.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Continue the known-limitation microtest suite started by the multi-way
backedge PR. Cover two more concrete loop-handling failure modes, both
pinning CURRENT observable behavior so a future fix produces a natural
test-break signal.
Test 1: generalized_loop_non_themida_control_slot_produces_no_phi
retrieve_generalized_loop_control_slot_value_impl gates on
startAddress == kThemidaControlCursorSlot (0x14004DD19). A loop
whose control cursor lives at any other address does not get its
load routed through the canonical/backedge phi; the caller falls
back to the normal memory pipeline. Test seeds the Themida slot
(to activate state) plus a distinct non-Themida slot, loads from
the non-Themida slot, and asserts the result is NOT a PHINode.
When per-function control-cursor detection or tagging lands, this
test fails and must be rewritten.
Test 2: generalized_loop_nested_inner_overwrites_outer_active_state
activeGeneralizedLoopControlFieldState is a scalar struct, not
a stack. Loading an inner loop header while an outer loop is
active overwrites the outer's active state; only one header's
state is queryable through the retrieve_generalized_loop_*
helpers at a time. Test loads outer, verifies activation, loads
inner, asserts activeGeneralizedLoopControlFieldState.headerBlock
now equals innerHeader (outer's active context is gone). When
nested-loop support lands (state stack / lazy per-header lookup),
this test fails and must be rewritten.
Two other candidates considered and dropped:
- kSupportedGeneralizedControlFieldOffsets limitation: tried via
GetMemoryValue(phi_controlSlot + 0x8) but phi-of-concrete-addresses
is handled by retrieve_generalized_loop_phi_address_value
regardless of offset, so this public-API shape does not trigger
the offset allowlist gate. Observable only through a lower-level
test that constructs the exact internal address shape, which is
too brittle for a KNOWN-LIMITATION test.
- mergeValue structural mismatch (canonical-nullptr vs concrete
backedge returning backedge directly): arguably correct behavior
when canonical is genuinely untracked, so not a clear bug worth
pinning.
Verified:
- python test.py micro: all pass (including both new tests)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
branch_backup(bb, /*generalized=*/true) unconditionally overwrites
generalizedLoopBackedgeBackup[bb] (LifterClass_Concolic.hpp ~line 599).
When a loop header has three or more incoming backedges, the second and
any further generalized snapshots silently replace the first - their
sourceBlock, buffer, register, and flag state are lost before
load_generalized_backup builds its canonical/backedge phi. The handoff
from 2026-04-22 explicitly flagged this as untested.
Add two inverted-assertion microtests that pin the current silent-drop
behavior:
- generalized_loop_third_backedge_overwrites_prior_backedge_silently
Asserts the raw map after three generalized branch_backup calls:
sourceBlock resolves to the third backedge, and the map still holds
exactly one entry per header. A multi-way representation change
(vector<backup_point>, or eager N-way merge) would break both.
- generalized_loop_load_backup_with_three_backedges_produces_two_way_phi_only
Asserts the downstream effect: GetMemoryValue(controlSlot, 64) at
the header yields a two-incoming phi carrying canonicalControl and
the third backedge's control value only; first and second backedge
control values are absent. A correct multi-way model would emit a
four-incoming phi.
Both tests document a known limitation via header comments and carry
failure messages that point the future implementer at what to rewrite.
Convention: inverted-assertion tests pass while the bug exists and fail
naturally when it is fixed, signaling the implementer to update the
test against the new contract. No new infrastructure required.
Verified:
- python test.py micro: all pass (including both new tests)
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
resolveTargetedThemidaR9 was added to recover the controlCursor identity
of R9 at three hardcoded Themida instruction addresses where the symbolic
pipeline had lost provenance. PR #112 (generalized-loop control-field /
slot phi infrastructure) since landed retrieve_generalized_loop_control_*
helpers that produce the correct phi shape through the normal
GetMemoryValue path. The R9 override is now dead code: it overwrites a
correct value with another correct value at three sites that the
upstream pipeline already handles.
Empirical bisect on the reference Themida sample
(../testthemida/example2-virt.bin @ 0x140001000) confirmed:
- site 0x140023671 disabled alone: 2544 lifted, 0 warn, 0 err
- site 0x14002368D disabled alone: 2544 lifted, 0 warn, 0 err
- site 0x140023741 disabled alone: 2544 lifted, 0 warn, 0 err
- all three disabled simultaneously: 2544 lifted, 0 warn, 0 err
- baseline (override active): 2544 lifted, 0 warn, 0 err
The MERGEN_DIAG_LIFT_PROGRESS=1 trace at site 0x14002368D shows R9 is
already `add i64 %generalized_phi_load, 10` before the override fires -
the generalized-loop machinery produced the correct phi independently.
Removed:
- resolveTargetedThemidaR9() in lifter/core/LifterClass_Concolic.hpp
- R9 special-case branch + session-scaffolding diag block in
GetRegisterValue_impl (now just `return get_impl(key)`)
- Three microtests in lifter/test/Tester.hpp:
runTargetedThemidaR9OverrideProducesPhi
runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress
runTargetedThemidaR9OverrideFallsThroughWithoutLoopState
- Their three runCustom() registrations
- Override row in helper table, hardcoded-address subsection, and
limitations row in docs/LOOP_HANDLING.md
Retained: kThemidaControlCursorSlot, kThemidaLoopCarriedSlot, and
kSupportedGeneralizedControlFieldOffsets - still consumed by the
generalized-loop control-field/slot retrieve_* helpers.
Verified:
- python test.py micro: all instruction microtests passed
- python test.py baseline: all rewrite regression checks passed,
determinism check passed (42 golden files match)
- Themida sample: 2544 instructions lifted, 0 warnings, 0 errors
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
The identifier 'resolveTargetedThemid\u0430R9' (declared in LifterClass_Concolic.hpp)
contained U+0430 (Cyrillic small letter a) instead of U+0061 (Latin a)
between 'Themid' and 'R9'. Every in-tree reference mirrored the
Cyrillic form, but prose mentions and merge titles (e.g. PR #115 title)
used ASCII, so an ASCII grep for 'resolveTargetedThemidaR9' returned
zero hits. This was a silent discoverability hazard for future sessions
and grep-based tooling.
Rename to pure ASCII across the single declaration, the single
caller in getLatestValueForKey, the six test entry points in
lifter/test/Tester.hpp, and the four references in
docs/LOOP_HANDLING.md. No behavior change.
Verified:
- python test.py micro: all instruction microtests passed
(including the three targeted_themida_r9_override_* cases)
- Themida reference sample (../testthemida/example2-virt.bin @
0x140001000): 2544 instructions lifted, 0 warnings, 0 errors
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Four tests designed to fail if a regression silently breaks the guards we rely on. All four pass on main today; they exist to catch future drift.
compute_possible_values_circular_phi_bails_via_depth_guard: construct a self-referential phi (%self = phi [0, entry], [%self, header]). The existing Depth > 16 guard in computePossibleValues must catch this without infinite recursion or result-set explosion. Accept either an empty set (guard bail) or a single-element {0} set (ideal dedupe).
compute_possible_values_trunc_to_i1_preserves_width: the cast-width preservation in PR #111 widens/narrows through trunc/zext/sext. Trunc to i1 is the extreme narrowing case. The result set must have getBitWidth() == 1 on every entry and contain both 0 and 1 when the source has both even and odd low-bit values.
targeted_themida_r9_override_does_not_fire_at_adjacent_address: resolveTargetedThemidaR9 is exact-match on three instruction addresses. A regression broadening it to a range would silently phi-ify every R9 read in that window. Pick adjacent-byte addresses (0x140023672, 0x14002368E, 0x140023742) and verify no PHINode is produced.
targeted_themida_r9_override_falls_through_without_loop_state: at a hot address but before any generalized-loop backup exists, getMostRecentGeneralizedLoopState() returns null. The override's null-state early exit must return the unchanged value instead of building a phi over uninitialized state.
No behavior change; test-only.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Line 28 read 'Temporarily disabled while the team keeps required VMP 3.8.x targets on the safe high-budget path'. That is stale relative to the current code: canGeneralizeStructuredLoopHeader (lifter/core/LifterClass.hpp) gates generalization on path-solve context plus nine operational guards, and the corresponding loop_generalization_* microtests pass on main. Describe the actual gating and point readers at docs/LOOP_HANDLING.md.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Captures the three-phase architecture (detect/generalize/consume), the path-solve context gating table, the GeneralizedLoopControlFieldState layout, mergeValue's widenFirstBackedge contract, the full set of retrieve_generalized_loop_* helpers, and the hardcoded reference-sample addresses (kThemidaControlCursorSlot, the three resolveTargetedThemidаR9 instruction addresses with fire-counts on the reference binary).
Documents known limitations at the bottom: REP SCAS, VMP 3.6 INT 2 dispatcher, the reference-sample hardcodes, unrolling/LICM, multi-way backedges.
Flags that SCOPE.md's 'loop-header generalization temporarily disabled' entry appears to be stale: the code gates generalization on path-solve context (ConditionalBranch / DirectJump / resolved IndirectJump) rather than disabling it wholesale. Not changed in this PR; maintainer decision.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
The existing microtest only pinned the 0x14002368D branch (offset 0xA). A regression that silently dropped or re-offset the 0x140023671 (offset 0x0) or 0x140023741 (offset 0xC) branch would still pass.
Parameterize the test body over the three {address, offset} tuples with a fresh LifterUnderTest per iteration, so every switch case is actually exercised. Confirmed firing via MERGEN_DIAG_LIFT_PROGRESS on the reference Themida sample: 3 hits at 0x140023671, 6 at 0x14002368D, 12 at 0x140023741.
No behavior change; test-only.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
PR #112 landed a PHINode case in computePossibleValues that unions every incoming value's set. The existing tests exercise it indirectly through join-block phis, but nothing pinned the new capability itself.
Adds compute_possible_values_enumerates_phi_incomings covering:
- A 4-way phi (all incomings are distinct i64 constants). Guards against an accidental 2-way cap and against dedupe bugs when the set grows past two entries.
- A phi-of-phi: the outer phi's first incoming is itself a phi over two further constants. The union must recurse into the inner phi rather than stopping at its instruction as a single opaque operand, so the result size should be 3 (two inner leaves + one outer-other constant).
No code changes; test-only.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
During the Themida-frontier session, two failure modes cost real time:
1) Ad-hoc lifter runs produced scratch files (internal_0x*.ll, *handoff.md, linked_target.txt, vlizer_stub.txt) that got committed on a research branch and then had to be scrubbed before merge. Extend .gitignore with patterns matching the observed pollution class.
2) 'python test.py baseline' was run against origin/main with a build_iced/ directory that still held object files from a feature branch. The resulting lifter binary linked a stale mix of old and new code, producing a failure set that matched neither branch. This led to a false 'branch matches main' claim that was only caught after CI. Document the required wipe-and-rebuild in the operator defaults.
No code changes.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Introduces the infrastructure needed to keep Themida's control-slot-driven indirect dispatch symbolic through the late cursor-manipulation chain at 0x140023741..0x1400237dc.
Core pieces:
- activeGeneralizedLoopControlFieldState: per-loop snapshot of {canonical,backedge}*{control,buffer,source}, populated on load_generalized_backup and cleared on load_backup, consumed by the retrieve_* helpers below.
- retrieve_generalized_loop_control_field_value / retrieve_generalized_loop_control_slot_value / retrieve_generalized_loop_target_slot_value / retrieve_generalized_loop_phi_address_value / retrieve_generalized_loop_local_phi_address_value: CRTP dispatch into concrete implementations that either (a) emit a two-incoming phi of the canonical and backedge values at the loop header, or (b) return nullptr so the caller falls back to the existing load path. Symbolic mode stubs them to nullptr so symbolic analysis behavior is unchanged.
- PHINode handling in computePossibleValues: enumerate incoming operand value sets and union them, so downstream callers get the full set instead of an empty result on phis that previously fell through the default path.
- solvePath: prefer mapped targets over null for indirect jumps, plus supporting control-field hookups.
- mergeValue in make_generalized_loop_backup gains a widenFirstBackedge parameter and a shouldPreserveGeneralizedBackedgeRegisterIndex predicate. RSP is now preserved through the first backedge; other GPRs and flags continue to widen to Undef, matching main's prior behavior.
Explicitly NOT landed from the original research branch:
- The local-buffer snapshot merge in save_backup (5 lines that copied activeGeneralizedLoopLocalBuffer entries into every snapshot). Bisection against main showed this alone regresses dummy_vm_loop / bytecode_vm_loop / stack_vm_loop / calc_sum_to_n by collapsing their generalized-loop bodies to 'bb3: br label %bb3'. The control-field/slot machinery does not depend on it; the research-stack benchmark gains are preserved without it.
- runGeneralizedPhiAddressCreatesPhiOfLoadedValues: aspirational microtest from the research stack that never passed there either.
- Session-scratch files (internal_0x140001000*, linked_target.txt, themidahandoff.md, vlizer_stub.txt) and the autoresearch shell harness.
Adds microtests covering: control-field / control-slot / target-slot / phi-address / local-phi-address retrieve paths, solvePath null-vs-mapped preference, rolled arithmetic chain enumeration, byte-test join preservation, and supporting structured-loop invariants. All pass on a fresh build_iced/.
Verification: python test.py baseline -> 0 failures; python test.py micro -> 0 failures.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
computePossibleValues recurses through single-operand instructions by passing the operand's value set through unchanged. For integer casts that silently widened: a trunc from i64 to i32 would return i64-wide APInts in its result set, which then mismatches callers that compare or index the set by the instruction's declared type (e.g. switch dispatch on a freshly-truncated i32).
Detect trunc/zext/sext casts between integer types and rewrite each APInt to the destination type's bit width. Default through for other single-operand opcodes, so non-cast unary instructions stay on the existing pass-through path.
Adds compute_possible_values_preserves_cast_widths covering all three integer cast opcodes on a select-derived value set. The test pins the post-cast width explicitly and checks a sign-extended negative round-trip so a regression in any of the three branches surfaces as a width mismatch.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
computePossibleValues cross-products a select whose unreachable branch often defaults to 0. When that 0 reaches solvePath's multi-target switch emission, normalizeTargetAddress rewrites it as 'file.imageBase' and the lifter emits a bogus switch case pointing into the PE header.
Drop the raw zero before normalization. This preserves legitimate mapped targets (including imageBase when it is actually a real target produced by a non-zero raw value) while removing the spurious zero-derived case.
Adds solve_path_skips_raw_zero_in_multi_target_switch as a regression guard. The test intentionally marks imageBase as a mapped page, so any future regression of the filter would surface as a bogus 0x140000000 switch case.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Some real entrypoint startup helpers in .text are tiny leaf/wrapper functions, but .pdata outlining currently routes them through opaque outlined-call lowering. Re-entering those helpers repeatedly requires a stable callee-entry snapshot, so this patch both bypasses outline policy for tiny outlined targets and snapshots the callee entry block before queueing it.
The tiny-helper bypass is gated to paged, outline-marked targets whose next outlined start is within 0x40 bytes, which keeps it conservative. Observed effect on example2-virt.bin entrypoint 0x1400013b8: 28 attempted / 5 completed / 121 instructions -> 48 attempted / 2 completed / 156 instructions, with 0 errors and 0 warnings.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe tiny_outlined_call_bypasses_outline_policy xgetbv_returns_deterministic_xcr0 int29_fastfail_lowered_to_noreturn_call solve_path_widens_mapped_rva_target normalize_runtime_target_widens_mapped_rva_target
- python test.py quick
- python test.py vmp
- build_iced\lifter.exe ..\testthemida\example2-virt.bin 0x1400013b8
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
* lifter: model XGETBV deterministically
Add XGETBV opcode support and model selector 0 as a deterministic XCR0 value (0x7: x87+SSE+AVX enabled), with zero returned for other selectors. This follows the existing CPUID deterministic-model approach for static lifting/deobfuscation.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe xgetbv_returns_deterministic_xcr0 int29_fastfail_lowered_to_noreturn_call solve_path_widens_mapped_rva_target normalize_runtime_target_widens_mapped_rva_target
- python test.py quick
- python test.py vmp
* rewrite: seed deterministic XGETBV handler
The XGETBV semantics patch is deterministic by design, so the full-handler oracle pipeline must not use Unicorn's host-specific result. Add a manual handler seed entry for xgetbv bytes and computed expected outputs, then regenerate the enriched seed and oracle vectors to match the lifter model (selector 0 -> EAX=0x7, EDX=0).
Verification:
- scripts\rewrite\run_all_handlers.cmd
- python test.py quick
- python test.py vmp
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
The real PE entrypoint for example2-virt.bin is 0x1400013b8, and lifting it failed on at 0x14000179e. On x64 Windows this is the fast-fail mechanism () and does not return.
Model INT 29h as a direct call to using RCX as the fail code, then terminate the block with . Other INT immediates explicitly route to the existing not_implemented sentinel path instead of silently becoming no-ops.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe int29_fastfail_lowered_to_noreturn_call solve_path_widens_mapped_rva_target normalize_runtime_target_widens_mapped_rva_target solve_load_infers_concrete_base_from_tracked_load generalized_loop_restore_merges_backedge_register_state
- python test.py quick
- python test.py vmp
- build_iced\lifter.exe ..\testthemida\example2-virt.bin 0x1400013b8
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
The next Themida follow-up after PR #104 exposed another low-target alias problem: queued/control-flow destinations like 0x52532 were treated as already paged because the synthetic stack range covered them, so they never widened to their file-backed image RVA forms.
This patch splits target normalization into two policies: PathSolver keeps the broad paged normalization it needs for resolved loop/control-flow work, while getOrCreateBB/getUnvisitedAddr use a stricter file-backed normalization that prefers image-backed addresses over low stack aliases. It also queues the fake indirect-call return targets in Semantics.ipp so those destinations actually enter the worklist.
Observed effect on example2-virt.bin @ 0x140001000: 24 attempted / 1086 instructions -> 35 attempted / 1565 instructions, with new reached addresses in the 0x14001xxxx and 0x14002xxxx ranges.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe solve_path_widens_mapped_rva_target normalize_runtime_target_widens_mapped_rva_target solve_load_infers_concrete_base_from_tracked_load generalized_loop_restore_merges_backedge_register_state
- python test.py quick
- python test.py vmp
- build_iced\lifter.exe ..\testthemida\example2-virt.bin 0x140001000
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Themida jump-table entries in .vlizer can contain low RVAs such as 0x118c8 rather than full runtime VAs. The previous normalization only ORed the image high 32 bits, which is insufficient for image bases like 0x140000000 and leaves the target as an unmapped low address.
The path solver now first tries the existing high-32-bit widening and then falls back to imageBase + target when that RVA-style candidate maps. Added a targeted regression covering the mapped-RVA case.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe solve_path_widens_mapped_rva_target solve_load_infers_concrete_base_from_tracked_load
- python test.py quick
- python test.py vmp
- build_iced\lifter.exe ..\testthemida\example2-virt.bin 0x140001000
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Fix the remaining Themida blocker at 0x1401BAF5D without reintroducing the broad generalized-loop local-stack carry that regressed rewrite VM-loop samples.
The new path keeps generalized-loop local stack bytes in a side map, reseeds only deep local qwords whose canonical and backedge snapshots agree on the same concrete 8-byte value, and lets solveLoad materialize missing local loads from that carried state when the current buffer lacks real byte coverage.
This restores the Themida sample to 20 attempted / 1 completed / 0 unreachable / 933 lifted instructions while preserving rewrite quick expectations and VMP gate behavior.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe solve_load_infers_concrete_base_from_tracked_load generalized_loop_restore_merges_backedge_register_state generalized_loop_with_bypass_tag_uses_generalized_restore generalized_loop_bypass_tag_clears_after_promotion promoted_generalized_loop_restores_canonical_backup
- python test.py quick (log: all rewrite checks passed, determinism 42/42, semantic 33/33, microtests passed)
- python test.py vmp (required targets pass)
- build_iced\lifter.exe ..\testthemida\example2-virt.bin 0x140001000
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Keep canonical loop-header backups separate from generalized backedge state, create header phis when the generalized restore first merges those states, and record live recurrent backedge values on subsequent generalized-loop re-entry.
This avoids over-constraining the first generalized backedge with a concrete value, which had been folding Themida loop exits away before the real self-edge was recorded.
Result on example2-virt.bin @ 0x140001000: lifting now progresses past the generalized dispatcher loop and reaches the next blocker at unresolved indirect jump 0x1401BAF5D.
Verification:
- build_iced lifter rewrite_microtests
- rewrite_microtests.exe generalized_loop_restore_merges_backedge_register_state
- python test.py quick
- python test.py vmp
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Obfuscated binaries frequently dispatch back to a loop header through a single-instruction unconditional-br trampoline block whose successor (the real per-instruction lift block) is still mid-lift when canGeneralizeStructuredLoopHeader is queried. In that state getTerminator() either returns null or (in this LLVM build) a non-terminator instruction, so isStructuredLoopHeaderShape walks to depth 1 and rejects with 'not-branch'.
This patch detects the trampoline-header shape up front (bb->size() == 1, terminator is unconditional BranchInst) and, when it holds, accepts a depth>0 chain whose next block is still partially lifted. The outer canGeneralizeStructuredLoopHeader gates (backwardVisitedTarget, blockCanReach) still filter out false positives, and the trampoline gate prevents ordinary linear lifts (e.g. VMP sequential handlers) from being mis-classified.
Measured on the example2-virt.bin Themida sample:
- blocks_attempted 2639 -> 17 (155x reduction)
- the two dispatcher heads 0x1401BAE0F / 0x1401BAE18 drop from 1142 re-lifts each (86.6% of effort) to 2-3 each (normal loop participants)
Regression coverage:
- python test.py quick: all rewrite + determinism (42/42) + semantic (33/33) + microtests pass
- python test.py vmp: simple_vmp381_one_vm (1629/1) and simple_vmp381_full (1621/1) both gate-pass unchanged
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
Adds opt-in trace output to canGeneralizeStructuredLoopHeader and isStructuredLoopHeaderShape, gated by the same MERGEN_DIAG_LIFT_PROGRESS env flag introduced in #99, plus a hot-address filter (currently 0x1401BAE0F and 0x1401BAE18, the dispatcher loop header in the Themida example2-virt.bin sample).
When enabled, prints per-invocation:
- which canGeneralize gate rejected (not-unflatten, context-not-allowed, forward-target, not-visited, already-pending, already-generalized, empty-or-missing-bb, bad-shape, no-current-block, no-reach)
- the block's size, terminator name, successor/predecessor counts, and truncated IR
- which depth in isStructuredLoopHeaderShape rejected and why (empty, cycle, pred-count, not-branch, multi-succ, depth-exceeded) or ACCEPT cond-br
Refactors the sequential rejection if-chain in canGeneralizeStructuredLoopHeader into a named reject(reason) lambda so each path tags itself without duplicating print code. Behavior when the env flag is unset is unchanged.
Used to diagnose that the Themida sample's 86.6%-of-effort dispatcher spinning is caused by the shape check rejecting at depth=1 with 'not-branch term=none' \u2014 the successor block of the trampoline is partially-lifted at canGeneralize time, so loop generalization never fires.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
When MERGEN_DIAG_LIFT_PROGRESS=1 is set, the lifter tracks how many times each address is attempted by liftBasicBlockFromAddress and emits a compact summary at the end of the lift worklist: unique addresses, total attempts, max revisits, a 7-bucket revisit histogram, and the top-16 most-revisited addresses.
Default behavior (env unset) is unchanged: no per-block work, no extra stdout output. The new DenseMap and bool field on lifterClassBase stay empty / false.
Useful for diagnosing whether lift effort is genuinely advancing through distinct VM handlers or churning on a small set of dispatcher headers (the latter being a loop-generalization gap).
On example2-virt.bin @ 0x140001000 it shows that 0x1401BAE0F and 0x1401BAE18 (a test rbx,rbx; je dispatcher loop header) account for 2284 of the 2639 attempts (86.6%) of total lift effort \u2014 a measurable target for follow-up loop-generalization fixes.
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
* docs: sync rewrite workflow guidance
* docs: drop machine-local pointers and fix stale README branch link
* lifter: allow resolved indirect jumps to participate in structured loop generalization
When a register-indirect jmp has already been resolved to a concrete target via solvePath (ConstantInt or solver), it's no longer speculative. If the target also points backward at a visited block, treat it as a loop back-edge for generalization purposes, the same way a direct or conditional jump would be treated.
Introduces currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget() alongside the existing narrow predicate. canGeneralizeStructuredLoopHeader gains an opt-in targetResolvedConcretely parameter that routes through the widened check. getLiftedBackedgeBB uses the widened variant so back-edge reuse fires for resolved indirect jumps. resolveTargetBlock passes targetResolvedConcretely=true (its entry condition requires a concrete destination) and extends stackBypassGeneralizedLoopAddresses to include IndirectJump-context inserts.
Ret-path contexts remain excluded. Tests updated: the old runLoopGeneralizationIndirectJumpBlocked splits into runLoopGeneralizationIndirectJumpBlockedWhenUnresolved (unchanged semantics) and runLoopGeneralizationIndirectJumpAllowedWhenResolved (new). runPendingGeneralizedLoopBlockedByContext becomes runPendingGeneralizedLoopByContext with an expectReuse parameter; Ret still expects no reuse, IndirectJump with a resolved target now expects reuse.
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
The windows-latest preinstalled clang-cl (currently 20.1.8 at
`C:\Program Files\LLVM\bin\clang-cl.exe`) produces a lifter binary
that segfaults on calc_fib before emitting any IR, causing the rewrite
gate to fail. Clang 21.1.8 has been verified locally to compile the
lifter into a binary that lifts both calc_fib and calc_sum_array to
their expected constant returns (`ret i64 13` and `ret i64 150`).
Rolling back to clang 18.x is not an option: the runner image's MSVC STL
(14.44+) hard-requires clang 19.0.0 or newer via a static_assert in
yvals_core.h. Clang 21 satisfies that bound and dodges the clang 20.1.8
miscompile.
Upgrading via `choco upgrade llvm --version=21.1.8` keeps the existing
`C:\Program Files\LLVM\bin\clang-cl.exe` path valid, so the rest of
the pipeline (Resolve LLVM_DIR, Resolve clang-cl, Configure, Build) is
unchanged.
## Changes
- `.github/workflows/rewrite-strict-gate.yml`: add an "Upgrade clang-cl
to 21.1.8" step before `Resolve LLVM_DIR` that runs `choco upgrade
llvm` and pins `CMAKE_{C,CXX}_COMPILER` to the upgraded binary.
- `scripts/rewrite/instruction_microtests.json`: drop the `ci_skip`
entries on `calc_fib` and `calc_sum_array`.
- `docs/SCOPE.md`: bump the corpus counts to 33 samples / 177 runtime
semantic cases.
## Follow-up
Investigating the underlying clang 20.1.8 miscompile in the lifter is
still worth doing \u2014 it's almost certainly UB somewhere in the
structured-loop recovery path that clang 21 happens to tolerate. Tracked
separately.
Co-authored-by: NaC-L <nac-l@users.noreply.github.com>
The lifter emits Hex-Rays-style straight-line jump tables as a chain of
`icmp eq %idx, K_i; select V_i, prev` instructions, with the chain head
flowing into a join phi. The chain is structurally a switch but neither
SimplifyCFG nor downstream readers recognize it as one, so dispatches
like calc_jumptable_large still emitted 15 icmp/select pairs after O2.
This change adds two pieces:
1. SelectChainToSwitchPass (new, runs before SwitchNormalizationPass)
detects a chain whose head feeds a single phi in the unique successor,
verifies all comparisons share one %idx and all values are constants
(including the terminating false branch), and rewrites the chain into
a switch on %idx whose case-i blocks are trampolines that supply the
case-specific value to the join phi. The chain instructions are erased
in head-first order so each link is dead by the time we reach it.
2. SwitchNormalizationPass is restructured to support two normalization
modes against the same switch:
Mode A (index-arithmetic) walks the switch operand back through
trunc/select-chain to recover (originalInput, addrBase, addrStride)
and converts each case constant via (case - addrBase) / addrStride.
This produces true logical indices and now also handles the
"folded default" pattern where the chain's default branch is the
case for logical 0: when rangeSize == numCases + 1 and minLogical
== 1, the old default block is promoted to an explicit case 0 and
the new default becomes an unreachable trampoline.
Mode B (sorted-position fallback) preserves the previous behavior
for switches whose case constants are jump-table TARGET addresses
rather than table-entry indices (e.g. jumptable_basic). When the
cases form an arithmetic progression and rangeSize == numCases,
sorted-position i becomes logical i.
Verification: `python test.py all` green; semantic 33/33; calc_jumptable
and calc_jumptable_large now lift to clean `switch i64 %RCX` with logical
0..N-1 cases and an unreachable default for the folded-default shape.
All other jumptable samples (basic/dense/rel32/shifted/shared_targets/
computation) still pass via the Mode B fallback. Patterns updated for
calc_jumptable and calc_jumptable_large.
Co-authored-by: NaC-L <nac-l@users.noreply.github.com>
PR #93 un-skipped both samples after a clean local Release build proved
they lift correctly, but the windows-latest CI lane still fails on them
`Lifter failed for calc_fib` (run 24077021868). The HANDOFF note that
windows-latest clang-cl produces a different codegen shape than the
locally pinned clang-cl turned out to be the actual root cause; the
"stale build cache" theory only explained the local symptom.
Restoring the `ci_skip` entries unbreaks the rewrite-strict-gate and
rewrite-quick-gate workflows. Real fix tracked as a follow-up: either
teach the lifter the CI codegen shape, or pin the rewrite CI lane to a
toolchain that matches the local one byte-for-byte.
Also reverts the `docs/SCOPE.md` corpus counts to 31 samples / 175 cases.
Co-authored-by: NaC-L <nac-l@users.noreply.github.com>
Both samples were originally CI-skipped because windows-latest clang-cl
produced loop/array codegen shapes that tripped the lifter on CI even
though local runs passed. Since then the rewrite CI lane has been pinned
to the same LLVM 18.1.8 clang-cl used locally (eb49a35, 949acaa, a28a368)
and several structured loop recovery fixes have landed (2989e5a, 2eaa22e),
so the codegen mismatch that motivated the skips is gone.
Verified locally with a clean Release build (`cmd /c scripts\dev\configure_iced.cmd`
followed by `build_iced.cmd`):
- `calc_fib` lifts to `ret i64 13` and passes its semantic case
- `calc_sum_array` lifts to `ret i64 150` and passes its semantic case
- `python test.py all` is fully green: semantic 33/33 (was 31/31),
baseline, micro --check-flags, full handler suite 115/119, determinism
Drops the two `ci_skip` entries from `instruction_microtests.json` and
updates `docs/SCOPE.md` corpus counts to 33 samples / 177 cases.
Co-authored-by: NaC-L <nac-l@users.noreply.github.com>
- Add lift_punpcklqdq handler in Semantics_Misc.ipp (XMM dest, low-quadword
interleave from dest+src into a 128-bit result; rejects MMX/non-XMM forms
via the standard not_implemented bailout)
- Wire OPCODE(punpcklqdq, PUNPCKLQDQ) in x86_64_opcodes.x and add a missing
trailing newline
- Add manual punpcklqdq case to TestInstructions.cpp (rdrand-style XMM seed)
and matching seeds in build_full_handler_seed.py
- Regenerate oracle_seed_full_handlers{,_enriched}.json, oracle_seed_vectors.json,
and oracle_vectors_full_handlers.json with two punpcklqdq vectors
(basic interleave, low-source-zero edge case)
- Drop ci_skip on calc_cout in instruction_microtests.json now that the STL
PUNPCKLQDQ path lifts cleanly (4/4 semantic cases pass locally)
- Keep calc_fib and calc_sum_array ci_skipped: they still trip a separate
lifter dyn_cast assertion that is not related to PUNPCKLQDQ; tracked as
follow-up
- Update docs/SCOPE.md handler counts (115/119 covered, 4 intentional skips)
and corpus counts (31 active samples / 175 cases)
Co-authored-by: NaC-L <nac-l@users.noreply.github.com>