Files
naci 3384786a70 lifter: support multi-way backedges with N-way generalized-loop phi construction (#123)
branch_backup(bb, /*generalized=*/true) previously overwrote a single
backup_point per header in generalizedLoopBackedgeBackup[bb]. A loop
header reached from three or more backedges silently lost every
snapshot except the most recent, and the load_generalized_backup phi
was always 2-incoming (canonical + last-seen backedge). PR #121
pinned this as a KNOWN-LIMITATION microtest.

This commit widens the machinery end-to-end to 1 canonical + N
backedges.

Storage and state:

  - generalizedLoopBackedgeBackup is now DenseMap<BB*,
    SmallVector<backup_point, 2>>. branch_backup_impl appends,
    deduplicated by sourceBlock (repeat call from the same source
    replaces its entry in place).
  - GeneralizedLoopControlFieldState.backedgeSource/Control/Buffer
    become parallel SmallVectors sized N per header.

Phi construction:

  - make_generalized_loop_backup takes ArrayRef<backup_point> sources.
    Its mergeValue lambda constructs (1 + N)-incoming phis, one
    incoming per distinct backedge sourceBlock, with canonicalSource
    first. Sources duplicating canonicalSource are filtered. The N=1
    path produces the same 2-incoming phi as before (determinism
    gate: 42/42 golden hashes match).
  - retrieve_generalized_loop_control_slot_value_impl,
    retrieve_generalized_loop_target_slot_value_impl, and
    retrieve_generalized_loop_control_field_value_impl each emit
    (1 + N)-incoming phis from state.backedgeSources/Controls/Buffers.
  - retrieve_generalized_loop_phi_address_value_impl and
    retrieve_generalized_loop_local_phi_address_value_impl relax
    their 'phi->getNumIncomingValues() != 2' sanity check to accept
    any phi with >= 2 incomings, and match each incoming against
    canonicalSource or any of state->backedgeSources[i].

load_generalized_backup_impl:

  - Collects backedges whose sourceBlock differs from canonical AND
    whose controlCursor value differs from canonical; activates state
    only if at least one such backedge exists.
  - seedInvariantLocalQwords requires the qword to read identically
    from canonicalBuffer AND every backedgeBuffer to qualify.

record_generalized_loop_backedge_impl:

  - The rolled-control promotion (move current backedge into
    canonical, install new source as backedge) is only well-defined
    for the 1-backedge case, so it now guards on
    backedgeSources.size() == 1 and becomes a no-op for multi-way.
    Extending the rolled-control semantics to multi-way loops is
    left as follow-up when a real sample exercises it.

Tests (Tester.hpp):

  - runGeneralizedLoopThirdBackedgeOverwritesPriorBackedgeSilently
    flipped and renamed to runGeneralizedLoopThirdBackedgePreservesAllThreeSnapshots:
    asserts three-backedge vector holds one entry per sourceBlock.
  - runGeneralizedLoopLoadBackupWithThreeBackedgesProducesTwoWayPhiOnly
    flipped and renamed to runGeneralizedLoopLoadBackupWithThreeBackedgesProducesFourWayPhi:
    asserts GetMemoryValue(controlSlot) at the header yields a
    4-incoming phi carrying canonical + all three backedge control
    values.

Docs (docs/LOOP_HANDLING.md):

  - Struct and mergeValue snippets updated to N-way shapes.
  - branch_backup state-transition row describes append+dedup.
  - Multi-way backedge row removed from Known limitations.

Verification:

  - python test.py micro: all pass, including the two flipped tests.
  - python test.py baseline: all rewrite regression checks passed,
    determinism check passed (42 golden files match - 2-way loop
    IR shape unchanged).
  - Themida reference sample (../testthemida/example2-virt.bin @
    0x140001000): 2544 instructions lifted, 0 warnings, 0 errors.

Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
2026-04-23 01:53:32 +03:00
..
2024-06-11 15:26:55 +03:00