lifter: allow resolved indirect jumps to participate in structured loop generalization (#98)

* docs: sync rewrite workflow guidance * docs: drop machine-local pointers and fix stale README branch link * lifter: allow resolved indirect jumps to participate in structured loop generalization When a register-indirect jmp has already been resolved to a concrete target via solvePath (ConstantInt or solver), it's no longer speculative. If the target also points backward at a visited block, treat it as a loop back-edge for generalization purposes, the same way a direct or conditional jump would be treated. Introduces currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget() alongside the existing narrow predicate. canGeneralizeStructuredLoopHeader gains an opt-in targetResolvedConcretely parameter that routes through the widened check. getLiftedBackedgeBB uses the widened variant so back-edge reuse fires for resolved indirect jumps. resolveTargetBlock passes targetResolvedConcretely=true (its entry condition requires a concrete destination) and extends stackBypassGeneralizedLoopAddresses to include IndirectJump-context inserts. Ret-path contexts remain excluded. Tests updated: the old runLoopGeneralizationIndirectJumpBlocked splits into runLoopGeneralizationIndirectJumpBlockedWhenUnresolved (unchanged semantics) and runLoopGeneralizationIndirectJumpAllowedWhenResolved (new). runPendingGeneralizedLoopBlockedByContext becomes runPendingGeneralizedLoopByContext with an expectReuse parameter; Ret still expects no reuse, IndirectJump with a resolved target now expects reuse. --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
2026-05-12 09:40:34 +00:00 · 2026-04-19 05:36:45 +03:00
parent 0fbc2e9a52
commit 5708deef54
7 changed files with 131 additions and 36 deletions
@@ -47,6 +47,8 @@ Important invariants:
 - `.editorconfig` and `.clang-format` — formatting contract (2 spaces, LF, UTF-8, 100-column LLVM-based style).

 ## Development Commands
+Before running any command in this section, confirm the exact repo root and cwd. Prefer these repo-provided scripts over ad hoc shell commands.
+
 Preferred Windows build flow:
 ```bat
 cmd /c scripts\dev\configure_iced.cmd
@@ -109,6 +111,24 @@ scripts\rewrite\run_microtests.cmd --check-flags xor
  - Coverage/vector plumbing: `python test.py coverage --full` and `python test.py report --json`
  - Build script/CMake changes: rerun the affected `scripts\dev\configure_*.cmd` + `build_*.cmd` lane

+## Operator workflow defaults
+
+> Use these with the repo-specific architecture/test rules above.
+
+- Confirm the real repo root, source-of-truth file, and owning subsystem before searching or editing.
+- Narrow search scope before using broad repo scans.
+- Prefer `read`, `find`, `grep`, `ast_grep`, `edit`, `ast_edit`, and `lsp` before bash for discovery or structural edits.
+- Before build/test/git/bash commands, confirm the exact cwd and lane you intend to run.
+- If you edit the same file twice, re-read it first.
+- Default to one main line of work; split into subtasks only when file boundaries are real and outputs are independent.
+- Do not finish non-trivial work without focused verification that matches the changed subsystem.
+
+## What not to do
+- Do not start with repo-root scans when a narrower directory or entry document can answer the question.
+- Do not run configure/build/test commands from an assumed cwd.
+- Do not use bash-first discovery when a specialized tool can answer it.
+- Do not spawn reviewer/subtask branches just to spread a single code path across multiple agents.
+
 ## Process Notes For AI Assistants
 - Prefer `docs/REWRITE_BASELINE.md` and CI workflows over older generic build docs when commands disagree.
 - Do not edit generated files or artifact outputs unless the task is explicitly about generation.
@@ -119,7 +119,7 @@ jump next_handler;
 ```

 We try to always analyze values and keep track of them. This allows us to understand control flow. 
-[For jumptable-like branches](https://github.com/NaC-L/Mergen/blob/experimental-pattern-matching/testcases/test_branches.asm)
+[For jumptable-like branches](https://github.com/NaC-L/Mergen/blob/main/testcases/test_branches.asm)
 Optimized output would be a simple
 ```llvm
 define i64 @main(i64 %rax, i64 %rcx, i64 %rdx, i64 %rbx, i64 %rsp, i64 %rbp, i64 %rsi, i64 %rdi, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, ptr nocapture readnone %TEB, ptr nocapture readnone %memory) local_unnamed_addr #0 {
@@ -33,6 +33,8 @@ cmd /c scripts\dev\build_zydis.cmd
 ## Verify After Building
 Primary checks:

+The rewrite gate's sample-build lane is stricter than the core CMake build. CI requires a pinned `clang-cl` via `CLANG_CL_EXE`, `CMAKE_C_COMPILER`, or `LLVM_DIR`; for local `python test.py quick` / `all` runs, set `CLANG_CL_EXE=C:\Program Files\LLVM\bin\clang-cl.exe` when you want the same sample-build compiler resolution as CI instead of relying on local fallback discovery.
+
 ```bat
 python test.py quick
 python test.py all
@@ -53,12 +55,15 @@ Use `python test.py vmp` for larger control-flow/semantics/inlining changes when
 - `LLVM_DIR` — points CMake at `LLVMConfig.cmake`
 - `MERGEN_BUILD_JOBS` — overrides build parallelism (default `4`)
 - `CMAKE_C_COMPILER` / `CMAKE_CXX_COMPILER` — optional compiler override for the configure scripts
+- `CLANG_CL_EXE` — optional local override for the rewrite gate's sample-build path; set it to the pinned `clang-cl` when you want local `python test.py quick` / `all` runs to match CI compiler resolution

 Example:

 ```bat
+set CLANG_CL_EXE=C:\Program Files\LLVM\bin\clang-cl.exe
 set MERGEN_BUILD_JOBS=8
 cmd /c scripts\dev\build_iced.cmd
+python test.py quick
 ```

 ## Secondary Flows
@@ -22,17 +22,17 @@ Sample sources live in:
 - `scripts/rewrite/manifest_validation.ps1` — shared strict manifest validator used by both `run.ps1` and `verify.ps1`
 - `scripts/rewrite/run.cmd` — one-command Windows entrypoint
 - `scripts/rewrite/run_microtests.cmd` — runs `rewrite_microtests.exe` (in-process instruction-byte tests from `lifter/test/TestInstructions.cpp`); builds lazily only when the executable is missing, supports `--build` to force rebuild and `--no-build` to require prebuilt binaries
- `scripts/rewrite/collect_instruction_tests.cmd` — reports handler coverage against `lifter/x86_64_opcodes.x` using oracle vector metadata (`handler` field) to track missing instruction tests
- `scripts/rewrite/generate_oracle_vectors.cmd` — regenerates `lifter/test_vectors/oracle_vectors.json` from seed vectors using oracle providers (currently Unicorn)
+- `scripts/rewrite/collect_instruction_tests.cmd` — reports handler coverage against `lifter/semantics/x86_64_opcodes.x` using oracle vector metadata (`handler` field) to track missing instruction tests
+- `scripts/rewrite/generate_oracle_vectors.cmd` — regenerates `lifter/test/test_vectors/oracle_vectors.json` from seed vectors using oracle providers (currently Unicorn)
 - `scripts/rewrite/oracle_seed_vectors.json` — seed cases with instruction bytes, initial state, and tracked outputs for oracle generation
 - `scripts/rewrite/build_full_handler_seed.cmd` — builds `oracle_seed_full_handlers.json` (base semantic vectors + auto-discovered smoke vectors for missing handlers)
 - `scripts/rewrite/build_full_handler_seed.py` — Capstone-based opcode discovery that fills missing handlers and marks known-crashing handlers as `skip`
 - `scripts/rewrite/run_all_handlers.cmd` — generates full-handler seed/vectors and executes `rewrite_microtests.exe` across the full suite
- `scripts/rewrite/generate_flag_stress_vectors.cmd` — builds `lifter/test_vectors/oracle_vectors_flagstress.json` with multiple strict flag-oracle cases per flag-writing handler
- `scripts/rewrite/generate_flag_stress_vectors.py` — derives flag-writing handlers from `lifter/Semantics.ipp`, generates deterministic initial states, and computes expected flags via Unicorn
+- `scripts/rewrite/generate_flag_stress_vectors.cmd` — builds `lifter/test/test_vectors/oracle_vectors_flagstress.json` with multiple strict flag-oracle cases per flag-writing handler
+- `scripts/rewrite/generate_flag_stress_vectors.py` — derives flag-writing handlers from `lifter/semantics/Semantics.ipp`, generates deterministic initial states, and computes expected flags via Unicorn
 - `scripts/rewrite/run_flagstress.cmd` — one-command strict flag suite runner (auto-generates flag-stress vectors and executes microtests with strict flag assertions)
 - `run.ps1` validates that `instruction_microtests.json` covers every `testcases/rewrite_smoke/*` source file
- `scripts/rewrite/check_semantic.py` — runtime semantic regression for all lifted samples; reads `semantic` cases from the manifest, generates lli-executable wrappers, and verifies return values across all declared inputs (23 samples, 107 test cases)
+- `scripts/rewrite/check_semantic.py` — runtime semantic regression for all lifted samples; reads `semantic` cases from the manifest, generates lli-executable wrappers, and verifies return values across all declared inputs (33 samples, 177 test cases)

 Helper build scripts for local development are in:

@@ -56,7 +56,7 @@ set MERGEN_BUILD_JOBS=8    &rem fast builds on large machines
 Use `run_microtests.cmd --check-flags <filter>` to enforce oracle flag comparisons (strict mode, expected to fail until flag semantics are fixed).
 Use `run_microtests.cmd --build <filter>` to force rebuilding `rewrite_microtests.exe`, or `run_microtests.cmd --no-build <filter>` to skip any build step.
 Set `SKIP_ORACLE_GENERATION=1` to reuse a pre-generated oracle file. Set `MERGEN_TEST_VECTORS=<path>` to point tests at a custom oracle JSON file.
-Use `run_all_handlers.cmd` to exercise full handler coverage smoke tests. It writes `lifter/test_vectors/oracle_vectors_full_handlers.json` and then runs microtests against it through `run_microtests.cmd` (which now builds lazily).
+Use `run_all_handlers.cmd` to exercise full handler coverage smoke tests. It writes `lifter/test/test_vectors/oracle_vectors_full_handlers.json` and then runs microtests against it through `run_microtests.cmd` (which now builds lazily).
 Oracle vector JSON fixtures are deterministic by design; regenerating them should only change tracked files when the underlying cases change, not because of wall-clock metadata.
 Full-handler vectors are expected to execute end-to-end (no default `skip: true` crash exclusions).
 Use `run_flagstress.cmd` (or `python test.py flags`) for broad strict-flag validation across all handlers that explicitly write flags.
@@ -69,13 +69,13 @@ By default, regression artifacts are written to a sibling folder outside the rep
 - `../rewrite-regression-work/`

 Artifacts include:
- `lifter/test_vectors/oracle_vectors_flagstress.json` (generated strict-flag stress suite)
+- `lifter/test/test_vectors/oracle_vectors_flagstress.json` (generated strict-flag stress suite)

 - compiled sample binaries/maps/objects for every manifest entry
 - `ir_outputs/*.ll` and `ir_outputs/*_no_opts.ll` (replaced on each run after stale `.ll` cleanup)
 - `ir_outputs/*_semantic.ll` (generated by `check_semantic.py` for lli execution)

- `lifter/test_vectors/oracle_vectors_full_handlers.json` (generated by `run_all_handlers.cmd`)
+- `lifter/test/test_vectors/oracle_vectors_full_handlers.json` (generated by `run_all_handlers.cmd`)
 ## Running the baseline gate

 From repository root:
@@ -84,6 +84,8 @@ From repository root:
 scripts\rewrite\run.cmd
 ```

+CI requires a pinned sample-build compiler via `CLANG_CL_EXE`, `CMAKE_C_COMPILER`, or `LLVM_DIR`. For local runs, set `CLANG_CL_EXE=C:\Program Files\LLVM\bin\clang-cl.exe` when you want `scripts\rewrite\run.cmd` or `python test.py quick` to use the same sample-build compiler resolution as CI instead of relying on fallback discovery.
+
 Optional custom output directory:

 ```bat
@@ -131,13 +133,12 @@ Samples without a `semantic` field are not tested. The `semantic` field is optio

 ### Coverage summary

-Current active quick-gate semantic coverage is **30 samples / 171 cases** on CI.
+Current active quick-gate semantic coverage is **33 samples / 177 cases** on CI and local pinned-toolchain runs.

 Notable current state:
 - `dummy_vm_loop`, `bytecode_vm_loop`, and `stack_vm_loop` are active VM-shaped control-flow samples.
- `calc_sum_to_n` is active again under the safe structured-loop recovery path.
- `calc_fib` and `calc_sum_array` are `ci_skip` on `windows-latest` because the current hosted toolchain still emits loop/array codegen shapes that fail lifting there even though local developer runs pass.
- `calc_cout` remains `ci_skip` because its C++ codegen is toolchain-dependent on CI.
+- `calc_sum_to_n`, `calc_fib`, and `calc_sum_array` are active again under the current safe path.
+- `calc_cout` is active again after SSE2 `PUNPCKLQDQ` support landed; the manifest currently has zero `ci_skip` entries.

 ## Call-boundary ABI framework

@@ -61,15 +61,25 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
    auto it = addrToBB.find(target);
    const bool hasPendingGeneralization =
        pendingLoopGeneralizationAddresses.contains(target);
+    // `resolveTargetBlock` is only reached with a concrete destination, so an
+    // indirect jump whose target has just resolved participates in the same
+    // structured-loop generalization path that direct and conditional jumps
+    // already take.
    const bool canUseStructuredLoopGeneralization =
-        currentPathSolveAllowsStructuredLoopGeneralization();
+        currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget();
    const bool canReusePendingGeneralization =
        hasPendingGeneralization && canUseStructuredLoopGeneralization;
    const bool wantsGeneralization =
        canReusePendingGeneralization ||
-        (backwardVisitedTarget && canGeneralizeStructuredLoopHeader(target));
+        (backwardVisitedTarget &&
+         canGeneralizeStructuredLoopHeader(target,
+                                           /*targetResolvedConcretely=*/true));
    if (wantsGeneralization) {
-      if (currentPathSolveContext == PathSolveContext::DirectJump) {
+      // A resolved backward target participates in the same stack-concolic
+      // bypass regime regardless of whether the source jump is direct or
+      // indirect — both represent a confirmed loop back-edge.
+      if (currentPathSolveContext == PathSolveContext::DirectJump ||
+          currentPathSolveContext == PathSolveContext::IndirectJump) {
        stackBypassGeneralizedLoopAddresses.insert(target);
      }
      const bool generalizedBackup =
@@ -507,6 +507,16 @@ public:
    return currentPathSolveContext == PathSolveContext::ConditionalBranch ||
           currentPathSolveContext == PathSolveContext::DirectJump;
  }
+  // Widened variant: when the path solver has already resolved the branch
+  // target to a concrete address, an indirect jump is no longer speculative.
+  // If its target also points backward at a visited block it is legitimately
+  // a loop back-edge and should enter structured loop generalization alongside
+  // direct and conditional jumps. Ret-path contexts have their own lifecycle
+  // and stay excluded here.
+  bool currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget() const {
+    return currentPathSolveAllowsStructuredLoopGeneralization() ||
+           currentPathSolveContext == PathSolveContext::IndirectJump;
+  }
  bool isStructuredLoopHeaderShape(BasicBlock* block) const {
    std::set<BasicBlock*> seenBlocks;
    auto* current = block;
@@ -558,9 +568,13 @@ public:
    return false;
  }

-  bool canGeneralizeStructuredLoopHeader(uint64_t addr) {
-    if (getControlFlow() != ControlFlow::Unflatten ||
-        !currentPathSolveAllowsStructuredLoopGeneralization() ||
+  bool canGeneralizeStructuredLoopHeader(uint64_t addr,
+                                         bool targetResolvedConcretely = false) {
+    const bool contextAllows =
+        targetResolvedConcretely
+            ? currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget()
+            : currentPathSolveAllowsStructuredLoopGeneralization();
+    if (getControlFlow() != ControlFlow::Unflatten || !contextAllows ||
        addr > blockInfo.block_address || !visitedAddresses.contains(addr) ||
        pendingLoopGeneralizationAddresses.contains(addr) ||
        generalizedLoopAddresses.contains(addr)) {
@@ -821,8 +835,13 @@ public:


  BasicBlock* getLiftedBackedgeBB(uint64_t addr) {
+    // A resolved backward target is eligible for reuse regardless of whether
+    // the branching source was direct, conditional, or indirect. Once we have
+    // a non-empty generalized block for the address, re-entering it on a
+    // subsequent iteration should branch into that block rather than cutting a
+    // fresh empty one through `getOrCreateBB` (which would orphan the body).
    if (getControlFlow() != ControlFlow::Unflatten ||
-        !currentPathSolveAllowsStructuredLoopGeneralization()) {
+        !currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget()) {
      return nullptr;
    }
    if (addr > blockInfo.block_address ||
@@ -435,13 +435,37 @@ private:
    return true;
  }

-  bool runLoopGeneralizationIndirectJumpBlocked(std::string& details) {
+  bool runLoopGeneralizationIndirectJumpBlockedWhenUnresolved(std::string& details) {
+    // The unresolved-indirect-jump predicate must still exclude indirect
+    // dispatchers from speculative loop generalization. Without a concrete
+    // target, we have no proof the jump forms a backward loop edge.
    LifterUnderTest lifter;
    lifter.currentPathSolveContext =
        LifterUnderTest::PathSolveContext::IndirectJump;
    if (lifter.currentPathSolveAllowsStructuredLoopGeneralization()) {
      details =
-          "  indirect-jump dispatcher context must not generalize loop state\n";
+          "  unresolved indirect-jump context must not generalize loop state\n";
+      return false;
+    }
+    return true;
+  }
+
+  bool runLoopGeneralizationIndirectJumpAllowedWhenResolved(std::string& details) {
+    // Once `solvePath` has pinned an indirect jump to a concrete destination,
+    // the resolved-target predicate widens to admit it. Ret-path contexts
+    // still have their own lifecycle and stay excluded.
+    LifterUnderTest lifter;
+    lifter.currentPathSolveContext =
+        LifterUnderTest::PathSolveContext::IndirectJump;
+    if (!lifter.currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget()) {
+      details =
+          "  resolved indirect-jump context must allow structured loop generalization\n";
+      return false;
+    }
+    lifter.currentPathSolveContext = LifterUnderTest::PathSolveContext::Ret;
+    if (lifter.currentPathSolveAllowsStructuredLoopGeneralizationForResolvedTarget()) {
+      details =
+          "  ret context must never participate in structured loop generalization\n";
      return false;
    }
    return true;
@@ -457,9 +481,9 @@ private:
    return true;
  }

-  bool runPendingGeneralizedLoopBlockedByContext(
+  bool runPendingGeneralizedLoopByContext(
      LifterUnderTest::PathSolveContext context, const char* contextName,
-      std::string& details) {
+      bool expectReuse, std::string& details) {
    LifterUnderTest lifter;
    lifter.currentPathSolveContext = context;

@@ -486,13 +510,19 @@ private:
                " context did not emit the expected direct branch\n";
      return false;
    }
-    if (branch->getSuccessor(0) == pending) {
+    const bool reused = branch->getSuccessor(0) == pending;
+    if (expectReuse && !reused) {
+      details = std::string("  ") + contextName +
+                " context must reuse the pending generalized loop header when the target resolved concretely\n";
+      return false;
+    }
+    if (!expectReuse && reused) {
      details = std::string("  ") + contextName +
                " context must not reuse a pending generalized loop header\n";
      return false;
    }
-    if (lifter.unvisitedBlocks.empty() ||
-        lifter.unvisitedBlocks.back().block == pending) {
+    if (!expectReuse && (lifter.unvisitedBlocks.empty() ||
+                          lifter.unvisitedBlocks.back().block == pending)) {
      details = std::string("  ") + contextName +
                " context queued the pending generalized loop header instead of a fresh block\n";
      return false;
@@ -505,14 +535,22 @@ private:
    return true;
  }

-  bool runPendingGeneralizedLoopIndirectJumpBlocked(std::string& details) {
-    return runPendingGeneralizedLoopBlockedByContext(
-        LifterUnderTest::PathSolveContext::IndirectJump, "indirect-jump", details);
+  bool runPendingGeneralizedLoopIndirectJumpAllowedWhenResolved(std::string& details) {
+    // After the resolved-target relaxation, a constant-folded indirect-jump
+    // target that matches a pending generalized loop header is reused just
+    // like a direct-jump target would be.
+    return runPendingGeneralizedLoopByContext(
+        LifterUnderTest::PathSolveContext::IndirectJump, "indirect-jump",
+        /*expectReuse=*/true, details);
  }

  bool runPendingGeneralizedLoopRetBlocked(std::string& details) {
-    return runPendingGeneralizedLoopBlockedByContext(
-        LifterUnderTest::PathSolveContext::Ret, "return-path", details);
+    // Return-path contexts keep their own lifecycle — they must not reuse
+    // pending generalized loop headers, even now that the resolved-target
+    // relaxation admits indirect jumps.
+    return runPendingGeneralizedLoopByContext(
+        LifterUnderTest::PathSolveContext::Ret, "return-path",
+        /*expectReuse=*/false, details);
  }


@@ -936,12 +974,14 @@ private:
             &InstructionTester::runLoopGeneralizationConditionalBranchAllowed);
    runCustom("loop_generalization_direct_jump_allowed",
             &InstructionTester::runLoopGeneralizationDirectJumpAllowed);
-    runCustom("loop_generalization_indirect_jump_blocked",
-             &InstructionTester::runLoopGeneralizationIndirectJumpBlocked);
+    runCustom("loop_generalization_indirect_jump_blocked_when_unresolved",
+             &InstructionTester::runLoopGeneralizationIndirectJumpBlockedWhenUnresolved);
+    runCustom("loop_generalization_indirect_jump_allowed_when_resolved",
+             &InstructionTester::runLoopGeneralizationIndirectJumpAllowedWhenResolved);
    runCustom("loop_generalization_ret_blocked",
             &InstructionTester::runLoopGeneralizationRetBlocked);
-    runCustom("pending_generalized_loop_indirect_jump_blocked",
-             &InstructionTester::runPendingGeneralizedLoopIndirectJumpBlocked);
+    runCustom("pending_generalized_loop_indirect_jump_allowed_when_resolved",
+             &InstructionTester::runPendingGeneralizedLoopIndirectJumpAllowedWhenResolved);
    runCustom("pending_generalized_loop_ret_blocked",
             &InstructionTester::runPendingGeneralizedLoopRetBlocked);
    runCustom("structured_loop_header_allows_conditional_backedge",