12 Commits

Author SHA1 Message Date
naci aff35bc01c diag: sentinel pages as ret stubs so tracer observes every import (#190)
Before: each IAT slot pointed at an unmapped sentinel address, so the
FIRST import call raised UC_ERR_FETCH_UNMAPPED and the emulator stopped.
We only ever observed one import per run.

After: sentinel addresses live in a mapped page filled with 0xC3 (near
ret) instructions. Each import call fetches the ret byte, immediately
returns to the VMs pre-staged continuation, and emulation keeps going.
All subsequent imports now surface as [HIT] events.

On example2-virt.bin @ 0x140001000 this finds every required import:

  insn   ret-site      import         target
  ----   --------      ------         ------
  34223  0x14017fa77   GetStdHandle   stdin
  44847  0x14017fa77   GetStdHandle   stdout
  60695  0x14017ef9f   WriteConsoleA  prompt
  74394  0x140192798   ReadConsoleA
  85326  0x140157ef9   CharUpperA
  97859  0x14013bf11   WriteConsoleA  echo
  110166 0x14017fa77   WriteConsoleA  final

This gives the full map of import ret-site addresses for the virt
sample - useful for future work that needs to reach those sites
(whether by deeper lifter exploration or by seeding additional
entries). The lifter currently reaches 0x14017fa77 only.

Co-authored-by: Claude <claude@anthropic.com>
2026-04-24 18:47:21 +03:00
naci c8102a69cf themida: correctness gate, diagnostic tracer, ret-to-IAT recognition, gen revisit knob (#182)
* tests: add Themida devirtualization import-equivalence check

Adds python test.py themida that lifts every sample in
scripts/rewrite/themida_samples.json and asserts the resulting IR calls
every import declared in required_imports. Names are pinned against
a lift of the non-virtualized reference binary via --update.

This is a correctness gate that complements the existing
coverage gate ('2544 instructions, 0 errors'). Currently red on
example2-virt.bin: the lifter unrolls the VM without surfacing
GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the
guest program. That gap is the active devirtualization frontier;
this test makes it visible instead of silently green.

Samples whose binaries are absent (`../testthemida/*.bin` lives
outside the repo) are skipped rather than failed, so the check
runs cleanly in CI without the binaries present.

* diag: add Unicorn-based external-call tracer; document Themida transform

Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn,
patches every IAT slot with a unique unmapped-address sentinel, then
emulates from the chosen entry. When any call/jmp/ret resolves its
target to a sentinel, logs the call-site address, the mnemonic, and
the addressing form. One-shot diagnostic for answering 'what x86
instruction issues this external call at runtime.'

Using it on example2-virt.bin shows the Themida transform precisely:
- guest imports (GetStdHandle etc) remain in the IAT
- every guest call site is rewritten from 'call [rip+IAT]' to a
  VM-staged 'push target; ret' where target was loaded from the
  IAT upstream
- for example2, the first external call happens at VA 0x14017fa77
  via 'ret 0', popping the GetStdHandle IAT value off the stack
- Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503)
  from the IAT; our ignore_imports filter already accounts for this

The lifter's current recognition handles direct call-through-IAT
and register-indirect IAT calls (the non-virt binary resolves 5
imports cleanly). It does not recognize the ret-pops-IAT-loaded-
pointer pattern, which is why the virt lift surfaces zero imports.

Also annotates themida_samples.json with these properties inline
so the transform semantics live next to the test that exercises
them.

* diag: trace_external_calls can dump visited PCs and record sentinel push chain

Two additions, both motivated by the example2-virt.bin diagnosis session:

- --dump-visited <path>: writes every unique instruction PC the emulator
  executes, in first-visit order. Diff against the lifter's 'reached
  addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the
  lifter's static exploration diverges from the dynamic path.

- UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a
  sentinel. Records every such write, not just the first, because Themida
  uses push-pop swap gadgets that stage a sentinel on the stack
  transiently before the 'real' push lands it at the ret-target slot.
  The last-5-pushes summary exposes this.

Findings for example2-virt.bin @ 0x140001000:
- lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%)
- external call site is at emu_pos=4209; gap of 2911 unvisited PCs
- lifter visits 5 addresses the runtime never takes (wrong concolic branch)
- the 'final push to ret slot' is not a 'push [iat]' but rather
  'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged
  stack slot in place to reconstruct the IAT pointer. Pattern-match
  recognition alone cannot handle this; concrete VM-dispatch unrolling
  is required.

* diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization

Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader.
When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing
pure concrete exploration with no phi-widening abstraction.

Diagnostic knob, not a user-facing feature. Used to localise how much
of a lift's coverage depends on generalization vs. the concolic engine.

Measurement on example2-virt.bin @ 0x140001000:

                            gen ON        gen OFF (NO_LOOP_GEN=1)
  blocks_attempted              56          2642   (47x)
  instructions_lifted         2544         34229   (13.5x)
  output_no_opts.ll lines     6022         30481   (5x)
  unique addrs visited          34           338   (10x)
  addrs in 0x14017xxxx           0           103   (call-handler cluster)
  external call site reached:   no           yes (via BB 0x14017fa72)
  themida equivalence test:    red           red (recognition still gap)

Loop-generalization is the dominant reachability blocker on Themida
VM dispatchers at current tuning. Pure concrete exploration reaches
the external-call handler block but does not emit named import calls
because lift_ret has no path to match a resolved ret target against
importMap. Recognition is the next fix surface; reachability is large
mostly because of generalization tuning.

Side-effects of gen OFF that are NOT acceptable in production:
 - Lifter decodes .rdata IAT bytes as instructions (OUTSD error at
   0x140002688 on this sample)
 - Top-revisited addresses hit ~1142x each: the lifter spins in
   tight loops without generalization cutting them off; block budget
   (4096) would fire eventually on a larger sample

So the knob is purely diagnostic. The real production fix is selective
generalization (distinguish 'VM dispatcher' from 'guest loop') plus
lift_ret import recognition.

* lifter: recognize ret-to-IAT as named external call in lift_ret

Adds a recognition path in lift_ret: if the value being popped resolves
to a concrete address that's in importMap, emit callFunctionIR for the
named import, then simulate the external's own ret by popping one more
qword (the continuation address pre-staged by the caller). solvePath
then continues at the continuation instead of trying to lift the IAT
pointer as code.

Two resolution routes:
  1. realval is a ConstantInt (direct push+ret of an IAT load)
  2. realval is symbolic but computePossibleValues folds to a single
     concrete value (obfuscated chains that constant-fold at this path)

Scope limits:
- Non-virt example2.bin lift is unchanged (still resolves 5 imports
  via register-indirect path; the new ret path does not fire because
  the binary uses 'call [iat]', not 'push+ret').
- Virt example2-virt.bin lift: the recognition code runs but does not
  surface imports because the lifter's static resolution of the
  arithmetic-decrypt chain produces wrong concrete targets. E.g. the
  ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via
  computePossibleValues; at runtime the emulator sees it pop the
  GetStdHandle IAT pointer (0x140002490). The recognition logic is
  correct; the upstream data flow is lying. Fixing that requires
  selective-generalization tuning or concrete VM unrolling, tracked
  separately.

So β lands as ground work for simpler push+ret thunks and for future
work where state-propagation fidelity improves. It is not a Themida
fix on its own.

* lifter: gate canGeneralize on per-header revisit count

Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader:
below threshold N the gate rejects (concrete exploration continues);
at or above N it falls through to the existing loop-shape checks.
Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches
pre-existing behaviour).

Also promotes ++liftAttemptCounts[addr] out from under the
liftProgressDiagEnabled gate so the counter is always maintained.

Rationale: on Themida example2-virt.bin @ 0x140001000, the existing
gate (always-generalize on first qualifying revisit) abstracts the
VM's dispatch loop too early, cutting reachability to ~30% of the
dynamic execution path. A higher threshold lets the dispatcher run
concretely for more iterations before abstracting. Measurement (all
other settings at defaults):

  T=0   (current)  blocks=   56   insns=  2544   err=0  warn=0
  T=4              blocks=   88   insns=  3842   err=0  warn=4
  T=16             blocks=  393   insns= 11747   err=1  warn=0
  T=32             blocks=  425   insns= 12067   err=1  warn=0
  T=128            blocks=  617   insns= 13987   err=1  warn=0
  MERGEN_NO_LOOP_GEN=1 (kill)
                   blocks= 2642   insns= 34229   err=1  warn=0

Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation
partway through lifting. The crash fires in the Themida dispatcher
state machinery around 0x1400237F9 when generalization fires mid-
iteration with state that the existing machinery is not prepared to
handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable.

So the knob is landing as experimental infrastructure with default=0
(no-op). Future work can pair a safe non-zero default with a fix for
the dispatcher-state crash.

---------

Co-authored-by: Claude <claude@anthropic.com>
2026-04-24 14:54:22 +03:00
yusufcanislek eb49a35cc7 Pin rewrite CI clang-cl toolchain 2026-04-04 15:53:28 +03:00
yusufcanislek ce77dc514d Fix loop recovery and clang-cl CI gaps 2026-04-04 15:12:49 +03:00
yusufcanislek 555a23def5 Force pinned LLVM clang in rewrite CI 2026-04-04 09:56:03 +03:00
yusufcanislek 2fd523b4e4 Pin CI clang toolchain selection 2026-04-04 09:47:18 +03:00
yusufcanislek 67c26c9bab Fix iced CI cargo detection 2026-04-03 20:28:26 +03:00
yusufcanislek 8fba033cc6 Fix VMP gate and loop safety 2026-04-03 15:00:42 +03:00
yusufcanislek cdd52c271a feat: profile VMP samples and speed up lifting
- add nested lift diagnostics and helper-level profiling for protected381 targets
- refactor function signature specs and optimize folderBinOps fast paths
- implement the full SCAS family and document VMP 3.6 INT dispatcher findings
- add reproducible profiling scripts and root VMP testing notes
2026-03-31 22:02:57 +03:00
yusufcanislek eb10474eb8 feat: commit working-tree changes required by rewrite gates
Lifter improvements:
- PathSolver.ipp: enhanced path memoization, switch-target diagnostics
- GEPTracker.ipp: expanded value tracking, graceful bail-out paths
- Semantics_Misc.ipp: clean up CPUID handler (remove dead comments,
  simplify constant emission)

Rewrite infrastructure:
- instruction_microtests.json: add jumptable manifest entries
  (calc_jumptable, jumptable_basic, jumptable_dense) with semantic cases
- golden_ir_hashes.json: add hashes for new jumptable samples
- build_samples.cmd: support C jumptable /O2 compilation pass
- oracle vectors: regenerated (oracle_vectors.json trimmed to current
  seed set, full-handler vectors updated with new handlers)
- run_microtests.cmd / run_all_handlers.cmd: script improvements
- test.py: add jumptable semantic cases to coverage

Dev scripts:
- configure_iced/zydis.cmd, build_iced/zydis.cmd: improved toolchain
  detection and MERGEN_BUILD_JOBS support

Review automation:
- format_comment.py, invariant_guard.py, risk_map.py, shard_pr.py:
  minor fixes aligned with verify_plan public API rename

Docs:
- REWRITE_BASELINE.md: updated coverage summary and script docs
- REVIEWER_RULES.md: minor formatting
2026-03-26 07:53:43 +03:00
yusufcanislek 96f1caa06a build: make iced/zydis selection deterministic in scripts and cmake 2026-03-19 01:59:44 +03:00
yusufcanislek 567e0d7daf Add rewrite regression automation, vectors, and documentation 2026-03-03 23:04:21 +03:00