* tests: add Themida devirtualization import-equivalence check
Adds python test.py themida that lifts every sample in
scripts/rewrite/themida_samples.json and asserts the resulting IR calls
every import declared in required_imports. Names are pinned against
a lift of the non-virtualized reference binary via --update.
This is a correctness gate that complements the existing
coverage gate ('2544 instructions, 0 errors'). Currently red on
example2-virt.bin: the lifter unrolls the VM without surfacing
GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the
guest program. That gap is the active devirtualization frontier;
this test makes it visible instead of silently green.
Samples whose binaries are absent (`../testthemida/*.bin` lives
outside the repo) are skipped rather than failed, so the check
runs cleanly in CI without the binaries present.
* diag: add Unicorn-based external-call tracer; document Themida transform
Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn,
patches every IAT slot with a unique unmapped-address sentinel, then
emulates from the chosen entry. When any call/jmp/ret resolves its
target to a sentinel, logs the call-site address, the mnemonic, and
the addressing form. One-shot diagnostic for answering 'what x86
instruction issues this external call at runtime.'
Using it on example2-virt.bin shows the Themida transform precisely:
- guest imports (GetStdHandle etc) remain in the IAT
- every guest call site is rewritten from 'call [rip+IAT]' to a
VM-staged 'push target; ret' where target was loaded from the
IAT upstream
- for example2, the first external call happens at VA 0x14017fa77
via 'ret 0', popping the GetStdHandle IAT value off the stack
- Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503)
from the IAT; our ignore_imports filter already accounts for this
The lifter's current recognition handles direct call-through-IAT
and register-indirect IAT calls (the non-virt binary resolves 5
imports cleanly). It does not recognize the ret-pops-IAT-loaded-
pointer pattern, which is why the virt lift surfaces zero imports.
Also annotates themida_samples.json with these properties inline
so the transform semantics live next to the test that exercises
them.
* diag: trace_external_calls can dump visited PCs and record sentinel push chain
Two additions, both motivated by the example2-virt.bin diagnosis session:
- --dump-visited <path>: writes every unique instruction PC the emulator
executes, in first-visit order. Diff against the lifter's 'reached
addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the
lifter's static exploration diverges from the dynamic path.
- UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a
sentinel. Records every such write, not just the first, because Themida
uses push-pop swap gadgets that stage a sentinel on the stack
transiently before the 'real' push lands it at the ret-target slot.
The last-5-pushes summary exposes this.
Findings for example2-virt.bin @ 0x140001000:
- lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%)
- external call site is at emu_pos=4209; gap of 2911 unvisited PCs
- lifter visits 5 addresses the runtime never takes (wrong concolic branch)
- the 'final push to ret slot' is not a 'push [iat]' but rather
'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged
stack slot in place to reconstruct the IAT pointer. Pattern-match
recognition alone cannot handle this; concrete VM-dispatch unrolling
is required.
* diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization
Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader.
When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing
pure concrete exploration with no phi-widening abstraction.
Diagnostic knob, not a user-facing feature. Used to localise how much
of a lift's coverage depends on generalization vs. the concolic engine.
Measurement on example2-virt.bin @ 0x140001000:
gen ON gen OFF (NO_LOOP_GEN=1)
blocks_attempted 56 2642 (47x)
instructions_lifted 2544 34229 (13.5x)
output_no_opts.ll lines 6022 30481 (5x)
unique addrs visited 34 338 (10x)
addrs in 0x14017xxxx 0 103 (call-handler cluster)
external call site reached: no yes (via BB 0x14017fa72)
themida equivalence test: red red (recognition still gap)
Loop-generalization is the dominant reachability blocker on Themida
VM dispatchers at current tuning. Pure concrete exploration reaches
the external-call handler block but does not emit named import calls
because lift_ret has no path to match a resolved ret target against
importMap. Recognition is the next fix surface; reachability is large
mostly because of generalization tuning.
Side-effects of gen OFF that are NOT acceptable in production:
- Lifter decodes .rdata IAT bytes as instructions (OUTSD error at
0x140002688 on this sample)
- Top-revisited addresses hit ~1142x each: the lifter spins in
tight loops without generalization cutting them off; block budget
(4096) would fire eventually on a larger sample
So the knob is purely diagnostic. The real production fix is selective
generalization (distinguish 'VM dispatcher' from 'guest loop') plus
lift_ret import recognition.
* lifter: recognize ret-to-IAT as named external call in lift_ret
Adds a recognition path in lift_ret: if the value being popped resolves
to a concrete address that's in importMap, emit callFunctionIR for the
named import, then simulate the external's own ret by popping one more
qword (the continuation address pre-staged by the caller). solvePath
then continues at the continuation instead of trying to lift the IAT
pointer as code.
Two resolution routes:
1. realval is a ConstantInt (direct push+ret of an IAT load)
2. realval is symbolic but computePossibleValues folds to a single
concrete value (obfuscated chains that constant-fold at this path)
Scope limits:
- Non-virt example2.bin lift is unchanged (still resolves 5 imports
via register-indirect path; the new ret path does not fire because
the binary uses 'call [iat]', not 'push+ret').
- Virt example2-virt.bin lift: the recognition code runs but does not
surface imports because the lifter's static resolution of the
arithmetic-decrypt chain produces wrong concrete targets. E.g. the
ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via
computePossibleValues; at runtime the emulator sees it pop the
GetStdHandle IAT pointer (0x140002490). The recognition logic is
correct; the upstream data flow is lying. Fixing that requires
selective-generalization tuning or concrete VM unrolling, tracked
separately.
So β lands as ground work for simpler push+ret thunks and for future
work where state-propagation fidelity improves. It is not a Themida
fix on its own.
* lifter: gate canGeneralize on per-header revisit count
Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader:
below threshold N the gate rejects (concrete exploration continues);
at or above N it falls through to the existing loop-shape checks.
Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches
pre-existing behaviour).
Also promotes ++liftAttemptCounts[addr] out from under the
liftProgressDiagEnabled gate so the counter is always maintained.
Rationale: on Themida example2-virt.bin @ 0x140001000, the existing
gate (always-generalize on first qualifying revisit) abstracts the
VM's dispatch loop too early, cutting reachability to ~30% of the
dynamic execution path. A higher threshold lets the dispatcher run
concretely for more iterations before abstracting. Measurement (all
other settings at defaults):
T=0 (current) blocks= 56 insns= 2544 err=0 warn=0
T=4 blocks= 88 insns= 3842 err=0 warn=4
T=16 blocks= 393 insns= 11747 err=1 warn=0
T=32 blocks= 425 insns= 12067 err=1 warn=0
T=128 blocks= 617 insns= 13987 err=1 warn=0
MERGEN_NO_LOOP_GEN=1 (kill)
blocks= 2642 insns= 34229 err=1 warn=0
Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation
partway through lifting. The crash fires in the Themida dispatcher
state machinery around 0x1400237F9 when generalization fires mid-
iteration with state that the existing machinery is not prepared to
handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable.
So the knob is landing as experimental infrastructure with default=0
(no-op). Future work can pair a safe non-zero default with a fix for
the dispatcher-state crash.
---------
Co-authored-by: Claude <claude@anthropic.com>
The calc_cout test was skipped since Phase 1 with the assumption that
statically-linked STL calls required a new inline policy. In reality,
the .pdata auto-outline (PR #78) already registers all STL functions
as outline targets, and the lifter correctly outlines the operator<<
call and lifts the pure computation (x*3+7).
The test was failing because it used the wrong symbol name ('calc_cout'
instead of the MSVC-mangled '?calc_cout@@YAHH@Z').
Changes:
- Un-skip calc_cout, fix symbol to mangled name, add 4 semantic cases
- Add _INTTOPTR_CALL_RE to strip outlined calls to concrete addresses
(they segfault in lli but are provably dead for return value)
- Golden hashes: 42 -> 44 files
Zero skipped tests remaining. 29/29 samples, 150 semantic cases.
Two new post-optimization passes that run after the final O2 pipeline:
PrototypeMinimizationPass:
- Removes unused function arguments based on Argument::use_empty()
- Typical reduction: 34 params -> 0-2 (e.g. @main(i64 %RCX) instead of all 16 GPRs + 16 XMMs + 2 ptrs)
- Splices basic blocks into new function, remaps argument uses, erases old function
- Updated check_semantic.py to parse actual IR signatures instead of hardcoded 34-param list
CanonicalNamingPass:
- Strips address-derived suffixes from block/value names for deterministic output
- Blocks: entry, bb1, bb2, ... (sequential)
- Values: semantic prefix preserved, address suffix removed (realadd-5368713230- -> realadd)
- Same input now produces byte-identical IR across rebuilds
Also fixed writeFunctionToFile to use stored module pointer M instead of
fnc->getParent() (dangling after prototype minimization erases the old function).
Review fixes:
- CanonicalNamingPass: use StringMap<unsigned> instead of DenseMap<StringRef> (dangling key)
- PrototypeMinimizationPass: restrict call rewriting to CallInst (not InvokeInst/CallBrInst)
- PrototypeMinimizationPass: guard F->eraseFromParent() with use_empty() check
- check_semantic.py: widen define regex to handle dso_local and other prefixes
All 28 samples pass, 146 semantic cases, 56 golden hashes updated.
C-compiled samples (calc_fib, calc_grade, calc_jumptable, calc_mixed,
calc_sum_array, calc_switch) produce toolchain-dependent binaries with
different virtual addresses. The lifted IR contains these addresses, so
IR hashes vary across build environments.
- Remove calc_* entries from golden_ir_hashes.json (34 ASM-stable entries remain)
- Update check_determinism() to only validate files in the golden set;
extra IR files from C-compiled samples are reported as skipped
- Correctness for C-compiled samples is enforced by semantic tests
(107 cases verifying actual computed return values via lli)
The InstructionKey::InstructionKeyInfo had getEmptyKey() and getTombstoneKey()
both returning InstructionKey(nullptr, nullptr). LLVM DenseMap requires these
to be distinct sentinel values. This violated the DenseMap contract, causing
bucket corruption during copy/iteration (the old FIXME about 'last item
corrupted').
Fix: use reinterpret_cast sentinel pointers -1 and -2, matching LLVM convention.
Also cleaned up the non-const copy constructor (removed dead local copy and
stale FIXME comment).
Also adds:
- switch_sparse.asm test (non-consecutive case values: 10, 50, 200, 1000)
- calc_cout.cpp test (skipped - documents inline policy limitation with STL)
- C++ compilation support in build_samples.cmd
- Skip mechanism for manifest entries (skip: true + skip_reason)
- Fix test.py update-golden to not run determinism check before updating
68 pattern checks, 40 golden hashes, 108 handler microtests — all green.