Before: each IAT slot pointed at an unmapped sentinel address, so the
FIRST import call raised UC_ERR_FETCH_UNMAPPED and the emulator stopped.
We only ever observed one import per run.
After: sentinel addresses live in a mapped page filled with 0xC3 (near
ret) instructions. Each import call fetches the ret byte, immediately
returns to the VMs pre-staged continuation, and emulation keeps going.
All subsequent imports now surface as [HIT] events.
On example2-virt.bin @ 0x140001000 this finds every required import:
insn ret-site import target
---- -------- ------ ------
34223 0x14017fa77 GetStdHandle stdin
44847 0x14017fa77 GetStdHandle stdout
60695 0x14017ef9f WriteConsoleA prompt
74394 0x140192798 ReadConsoleA
85326 0x140157ef9 CharUpperA
97859 0x14013bf11 WriteConsoleA echo
110166 0x14017fa77 WriteConsoleA final
This gives the full map of import ret-site addresses for the virt
sample - useful for future work that needs to reach those sites
(whether by deeper lifter exploration or by seeding additional
entries). The lifter currently reaches 0x14017fa77 only.
Co-authored-by: Claude <claude@anthropic.com>
* tests: add Themida devirtualization import-equivalence check
Adds python test.py themida that lifts every sample in
scripts/rewrite/themida_samples.json and asserts the resulting IR calls
every import declared in required_imports. Names are pinned against
a lift of the non-virtualized reference binary via --update.
This is a correctness gate that complements the existing
coverage gate ('2544 instructions, 0 errors'). Currently red on
example2-virt.bin: the lifter unrolls the VM without surfacing
GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the
guest program. That gap is the active devirtualization frontier;
this test makes it visible instead of silently green.
Samples whose binaries are absent (`../testthemida/*.bin` lives
outside the repo) are skipped rather than failed, so the check
runs cleanly in CI without the binaries present.
* diag: add Unicorn-based external-call tracer; document Themida transform
Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn,
patches every IAT slot with a unique unmapped-address sentinel, then
emulates from the chosen entry. When any call/jmp/ret resolves its
target to a sentinel, logs the call-site address, the mnemonic, and
the addressing form. One-shot diagnostic for answering 'what x86
instruction issues this external call at runtime.'
Using it on example2-virt.bin shows the Themida transform precisely:
- guest imports (GetStdHandle etc) remain in the IAT
- every guest call site is rewritten from 'call [rip+IAT]' to a
VM-staged 'push target; ret' where target was loaded from the
IAT upstream
- for example2, the first external call happens at VA 0x14017fa77
via 'ret 0', popping the GetStdHandle IAT value off the stack
- Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503)
from the IAT; our ignore_imports filter already accounts for this
The lifter's current recognition handles direct call-through-IAT
and register-indirect IAT calls (the non-virt binary resolves 5
imports cleanly). It does not recognize the ret-pops-IAT-loaded-
pointer pattern, which is why the virt lift surfaces zero imports.
Also annotates themida_samples.json with these properties inline
so the transform semantics live next to the test that exercises
them.
* diag: trace_external_calls can dump visited PCs and record sentinel push chain
Two additions, both motivated by the example2-virt.bin diagnosis session:
- --dump-visited <path>: writes every unique instruction PC the emulator
executes, in first-visit order. Diff against the lifter's 'reached
addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the
lifter's static exploration diverges from the dynamic path.
- UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a
sentinel. Records every such write, not just the first, because Themida
uses push-pop swap gadgets that stage a sentinel on the stack
transiently before the 'real' push lands it at the ret-target slot.
The last-5-pushes summary exposes this.
Findings for example2-virt.bin @ 0x140001000:
- lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%)
- external call site is at emu_pos=4209; gap of 2911 unvisited PCs
- lifter visits 5 addresses the runtime never takes (wrong concolic branch)
- the 'final push to ret slot' is not a 'push [iat]' but rather
'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged
stack slot in place to reconstruct the IAT pointer. Pattern-match
recognition alone cannot handle this; concrete VM-dispatch unrolling
is required.
* diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization
Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader.
When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing
pure concrete exploration with no phi-widening abstraction.
Diagnostic knob, not a user-facing feature. Used to localise how much
of a lift's coverage depends on generalization vs. the concolic engine.
Measurement on example2-virt.bin @ 0x140001000:
gen ON gen OFF (NO_LOOP_GEN=1)
blocks_attempted 56 2642 (47x)
instructions_lifted 2544 34229 (13.5x)
output_no_opts.ll lines 6022 30481 (5x)
unique addrs visited 34 338 (10x)
addrs in 0x14017xxxx 0 103 (call-handler cluster)
external call site reached: no yes (via BB 0x14017fa72)
themida equivalence test: red red (recognition still gap)
Loop-generalization is the dominant reachability blocker on Themida
VM dispatchers at current tuning. Pure concrete exploration reaches
the external-call handler block but does not emit named import calls
because lift_ret has no path to match a resolved ret target against
importMap. Recognition is the next fix surface; reachability is large
mostly because of generalization tuning.
Side-effects of gen OFF that are NOT acceptable in production:
- Lifter decodes .rdata IAT bytes as instructions (OUTSD error at
0x140002688 on this sample)
- Top-revisited addresses hit ~1142x each: the lifter spins in
tight loops without generalization cutting them off; block budget
(4096) would fire eventually on a larger sample
So the knob is purely diagnostic. The real production fix is selective
generalization (distinguish 'VM dispatcher' from 'guest loop') plus
lift_ret import recognition.
* lifter: recognize ret-to-IAT as named external call in lift_ret
Adds a recognition path in lift_ret: if the value being popped resolves
to a concrete address that's in importMap, emit callFunctionIR for the
named import, then simulate the external's own ret by popping one more
qword (the continuation address pre-staged by the caller). solvePath
then continues at the continuation instead of trying to lift the IAT
pointer as code.
Two resolution routes:
1. realval is a ConstantInt (direct push+ret of an IAT load)
2. realval is symbolic but computePossibleValues folds to a single
concrete value (obfuscated chains that constant-fold at this path)
Scope limits:
- Non-virt example2.bin lift is unchanged (still resolves 5 imports
via register-indirect path; the new ret path does not fire because
the binary uses 'call [iat]', not 'push+ret').
- Virt example2-virt.bin lift: the recognition code runs but does not
surface imports because the lifter's static resolution of the
arithmetic-decrypt chain produces wrong concrete targets. E.g. the
ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via
computePossibleValues; at runtime the emulator sees it pop the
GetStdHandle IAT pointer (0x140002490). The recognition logic is
correct; the upstream data flow is lying. Fixing that requires
selective-generalization tuning or concrete VM unrolling, tracked
separately.
So β lands as ground work for simpler push+ret thunks and for future
work where state-propagation fidelity improves. It is not a Themida
fix on its own.
* lifter: gate canGeneralize on per-header revisit count
Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader:
below threshold N the gate rejects (concrete exploration continues);
at or above N it falls through to the existing loop-shape checks.
Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches
pre-existing behaviour).
Also promotes ++liftAttemptCounts[addr] out from under the
liftProgressDiagEnabled gate so the counter is always maintained.
Rationale: on Themida example2-virt.bin @ 0x140001000, the existing
gate (always-generalize on first qualifying revisit) abstracts the
VM's dispatch loop too early, cutting reachability to ~30% of the
dynamic execution path. A higher threshold lets the dispatcher run
concretely for more iterations before abstracting. Measurement (all
other settings at defaults):
T=0 (current) blocks= 56 insns= 2544 err=0 warn=0
T=4 blocks= 88 insns= 3842 err=0 warn=4
T=16 blocks= 393 insns= 11747 err=1 warn=0
T=32 blocks= 425 insns= 12067 err=1 warn=0
T=128 blocks= 617 insns= 13987 err=1 warn=0
MERGEN_NO_LOOP_GEN=1 (kill)
blocks= 2642 insns= 34229 err=1 warn=0
Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation
partway through lifting. The crash fires in the Themida dispatcher
state machinery around 0x1400237F9 when generalization fires mid-
iteration with state that the existing machinery is not prepared to
handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable.
So the knob is landing as experimental infrastructure with default=0
(no-op). Future work can pair a safe non-zero default with a fix for
the dispatcher-state crash.
---------
Co-authored-by: Claude <claude@anthropic.com>
- add nested lift diagnostics and helper-level profiling for protected381 targets
- refactor function signature specs and optimize folderBinOps fast paths
- implement the full SCAS family and document VMP 3.6 INT dispatcher findings
- add reproducible profiling scripts and root VMP testing notes