mirror of
https://github.com/NaC-L/Mergen.git
synced 2026-05-12 09:40:34 +00:00
main
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
605a36e8ed |
lifter: correctness fixes, refactors, and regression tests (#205)
* lifter: restore indirect-jump threshold to 128
* gitignore: glob output_*.ll instead of enumerating dumps
Replace output_finalnoopt.ll / output_no_opts.ll entries with
output_*.ll so ad-hoc lifter dumps (output_rets.ll, output_newpath.ll,
etc.) stop showing up in git status.
* lifter: factor REAL_return path through emitResolvedFunctionReturn
Pull the rax-zext + CreateRet + run/finished bookkeeping out of the
REAL_return branch in lift_ret() into a local lambda so future ret
exit points can reuse it without duplicating four lines of
boilerplate.
Drop the dead returnStruct/myStruct scaffolding and the
originalFunc_finalnopt local: every InsertValue call site has been
commented out for a long time and the locals had no remaining uses.
The active code emits a plain rax return.
No behavior change.
* lifter: advance RSP past continuation slot in ret-to-IAT chain
In the chained import-return pattern (`ret` to IAT slot, IAT slot
holds an external function address, the function returns and control
resumes at the next stack slot's continuation address), the lifter
collapses the two pops into a single `call @import; br contBB`. RSP
was only advanced past the IAT slot itself, so post-call register
state still claimed RSP pointed at the continuation address. Any
downstream stack read from RSP saw stale data and any solver that
constant-folded RSP picked up a value that no longer matched the
post-chain physical layout.
Bump RSP by another `ptrSize` immediately before lowering the
import call so the continuation block inherits the same RSP it would
have under a faithful two-pop lowering.
* lifter/test: regression test for ret-to-IAT chain RSP advancement
Locks in dd95fe7. The microtest stands up a LifterUnderTest, plants
[importVA, contVA] on the stack at an RSP that is intentionally NOT
equal to STACKP_VALUE (so the lift_ret REAL_return short-circuit does
not fire), registers the import in the lifter's importMap, and lifts
a single `ret` (0xC3).
It then asserts that:
- the chain handler emitted a direct call to the registered import
- RSP after the chain equals entry RSP + 16, not + 8
Without the fix the test fails with RSP = entry + 8 (only the IAT
slot pop is modeled), exactly the off-by-8 the fix closes.
Verified the test catches the regression by reverting dd95fe7
locally before re-applying — the failing message reads
"RSP after chain = 0x14FDA8; expected 0x14fdb0".
* scripts/themida: filter lifter-synthesized helpers from import diff
Calls to lifter-emitted helpers (`@exception`, `@fastfail`,
`@not_implemented`, etc.) surfaced as 'extra import (not required)'
lines on every Themida equivalence run. They are not user imports;
they are lowered from INT1/INT3/UD2/INT29/SYSCALL/segment-load
sites in the lifter's own semantics files.
Skip them in `_extract_call_names` so the equivalence diff shows
only real imports. The list of helpers lives next to the call regex
so it stays adjacent to the code that emits them; if a new helper
shows up in the IR (e.g. another illegal-instruction lowering) the
script will surface it as an 'extra import' until the entry is added
here, which is the right tripwire.
Before: example2 \xe2\x80\x94 6 distinct imports, 10 calls (3 noise calls)
After: example2 \xe2\x80\x94 4 distinct imports, 7 calls (clean)
* lifter/analysis: replace 'TODO: fix?' marker with positive explanation
The 2-value path-solving fork's swap branch had a 'TODO: fix?'
comment from the original draft. Traced both branches and confirmed
the swap is correct:
- When the select's trueValue equals firstcase, condition is the
select's condition as-is and firstcase\xe2\x86\x92bb_true wires correctly.
- When trueValue equals secondcase, condition still expresses 'true
picks trueValue' but downstream code uses firstcase\xe2\x86\x92bb_true.
Swapping firstcase\xe2\x86\x94secondcase makes firstcase refer to the trueVal
constant so the existing CreateCondBr wiring stays correct without
a parallel reversed-branch path.
Replaced the TODO with a comment that explains why the swap is
necessary, so future readers do not waste time investigating a
branch that is intentional.
* lifter: accept Register64/Memory64 source for punpcklqdq
Iced classifies operand types by the bytes the instruction actually
accesses, not by physical register width. PUNPCKLQDQ only reads the
low 64 bits of its second operand, so Iced reports Register64 (or
Memory64 for the m128 form) for a source whose physical encoding is
`xmm/m128`. The lift handler's accept check rejected anything other
than Register128/Memory128 and fell through to the not_implemented
exit, so every `punpcklqdq xmm, xmm/m128` site lowered to a bogus
`call @not_implemented; ret` instead of the unpack semantic.
Widen the accept set to Register64 and Memory64 too. The body
already truncates the source to i64 before OR'ing it into the high
half of the result, so a 64-bit-typed source is semantically
identical to a 128-bit one for this handler.
Fixes the two pre-existing oracle test failures
`punpcklqdq_xmm0_xmm1_basic` and
`punpcklqdq_xmm0_xmm1_zero_upper_from_zero_source`. `python test.py
all` stays at 244/244, confirming no semantic regressions.
* lifter: replace lift_jmp's fallthrough switch with an isDirectJump if
The RIP-relative add for direct jumps lived inside a 4-case switch
whose body intentionally fell through into `default: break;`. It
worked, but:
- Implicit fallthrough is a -Wimplicit-fallthrough hazard. Today the
default does nothing; tomorrow someone adds a body and every direct
jump silently runs it.
- The switch's discriminator is exactly `isDirectJump`, which is
already computed two lines above for the path-solver context. The
switch was a parallel restatement of the same predicate.
Collapse the switch into `if (isDirectJump) { trunc = add(trunc,
ripval); }` so the predicate has one definition and there is no
fallthrough to misuse. Behavior unchanged: the same immediate cases
still get the RIP-relative bump, indirect jumps still skip it, and
`python test.py all` stays at 244/244.
* lifter/test: regression test for SSE memory-form handler dispatch
Lock in that pand/por/pxor accept the `xmm, [mem]` encoding form. The
test lifts `66 0F DB 00`, `66 0F EB 00`, and `66 0F EF 00` (one
`xmm0, [rax]` site each) and asserts that the lifted function does
not contain a direct call to @not_implemented.
Pure structural acceptance: not validating bitwise-AND/OR/XOR
semantics, only that the handler dispatched at all. Iced today
reports Memory128 for these encodings so the test passes against the
existing `Register128 || Memory128` accept sets. If a future Iced
update reclassifies the source operand by bytes-actually-accessed
(the way it already does for punpcklqdq, where it reports
Register64/Memory64 even for an `xmm/m128` encoding) the handler
would silently fall through to `call @not_implemented; ret` and
miscompile every memory-form site \u2014 this test trips first.
* lifter: drop duplicate stdout print on unresolved indirect jmp
`lift_jmp` printed every UnresolvedIndirectJump twice: once as a raw
`std::cout << "[diag] lift_jmp: ..."` and once through
`diagnostics.warning(...)` on the very next line. The diagnostics
framework already persists the warning to `output_diagnostics.json`
at lift completion, and no script or test grep'd the stdout form.
Drop the std::cout. The diagnostic remains in the recorded diagnostics
list, surfaceable via the JSON dump or the in-memory entries vector.
This removes the only unguarded raw `[diag]` print in the lift path
-- the rest are gated on `liftProgressDiagEnabled` or specific hot
addresses for active debugging.
* scripts/themida: fix docstring escape leak in import-filter doc
Audit of #205 caught a literal `\\u2014` and unnecessary
`\\"` escapes in the `_extract_call_names` docstring \xe2\x80\x94 leftovers
from how the surrounding commit (#205, scripts/themida: filter
lifter-synthesized helpers) was authored. Replace the literal
escape with a plain `--` and drop the redundant backslash-quotes;
the docstring now renders cleanly at `help(_extract_call_names)`
and looks normal in the source.
Behavior unchanged: `python test.py themida` still passes with
the same import-diff filter (4 imports, 7 calls for example2).
---------
Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
|
||
|
|
c8102a69cf |
themida: correctness gate, diagnostic tracer, ret-to-IAT recognition, gen revisit knob (#182)
* tests: add Themida devirtualization import-equivalence check
Adds python test.py themida that lifts every sample in
scripts/rewrite/themida_samples.json and asserts the resulting IR calls
every import declared in required_imports. Names are pinned against
a lift of the non-virtualized reference binary via --update.
This is a correctness gate that complements the existing
coverage gate ('2544 instructions, 0 errors'). Currently red on
example2-virt.bin: the lifter unrolls the VM without surfacing
GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the
guest program. That gap is the active devirtualization frontier;
this test makes it visible instead of silently green.
Samples whose binaries are absent (`../testthemida/*.bin` lives
outside the repo) are skipped rather than failed, so the check
runs cleanly in CI without the binaries present.
* diag: add Unicorn-based external-call tracer; document Themida transform
Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn,
patches every IAT slot with a unique unmapped-address sentinel, then
emulates from the chosen entry. When any call/jmp/ret resolves its
target to a sentinel, logs the call-site address, the mnemonic, and
the addressing form. One-shot diagnostic for answering 'what x86
instruction issues this external call at runtime.'
Using it on example2-virt.bin shows the Themida transform precisely:
- guest imports (GetStdHandle etc) remain in the IAT
- every guest call site is rewritten from 'call [rip+IAT]' to a
VM-staged 'push target; ret' where target was loaded from the
IAT upstream
- for example2, the first external call happens at VA 0x14017fa77
via 'ret 0', popping the GetStdHandle IAT value off the stack
- Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503)
from the IAT; our ignore_imports filter already accounts for this
The lifter's current recognition handles direct call-through-IAT
and register-indirect IAT calls (the non-virt binary resolves 5
imports cleanly). It does not recognize the ret-pops-IAT-loaded-
pointer pattern, which is why the virt lift surfaces zero imports.
Also annotates themida_samples.json with these properties inline
so the transform semantics live next to the test that exercises
them.
* diag: trace_external_calls can dump visited PCs and record sentinel push chain
Two additions, both motivated by the example2-virt.bin diagnosis session:
- --dump-visited <path>: writes every unique instruction PC the emulator
executes, in first-visit order. Diff against the lifter's 'reached
addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the
lifter's static exploration diverges from the dynamic path.
- UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a
sentinel. Records every such write, not just the first, because Themida
uses push-pop swap gadgets that stage a sentinel on the stack
transiently before the 'real' push lands it at the ret-target slot.
The last-5-pushes summary exposes this.
Findings for example2-virt.bin @ 0x140001000:
- lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%)
- external call site is at emu_pos=4209; gap of 2911 unvisited PCs
- lifter visits 5 addresses the runtime never takes (wrong concolic branch)
- the 'final push to ret slot' is not a 'push [iat]' but rather
'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged
stack slot in place to reconstruct the IAT pointer. Pattern-match
recognition alone cannot handle this; concrete VM-dispatch unrolling
is required.
* diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization
Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader.
When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing
pure concrete exploration with no phi-widening abstraction.
Diagnostic knob, not a user-facing feature. Used to localise how much
of a lift's coverage depends on generalization vs. the concolic engine.
Measurement on example2-virt.bin @ 0x140001000:
gen ON gen OFF (NO_LOOP_GEN=1)
blocks_attempted 56 2642 (47x)
instructions_lifted 2544 34229 (13.5x)
output_no_opts.ll lines 6022 30481 (5x)
unique addrs visited 34 338 (10x)
addrs in 0x14017xxxx 0 103 (call-handler cluster)
external call site reached: no yes (via BB 0x14017fa72)
themida equivalence test: red red (recognition still gap)
Loop-generalization is the dominant reachability blocker on Themida
VM dispatchers at current tuning. Pure concrete exploration reaches
the external-call handler block but does not emit named import calls
because lift_ret has no path to match a resolved ret target against
importMap. Recognition is the next fix surface; reachability is large
mostly because of generalization tuning.
Side-effects of gen OFF that are NOT acceptable in production:
- Lifter decodes .rdata IAT bytes as instructions (OUTSD error at
0x140002688 on this sample)
- Top-revisited addresses hit ~1142x each: the lifter spins in
tight loops without generalization cutting them off; block budget
(4096) would fire eventually on a larger sample
So the knob is purely diagnostic. The real production fix is selective
generalization (distinguish 'VM dispatcher' from 'guest loop') plus
lift_ret import recognition.
* lifter: recognize ret-to-IAT as named external call in lift_ret
Adds a recognition path in lift_ret: if the value being popped resolves
to a concrete address that's in importMap, emit callFunctionIR for the
named import, then simulate the external's own ret by popping one more
qword (the continuation address pre-staged by the caller). solvePath
then continues at the continuation instead of trying to lift the IAT
pointer as code.
Two resolution routes:
1. realval is a ConstantInt (direct push+ret of an IAT load)
2. realval is symbolic but computePossibleValues folds to a single
concrete value (obfuscated chains that constant-fold at this path)
Scope limits:
- Non-virt example2.bin lift is unchanged (still resolves 5 imports
via register-indirect path; the new ret path does not fire because
the binary uses 'call [iat]', not 'push+ret').
- Virt example2-virt.bin lift: the recognition code runs but does not
surface imports because the lifter's static resolution of the
arithmetic-decrypt chain produces wrong concrete targets. E.g. the
ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via
computePossibleValues; at runtime the emulator sees it pop the
GetStdHandle IAT pointer (0x140002490). The recognition logic is
correct; the upstream data flow is lying. Fixing that requires
selective-generalization tuning or concrete VM unrolling, tracked
separately.
So β lands as ground work for simpler push+ret thunks and for future
work where state-propagation fidelity improves. It is not a Themida
fix on its own.
* lifter: gate canGeneralize on per-header revisit count
Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader:
below threshold N the gate rejects (concrete exploration continues);
at or above N it falls through to the existing loop-shape checks.
Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches
pre-existing behaviour).
Also promotes ++liftAttemptCounts[addr] out from under the
liftProgressDiagEnabled gate so the counter is always maintained.
Rationale: on Themida example2-virt.bin @ 0x140001000, the existing
gate (always-generalize on first qualifying revisit) abstracts the
VM's dispatch loop too early, cutting reachability to ~30% of the
dynamic execution path. A higher threshold lets the dispatcher run
concretely for more iterations before abstracting. Measurement (all
other settings at defaults):
T=0 (current) blocks= 56 insns= 2544 err=0 warn=0
T=4 blocks= 88 insns= 3842 err=0 warn=4
T=16 blocks= 393 insns= 11747 err=1 warn=0
T=32 blocks= 425 insns= 12067 err=1 warn=0
T=128 blocks= 617 insns= 13987 err=1 warn=0
MERGEN_NO_LOOP_GEN=1 (kill)
blocks= 2642 insns= 34229 err=1 warn=0
Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation
partway through lifting. The crash fires in the Themida dispatcher
state machinery around 0x1400237F9 when generalization fires mid-
iteration with state that the existing machinery is not prepared to
handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable.
So the knob is landing as experimental infrastructure with default=0
(no-op). Future work can pair a safe non-zero default with a fix for
the dispatcher-state crash.
---------
Co-authored-by: Claude <claude@anthropic.com>
|
||
|
|
3384786a70 |
lifter: support multi-way backedges with N-way generalized-loop phi construction (#123)
branch_backup(bb, /*generalized=*/true) previously overwrote a single backup_point per header in generalizedLoopBackedgeBackup[bb]. A loop header reached from three or more backedges silently lost every snapshot except the most recent, and the load_generalized_backup phi was always 2-incoming (canonical + last-seen backedge). PR #121 pinned this as a KNOWN-LIMITATION microtest. This commit widens the machinery end-to-end to 1 canonical + N backedges. Storage and state: - generalizedLoopBackedgeBackup is now DenseMap<BB*, SmallVector<backup_point, 2>>. branch_backup_impl appends, deduplicated by sourceBlock (repeat call from the same source replaces its entry in place). - GeneralizedLoopControlFieldState.backedgeSource/Control/Buffer become parallel SmallVectors sized N per header. Phi construction: - make_generalized_loop_backup takes ArrayRef<backup_point> sources. Its mergeValue lambda constructs (1 + N)-incoming phis, one incoming per distinct backedge sourceBlock, with canonicalSource first. Sources duplicating canonicalSource are filtered. The N=1 path produces the same 2-incoming phi as before (determinism gate: 42/42 golden hashes match). - retrieve_generalized_loop_control_slot_value_impl, retrieve_generalized_loop_target_slot_value_impl, and retrieve_generalized_loop_control_field_value_impl each emit (1 + N)-incoming phis from state.backedgeSources/Controls/Buffers. - retrieve_generalized_loop_phi_address_value_impl and retrieve_generalized_loop_local_phi_address_value_impl relax their 'phi->getNumIncomingValues() != 2' sanity check to accept any phi with >= 2 incomings, and match each incoming against canonicalSource or any of state->backedgeSources[i]. load_generalized_backup_impl: - Collects backedges whose sourceBlock differs from canonical AND whose controlCursor value differs from canonical; activates state only if at least one such backedge exists. - seedInvariantLocalQwords requires the qword to read identically from canonicalBuffer AND every backedgeBuffer to qualify. record_generalized_loop_backedge_impl: - The rolled-control promotion (move current backedge into canonical, install new source as backedge) is only well-defined for the 1-backedge case, so it now guards on backedgeSources.size() == 1 and becomes a no-op for multi-way. Extending the rolled-control semantics to multi-way loops is left as follow-up when a real sample exercises it. Tests (Tester.hpp): - runGeneralizedLoopThirdBackedgeOverwritesPriorBackedgeSilently flipped and renamed to runGeneralizedLoopThirdBackedgePreservesAllThreeSnapshots: asserts three-backedge vector holds one entry per sourceBlock. - runGeneralizedLoopLoadBackupWithThreeBackedgesProducesTwoWayPhiOnly flipped and renamed to runGeneralizedLoopLoadBackupWithThreeBackedgesProducesFourWayPhi: asserts GetMemoryValue(controlSlot) at the header yields a 4-incoming phi carrying canonical + all three backedge control values. Docs (docs/LOOP_HANDLING.md): - Struct and mergeValue snippets updated to N-way shapes. - branch_backup state-transition row describes append+dedup. - Multi-way backedge row removed from Known limitations. Verification: - python test.py micro: all pass, including the two flipped tests. - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match - 2-way loop IR shape unchanged). - Themida reference sample (../testthemida/example2-virt.bin @ 0x140001000): 2544 instructions lifted, 0 warnings, 0 errors. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
d0a9d7fc9d |
lifter: remove resolveTargetedThemidaR9 - obsoleted by generalized-loop phi infrastructure (#120)
resolveTargetedThemidaR9 was added to recover the controlCursor identity of R9 at three hardcoded Themida instruction addresses where the symbolic pipeline had lost provenance. PR #112 (generalized-loop control-field / slot phi infrastructure) since landed retrieve_generalized_loop_control_* helpers that produce the correct phi shape through the normal GetMemoryValue path. The R9 override is now dead code: it overwrites a correct value with another correct value at three sites that the upstream pipeline already handles. Empirical bisect on the reference Themida sample (../testthemida/example2-virt.bin @ 0x140001000) confirmed: - site 0x140023671 disabled alone: 2544 lifted, 0 warn, 0 err - site 0x14002368D disabled alone: 2544 lifted, 0 warn, 0 err - site 0x140023741 disabled alone: 2544 lifted, 0 warn, 0 err - all three disabled simultaneously: 2544 lifted, 0 warn, 0 err - baseline (override active): 2544 lifted, 0 warn, 0 err The MERGEN_DIAG_LIFT_PROGRESS=1 trace at site 0x14002368D shows R9 is already `add i64 %generalized_phi_load, 10` before the override fires - the generalized-loop machinery produced the correct phi independently. Removed: - resolveTargetedThemidaR9() in lifter/core/LifterClass_Concolic.hpp - R9 special-case branch + session-scaffolding diag block in GetRegisterValue_impl (now just `return get_impl(key)`) - Three microtests in lifter/test/Tester.hpp: runTargetedThemidaR9OverrideProducesPhi runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress runTargetedThemidaR9OverrideFallsThroughWithoutLoopState - Their three runCustom() registrations - Override row in helper table, hardcoded-address subsection, and limitations row in docs/LOOP_HANDLING.md Retained: kThemidaControlCursorSlot, kThemidaLoopCarriedSlot, and kSupportedGeneralizedControlFieldOffsets - still consumed by the generalized-loop control-field/slot retrieve_* helpers. Verified: - python test.py micro: all instruction microtests passed - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida sample: 2544 instructions lifted, 0 warnings, 0 errors Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
8d101dcc5a |
lifter: fix Cyrillic homoglyph in resolveTargetedThemidaR9 identifier (#119)
The identifier 'resolveTargetedThemid\u0430R9' (declared in LifterClass_Concolic.hpp) contained U+0430 (Cyrillic small letter a) instead of U+0061 (Latin a) between 'Themid' and 'R9'. Every in-tree reference mirrored the Cyrillic form, but prose mentions and merge titles (e.g. PR #115 title) used ASCII, so an ASCII grep for 'resolveTargetedThemidaR9' returned zero hits. This was a silent discoverability hazard for future sessions and grep-based tooling. Rename to pure ASCII across the single declaration, the single caller in getLatestValueForKey, the six test entry points in lifter/test/Tester.hpp, and the four references in docs/LOOP_HANDLING.md. No behavior change. Verified: - python test.py micro: all instruction microtests passed (including the three targeted_themida_r9_override_* cases) - Themida reference sample (../testthemida/example2-virt.bin @ 0x140001000): 2544 instructions lifted, 0 warnings, 0 errors Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |
||
|
|
c6e4c33627 |
docs: add LOOP_HANDLING.md reference for loop detection, generalization, and phi consumption (#116)
Captures the three-phase architecture (detect/generalize/consume), the path-solve context gating table, the GeneralizedLoopControlFieldState layout, mergeValue's widenFirstBackedge contract, the full set of retrieve_generalized_loop_* helpers, and the hardcoded reference-sample addresses (kThemidaControlCursorSlot, the three resolveTargetedThemidаR9 instruction addresses with fire-counts on the reference binary). Documents known limitations at the bottom: REP SCAS, VMP 3.6 INT 2 dispatcher, the reference-sample hardcodes, unrolling/LICM, multi-way backedges. Flags that SCOPE.md's 'loop-header generalization temporarily disabled' entry appears to be stale: the code gates generalization on path-solve context (ConditionalBranch / DirectJump / resolved IndirectJump) rather than disabling it wholesale. Not changed in this PR; maintainer decision. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com> |