Mergen

mirror of https://github.com/NaC-L/Mergen.git synced 2026-05-12 09:40:34 +00:00

Author	SHA1	Message	Date
naci	605a36e8ed	lifter: correctness fixes, refactors, and regression tests (#205 ) * lifter: restore indirect-jump threshold to 128 * gitignore: glob output_.ll instead of enumerating dumps Replace output_finalnoopt.ll / output_no_opts.ll entries with output_.ll so ad-hoc lifter dumps (output_rets.ll, output_newpath.ll, etc.) stop showing up in git status. * lifter: factor REAL_return path through emitResolvedFunctionReturn Pull the rax-zext + CreateRet + run/finished bookkeeping out of the REAL_return branch in lift_ret() into a local lambda so future ret exit points can reuse it without duplicating four lines of boilerplate. Drop the dead returnStruct/myStruct scaffolding and the originalFunc_finalnopt local: every InsertValue call site has been commented out for a long time and the locals had no remaining uses. The active code emits a plain rax return. No behavior change. * lifter: advance RSP past continuation slot in ret-to-IAT chain In the chained import-return pattern (`ret` to IAT slot, IAT slot holds an external function address, the function returns and control resumes at the next stack slot's continuation address), the lifter collapses the two pops into a single `call @import; br contBB`. RSP was only advanced past the IAT slot itself, so post-call register state still claimed RSP pointed at the continuation address. Any downstream stack read from RSP saw stale data and any solver that constant-folded RSP picked up a value that no longer matched the post-chain physical layout. Bump RSP by another `ptrSize` immediately before lowering the import call so the continuation block inherits the same RSP it would have under a faithful two-pop lowering. * lifter/test: regression test for ret-to-IAT chain RSP advancement Locks in dd95fe7. The microtest stands up a LifterUnderTest, plants [importVA, contVA] on the stack at an RSP that is intentionally NOT equal to STACKP_VALUE (so the lift_ret REAL_return short-circuit does not fire), registers the import in the lifter's importMap, and lifts a single `ret` (0xC3). It then asserts that: - the chain handler emitted a direct call to the registered import - RSP after the chain equals entry RSP + 16, not + 8 Without the fix the test fails with RSP = entry + 8 (only the IAT slot pop is modeled), exactly the off-by-8 the fix closes. Verified the test catches the regression by reverting dd95fe7 locally before re-applying — the failing message reads "RSP after chain = 0x14FDA8; expected 0x14fdb0". * scripts/themida: filter lifter-synthesized helpers from import diff Calls to lifter-emitted helpers (`@exception`, `@fastfail`, `@not_implemented`, etc.) surfaced as 'extra import (not required)' lines on every Themida equivalence run. They are not user imports; they are lowered from INT1/INT3/UD2/INT29/SYSCALL/segment-load sites in the lifter's own semantics files. Skip them in `_extract_call_names` so the equivalence diff shows only real imports. The list of helpers lives next to the call regex so it stays adjacent to the code that emits them; if a new helper shows up in the IR (e.g. another illegal-instruction lowering) the script will surface it as an 'extra import' until the entry is added here, which is the right tripwire. Before: example2 \xe2\x80\x94 6 distinct imports, 10 calls (3 noise calls) After: example2 \xe2\x80\x94 4 distinct imports, 7 calls (clean) * lifter/analysis: replace 'TODO: fix?' marker with positive explanation The 2-value path-solving fork's swap branch had a 'TODO: fix?' comment from the original draft. Traced both branches and confirmed the swap is correct: - When the select's trueValue equals firstcase, condition is the select's condition as-is and firstcase\xe2\x86\x92bb_true wires correctly. - When trueValue equals secondcase, condition still expresses 'true picks trueValue' but downstream code uses firstcase\xe2\x86\x92bb_true. Swapping firstcase\xe2\x86\x94secondcase makes firstcase refer to the trueVal constant so the existing CreateCondBr wiring stays correct without a parallel reversed-branch path. Replaced the TODO with a comment that explains why the swap is necessary, so future readers do not waste time investigating a branch that is intentional. * lifter: accept Register64/Memory64 source for punpcklqdq Iced classifies operand types by the bytes the instruction actually accesses, not by physical register width. PUNPCKLQDQ only reads the low 64 bits of its second operand, so Iced reports Register64 (or Memory64 for the m128 form) for a source whose physical encoding is `xmm/m128`. The lift handler's accept check rejected anything other than Register128/Memory128 and fell through to the not_implemented exit, so every `punpcklqdq xmm, xmm/m128` site lowered to a bogus `call @not_implemented; ret` instead of the unpack semantic. Widen the accept set to Register64 and Memory64 too. The body already truncates the source to i64 before OR'ing it into the high half of the result, so a 64-bit-typed source is semantically identical to a 128-bit one for this handler. Fixes the two pre-existing oracle test failures `punpcklqdq_xmm0_xmm1_basic` and `punpcklqdq_xmm0_xmm1_zero_upper_from_zero_source`. `python test.py all` stays at 244/244, confirming no semantic regressions. * lifter: replace lift_jmp's fallthrough switch with an isDirectJump if The RIP-relative add for direct jumps lived inside a 4-case switch whose body intentionally fell through into `default: break;`. It worked, but: - Implicit fallthrough is a -Wimplicit-fallthrough hazard. Today the default does nothing; tomorrow someone adds a body and every direct jump silently runs it. - The switch's discriminator is exactly `isDirectJump`, which is already computed two lines above for the path-solver context. The switch was a parallel restatement of the same predicate. Collapse the switch into `if (isDirectJump) { trunc = add(trunc, ripval); }` so the predicate has one definition and there is no fallthrough to misuse. Behavior unchanged: the same immediate cases still get the RIP-relative bump, indirect jumps still skip it, and `python test.py all` stays at 244/244. * lifter/test: regression test for SSE memory-form handler dispatch Lock in that pand/por/pxor accept the `xmm, [mem]` encoding form. The test lifts `66 0F DB 00`, `66 0F EB 00`, and `66 0F EF 00` (one `xmm0, [rax]` site each) and asserts that the lifted function does not contain a direct call to @not_implemented. Pure structural acceptance: not validating bitwise-AND/OR/XOR semantics, only that the handler dispatched at all. Iced today reports Memory128 for these encodings so the test passes against the existing `Register128 \|\| Memory128` accept sets. If a future Iced update reclassifies the source operand by bytes-actually-accessed (the way it already does for punpcklqdq, where it reports Register64/Memory64 even for an `xmm/m128` encoding) the handler would silently fall through to `call @not_implemented; ret` and miscompile every memory-form site \u2014 this test trips first. * lifter: drop duplicate stdout print on unresolved indirect jmp `lift_jmp` printed every UnresolvedIndirectJump twice: once as a raw `std::cout << "[diag] lift_jmp: ..."` and once through `diagnostics.warning(...)` on the very next line. The diagnostics framework already persists the warning to `output_diagnostics.json` at lift completion, and no script or test grep'd the stdout form. Drop the std::cout. The diagnostic remains in the recorded diagnostics list, surfaceable via the JSON dump or the in-memory entries vector. This removes the only unguarded raw `[diag]` print in the lift path -- the rest are gated on `liftProgressDiagEnabled` or specific hot addresses for active debugging. * scripts/themida: fix docstring escape leak in import-filter doc Audit of #205 caught a literal `\\u2014` and unnecessary `\\"` escapes in the `_extract_call_names` docstring \xe2\x80\x94 leftovers from how the surrounding commit (#205, scripts/themida: filter lifter-synthesized helpers) was authored. Replace the literal escape with a plain `--` and drop the redundant backslash-quotes; the docstring now renders cleanly at `help(_extract_call_names)` and looks normal in the source. Behavior unchanged: `python test.py themida` still passes with the same import-diff filter (4 imports, 7 calls for example2). --------- Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-05-02 11:58:47 +03:00
naci	c8102a69cf	themida: correctness gate, diagnostic tracer, ret-to-IAT recognition, gen revisit knob (#182 ) * tests: add Themida devirtualization import-equivalence check Adds python test.py themida that lifts every sample in scripts/rewrite/themida_samples.json and asserts the resulting IR calls every import declared in required_imports. Names are pinned against a lift of the non-virtualized reference binary via --update. This is a correctness gate that complements the existing coverage gate ('2544 instructions, 0 errors'). Currently red on example2-virt.bin: the lifter unrolls the VM without surfacing GetStdHandle / WriteConsoleA / ReadConsoleA / CharUpperA from the guest program. That gap is the active devirtualization frontier; this test makes it visible instead of silently green. Samples whose binaries are absent (`../testthemida/.bin` lives outside the repo) are skipped rather than failed, so the check runs cleanly in CI without the binaries present. diag: add Unicorn-based external-call tracer; document Themida transform Adds scripts/dev/trace_external_calls.py: loads a PE into Unicorn, patches every IAT slot with a unique unmapped-address sentinel, then emulates from the chosen entry. When any call/jmp/ret resolves its target to a sentinel, logs the call-site address, the mnemonic, and the addressing form. One-shot diagnostic for answering 'what x86 instruction issues this external call at runtime.' Using it on example2-virt.bin shows the Themida transform precisely: - guest imports (GetStdHandle etc) remain in the IAT - every guest call site is rewritten from 'call [rip+IAT]' to a VM-staged 'push target; ret' where target was loaded from the IAT upstream - for example2, the first external call happens at VA 0x14017fa77 via 'ret 0', popping the GetStdHandle IAT value off the stack - Themida strips its own SDK markers (VirtualizerSDK64.dll#103/#503) from the IAT; our ignore_imports filter already accounts for this The lifter's current recognition handles direct call-through-IAT and register-indirect IAT calls (the non-virt binary resolves 5 imports cleanly). It does not recognize the ret-pops-IAT-loaded- pointer pattern, which is why the virt lift surfaces zero imports. Also annotates themida_samples.json with these properties inline so the transform semantics live next to the test that exercises them. * diag: trace_external_calls can dump visited PCs and record sentinel push chain Two additions, both motivated by the example2-virt.bin diagnosis session: - --dump-visited <path>: writes every unique instruction PC the emulator executes, in first-visit order. Diff against the lifter's 'reached addresses' trace (MERGEN_DIAG_LIFT_PROGRESS=1) to localise where the lifter's static exploration diverges from the dynamic path. - UC_HOOK_MEM_WRITE for stack-addressed 8-byte writes whose payload is a sentinel. Records every such write, not just the first, because Themida uses push-pop swap gadgets that stage a sentinel on the stack transiently before the 'real' push lands it at the ret-target slot. The last-5-pushes summary exposes this. Findings for example2-virt.bin @ 0x140001000: - lifter covers emu_pos=0..1298 out of 4210 unique PCs (~30%) - external call site is at emu_pos=4209; gap of 2911 unvisited PCs - lifter visits 5 addresses the runtime never takes (wrong concolic branch) - the 'final push to ret slot' is not a 'push [iat]' but rather 'sub qword ptr [r14], <const>' — the VM decrypts a pre-staged stack slot in place to reconstruct the IAT pointer. Pattern-match recognition alone cannot handle this; concrete VM-dispatch unrolling is required. * diag: add MERGEN_NO_LOOP_GEN env gate for loop-generalization Adds an env-var toggle at the top of canGeneralizeStructuredLoopHeader. When MERGEN_NO_LOOP_GEN=1, the gate rejects every header, forcing pure concrete exploration with no phi-widening abstraction. Diagnostic knob, not a user-facing feature. Used to localise how much of a lift's coverage depends on generalization vs. the concolic engine. Measurement on example2-virt.bin @ 0x140001000: gen ON gen OFF (NO_LOOP_GEN=1) blocks_attempted 56 2642 (47x) instructions_lifted 2544 34229 (13.5x) output_no_opts.ll lines 6022 30481 (5x) unique addrs visited 34 338 (10x) addrs in 0x14017xxxx 0 103 (call-handler cluster) external call site reached: no yes (via BB 0x14017fa72) themida equivalence test: red red (recognition still gap) Loop-generalization is the dominant reachability blocker on Themida VM dispatchers at current tuning. Pure concrete exploration reaches the external-call handler block but does not emit named import calls because lift_ret has no path to match a resolved ret target against importMap. Recognition is the next fix surface; reachability is large mostly because of generalization tuning. Side-effects of gen OFF that are NOT acceptable in production: - Lifter decodes .rdata IAT bytes as instructions (OUTSD error at 0x140002688 on this sample) - Top-revisited addresses hit ~1142x each: the lifter spins in tight loops without generalization cutting them off; block budget (4096) would fire eventually on a larger sample So the knob is purely diagnostic. The real production fix is selective generalization (distinguish 'VM dispatcher' from 'guest loop') plus lift_ret import recognition. * lifter: recognize ret-to-IAT as named external call in lift_ret Adds a recognition path in lift_ret: if the value being popped resolves to a concrete address that's in importMap, emit callFunctionIR for the named import, then simulate the external's own ret by popping one more qword (the continuation address pre-staged by the caller). solvePath then continues at the continuation instead of trying to lift the IAT pointer as code. Two resolution routes: 1. realval is a ConstantInt (direct push+ret of an IAT load) 2. realval is symbolic but computePossibleValues folds to a single concrete value (obfuscated chains that constant-fold at this path) Scope limits: - Non-virt example2.bin lift is unchanged (still resolves 5 imports via register-indirect path; the new ret path does not fire because the binary uses 'call [iat]', not 'push+ret'). - Virt example2-virt.bin lift: the recognition code runs but does not surface imports because the lifter's static resolution of the arithmetic-decrypt chain produces wrong concrete targets. E.g. the ret at 0x14017fa77 resolves to 0x140002628 (somewhere in .rdata) via computePossibleValues; at runtime the emulator sees it pop the GetStdHandle IAT pointer (0x140002490). The recognition logic is correct; the upstream data flow is lying. Fixing that requires selective-generalization tuning or concrete VM unrolling, tracked separately. So β lands as ground work for simpler push+ret thunks and for future work where state-propagation fidelity improves. It is not a Themida fix on its own. * lifter: gate canGeneralize on per-header revisit count Adds a revisit-count threshold to canGeneralizeStructuredLoopHeader: below threshold N the gate rejects (concrete exploration continues); at or above N it falls through to the existing loop-shape checks. Tunable via MERGEN_GEN_MIN_REVISITS; default is 0 (inert, matches pre-existing behaviour). Also promotes ++liftAttemptCounts[addr] out from under the liftProgressDiagEnabled gate so the counter is always maintained. Rationale: on Themida example2-virt.bin @ 0x140001000, the existing gate (always-generalize on first qualifying revisit) abstracts the VM's dispatch loop too early, cutting reachability to ~30% of the dynamic execution path. A higher threshold lets the dispatcher run concretely for more iterations before abstracting. Measurement (all other settings at defaults): T=0 (current) blocks= 56 insns= 2544 err=0 warn=0 T=4 blocks= 88 insns= 3842 err=0 warn=4 T=16 blocks= 393 insns= 11747 err=1 warn=0 T=32 blocks= 425 insns= 12067 err=1 warn=0 T=128 blocks= 617 insns= 13987 err=1 warn=0 MERGEN_NO_LOOP_GEN=1 (kill) blocks= 2642 insns= 34229 err=1 warn=0 Caveat: at T=6, T=8, T=12 the lifter crashes with an access violation partway through lifting. The crash fires in the Themida dispatcher state machinery around 0x1400237F9 when generalization fires mid- iteration with state that the existing machinery is not prepared to handle. Other nearby T values (T=5, 7, 9, 10, 11, 13-19) are stable. So the knob is landing as experimental infrastructure with default=0 (no-op). Future work can pair a safe non-zero default with a fix for the dispatcher-state crash. --------- Co-authored-by: Claude <claude@anthropic.com>	2026-04-24 14:54:22 +03:00
naci	3384786a70	lifter: support multi-way backedges with N-way generalized-loop phi construction (#123 ) branch_backup(bb, /generalized=/true) previously overwrote a single backup_point per header in generalizedLoopBackedgeBackup[bb]. A loop header reached from three or more backedges silently lost every snapshot except the most recent, and the load_generalized_backup phi was always 2-incoming (canonical + last-seen backedge). PR #121 pinned this as a KNOWN-LIMITATION microtest. This commit widens the machinery end-to-end to 1 canonical + N backedges. Storage and state: - generalizedLoopBackedgeBackup is now DenseMap<BB*, SmallVector<backup_point, 2>>. branch_backup_impl appends, deduplicated by sourceBlock (repeat call from the same source replaces its entry in place). - GeneralizedLoopControlFieldState.backedgeSource/Control/Buffer become parallel SmallVectors sized N per header. Phi construction: - make_generalized_loop_backup takes ArrayRef<backup_point> sources. Its mergeValue lambda constructs (1 + N)-incoming phis, one incoming per distinct backedge sourceBlock, with canonicalSource first. Sources duplicating canonicalSource are filtered. The N=1 path produces the same 2-incoming phi as before (determinism gate: 42/42 golden hashes match). - retrieve_generalized_loop_control_slot_value_impl, retrieve_generalized_loop_target_slot_value_impl, and retrieve_generalized_loop_control_field_value_impl each emit (1 + N)-incoming phis from state.backedgeSources/Controls/Buffers. - retrieve_generalized_loop_phi_address_value_impl and retrieve_generalized_loop_local_phi_address_value_impl relax their 'phi->getNumIncomingValues() != 2' sanity check to accept any phi with >= 2 incomings, and match each incoming against canonicalSource or any of state->backedgeSources[i]. load_generalized_backup_impl: - Collects backedges whose sourceBlock differs from canonical AND whose controlCursor value differs from canonical; activates state only if at least one such backedge exists. - seedInvariantLocalQwords requires the qword to read identically from canonicalBuffer AND every backedgeBuffer to qualify. record_generalized_loop_backedge_impl: - The rolled-control promotion (move current backedge into canonical, install new source as backedge) is only well-defined for the 1-backedge case, so it now guards on backedgeSources.size() == 1 and becomes a no-op for multi-way. Extending the rolled-control semantics to multi-way loops is left as follow-up when a real sample exercises it. Tests (Tester.hpp): - runGeneralizedLoopThirdBackedgeOverwritesPriorBackedgeSilently flipped and renamed to runGeneralizedLoopThirdBackedgePreservesAllThreeSnapshots: asserts three-backedge vector holds one entry per sourceBlock. - runGeneralizedLoopLoadBackupWithThreeBackedgesProducesTwoWayPhiOnly flipped and renamed to runGeneralizedLoopLoadBackupWithThreeBackedgesProducesFourWayPhi: asserts GetMemoryValue(controlSlot) at the header yields a 4-incoming phi carrying canonical + all three backedge control values. Docs (docs/LOOP_HANDLING.md): - Struct and mergeValue snippets updated to N-way shapes. - branch_backup state-transition row describes append+dedup. - Multi-way backedge row removed from Known limitations. Verification: - python test.py micro: all pass, including the two flipped tests. - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match - 2-way loop IR shape unchanged). - Themida reference sample (../testthemida/example2-virt.bin @ 0x140001000): 2544 instructions lifted, 0 warnings, 0 errors. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 01:53:32 +03:00
naci	d0a9d7fc9d	lifter: remove resolveTargetedThemidaR9 - obsoleted by generalized-loop phi infrastructure (#120 ) resolveTargetedThemidaR9 was added to recover the controlCursor identity of R9 at three hardcoded Themida instruction addresses where the symbolic pipeline had lost provenance. PR #112 (generalized-loop control-field / slot phi infrastructure) since landed retrieve_generalized_loop_control_* helpers that produce the correct phi shape through the normal GetMemoryValue path. The R9 override is now dead code: it overwrites a correct value with another correct value at three sites that the upstream pipeline already handles. Empirical bisect on the reference Themida sample (../testthemida/example2-virt.bin @ 0x140001000) confirmed: - site 0x140023671 disabled alone: 2544 lifted, 0 warn, 0 err - site 0x14002368D disabled alone: 2544 lifted, 0 warn, 0 err - site 0x140023741 disabled alone: 2544 lifted, 0 warn, 0 err - all three disabled simultaneously: 2544 lifted, 0 warn, 0 err - baseline (override active): 2544 lifted, 0 warn, 0 err The MERGEN_DIAG_LIFT_PROGRESS=1 trace at site 0x14002368D shows R9 is already `add i64 %generalized_phi_load, 10` before the override fires - the generalized-loop machinery produced the correct phi independently. Removed: - resolveTargetedThemidaR9() in lifter/core/LifterClass_Concolic.hpp - R9 special-case branch + session-scaffolding diag block in GetRegisterValue_impl (now just `return get_impl(key)`) - Three microtests in lifter/test/Tester.hpp: runTargetedThemidaR9OverrideProducesPhi runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress runTargetedThemidaR9OverrideFallsThroughWithoutLoopState - Their three runCustom() registrations - Override row in helper table, hardcoded-address subsection, and limitations row in docs/LOOP_HANDLING.md Retained: kThemidaControlCursorSlot, kThemidaLoopCarriedSlot, and kSupportedGeneralizedControlFieldOffsets - still consumed by the generalized-loop control-field/slot retrieve_* helpers. Verified: - python test.py micro: all instruction microtests passed - python test.py baseline: all rewrite regression checks passed, determinism check passed (42 golden files match) - Themida sample: 2544 instructions lifted, 0 warnings, 0 errors Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 00:37:04 +03:00
naci	8d101dcc5a	lifter: fix Cyrillic homoglyph in resolveTargetedThemidaR9 identifier (#119 ) The identifier 'resolveTargetedThemid\u0430R9' (declared in LifterClass_Concolic.hpp) contained U+0430 (Cyrillic small letter a) instead of U+0061 (Latin a) between 'Themid' and 'R9'. Every in-tree reference mirrored the Cyrillic form, but prose mentions and merge titles (e.g. PR #115 title) used ASCII, so an ASCII grep for 'resolveTargetedThemidaR9' returned zero hits. This was a silent discoverability hazard for future sessions and grep-based tooling. Rename to pure ASCII across the single declaration, the single caller in getLatestValueForKey, the six test entry points in lifter/test/Tester.hpp, and the four references in docs/LOOP_HANDLING.md. No behavior change. Verified: - python test.py micro: all instruction microtests passed (including the three targeted_themida_r9_override_* cases) - Themida reference sample (../testthemida/example2-virt.bin @ 0x140001000): 2544 instructions lifted, 0 warnings, 0 errors Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-23 00:06:55 +03:00
naci	c6e4c33627	docs: add LOOP_HANDLING.md reference for loop detection, generalization, and phi consumption (#116 ) Captures the three-phase architecture (detect/generalize/consume), the path-solve context gating table, the GeneralizedLoopControlFieldState layout, mergeValue's widenFirstBackedge contract, the full set of retrieve_generalized_loop_* helpers, and the hardcoded reference-sample addresses (kThemidaControlCursorSlot, the three resolveTargetedThemidаR9 instruction addresses with fire-counts on the reference binary). Documents known limitations at the bottom: REP SCAS, VMP 3.6 INT 2 dispatcher, the reference-sample hardcodes, unrolling/LICM, multi-way backedges. Flags that SCOPE.md's 'loop-header generalization temporarily disabled' entry appears to be stale: the code gates generalization on path-solve context (ConditionalBranch / DirectJump / resolved IndirectJump) rather than disabling it wholesale. Not changed in this PR; maintainer decision. Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>	2026-04-22 23:05:24 +03:00

6 Commits