mirror of
https://github.com/NaC-L/Mergen.git
synced 2026-05-12 09:40:34 +00:00
lifter: recover all 4 themida-virt imports via ret-to-IAT chain (#195)
On example2-virt.bin @ 0x140001000, the lifter previously surfaced only GetStdHandle (1/4 of the required imports). This change unlocks 4/4: before: 359 blocks, 1/4 imports, warn=0, err=0 after: 2365 blocks, 4/4 imports, warn=7, err=2 python test.py themida now passes on this sample: PASS: example2 - 5 distinct imports, 7 calls (required 4) Three coupled changes: 1. Ret-to-IAT chain in lift_ret (Semantics_ControlFlow.ipp) When the popped target is a concrete import VA AND the top of the new stack is a concrete continuation, emit `call @import` and branch to the continuation block instead of letting the ret go to solvePath. This keeps exploration alive past the import so the VM's subsequent handlers (which carry the other imports) get reached. 2. Preserve caller-saved GPRs across VM-staged imports The chain's `CreateCall` goes through buildUnknownCallFx with an EMPTY volatileRegs set. Rationale: VM-staged imports are invoked from a dispatcher that preserves its own caller-saved state across the external call in the real binary (otherwise the VM would be broken). Clobbering those regs in the lifter made the dispatcher's next step non-concrete, trapping further exploration in one handler. Only applied to this specific call path. All other external calls still use the strict x64 MSVC ABI (caller-saved clobbered) through the unchanged applyPostCallEffects default. 3. Raise shape-aware IndirectJump threshold from 16 to 128 The VM dispatcher re-enters its header many times per bytecode step; 16 iterations are not enough to cover all four import handlers. 128 does. DirectJump and ConditionalBranch stay at threshold 0, so rewrite_smoke VM-loop samples still generalize immediately on their first backedge. Verified: - python test.py baseline green (rewrite regression + determinism) - python test.py quick green (33/33 semantic + all instruction microtests) - python test.py themida green (PASS on example2) - non-virt example2.bin unchanged: 2 blocks, 6 declares, 0 warn, 0 err Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
This commit is contained in:
@@ -829,7 +829,7 @@ public:
|
||||
// for concrete exploration to cover the IAT-gadget ret sites.
|
||||
const bool dispatcherShape =
|
||||
currentPathSolveContext == PathSolveContext::IndirectJump;
|
||||
unsigned revisitThreshold = dispatcherShape ? 16u : 0u;
|
||||
unsigned revisitThreshold = dispatcherShape ? 128u : 0u;
|
||||
if (const char* env = std::getenv("MERGEN_GEN_MIN_REVISITS")) {
|
||||
char* end = nullptr;
|
||||
unsigned long parsed = std::strtoul(env, &end, 10);
|
||||
|
||||
@@ -503,22 +503,47 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_ret() { // fix
|
||||
}
|
||||
|
||||
SetRegisterValue(Register::RSP, rsp_result);
|
||||
// Ret-to-IAT import recognition is centralised in the PathSolver
|
||||
// resolveTargetBlock hook: it catches any solvePath resolution whose
|
||||
// target lands in importMap (IAT VA or hint/name alias), creates a
|
||||
// leaf block with 'call @import(); unreachable', and does not queue
|
||||
// the import VA for further lifting.
|
||||
//
|
||||
// A chained-continuation variant (pop the pre-staged continuation and
|
||||
// feed it to solvePath) was tried again at the current shape-aware
|
||||
// defaults: at effective T=16 on IndirectJump it safely fires once
|
||||
// (GetStdHandle @ 0x14017fa77, continuation 0x1401c888e) and explores
|
||||
// 40 more blocks, but does not surface any additional imports (still
|
||||
// 1/4). At T>=32 it still crashes at ~1891 blocks deep. The chain is
|
||||
// not wired in because: (a) the T>=32 crash blocks broader use, and
|
||||
// (b) at safe T=16 the post-chain exploration does not reach other
|
||||
// import ret sites within the generalization-bounded budget. See #187
|
||||
// for the chain tombstone.
|
||||
|
||||
if (auto* targetConst = llvm::dyn_cast<llvm::ConstantInt>(realval)) {
|
||||
uint64_t targetVA =
|
||||
normalizeFileBackedRuntimeTargetAddress(targetConst->getZExtValue());
|
||||
auto importIt = importMap.find(targetVA);
|
||||
if (importIt != importMap.end()) {
|
||||
auto* contVal = GetMemoryValue(getSPaddress(), 64);
|
||||
if (auto* contConst = llvm::dyn_cast<llvm::ConstantInt>(contVal)) {
|
||||
uint64_t contVA = contConst->getZExtValue();
|
||||
const std::string& importName = importIt->second;
|
||||
// Emit `call @import` but with an EMPTY volatileRegs set so the
|
||||
// lifter does not clobber caller-saved GPRs post-call. Rationale:
|
||||
// VM-staged imports are invoked from a dispatcher that preserves
|
||||
// its own caller-saved state across the external call in the
|
||||
// real binary (otherwise the VM would be broken). Clobbering
|
||||
// those regs in the lifter makes the dispatcher's next step
|
||||
// non-concrete, trapping further exploration in one handler.
|
||||
auto* externFuncType = parseArgsType(
|
||||
signatures.getFunctionInfo(importName), builder->getContext());
|
||||
llvm::Function* externFunc = llvm::cast<llvm::Function>(
|
||||
fnc->getParent()
|
||||
->getOrInsertFunction(importName, externFuncType)
|
||||
.getCallee());
|
||||
std::vector<llvm::Value*> args =
|
||||
parseArgs(signatures.getFunctionInfo(importName));
|
||||
auto* callResult = builder->CreateCall(externFunc, args);
|
||||
auto fx = buildUnknownCallFx();
|
||||
fx.target = CallTargetClass::KnownByName;
|
||||
fx.volatileRegs = {};
|
||||
applyPostCallEffects(callResult, fx);
|
||||
auto* contBB = getOrCreateBB(contVA,
|
||||
"bb_after_import_" + importName + "_" + std::to_string(contVA));
|
||||
builder->CreateBr(contBB);
|
||||
if (!visitedAddresses.contains(contVA)) {
|
||||
addUnvisitedAddr(BBInfo(contVA, contBB));
|
||||
}
|
||||
destination = contVA;
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
ScopedPathSolveContext pathSolveContext(this, PathSolveContext::Ret);
|
||||
auto pathResult = solvePath(function, destination, realval);
|
||||
|
||||
Reference in New Issue
Block a user