lifter: degrade PATH_unsolved ret to REAL_return to fix unterminated blocks (#199)

When lift_ret classifies a ret as ROP_return (because rspvalue is a
ConstantInt != STACKP_VALUE) and solvePath subsequently returns
PATH_unsolved, the previous code emitted a warning and left the block
unterminated. The outer per-instruction lift loop in
liftBasicBlockFromAddress would then advance current_address past the
ret and lift the next byte, producing a second terminator and
malformed IR (same shape as #196's earlier bug, exposed on a different
code path).

The most accurate semantic for an unresolvable ret is 'returns to a
caller we do not have context for' - degrade to REAL_return: emit
'ret rax' and stop the block. The warning still fires (suppressed for
chained PCs per #198) so the unresolvable signal is preserved as a
diagnostic, but the IR stays well-formed.

Visible at 0x14000110d (the entry function's own final ret) on
example2-virt.bin @ 0x140001000: previously 1 warn + an unterminated
block whose downstream lifted bytes produced spurious 'ret undef' in
some block. After this change, the warning still surfaces but the
block is terminated with 'ret rax' immediately.

Knock-on improvement: with the ret site terminating cleanly, O2's DCE
collapses the noise and the optimized IR now contains all 7 import
calls in their original program order:

  GetStdHandle(STD_INPUT_HANDLE)   ; stdin
  GetStdHandle(STD_OUTPUT_HANDLE)  ; stdout
  WriteConsoleA(stdout, prompt)
  ReadConsoleA(stdin, buffer)
  CharUpperA(buffer)
  WriteConsoleA(stdout, echo)
  WriteConsoleA(stdout, buffer)

Matches the emulator trace from PR #190 exactly. The post-opt IR went
from 'imports declared but most call-sites DCE'd' to 'full original
program flow visible'.

Verified: python test.py baseline + quick + themida green. Non-virt
example2.bin unchanged (2 blocks, 6 declares, 0 warn, 0 err).
themida-virt: 4/4 imports pre-opt AND post-opt, 1 warn (legit top-
level ret), 0 err - same headline numbers as pre-fix, but the post-
opt IR is dramatically cleaner.

Also drops the noisy stdout '[diag] lift_ret: unresolved ROP chain'
print that ran in lockstep with the structured warning - the warning
already conveys the same info via diagnostics.warning.

Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
This commit is contained in:
naci
2026-04-25 00:45:18 +03:00
committed by GitHub
parent 463b6aca68
commit 7bcb705b5a
+22 -6
View File
@@ -558,16 +558,32 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_ret() { // fix
auto pathResult = solvePath(function, destination, realval);
if (pathResult == PATH_unsolved) {
uint64_t diagAddr = current_address - instruction.length;
// Suppress the warning when this PC has already been recognised as a
// concrete VM-staged import ret-site by an earlier chain fire; the
// symbolic re-entry here carries no new information.
if (chainedImportRetSites.find(diagAddr) == chainedImportRetSites.end()) {
const bool chained =
chainedImportRetSites.find(diagAddr) != chainedImportRetSites.end();
if (!chained) {
++liftStats.blocks_unreachable;
std::cout << "[diag] lift_ret: unresolved ROP chain at 0x"
<< std::hex << diagAddr << std::dec << "\n" << std::flush;
diagnostics.warning(DiagCode::UnresolvedRetChain, diagAddr,
"Unresolved ROP chain (ret to symbolic address)");
}
// Block is currently unterminated: solvePath did not resolve the popped
// RIP, no chain fired, and we are at a `ret`. The most accurate
// semantic is "return to a caller we do not have context for" -
// degrade to REAL_return behaviour: emit `ret rax` and stop the
// block. This keeps the IR well-formed, lets DCE collapse the block
// if it ends up unreachable, and prevents the outer per-instruction
// lift loop from advancing past the ret and emitting a second
// terminator into the same block.
if (!builder->GetInsertBlock()->getTerminator()) {
auto rax = GetRegisterValue(Register::RAX);
rax = createZExtOrTruncFolder(
rax,
llvm::Type::getIntNTy(
context, file.getMode() == arch_mode::X64 ? 64 : 32));
builder->CreateRet(rax);
}
run = 0;
finished = 1;
return;
}
// If the callee returned to our speculative call's return address,