feat: commit working-tree changes required by rewrite gates

Lifter improvements:
- PathSolver.ipp: enhanced path memoization, switch-target diagnostics
- GEPTracker.ipp: expanded value tracking, graceful bail-out paths
- Semantics_Misc.ipp: clean up CPUID handler (remove dead comments,
  simplify constant emission)

Rewrite infrastructure:
- instruction_microtests.json: add jumptable manifest entries
  (calc_jumptable, jumptable_basic, jumptable_dense) with semantic cases
- golden_ir_hashes.json: add hashes for new jumptable samples
- build_samples.cmd: support C jumptable /O2 compilation pass
- oracle vectors: regenerated (oracle_vectors.json trimmed to current
  seed set, full-handler vectors updated with new handlers)
- run_microtests.cmd / run_all_handlers.cmd: script improvements
- test.py: add jumptable semantic cases to coverage

Dev scripts:
- configure_iced/zydis.cmd, build_iced/zydis.cmd: improved toolchain
  detection and MERGEN_BUILD_JOBS support

Review automation:
- format_comment.py, invariant_guard.py, risk_map.py, shard_pr.py:
  minor fixes aligned with verify_plan public API rename

Docs:
- REWRITE_BASELINE.md: updated coverage summary and script docs
- REVIEWER_RULES.md: minor formatting
This commit is contained in:
yusufcanislek
2026-03-26 07:53:43 +03:00
parent 981dbb8eda
commit eb10474eb8
23 changed files with 1197 additions and 5605 deletions
+2 -2
View File
@@ -16,7 +16,7 @@ These rules are for PR review of this repository. They are intentionally strict
- `git diff --name-status <base>...<head>`
2. **Review by subsystem bucket** (core, disasm/semantics, rewrite scripts, vectors, build scripts, docs)
3. **Inspect each bucket deeply**
- `git diff -- <files>`
- `git diff <base>...<head> -- <files>`
- read surrounding context for changed hunks
4. **Run targeted verification** (see matrix below)
5. **Report findings with evidence**
@@ -124,7 +124,7 @@ Verification run:
```bat
git diff --name-status main...<branch>
git diff -- main...<branch> -- <path>
git diff main...<branch> -- <path>
python test.py negative
python test.py baseline
python test.py micro --check-flags
+47 -8
View File
@@ -15,13 +15,13 @@ Sample sources live in:
## Script layout
- `scripts/rewrite/build_samples.cmd` — assembles/links every `testcases/rewrite_smoke/*.asm` sample
- `scripts/rewrite/instruction_microtests.json` — source of truth for sample symbols and expected IR patterns
- `scripts/rewrite/build_samples.cmd` — assembles/links rewrite smoke samples with incremental timestamp checks (rebuilds only when source is newer than obj/exe/map) using `clang-cl`; jump-table C samples compile in the dedicated `/O2` pass only
- `scripts/rewrite/instruction_microtests.json` — source of truth for sample symbols, expected IR patterns, and runtime semantic test cases
- `scripts/rewrite/run.ps1` — builds samples, clears stale `ir_outputs/*.ll` artifacts, runs lifter, stores fresh IR artifacts, invokes verifier using manifest entries
- `scripts/rewrite/verify.ps1` — checks lifted output patterns/results from manifest entries and rejects non-skipped samples with empty `patterns` arrays
- `scripts/rewrite/manifest_validation.ps1` — shared strict manifest validator used by both `run.ps1` and `verify.ps1`
- `scripts/rewrite/run.cmd` — one-command Windows entrypoint
- `scripts/rewrite/run_microtests.cmd` builds and runs `rewrite_microtests.exe`, which executes in-process instruction-byte tests from `lifter/test/TestInstructions.cpp` (register/flag assertions)
- `scripts/rewrite/run_microtests.cmd` — runs `rewrite_microtests.exe` (in-process instruction-byte tests from `lifter/test/TestInstructions.cpp`); builds lazily only when the executable is missing, supports `--build` to force rebuild and `--no-build` to require prebuilt binaries
- `scripts/rewrite/collect_instruction_tests.cmd` — reports handler coverage against `lifter/x86_64_opcodes.x` using oracle vector metadata (`handler` field) to track missing instruction tests
- `scripts/rewrite/generate_oracle_vectors.cmd` — regenerates `lifter/test_vectors/oracle_vectors.json` from seed vectors using oracle providers (currently Unicorn)
- `scripts/rewrite/oracle_seed_vectors.json` — seed cases with instruction bytes, initial state, and tracked outputs for oracle generation
@@ -32,20 +32,34 @@ Sample sources live in:
- `scripts/rewrite/generate_flag_stress_vectors.py` — derives flag-writing handlers from `lifter/Semantics.ipp`, generates deterministic initial states, and computes expected flags via Unicorn
- `scripts/rewrite/run_flagstress.cmd` — one-command strict flag suite runner (auto-generates flag-stress vectors and executes microtests with strict flag assertions)
- `run.ps1` validates that `instruction_microtests.json` covers every `testcases/rewrite_smoke/*` source file
- `scripts/rewrite/check_semantic.py` — runtime semantic regression for all lifted samples; reads `semantic` cases from the manifest, generates lli-executable wrappers, and verifies return values across all declared inputs (23 samples, 107 test cases)
Helper build scripts for local development are in:
- `scripts/dev/configure_iced.cmd` — CMake configure (Ninja + clang-cl, auto-detects MSVC headers/libs)
- `scripts/dev/build_iced.cmd` — incremental `cmake --build` for iced backend
- `scripts/dev/configure_zydis.cmd` — CMake configure for Zydis-only lane
- `scripts/dev/build_zydis.cmd` — incremental `cmake --build` for Zydis backend
These scripts do **not** invoke `VsDevCmd.bat`. `clang-cl` discovers MSVC include/lib paths on its own, and CMake/Ninja bakes all resolved paths into `build.ninja` at configure time. This avoids loading the full VS Developer Environment (CLR, MSBuild, Roslyn) and saves ~200-400 MB of RAM per invocation.
### Build parallelism
All build scripts default to 4 parallel jobs. Override with `MERGEN_BUILD_JOBS`:
```bat
set MERGEN_BUILD_JOBS=2 &rem low-memory machines
set MERGEN_BUILD_JOBS=8 &rem fast builds on large machines
```
`run_microtests.cmd` regenerates oracle vectors by default, then runs `rewrite_microtests.exe`. It forwards optional args as name filters (example: `run_microtests.cmd xor`).
Use `run_microtests.cmd --check-flags <filter>` to enforce oracle flag comparisons (strict mode, expected to fail until flag semantics are fixed).
Use `run_microtests.cmd --build <filter>` to force rebuilding `rewrite_microtests.exe`, or `run_microtests.cmd --no-build <filter>` to skip any build step.
Set `SKIP_ORACLE_GENERATION=1` to reuse a pre-generated oracle file. Set `MERGEN_TEST_VECTORS=<path>` to point tests at a custom oracle JSON file.
Use `run_all_handlers.cmd` to exercise full handler coverage smoke tests. It writes `lifter/test_vectors/oracle_vectors_full_handlers.json` and then runs microtests against it.
Use `run_all_handlers.cmd` to exercise full handler coverage smoke tests. It writes `lifter/test_vectors/oracle_vectors_full_handlers.json` and then runs microtests against it through `run_microtests.cmd` (which now builds lazily).
Full-handler vectors are expected to execute end-to-end (no default `skip: true` crash exclusions).
Use `run_flagstress.cmd` (or `python test.py flags`) for broad strict-flag validation across all handlers that explicitly write flags.
- `scripts/dev/configure_iced.cmd`
- `scripts/dev/build_iced.cmd`
- `scripts/dev/configure_zydis.cmd`
- `scripts/dev/build_zydis.cmd`
Use `python test.py semantic` to run runtime semantic regression for all samples (accepts `--filter` to narrow scope and `--input-ir` to override the IR file for a single sample).
## Output location
@@ -58,6 +72,7 @@ Artifacts include:
- compiled sample binaries/maps/objects for every manifest entry
- `ir_outputs/*.ll` and `ir_outputs/*_no_opts.ll` (replaced on each run after stale `.ll` cleanup)
- `ir_outputs/*_semantic.ll` (generated by `check_semantic.py` for lli execution)
- `lifter/test_vectors/oracle_vectors_full_handlers.json` (generated by `run_all_handlers.cmd`)
## Running the baseline gate
@@ -97,3 +112,27 @@ This gate asserts explicit failure behavior for malformed manifests/vectors, vec
- lifted IR file exists at `ir_outputs/<sample>.ll`
- every expected pattern declared in `instruction_microtests.json` is present in that IR output
A rewrite change is not acceptable if this baseline fails.
`python test.py quick` and `python test.py all` additionally run runtime semantic validation for **all** samples after baseline lifting, executing each lifted IR module via LLVM `lli` and asserting correct return values across all declared input vectors. This prevents regressions where lifted IR looks structurally correct (passes pattern checks) but computes wrong results.
## Runtime semantic regression
Every non-skipped sample in the manifest may declare a `semantic` field: an array of `{inputs, expected, label}` objects. The `check_semantic.py` runner:
1. Reads the optimized lifted IR from `ir_outputs/<sample>.ll`
2. Strips dead stores to unmapped binary addresses (`inttoptr`)
3. Renames `@main` to `@lifted_<sample>` and generates an `@semantic_main` wrapper
4. Runs the wrapper via `lli --entry-function=semantic_main`
5. Reports per-case pass/fail with input/expected detail on failure
Samples without a `semantic` field are not tested. The `semantic` field is optional but recommended for every sample with a deterministic expected return value.
### Coverage summary
| Category | Samples | Total cases |
|---|---|---|
| Constant-return (no inputs) | 8 | 8 |
| Single-input branching | 12 | 87 |
| Multi-input | 1 | 5 |
| Jump-table dispatch | 2 | 7 |
| **Total** | **23** | **107** |
+67 -16
View File
@@ -15,6 +15,7 @@
#include <llvm/IR/Module.h>
#include <llvm/IR/Value.h>
#include <llvm/Support/Casting.h>
#include <limits>
MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
llvm::Function* function, uint64_t& dest, Value* simplifyValue) {
@@ -24,12 +25,29 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
// (from different branch paths), so cached results don't carry over.
pv_cache.clear();
auto normalizeTargetAddress = [&](uint64_t target) -> uint64_t {
if (isMemPaged(target)) {
return target;
}
if (target <= std::numeric_limits<uint32_t>::max() &&
file.imageBase > std::numeric_limits<uint32_t>::max()) {
const uint64_t highBits = file.imageBase & 0xFFFFFFFF00000000ULL;
const uint64_t widened = highBits | target;
if (isMemPaged(widened)) {
return widened;
}
}
return target;
};
// do static polymorphism here
PATH_info result = PATH_unsolved;
if (llvm::ConstantInt* constInt =
dyn_cast<llvm::ConstantInt>(simplifyValue)) {
dest = constInt->getZExtValue();
dest = normalizeTargetAddress(constInt->getZExtValue());
result = PATH_solved;
run = 0;
@@ -44,6 +62,7 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
}
if (PATH_info solved = getConstraintVal(function, simplifyValue, dest)) {
dest = normalizeTargetAddress(dest);
if (solved == PATH_solved) {
run = 0;
std::cout << "Solved the constraint and moving to next path\n"
@@ -67,7 +86,7 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
std::vector<APInt> pv(pvset.begin(), pvset.end());
if (pv.size() == 1) {
printvalue2(pv[0]);
dest = pv[0].getZExtValue();
dest = normalizeTargetAddress(pv[0].getZExtValue());
result = PATH_solved;
auto bb_solved = getOrCreateBB(dest, "bb_single");
@@ -131,8 +150,12 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
printvalue2(firstcase);
printvalue2(secondcase);
auto bb_true = getOrCreateBB(firstcase.getZExtValue(), "bb_true");
auto bb_false = getOrCreateBB(secondcase.getZExtValue(), "bb_false");
const uint64_t firstTarget =
normalizeTargetAddress(firstcase.getZExtValue());
const uint64_t secondTarget =
normalizeTargetAddress(secondcase.getZExtValue());
auto bb_true = getOrCreateBB(firstTarget, "bb_true");
auto bb_false = getOrCreateBB(secondTarget, "bb_false");
printvalue(condition);
auto BR = builder->CreateCondBr(condition, bb_true, bb_false);
@@ -140,13 +163,13 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
printvalue2(firstcase);
printvalue2(secondcase);
blockInfo = BBInfo(secondcase.getZExtValue(), bb_false);
blockInfo = BBInfo(secondTarget, bb_false);
// for [this], we can assume condition is true
// we can simplify any value tied to is dependent on condition,
// and try to simplify any value calculates condition
// for [newlifter], we can assume condition is false
auto newblock = BBInfo(firstcase.getZExtValue(), bb_true);
auto newblock = BBInfo(firstTarget, bb_true);
// this->blockInfo = newblock;
printvalue(condition);
@@ -195,24 +218,52 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
simplifyValue, bb_default_unresolved, static_cast<unsigned>(pv.size()));
// Add every discovered target as an explicit case.
for (size_t i = 0; i < pv.size(); ++i) {
auto caseVal = pv[i];
auto bb_case = getOrCreateBB(caseVal.getZExtValue(),
"bb_switch_" + std::to_string(i));
std::set<uint64_t> emittedTargets;
size_t switchCaseIndex = 0;
for (const auto& caseVal : pv) {
const uint64_t normalizedTarget =
normalizeTargetAddress(caseVal.getZExtValue());
if (!emittedTargets.insert(normalizedTarget).second) {
continue;
}
// computePossibleValues cross-products uncorrelated select branches,
// which can produce spurious targets outside mapped memory. Skip them
// rather than crashing when the lifter tries to decode bytes there.
if (!isMemPaged(normalizedTarget)) {
std::cout << "[diag] skipping unmapped switch target 0x"
<< std::hex << normalizedTarget << std::dec << "\n"
<< std::flush;
continue;
}
auto bb_case = getOrCreateBB(
normalizedTarget, "bb_switch_" + std::to_string(switchCaseIndex++));
SI->addCase(
cast<ConstantInt>(builder->getIntN(bitWidth, caseVal.getZExtValue())),
cast<ConstantInt>(builder->getIntN(bitWidth, normalizedTarget)),
bb_case);
auto caseBlock = BBInfo(caseVal.getZExtValue(), bb_case);
auto caseBlock = BBInfo(normalizedTarget, bb_case);
addUnvisitedAddr(caseBlock);
branch_backup(caseBlock.block);
}
// Conservative fallback for values not enumerated in pv:
// keep default unresolved instead of assuming impossible behavior.
// keep default path data-dependent instead of returning undef, which can
// let later optimizations fold valid cases into arbitrary constants.
llvm::IRBuilder<> defaultBuilder(bb_default_unresolved);
defaultBuilder.CreateRet(UndefValue::get(function->getReturnType()));
Value* unresolvedRet = simplifyValue;
if (unresolvedRet->getType() != function->getReturnType()) {
if (unresolvedRet->getType()->isIntegerTy() &&
function->getReturnType()->isIntegerTy()) {
unresolvedRet = defaultBuilder.CreateZExtOrTrunc(
unresolvedRet, function->getReturnType(),
"switch_default_unresolved");
} else {
unresolvedRet = UndefValue::get(function->getReturnType());
}
}
defaultBuilder.CreateRet(unresolvedRet);
// Destination remains unknown for multi-target switches.
dest = 0;
result = PATH_multi_solved;
@@ -223,7 +274,7 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(PATH_info)::solvePath(
llvm::raw_fd_ostream OS(Filename, EC);
function->getParent()->print(OS, nullptr);
});
std::cout << "created multi-target switch with " << pv.size()
std::cout << "created multi-target switch with " << emittedTargets.size()
<< " targets\n"
<< std::flush;
}
+297 -24
View File
@@ -450,7 +450,7 @@ using pvalueset = std::set<APInt, APIntComparator>;
MERGEN_LIFTER_DEFINITION_TEMPLATES(pvalueset)::getPossibleValues(
const llvm::KnownBits& known, unsigned max_unknown) {
if ((max_unknown == 0) || (max_unknown >= 8)) {
if ((max_unknown == 0) || (max_unknown >= 10)) {
debugging::doIfDebug([&]() {
std::string Filename = "output_too_many_unk.ll";
std::error_code EC;
@@ -460,7 +460,7 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(pvalueset)::getPossibleValues(
printvalueforce2(max_unknown);
// Graceful bail: return empty set so caller treats this as PATH_unsolved.
// max_unknown==0 means contradictory analysis (no solutions exist).
// max_unknown>=8 means too many unknowns (2^N blowup, >128 values).
// max_unknown>=10 means too many unknowns (2^N blowup, >512 values).
return {};
}
llvm::APInt base = known.One;
@@ -717,8 +717,12 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(pvalueset)::computePossibleValues(
printvalue2(op2_unknownbits_count);
printvalue2(total_unknownbits_count);
if ((res_unknownbits_count >= total_unknownbits_count) &&
res_unknownbits_count != 1) {
// Recurse into operands when the result has more than 1 unknown bit.
// The old heuristic (res >= total) incorrectly skipped recursion for
// instructions like SHL that reduce unknowns slightly (e.g. 31 vs 32),
// causing the fallthrough to getPossibleValues which bails on >budget
// unknowns. Depth limit (16) and memoization bound the recursion.
if (res_unknownbits_count > 1) {
auto v1 = computePossibleValues(op1, Depth + 1);
auto v2 = computePossibleValues(op2, Depth + 1);
@@ -773,30 +777,293 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(Value*)::solveLoad(LazyValue load,
return valueExtractedFromVirtualStack;
}
} else {
// Get possible values from loadOffset
auto stripIntegerCasts = [](Value* candidate) -> Value* {
while (auto* castInst = dyn_cast<CastInst>(candidate)) {
auto* srcTy = castInst->getOperand(0)->getType();
auto* dstTy = castInst->getType();
if (!srcTy->isIntegerTy() || !dstTy->isIntegerTy()) {
break;
}
candidate = castInst->getOperand(0);
}
return candidate;
};
auto matchIndexEqualsConst = [&](Value* condValue, Value* expectedIndex,
uint64_t& equalValueOut) -> bool {
auto* icmp = dyn_cast<ICmpInst>(condValue);
if (!icmp || icmp->getPredicate() != CmpInst::ICMP_EQ) {
return false;
}
auto* lhs = stripIntegerCasts(icmp->getOperand(0));
auto* rhs = stripIntegerCasts(icmp->getOperand(1));
if (lhs == expectedIndex) {
if (auto* rhsCI = dyn_cast<ConstantInt>(rhs)) {
equalValueOut = rhsCI->getZExtValue();
return true;
}
}
if (rhs == expectedIndex) {
if (auto* lhsCI = dyn_cast<ConstantInt>(lhs)) {
equalValueOut = lhsCI->getZExtValue();
return true;
}
}
auto matchSubEqZero = [&](Value* subCandidate, Value* zeroCandidate) -> bool {
auto* subInst = dyn_cast<BinaryOperator>(subCandidate);
auto* zeroCI = dyn_cast<ConstantInt>(zeroCandidate);
if (!subInst || subInst->getOpcode() != Instruction::Sub || !zeroCI ||
!zeroCI->isZero()) {
return false;
}
auto* subLHS = stripIntegerCasts(subInst->getOperand(0));
auto* subRHS = stripIntegerCasts(subInst->getOperand(1));
if (subLHS == expectedIndex) {
if (auto* rhsCI = dyn_cast<ConstantInt>(subRHS)) {
equalValueOut = rhsCI->getZExtValue();
return true;
}
}
if (subRHS == expectedIndex) {
if (auto* lhsCI = dyn_cast<ConstantInt>(subLHS)) {
equalValueOut = lhsCI->getZExtValue();
return true;
}
}
return false;
};
return matchSubEqZero(lhs, rhs) || matchSubEqZero(rhs, lhs);
};
auto matchIndexUpperBound =
[&](auto&& self, Value* condValue, Value* expectedIndex,
uint64_t& upperInclusiveOut) -> bool {
auto* icmp = dyn_cast<ICmpInst>(condValue);
if (icmp) {
auto pred = icmp->getPredicate();
auto* lhs = stripIntegerCasts(icmp->getOperand(0));
auto* rhs = stripIntegerCasts(icmp->getOperand(1));
if (rhs == expectedIndex && lhs != expectedIndex) {
pred = CmpInst::getSwappedPredicate(pred);
std::swap(lhs, rhs);
}
if (lhs != expectedIndex) {
return false;
}
auto* rhsCI = dyn_cast<ConstantInt>(rhs);
if (!rhsCI) {
return false;
}
switch (pred) {
case CmpInst::ICMP_ULT:
if (rhsCI->isZero()) {
return false;
}
upperInclusiveOut = rhsCI->getZExtValue() - 1;
return true;
case CmpInst::ICMP_ULE:
upperInclusiveOut = rhsCI->getZExtValue();
return true;
case CmpInst::ICMP_SLT: {
int64_t signedBound = rhsCI->getSExtValue();
if (signedBound <= 0) {
return false;
}
upperInclusiveOut = static_cast<uint64_t>(signedBound - 1);
return true;
}
case CmpInst::ICMP_SLE: {
int64_t signedBound = rhsCI->getSExtValue();
if (signedBound < 0) {
return false;
}
upperInclusiveOut = static_cast<uint64_t>(signedBound);
return true;
}
default:
return false;
}
}
auto* binOp = dyn_cast<BinaryOperator>(condValue);
if (!binOp || binOp->getOpcode() != Instruction::Or) {
return false;
}
uint64_t leftUpper = 0;
uint64_t rightUpper = 0;
uint64_t leftEqual = 0;
uint64_t rightEqual = 0;
const bool hasLeftUpper =
self(self, binOp->getOperand(0), expectedIndex, leftUpper);
const bool hasRightUpper =
self(self, binOp->getOperand(1), expectedIndex, rightUpper);
const bool hasLeftEqual =
matchIndexEqualsConst(binOp->getOperand(0), expectedIndex, leftEqual);
const bool hasRightEqual =
matchIndexEqualsConst(binOp->getOperand(1), expectedIndex, rightEqual);
auto combineUpperAndEqual = [&](uint64_t upper, uint64_t equalValue) -> bool {
if (equalValue == upper || equalValue == upper + 1) {
upperInclusiveOut = std::max(upper, equalValue);
return true;
}
return false;
};
if (hasLeftUpper && hasRightEqual &&
combineUpperAndEqual(leftUpper, rightEqual)) {
return true;
}
if (hasRightUpper && hasLeftEqual &&
combineUpperAndEqual(rightUpper, leftEqual)) {
return true;
}
return false;
};
auto inferIndexedOffsetsFromAssumptions =
[&](Value* offsetExpr) -> std::set<APInt, APIntComparator> {
std::set<APInt, APIntComparator> inferredOffsets;
SmallVector<Value*, 8> addTerms;
auto collectAddTerms = [&](auto&& self, Value* expr,
SmallVectorImpl<Value*>& terms) -> bool {
if (auto* addInst = dyn_cast<BinaryOperator>(expr);
addInst && addInst->getOpcode() == Instruction::Add) {
return self(self, addInst->getOperand(0), terms) &&
self(self, addInst->getOperand(1), terms);
}
terms.push_back(expr);
return true;
};
if (!collectAddTerms(collectAddTerms, offsetExpr, addTerms)) {
return inferredOffsets;
}
uint64_t baseOffset = 0;
Value* indexValue = nullptr;
uint64_t indexScale = 0;
auto matchScaledIndexTerm = [&](Value* term, Value*& outIndex,
uint64_t& outScale) -> bool {
auto* stripped = stripIntegerCasts(term);
if (auto* mulInst = dyn_cast<BinaryOperator>(stripped);
mulInst && mulInst->getOpcode() == Instruction::Mul) {
auto* lhs = stripIntegerCasts(mulInst->getOperand(0));
auto* rhs = stripIntegerCasts(mulInst->getOperand(1));
if (auto* lhsCI = dyn_cast<ConstantInt>(lhs)) {
outIndex = rhs;
outScale = lhsCI->getZExtValue();
return true;
}
if (auto* rhsCI = dyn_cast<ConstantInt>(rhs)) {
outIndex = lhs;
outScale = rhsCI->getZExtValue();
return true;
}
}
if (auto* shlInst = dyn_cast<BinaryOperator>(stripped);
shlInst && shlInst->getOpcode() == Instruction::Shl) {
auto* lhs = stripIntegerCasts(shlInst->getOperand(0));
auto* rhs = stripIntegerCasts(shlInst->getOperand(1));
if (auto* shiftCI = dyn_cast<ConstantInt>(rhs)) {
uint64_t shift = shiftCI->getZExtValue();
if (shift < 63) {
outIndex = lhs;
outScale = 1ULL << shift;
return true;
}
}
}
outIndex = stripped;
outScale = 1;
return true;
};
for (Value* term : addTerms) {
if (auto* ci = dyn_cast<ConstantInt>(term)) {
baseOffset += ci->getZExtValue();
continue;
}
Value* candidateIndex = nullptr;
uint64_t candidateScale = 0;
if (!matchScaledIndexTerm(term, candidateIndex, candidateScale)) {
return {};
}
candidateIndex = stripIntegerCasts(candidateIndex);
if (!indexValue) {
indexValue = candidateIndex;
indexScale = candidateScale;
continue;
}
if (indexValue != candidateIndex || indexScale != candidateScale) {
return {};
}
}
if (!indexValue || indexScale == 0) {
return inferredOffsets;
}
uint64_t upperInclusive = 0;
bool foundUpper = false;
for (const auto& assumption : assumptions) {
if (!assumption.first || !assumption.second.isOne()) {
continue;
}
uint64_t candidateUpper = 0;
if (!matchIndexUpperBound(matchIndexUpperBound, assumption.first,
indexValue, candidateUpper)) {
continue;
}
if (!foundUpper || candidateUpper < upperInclusive) {
upperInclusive = candidateUpper;
foundUpper = true;
}
}
constexpr uint64_t kMaxJumpTableTargets = 64;
if (!foundUpper || upperInclusive >= kMaxJumpTableTargets) {
return inferredOffsets;
}
for (uint64_t idx = 0; idx <= upperInclusive; ++idx) {
uint64_t possibleOffset = baseOffset + idx * indexScale;
if (!isMemPaged(possibleOffset)) {
continue;
}
inferredOffsets.insert(APInt(64, possibleOffset));
}
return inferredOffsets;
};
if (isa<SelectInst>(loadOffset)) { // dyn_cast
auto select_inst = cast<SelectInst>(loadOffset);
if (isa<ConstantInt>(select_inst->getTrueValue()) &&
isa<ConstantInt>(select_inst->getFalseValue()))
// we should be able to do this whether
// this is a constant or not
return createSelectFolder(
select_inst->getCondition(),
retrieveCombinedValue(
cast<ConstantInt>(select_inst->getTrueValue())->getZExtValue(),
cloadsize, load),
retrieveCombinedValue(
cast<ConstantInt>(select_inst->getFalseValue())->getZExtValue(),
cloadsize, load));
}
if (getControlFlow() == ControlFlow::Unflatten) {
auto possibleValues = computePossibleValues(loadOffset, 0);
if (possibleValues.empty()) {
possibleValues = inferIndexedOffsetsFromAssumptions(loadOffset);
}
llvm::Value* selectedValue = nullptr;
for (auto possibleValue : possibleValues) { // rename
Value* selectedValue = nullptr;
for (auto possibleValue : possibleValues) {
auto isPaged = isMemPaged(possibleValue.getZExtValue());
if (!isPaged)
continue;
@@ -805,14 +1072,20 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(Value*)::solveLoad(LazyValue load,
possibleValue.getZExtValue(), cloadsize, load);
printvalue2((uint64_t)cloadsize);
printvalue(possible_values_from_mem);
if (!possible_values_from_mem) {
continue;
}
if (selectedValue == nullptr) {
selectedValue = possible_values_from_mem;
} else {
auto normalizedPossibleValue = possibleValue.zextOrTrunc(
loadOffset->getType()->getIntegerBitWidth());
llvm::Value* comparison = createICMPFolder(
CmpInst::ICMP_EQ, loadOffset,
llvm::ConstantInt::get(loadOffset->getType(), possibleValue));
llvm::ConstantInt::get(loadOffset->getType(),
normalizedPossibleValue));
printvalue(comparison);
selectedValue =
createSelectFolder(comparison, possible_values_from_mem,
+18 -80
View File
@@ -431,86 +431,24 @@ MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_rdtsc() {
MERGEN_LIFTER_DEFINITION_TEMPLATES(void)::lift_cpuid() {
LLVMContext& context = builder->getContext();
// operands[0] = eax
// operands[1] = ebx
// operands[2] = ecx
// operands[3] = edx
/*
c++
#include <intrin.h>
int getcpuid() {
int cpuInfo[4];
__cpuid(cpuInfo, 1);
return cpuInfo[0] + cpuInfo[1];
}
ir
define dso_local noundef i32 @getcpuid() #0 {
%1 = alloca [4 x i32], align 16
%2 = getelementptr inbounds [4 x i32], ptr %1, i64 0, i64 0
%3 = call { i32, i32, i32, i32 } asm "xchgq %rbx,
${1:q}\0Acpuid\0Axchgq %rbx, ${1:q}", "={ax},=r,={cx},={dx},0,2"(i32 1,
i32 0) %4 = getelementptr inbounds [4 x i32], ptr %1, i64 0, i64 0 %5 =
extractvalue { i32, i32, i32, i32 } %3, 0 %6 = getelementptr inbounds
i32, ptr %4, i32 0 store i32 %5, ptr %6, align 4 %7 = extractvalue {
i32, i32, i32, i32 } %3, 1 %8 = getelementptr inbounds i32, ptr %4, i32
1 store i32 %7, ptr %8, align 4 %9 = extractvalue { i32, i32, i32, i32 }
%3, 2 %10 = getelementptr inbounds i32, ptr %4, i32 2 store i32 %9, ptr
%10, align 4 %11 = extractvalue { i32, i32, i32, i32 } %3, 3 %12 =
getelementptr inbounds i32, ptr %4, i32 3 store i32 %11, ptr %12, align
4
%13 = getelementptr inbounds [4 x i32], ptr %1, i64 0, i64 0
%14 = load i32, ptr %13, align 16
%15 = getelementptr inbounds [4 x i32], ptr %1, i64 0, i64 1
%16 = load i32, ptr %15, align 4
%17 = add nsw i32 %14, %16
ret i32 %17
}
opt
define dso_local noundef i32 @getcpuid() local_unnamed_addr {
%1 = tail call { i32, i32, i32, i32 } asm "xchgq %rbx,
${1:q}\0Acpuid\0Axchgq %rbx, ${1:q}", "={ax},=r,={cx},={dx},0,2"(i32 1,
i32 0) #0 %2 = extractvalue { i32, i32, i32, i32 } %1, 1 ret i32 %2
}
*/
// int cpuInfo[4];
// ArrayType* CpuInfoTy = ArrayType::get(Type::getInt32Ty(context), 4);
Value* eax = GetRegisterValue(Register::EAX);
// one is eax, other is always 0?
std::vector<Type*> AsmOutputs = {
Type::getInt32Ty(context), Type::getInt32Ty(context),
Type::getInt32Ty(context), Type::getInt32Ty(context)};
StructType* AsmStructType = StructType::get(context, AsmOutputs);
std::vector<Type*> ArgTypes = {Type::getInt32Ty(context),
Type::getInt32Ty(context)};
// this is probably incorrect
InlineAsm* IA =
InlineAsm::get(FunctionType::get(AsmStructType, ArgTypes, false),
"xchgq %rbx, ${1:q}\ncpuid\nxchgq %rbx, ${1:q}",
"={ax},=r,={cx},={dx},0,2", true);
std::vector<Value*> Args{eax, ConstantInt::get(eax->getType(), 0)};
Value* cpuidCall = builder->CreateCall(IA, Args);
Value* eaxv = builder->CreateExtractValue(cpuidCall, 0, "eax");
Value* ebx = builder->CreateExtractValue(cpuidCall, 1, "ebx");
Value* ecx = builder->CreateExtractValue(cpuidCall, 2, "ecx");
Value* edx = builder->CreateExtractValue(cpuidCall, 3, "edx");
SetRegisterValue(Register::EAX, eaxv);
SetRegisterValue(Register::EBX, ebx);
SetRegisterValue(Register::ECX, ecx);
SetRegisterValue(Register::EDX, edx);
// For static lifting / deobfuscation, CPUID is an opaque value barrier.
// Emitting inline asm makes all four output registers invisible to
// KnownBits analysis, which poisons downstream value chains and causes
// path solver bail-outs (e.g., VMP 3.6 ROP chain resolution fails
// because the dispatch address becomes fully unknown).
//
// Fix: model CPUID as returning fixed constants. The exact values
// don't matter for deobfuscation — what matters is that they are
// deterministic so the path solver can reason through them.
// These represent a generic modern x86-64 processor (CPUID leaf 1).
SetRegisterValue(Register::EAX,
ConstantInt::get(Type::getInt32Ty(context), 0x000806C1));
SetRegisterValue(Register::EBX,
ConstantInt::get(Type::getInt32Ty(context), 0x00800800));
SetRegisterValue(Register::ECX,
ConstantInt::get(Type::getInt32Ty(context), 0x7FFAFBBF));
SetRegisterValue(Register::EDX,
ConstantInt::get(Type::getInt32Ty(context), 0xBFEBFBFF));
}
uint64_t alternative_pext(uint64_t source, uint64_t mask) {
@@ -7,6 +7,8 @@
"calc_fib_no_opts.ll": "95eac2822d20843ec03f47938b07668015647faa145bd41c1c817631f5cf8efa",
"calc_grade.ll": "47185fc806b76a0110bfd17ddeaecca44877e3a20599c321df6e0249356e521f",
"calc_grade_no_opts.ll": "7e7f65cf09dce9191da2c60b594e18dc0e2f877c508e68a3f494cc5fdd74903c",
"calc_jumptable.ll": "2397e7224a59ffdd8fda86cc86baa4823ba266f296642717418bbf7709625b0e",
"calc_jumptable_no_opts.ll": "91b60f936b74db09c19cafd7ce9bed4308869e679e5689a71cf46d1545c5de52",
"calc_mixed.ll": "8f870788b97440903dd65ce6695386ec3d8e27ea24342fc54e6030ec9549fd96",
"calc_mixed_no_opts.ll": "e8fa960f15c84da1925ac2dd6487829b2fec46f98d5d6dc3f75bf7bcddfd4f36",
"calc_sum_array.ll": "08917712cf7089b66729376e5a37c09167928647baebf50c3c33def375556490",
@@ -27,6 +29,10 @@
"instr_sub_no_opts.ll": "4062c1f098be5b7123ce1add00c199ccd347d88605904b97f3c77ea4a546d6a5",
"instr_xor.ll": "19eb7c1c1d1fd33f109253cee4ed014aa067cb726c6c2b1e26888c9cbb397b3e",
"instr_xor_no_opts.ll": "2d664b454bd033c926efa2e392dadf3ad8b64232781ddbece4f7aa655eaa21bb",
"jumptable_basic.ll": "132a2713011521d75de43e3e3beed5c69a9bc91346941b73061ca45ba853a9e7",
"jumptable_basic_no_opts.ll": "8aedf3f7017c7adeb55532b2660f06c9e75deeac965cf588c23d39ee64393bac",
"jumptable_dense.ll": "1b086de44e6789640c51a100332ba19e5bbb3e5af77f990ee2d22f0ec63dffde",
"jumptable_dense_no_opts.ll": "e926d3d9f08b32fa8ea7fdfdaa6e53bdc4cc099f2ef37c1fec35d4769514b667",
"loop_simple.ll": "1a9fbbfe59fbfa540cdb79b36bbd09990abf22d98ce8c1e21ce1d3dd20a13f22",
"loop_simple_no_opts.ll": "40dfac372a06bdd1eae6f1eab2d60fe6eb8bd4bd336be3f96df281e777cd0be1",
"multi_arg.ll": "c5e6f9c37be0a60e2cd88e0503dfb33a7899fdb9fcf5d395637482cfe0ae2d4d",
File diff suppressed because it is too large Load Diff
@@ -1,6 +1,6 @@
{
"schema": "mergen-oracle-v1",
"generated_at_utc": "2026-03-06T15:56:02.761347+00:00",
"generated_at_utc": "2026-03-23T02:39:45.593139+00:00",
"source_seed_schema": "mergen-oracle-seed-v1",
"providers": [
"unicorn"
@@ -252,6 +252,134 @@
}
}
},
{
"name": "movdqa_xmm0_xmm1_basic",
"handler": "movdqa",
"oracle_mode": "unicorn",
"instruction_bytes": [
102,
15,
111,
193
],
"initial": {
"registers": {
"XMM0": "0x00112233445566778899aabbccddeeff",
"XMM1": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"oracle_observations": {
"unicorn": {
"registers": {
"XMM0": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
}
}
},
{
"name": "pxor_xmm0_xmm1_basic",
"handler": "pxor",
"oracle_mode": "unicorn",
"instruction_bytes": [
102,
15,
239,
193
],
"initial": {
"registers": {
"XMM0": "0x00112233445566778899aabbccddeeff",
"XMM1": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": "0xffffffffffffffffffffffffffffffff"
},
"flags": {}
},
"oracle_observations": {
"unicorn": {
"registers": {
"XMM0": "0xffffffffffffffffffffffffffffffff"
},
"flags": {}
}
}
},
{
"name": "pand_xmm0_xmm1_basic",
"handler": "pand",
"oracle_mode": "unicorn",
"instruction_bytes": [
102,
15,
219,
193
],
"initial": {
"registers": {
"XMM0": "0xf0f0f0f0f0f0f0f00f0f0f0f0f0f0f0f",
"XMM1": "0x00ff00ff00ff00ffff00ff00ff00ff00"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": "0x00f000f000f000f00f000f000f000f00"
},
"flags": {}
},
"oracle_observations": {
"unicorn": {
"registers": {
"XMM0": "0x00f000f000f000f00f000f000f000f00"
},
"flags": {}
}
}
},
{
"name": "por_xmm0_xmm1_basic",
"handler": "por",
"oracle_mode": "unicorn",
"instruction_bytes": [
102,
15,
235,
193
],
"initial": {
"registers": {
"XMM0": "0xf0f0f0f0f0f0f0f00f0f0f0f0f0f0f0f",
"XMM1": "0x00ff00ff00ff00ffff00ff00ff00ff00"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": "0xf0fff0fff0fff0ffff0fff0fff0fff0f"
},
"flags": {}
},
"oracle_observations": {
"unicorn": {
"registers": {
"XMM0": "0xf0fff0fff0fff0ffff0fff0fff0fff0f"
},
"flags": {}
}
}
},
{
"name": "smoke_adc_adc",
"handler": "adc",
+9 -18
View File
@@ -1,22 +1,6 @@
@echo off
setlocal
set "VSWHERE=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe"
if not exist "%VSWHERE%" (
echo ERROR: vswhere.exe not found at "%VSWHERE%"
exit /b 1
)
set "VSROOT="
for /f "usebackq delims=" %%I in (`"%VSWHERE%" -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath`) do set "VSROOT=%%I"
if not defined VSROOT (
echo ERROR: Visual Studio installation with VC tools not found
exit /b 1
)
call "%VSROOT%\Common7\Tools\VsDevCmd.bat" -arch=x64 -host_arch=x64
if errorlevel 1 exit /b 1
set "CMAKE_BIN="
for /f "usebackq delims=" %%I in (`where cmake 2^>nul`) do (
set "CMAKE_BIN=%%I"
@@ -30,7 +14,14 @@ if not defined CMAKE_BIN (
exit /b 1
)
for %%I in ("%~dp0..\..") do set "REPO_ROOT=%%~fI"
for %%I in ("%~dp0..\.." ) do set "REPO_ROOT=%%~fI"
"%CMAKE_BIN%" --build "%REPO_ROOT%\build_iced" --config Release --parallel 12
if not exist "%REPO_ROOT%\build_iced\CMakeCache.txt" (
echo ERROR: build_iced not configured. Run scripts\dev\configure_iced.cmd first.
exit /b 1
)
set "BUILD_JOBS=%MERGEN_BUILD_JOBS%"
if not defined BUILD_JOBS set "BUILD_JOBS=4"
"%CMAKE_BIN%" --build "%REPO_ROOT%\build_iced" --config Release --parallel %BUILD_JOBS%
exit /b %errorlevel%
+4 -18
View File
@@ -1,22 +1,6 @@
@echo off
setlocal
set "VSWHERE=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe"
if not exist "%VSWHERE%" (
echo ERROR: vswhere.exe not found at "%VSWHERE%"
exit /b 1
)
set "VSROOT="
for /f "usebackq delims=" %%I in (`"%VSWHERE%" -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath`) do set "VSROOT=%%I"
if not defined VSROOT (
echo ERROR: Visual Studio installation with VC tools not found
exit /b 1
)
call "%VSROOT%\Common7\Tools\VsDevCmd.bat" -arch=x64 -host_arch=x64
if errorlevel 1 exit /b 1
set "CMAKE_BIN="
for /f "usebackq delims=" %%I in (`where cmake 2^>nul`) do (
set "CMAKE_BIN=%%I"
@@ -30,7 +14,7 @@ if not defined CMAKE_BIN (
exit /b 1
)
for %%I in ("%~dp0..\..") do set "REPO_ROOT=%%~fI"
for %%I in ("%~dp0..\.." ) do set "REPO_ROOT=%%~fI"
set "BUILD_DIR=%REPO_ROOT%\build_zydis"
if not exist "%BUILD_DIR%\CMakeCache.txt" (
@@ -45,5 +29,7 @@ if errorlevel 1 (
exit /b 1
)
"%CMAKE_BIN%" --build "%BUILD_DIR%" --config Release --parallel 12
set "BUILD_JOBS=%MERGEN_BUILD_JOBS%"
if not defined BUILD_JOBS set "BUILD_JOBS=4"
"%CMAKE_BIN%" --build "%BUILD_DIR%" --config Release --parallel %BUILD_JOBS%
exit /b %errorlevel%
+9 -16
View File
@@ -1,24 +1,13 @@
@echo off
setlocal
set "VSWHERE=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe"
if not exist "%VSWHERE%" (
echo ERROR: vswhere.exe not found at "%VSWHERE%"
exit /b 1
)
set "VSROOT="
for /f "usebackq delims=" %%I in (`"%VSWHERE%" -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath`) do set "VSROOT=%%I"
if not defined VSROOT (
echo ERROR: Visual Studio installation with VC tools not found
exit /b 1
)
call "%VSROOT%\Common7\Tools\VsDevCmd.bat" -arch=x64 -host_arch=x64
if errorlevel 1 exit /b 1
rem --- clang-cl auto-detects MSVC headers/libs; no VsDevCmd needed ---
:resolve_cargo
set "CARGO_BIN=%USERPROFILE%\.cargo\bin"
if exist "%CARGO_BIN%\cargo.exe" set "PATH=%CARGO_BIN%;%PATH%"
:resolve_cmake
set "CMAKE_BIN="
for /f "usebackq delims=" %%I in (`where cmake 2^>nul`) do (
set "CMAKE_BIN=%%I"
@@ -32,6 +21,7 @@ if not defined CMAKE_BIN (
exit /b 1
)
:resolve_llvm
set "LLVM_CMAKE_DIR=%LLVM_DIR%"
if not defined LLVM_CMAKE_DIR (
if exist "%~dp0..\..\..\llvm18-install\lib\cmake\llvm\LLVMConfig.cmake" set "LLVM_CMAKE_DIR=%~dp0..\..\..\llvm18-install\lib\cmake\llvm"
@@ -41,11 +31,14 @@ if not defined LLVM_CMAKE_DIR (
exit /b 1
)
for %%I in ("%~dp0..\..") do set "REPO_ROOT=%%~fI"
:resolve_compiler
for %%I in ("%~dp0..\.." ) do set "REPO_ROOT=%%~fI"
set "MERGEN_C_COMPILER=%CMAKE_C_COMPILER%"
if not defined MERGEN_C_COMPILER set "MERGEN_C_COMPILER=clang-cl"
set "MERGEN_CXX_COMPILER=%CMAKE_CXX_COMPILER%"
if not defined MERGEN_CXX_COMPILER set "MERGEN_CXX_COMPILER=%MERGEN_C_COMPILER%"
:configure
"%CMAKE_BIN%" -G Ninja -S "%REPO_ROOT%" -B "%REPO_ROOT%\build_iced" -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR="%LLVM_CMAKE_DIR%" -DCMAKE_C_COMPILER="%MERGEN_C_COMPILER%" -DCMAKE_CXX_COMPILER="%MERGEN_CXX_COMPILER%" -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
exit /b %errorlevel%
+6 -16
View File
@@ -1,22 +1,9 @@
@echo off
setlocal
set "VSWHERE=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe"
if not exist "%VSWHERE%" (
echo ERROR: vswhere.exe not found at "%VSWHERE%"
exit /b 1
)
set "VSROOT="
for /f "usebackq delims=" %%I in (`"%VSWHERE%" -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath`) do set "VSROOT=%%I"
if not defined VSROOT (
echo ERROR: Visual Studio installation with VC tools not found
exit /b 1
)
call "%VSROOT%\Common7\Tools\VsDevCmd.bat" -arch=x64 -host_arch=x64
if errorlevel 1 exit /b 1
rem --- clang-cl auto-detects MSVC headers/libs; no VsDevCmd needed ---
:resolve_cmake
set "CMAKE_BIN="
for /f "usebackq delims=" %%I in (`where cmake 2^>nul`) do (
set "CMAKE_BIN=%%I"
@@ -30,6 +17,7 @@ if not defined CMAKE_BIN (
exit /b 1
)
:resolve_llvm
set "LLVM_CMAKE_DIR=%LLVM_DIR%"
if not defined LLVM_CMAKE_DIR (
if exist "%~dp0..\..\..\llvm18-install\lib\cmake\llvm\LLVMConfig.cmake" set "LLVM_CMAKE_DIR=%~dp0..\..\..\llvm18-install\lib\cmake\llvm"
@@ -39,7 +27,8 @@ if not defined LLVM_CMAKE_DIR (
exit /b 1
)
for %%I in ("%~dp0..\..") do set "REPO_ROOT=%%~fI"
:resolve_compiler
for %%I in ("%~dp0..\.." ) do set "REPO_ROOT=%%~fI"
set "BUILD_DIR=%REPO_ROOT%\build_zydis"
set "MERGEN_C_COMPILER=%CMAKE_C_COMPILER%"
@@ -47,6 +36,7 @@ if not defined MERGEN_C_COMPILER set "MERGEN_C_COMPILER=clang-cl"
set "MERGEN_CXX_COMPILER=%CMAKE_CXX_COMPILER%"
if not defined MERGEN_CXX_COMPILER set "MERGEN_CXX_COMPILER=%MERGEN_C_COMPILER%"
:configure
if exist "%BUILD_DIR%\CMakeCache.txt" (
echo INFO: Reconfiguring existing build_zydis cache for Zydis-only lane
echo INFO: Clearing backend-selection cache keys to prevent stale backend state
+3 -3
View File
@@ -9,19 +9,19 @@ from typing import Any
_SEVERITY_ORDER = {"P0": 0, "P1": 1, "P2": 2, "P3": 3}
_STATUS_ICON = {"PASS": "\u2705", "FAIL": "\u274c", "SKIP": "\u23ed\ufe0f"}
_STATUS_ICON = {"PASS": "\u2705", "FAIL": "\u274c", "SKIP": "\u23ed\ufe0f", "BLOCKED": "\u26d4"}
def _verdict(payload: dict[str, Any]) -> str:
"""Determine review verdict from invariant results and verification runs.
FAIL in either → request_changes. Otherwise → approve.
FAIL or BLOCKED in either → request_changes. Otherwise → approve.
"""
for result in payload.get("invariant_results", []):
if str(result.get("status", "")).upper() == "FAIL":
return "request_changes"
for run in payload.get("verification_runs", []):
if str(run.get("status", "")).upper() == "FAIL":
if str(run.get("status", "")).upper() in {"FAIL", "BLOCKED"}:
return "request_changes"
return "approve"
+1 -1
View File
@@ -482,7 +482,7 @@ def main() -> None:
args = _parse_args()
repo_root = args.repo_root.resolve()
if args.paths:
if args.paths is not None:
changed_paths = [normalize_path(path) for path in args.paths]
else:
changed_paths = load_changed_paths(repo_root, args.base, args.head)
+1 -1
View File
@@ -138,7 +138,7 @@ def main() -> None:
args = _parse_args()
repo_root = args.repo_root.resolve()
if args.paths:
if args.paths is not None:
changed_paths = [normalize_path(path) for path in args.paths]
else:
changed_paths = load_changed_paths(repo_root, args.base, args.head)
+1 -1
View File
@@ -105,7 +105,7 @@ def _parse_args() -> argparse.Namespace:
def main() -> None:
args = _parse_args()
repo_root = args.repo_root.resolve()
if args.paths:
if args.paths is not None:
changed_paths = [normalize_path(path) for path in args.paths]
else:
changed_paths = load_changed_paths(repo_root, args.base, args.head)
+95 -37
View File
@@ -1,75 +1,133 @@
@echo off
setlocal
:resolve_workdir
if "%~1"=="" (
set "WORKDIR=%~dp0..\..\..\rewrite-regression-work"
) else (
) else (
set "WORKDIR=%~1"
)
)
for %%I in ("%WORKDIR%") do set "WORKDIR=%%~fI"
:ensure_directories
if not exist "%WORKDIR%" mkdir "%WORKDIR%"
if not exist "%WORKDIR%\ir_outputs" mkdir "%WORKDIR%\ir_outputs"
set "VSWHERE=%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe"
if not exist "%VSWHERE%" (
echo ERROR: vswhere.exe not found at "%VSWHERE%"
exit /b 1
)
set "VSROOT="
for /f "usebackq delims=" %%I in (`"%VSWHERE%" -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath`) do set "VSROOT=%%I"
if not defined VSROOT (
echo ERROR: Visual Studio installation with VC tools not found
exit /b 1
)
call "%VSROOT%\Common7\Tools\VsDevCmd.bat" -arch=x64 -host_arch=x64
if errorlevel 1 exit /b 1
:resolve_nasm
set "NASM_BIN="
if defined NASM_EXE (
set "NASM_BIN=%NASM_EXE%"
) else (
) else (
for /f "usebackq delims=" %%I in (`where nasm 2^>nul`) do (
set "NASM_BIN=%%I"
goto found_nasm
)
)
)
if exist "%~dp0..\..\..\nasm-portable\nasm-3.01\nasm.exe" set "NASM_BIN=%~dp0..\..\..\nasm-portable\nasm-3.01\nasm.exe"
:found_nasm
if not defined NASM_BIN (
echo ERROR: NASM not found. Install NASM or set NASM_EXE environment variable.
exit /b 1
)
)
:resolve_clang
set "CLANG_CL_BIN="
if defined CLANG_CL_EXE (
set "CLANG_CL_BIN=%CLANG_CL_EXE%"
) else (
for /f "usebackq delims=" %%I in (`where clang-cl 2^>nul`) do (
set "CLANG_CL_BIN=%%I"
goto found_clang
)
)
if exist "%~dp0..\..\..\llvm18-install\bin\clang-cl.exe" set "CLANG_CL_BIN=%~dp0..\..\..\llvm18-install\bin\clang-cl.exe"
if not defined CLANG_CL_BIN if exist "C:\Program Files\LLVM\bin\clang-cl.exe" set "CLANG_CL_BIN=C:\Program Files\LLVM\bin\clang-cl.exe"
:found_clang
if not defined CLANG_CL_BIN (
echo ERROR: clang-cl not found. Install LLVM or set CLANG_CL_EXE.
exit /b 1
)
:build_asm_samples
for %%F in ("%~dp0..\..\testcases\rewrite_smoke\*.asm") do (
"%NASM_BIN%" -f win64 -gcv8 -o "%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
call :should_skip_build "%%~fF" "%WORKDIR%\%%~nF.obj" "%WORKDIR%\%%~nF.exe" "%WORKDIR%\%%~nF.map"
if not errorlevel 1 (
echo SKIP ASM up-to-date: %%~nxF
) else (
"%NASM_BIN%" -f win64 -gcv8 -o "%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
link.exe /nologo /entry:start /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map" "%WORKDIR%\%%~nF.obj" kernel32.lib
if errorlevel 1 exit /b 1
)
"%CLANG_CL_BIN%" /nologo "%WORKDIR%\%%~nF.obj" kernel32.lib /link /entry:start /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map"
if errorlevel 1 exit /b 1
)
)
:build_c_samples_od
rem --- Compile C test programs (real binaries with CRT) ---
for %%F in ("%~dp0..\..\testcases\rewrite_smoke\*.c") do (
cl.exe /nologo /Od /GS- /c /Fo"%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
echo %%~nF | findstr /I "_jumptable" >nul
if not errorlevel 1 (
echo SKIP C /Od pass for jumptable sample: %%~nxF
) else (
call :should_skip_build "%%~fF" "%WORKDIR%\%%~nF.obj" "%WORKDIR%\%%~nF.exe" "%WORKDIR%\%%~nF.map"
if not errorlevel 1 (
echo SKIP C up-to-date: %%~nxF
) else (
"%CLANG_CL_BIN%" /nologo /Od /GS- /c /Fo"%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
link.exe /nologo /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map" "%WORKDIR%\%%~nF.obj"
if errorlevel 1 exit /b 1
)
"%CLANG_CL_BIN%" /nologo "%WORKDIR%\%%~nF.obj" /link /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map"
if errorlevel 1 exit /b 1
)
)
)
:build_c_samples_o2
rem --- Compile jump-table C tests with /O2 (need optimizer for real jmp tables) ---
for %%F in ("%~dp0..\..\testcases\rewrite_smoke\*_jumptable*.c") do (
call :should_skip_build "%%~fF" "%WORKDIR%\%%~nF.obj" "%WORKDIR%\%%~nF.exe" "%WORKDIR%\%%~nF.map"
if not errorlevel 1 (
echo SKIP C /O2 up-to-date: %%~nxF
) else (
"%CLANG_CL_BIN%" /nologo /O2 /GS- /c /Fo"%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
"%CLANG_CL_BIN%" /nologo "%WORKDIR%\%%~nF.obj" /link /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map"
if errorlevel 1 exit /b 1
)
)
:build_cpp_samples
rem --- Compile C++ test programs (real binaries with CRT + STL) ---
for %%F in ("%~dp0..\..\testcases\rewrite_smoke\*.cpp") do (
cl.exe /nologo /Od /GS- /EHsc /c /Fo"%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
call :should_skip_build "%%~fF" "%WORKDIR%\%%~nF.obj" "%WORKDIR%\%%~nF.exe" "%WORKDIR%\%%~nF.map"
if not errorlevel 1 (
echo SKIP C++ up-to-date: %%~nxF
) else (
"%CLANG_CL_BIN%" /nologo /Od /GS- /EHsc /c /Fo"%WORKDIR%\%%~nF.obj" "%%~fF"
if errorlevel 1 exit /b 1
link.exe /nologo /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map" "%WORKDIR%\%%~nF.obj"
if errorlevel 1 exit /b 1
)
"%CLANG_CL_BIN%" /nologo "%WORKDIR%\%%~nF.obj" /link /subsystem:console /out:"%WORKDIR%\%%~nF.exe" /map:"%WORKDIR%\%%~nF.map"
if errorlevel 1 exit /b 1
)
)
:done
echo Built rewrite regression samples in "%WORKDIR%"
exit /b 0
:should_skip_build
set "SRC=%~1"
set "OBJ=%~2"
set "EXE=%~3"
set "MAP=%~4"
powershell -NoProfile -ExecutionPolicy Bypass -Command ^
"$ErrorActionPreference='Stop';" ^
"$src=Get-Item -LiteralPath '%SRC%';" ^
"$outs=@('%OBJ%','%EXE%','%MAP%');" ^
"if(($outs | Where-Object { -not (Test-Path -LiteralPath $_) }).Count -gt 0){ exit 1 };" ^
"$latest=($outs | ForEach-Object { (Get-Item -LiteralPath $_).LastWriteTimeUtc } | Sort-Object -Descending | Select-Object -First 1);" ^
"if($latest -ge $src.LastWriteTimeUtc){ exit 0 } else { exit 1 }"
exit /b %errorlevel%
+228 -16
View File
@@ -8,37 +8,62 @@
{ "line_all": ["mul i32", ", 3"] },
{ "line_all": ["add i32", ", 100"] },
"xor i32"
],
"semantic": [
{ "inputs": { "RCX": 0 }, "expected": 87, "label": "le path: (0+100)^0x33=87" },
{ "inputs": { "RCX": 3 }, "expected": 84, "label": "le path: (3+100)^0x33=84" },
{ "inputs": { "RCX": 5 }, "expected": 90, "label": "le boundary: (5+100)^0x33=90" },
{ "inputs": { "RCX": 6 }, "expected": 33, "label": "gt path: (6*3)^0x33=33" },
{ "inputs": { "RCX": 10 }, "expected": 45, "label": "gt path: (10*3)^0x33=45" }
]
},
{
"name": "stack",
"symbol": "stack_target",
"patterns": ["ret i64 1717986918"]
"patterns": ["ret i64 1717986918"],
"semantic": [
{ "expected": 1717986918, "label": "constant: 0x66666666" }
]
},
{
"name": "indirect",
"symbol": "jump_target",
"patterns": ["ret i64 53"]
"patterns": ["ret i64 53"],
"semantic": [
{ "expected": 53, "label": "constant: hardcoded case2 0x30+5" }
]
},
{
"name": "instr_add",
"symbol": "instr_add_target",
"patterns": ["ret i64 12"]
"patterns": ["ret i64 12"],
"semantic": [
{ "expected": 12, "label": "constant: 7+5" }
]
},
{
"name": "instr_sub",
"symbol": "instr_sub_target",
"patterns": ["ret i64 42"]
"patterns": ["ret i64 42"],
"semantic": [
{ "expected": 42, "label": "constant: 100-58" }
]
},
{
"name": "instr_xor",
"symbol": "instr_xor_target",
"patterns": ["ret i64 90"]
"patterns": ["ret i64 90"],
"semantic": [
{ "expected": 90, "label": "constant: 0x55^0x0F=0x5A=90" }
]
},
{
"name": "instr_rol",
"symbol": "instr_rol_target",
"patterns": ["ret i64 34"]
"patterns": ["ret i64 34"],
"semantic": [
{ "expected": 34, "label": "constant: rol(0x11,1)=0x22=34" }
]
},
{
"name": "nested_branch",
@@ -49,22 +74,45 @@
"select i1",
"i64 200, i64 300",
"i64 100"
],
"semantic": [
{ "inputs": { "RCX": 0 }, "expected": 100, "label": "<=10" },
{ "inputs": { "RCX": 5 }, "expected": 100, "label": "<=10 interior" },
{ "inputs": { "RCX": 10 }, "expected": 100, "label": "<=10 boundary" },
{ "inputs": { "RCX": 11 }, "expected": 200, "label": "11..20" },
{ "inputs": { "RCX": 15 }, "expected": 200, "label": "11..20 interior" },
{ "inputs": { "RCX": 20 }, "expected": 200, "label": "<=20 boundary" },
{ "inputs": { "RCX": 21 }, "expected": 300, "label": ">20" },
{ "inputs": { "RCX": 100 }, "expected": 300, "label": ">20 far" }
]
},
{
"name": "loop_simple",
"symbol": "loop_simple_target",
"patterns": ["ret i64 6"]
"patterns": ["ret i64 6"],
"semantic": [
{ "expected": 6, "label": "constant: 3+2+1" }
]
},
{
"name": "bitchain",
"symbol": "bitchain_target",
"patterns": ["ret i64 4090"]
"patterns": ["ret i64 4090"],
"semantic": [
{ "expected": 4090, "label": "constant: 0x0FFA" }
]
},
{
"name": "multi_arg",
"symbol": "multi_arg_target",
"patterns": ["trunc i64 %RCX to i32", "trunc i64 %RDX to i32", "add i32", "mul i32", "zext i32", "i128 %XMM0", "i128 %XMM15"]
"patterns": ["trunc i64 %RCX to i32", "trunc i64 %RDX to i32", "add i32", "mul i32", "zext i32", "i128 %XMM0", "i128 %XMM15"],
"semantic": [
{ "inputs": { "RCX": 5, "RDX": 3 }, "expected": 56, "label": "(5+3)*7" },
{ "inputs": { "RCX": 0, "RDX": 0 }, "expected": 0, "label": "(0+0)*7" },
{ "inputs": { "RCX": 10, "RDX": 4 }, "expected": 98, "label": "(10+4)*7" },
{ "inputs": { "RCX": 1, "RDX": 1 }, "expected": 14, "label": "(1+1)*7" },
{ "inputs": { "RCX": 100, "RDX": 0 }, "expected": 700, "label": "(100+0)*7" }
]
},
{
"name": "diamond",
@@ -74,6 +122,16 @@
"icmp eq i32",
"select i1",
{ "line_all": ["mul i32", ", 3"] }
],
"semantic": [
{ "inputs": { "RCX": 7 }, "expected": 51, "label": "odd: (7+10)*3" },
{ "inputs": { "RCX": 1 }, "expected": 33, "label": "odd: (1+10)*3" },
{ "inputs": { "RCX": 3 }, "expected": 39, "label": "odd: (3+10)*3" },
{ "inputs": { "RCX": 11 }, "expected": 63, "label": "odd: (11+10)*3" },
{ "inputs": { "RCX": 6 }, "expected": 3, "label": "even: (6-5)*3" },
{ "inputs": { "RCX": 8 }, "expected": 9, "label": "even: (8-5)*3" },
{ "inputs": { "RCX": 10 }, "expected": 15, "label": "even: (10-5)*3" },
{ "inputs": { "RCX": 100 }, "expected": 285, "label": "even: (100-5)*3" }
]
},
{
@@ -83,42 +141,104 @@
{ "line_all": ["icmp sgt i32", ", 10"] },
"select i1",
"i64 250, i64 150"
],
"semantic": [
{ "inputs": { "RCX": 0 }, "expected": 150, "label": "<=10: 100+50" },
{ "inputs": { "RCX": 10 }, "expected": 150, "label": "==10: not >10" },
{ "inputs": { "RCX": 11 }, "expected": 250, "label": ">10: 200+50" },
{ "inputs": { "RCX": 15 }, "expected": 250, "label": ">10 interior" },
{ "inputs": { "RCX": 100 }, "expected": 250, "label": ">10 far" }
]
},
{
"name": "calc_grade",
"symbol": "calc_grade",
"patterns": ["icmp slt i32 %0, 90", "icmp slt i32 %0, 80", "icmp slt i32 %0, 70", "icmp sgt i32 %0, 59", "phi i64", "ret i64 %common.ret.op"]
"patterns": ["icmp slt i32 %0, 90", "icmp slt i32 %0, 80", "icmp slt i32 %0, 70", "icmp sgt i32 %0, 59", "phi i64", "ret i64 %common.ret.op"],
"semantic": [
{ "inputs": { "RCX": 95 }, "expected": 4, "label": ">=90" },
{ "inputs": { "RCX": 90 }, "expected": 4, "label": "==90 boundary" },
{ "inputs": { "RCX": 89 }, "expected": 3, "label": "80..89" },
{ "inputs": { "RCX": 80 }, "expected": 3, "label": "==80 boundary" },
{ "inputs": { "RCX": 79 }, "expected": 2, "label": "70..79" },
{ "inputs": { "RCX": 70 }, "expected": 2, "label": "==70 boundary" },
{ "inputs": { "RCX": 69 }, "expected": 1, "label": "60..69" },
{ "inputs": { "RCX": 60 }, "expected": 1, "label": "==60 boundary" },
{ "inputs": { "RCX": 59 }, "expected": 0, "label": "<60" },
{ "inputs": { "RCX": 0 }, "expected": 0, "label": "<60 zero" },
{ "inputs": { "RCX": 100 }, "expected": 4, "label": ">=90 well above" }
]
},
{
"name": "calc_mixed",
"symbol": "calc_mixed",
"patterns": ["icmp slt i32 %0, 101", "select i1", "mul i32", "ret i64"]
"patterns": ["icmp slt i32 %0, 101", "select i1", "mul i32", "ret i64"],
"semantic": [
{ "inputs": { "RCX": 150 }, "expected": 576, "label": "x>100: (42+150)*3=576" },
{ "inputs": { "RCX": 101 }, "expected": 429, "label": "x>100: (42+101)*3=429" },
{ "inputs": { "RCX": 0 }, "expected": 126, "label": "x<=100: (42-0)*3=126" },
{ "inputs": { "RCX": 1 }, "expected": 123, "label": "x<=100: (42-1)*3=123" },
{ "inputs": { "RCX": 42 }, "expected": 0, "label": "x<=100: (42-42)*3=0" },
{ "inputs": { "RCX": 50 }, "expected": 4294967272, "label": "x<=100: uint32 wrap, zext" },
{ "inputs": { "RCX": 100 }, "expected": 4294967122, "label": "x<=100: uint32 wrap, zext" }
]
},
{
"name": "calc_fib",
"symbol": "calc_fib",
"patterns": ["ret i64 13"]
"patterns": ["ret i64 13"],
"semantic": [
{ "expected": 13, "label": "constant: fib(7)" }
]
},
{
"name": "calc_sum_array",
"symbol": "calc_sum_array",
"patterns": ["ret i64 150"]
"patterns": ["ret i64 150"],
"semantic": [
{ "expected": 150, "label": "constant: 10+20+30+40+50" }
]
},
{
"name": "switch_3way",
"symbol": "switch_3way_target",
"patterns": ["switch i32 %", "i32 1, label", "i32 2, label", "i32 3, label", "phi i64", "[ 100,", "[ 200,", "[ 300,", "[ 999,"]
"patterns": ["switch i32 %", "i32 1, label", "i32 2, label", "i32 3, label", "phi i64", "[ 100,", "[ 200,", "[ 300,", "[ 999,"],
"semantic": [
{ "inputs": { "RCX": 1 }, "expected": 100, "label": "case 1" },
{ "inputs": { "RCX": 2 }, "expected": 200, "label": "case 2" },
{ "inputs": { "RCX": 3 }, "expected": 300, "label": "case 3" },
{ "inputs": { "RCX": 0 }, "expected": 999, "label": "default (0)" },
{ "inputs": { "RCX": 4 }, "expected": 999, "label": "default (4)" },
{ "inputs": { "RCX": 100 }, "expected": 999, "label": "default (100)" }
]
},
{
"name": "calc_switch",
"symbol": "calc_switch",
"patterns": ["switch i32 %0", "i32 1, label", "i32 2, label", "i32 3, label", "i32 4, label", "i32 5, label", "phi i64"]
"patterns": ["switch i32 %0", "i32 1, label", "i32 2, label", "i32 3, label", "i32 4, label", "i32 5, label", "phi i64"],
"semantic": [
{ "inputs": { "RCX": 1 }, "expected": 6, "label": "Monday" },
{ "inputs": { "RCX": 2 }, "expected": 7, "label": "Tuesday" },
{ "inputs": { "RCX": 3 }, "expected": 9, "label": "Wednesday" },
{ "inputs": { "RCX": 4 }, "expected": 8, "label": "Thursday" },
{ "inputs": { "RCX": 5 }, "expected": 6, "label": "Friday" },
{ "inputs": { "RCX": 0 }, "expected": 0, "label": "default (0)" },
{ "inputs": { "RCX": 6 }, "expected": 0, "label": "default (6)" },
{ "inputs": { "RCX": 100 }, "expected": 0, "label": "default (100)" }
]
},
{
"name": "switch_sparse",
"symbol": "switch_sparse_target",
"patterns": ["switch i32 %", "i32 10, label", "i32 50, label", "i32 200, label", "i32 1000, label", "phi i64", "[ 11,", "[ 55,", "[ 222,", "[ 1337,", "[ 4294967295,"]
"patterns": ["switch i32 %", "i32 10, label", "i32 50, label", "i32 200, label", "i32 1000, label", "phi i64", "[ 11,", "[ 55,", "[ 222,", "[ 1337,", "[ 4294967295,"],
"semantic": [
{ "inputs": { "RCX": 10 }, "expected": 11, "label": "case 10" },
{ "inputs": { "RCX": 50 }, "expected": 55, "label": "case 50" },
{ "inputs": { "RCX": 200 }, "expected": 222, "label": "case 200" },
{ "inputs": { "RCX": 1000 }, "expected": 1337, "label": "case 1000" },
{ "inputs": { "RCX": 0 }, "expected": 4294967295, "label": "default: 0xFFFFFFFF" },
{ "inputs": { "RCX": 100 }, "expected": 4294967295, "label": "default" },
{ "inputs": { "RCX": 500 }, "expected": 4294967295, "label": "default" }
]
},
{
"name": "calc_cout",
@@ -126,6 +246,98 @@
"skip": true,
"skip_reason": "Statically-linked STL (cout) inlined by lifter; GEPTracker UNREACHABLE on complex library code. Blocked on inline policy improvements (Phase 2).",
"patterns": []
},
{
"name": "jumptable_basic",
"symbol": "jumptable_basic_target",
"patterns": [
"switch i32 %trunc7",
"i32 1073745941, label",
"i32 1073745948, label",
"i32 1073745955, label",
"phi i64",
"[ 10,",
"[ 20,",
"[ 30,",
"[ 40,",
"[ 999,"
],
"semantic": [
{ "inputs": { "RCX": 0 }, "expected": 10, "label": "case 0" },
{ "inputs": { "RCX": 1 }, "expected": 20, "label": "case 1" },
{ "inputs": { "RCX": 2 }, "expected": 30, "label": "case 2" },
{ "inputs": { "RCX": 3 }, "expected": 40, "label": "case 3" },
{ "inputs": { "RCX": 4 }, "expected": 999, "label": "default (>3)" },
{ "inputs": { "RCX": 100 }, "expected": 999, "label": "default far" }
]
},
{
"name": "jumptable_dense",
"symbol": "jumptable_dense_target",
"patterns": [
"switch i64 %lol-",
"i64 5368713237, label",
"i64 5368713244, label",
"i64 5368713251, label",
"i64 5368713258, label",
"i64 5368713265, label",
"i64 5368713272, label",
"i64 5368713279, label",
"phi i64",
"[ 100,",
"[ 200,",
"[ 300,",
"[ 400,",
"[ 500,",
"[ 600,",
"[ 700,",
"[ 800,",
"[ 0,"
],
"semantic": [
{ "inputs": { "RCX": 0 }, "expected": 100, "label": "case 0" },
{ "inputs": { "RCX": 1 }, "expected": 200, "label": "case 1" },
{ "inputs": { "RCX": 2 }, "expected": 300, "label": "case 2" },
{ "inputs": { "RCX": 3 }, "expected": 400, "label": "case 3" },
{ "inputs": { "RCX": 4 }, "expected": 500, "label": "case 4" },
{ "inputs": { "RCX": 5 }, "expected": 600, "label": "case 5" },
{ "inputs": { "RCX": 6 }, "expected": 700, "label": "case 6" },
{ "inputs": { "RCX": 7 }, "expected": 800, "label": "case 7" },
{ "inputs": { "RCX": 8 }, "expected": 0, "label": "default (>7)" },
{ "inputs": { "RCX": 100 }, "expected": 0, "label": "default far" }
]
},
{
"name": "calc_jumptable",
"symbol": "calc_jumptable",
"patterns": [
{ "line_all": ["icmp ult i32", ", 10"] },
{ "line_all": ["select i1", "i64 5368713307"] },
"[ 512,",
"i64 5368713265",
"i64 5368713271",
"i64 5368713277",
"i64 5368713283",
"i64 5368713289",
"i64 5368713295",
"i64 5368713301",
"i64 5368713307",
"phi i64"
],
"semantic": [
{ "inputs": { "RCX": -1 }, "expected": 4294967295, "label": "default (negative)" },
{ "inputs": { "RCX": 0 }, "expected": 1, "label": "2^0" },
{ "inputs": { "RCX": 1 }, "expected": 2, "label": "2^1" },
{ "inputs": { "RCX": 2 }, "expected": 4, "label": "2^2" },
{ "inputs": { "RCX": 3 }, "expected": 8, "label": "2^3" },
{ "inputs": { "RCX": 4 }, "expected": 16, "label": "2^4" },
{ "inputs": { "RCX": 5 }, "expected": 32, "label": "2^5" },
{ "inputs": { "RCX": 6 }, "expected": 64, "label": "2^6" },
{ "inputs": { "RCX": 7 }, "expected": 128, "label": "2^7" },
{ "inputs": { "RCX": 8 }, "expected": 256, "label": "2^8" },
{ "inputs": { "RCX": 9 }, "expected": 512, "label": "2^9" },
{ "inputs": { "RCX": 10 }, "expected": 4294967295, "label": "default (above range)" }
]
}
]
}
@@ -165,6 +165,98 @@
}
}
},
{
"name": "movdqa_xmm0_xmm1_basic",
"handler": "movdqa",
"instruction_bytes": [
102,
15,
111,
193
],
"initial": {
"registers": {
"XMM0": "0x00112233445566778899aabbccddeeff",
"XMM1": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "pxor_xmm0_xmm1_basic",
"handler": "pxor",
"instruction_bytes": [
102,
15,
239,
193
],
"initial": {
"registers": {
"XMM0": "0x00112233445566778899aabbccddeeff",
"XMM1": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "pand_xmm0_xmm1_basic",
"handler": "pand",
"instruction_bytes": [
102,
15,
219,
193
],
"initial": {
"registers": {
"XMM0": "0xf0f0f0f0f0f0f0f00f0f0f0f0f0f0f0f",
"XMM1": "0x00ff00ff00ff00ffff00ff00ff00ff00"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "por_xmm0_xmm1_basic",
"handler": "por",
"instruction_bytes": [
102,
15,
235,
193
],
"initial": {
"registers": {
"XMM0": "0xf0f0f0f0f0f0f0f00f0f0f0f0f0f0f0f",
"XMM1": "0x00ff00ff00ff00ffff00ff00ff00ff00"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "smoke_adc_adc",
"handler": "adc",
@@ -165,6 +165,98 @@
}
}
},
{
"name": "movdqa_xmm0_xmm1_basic",
"handler": "movdqa",
"instruction_bytes": [
102,
15,
111,
193
],
"initial": {
"registers": {
"XMM0": "0x00112233445566778899aabbccddeeff",
"XMM1": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "pxor_xmm0_xmm1_basic",
"handler": "pxor",
"instruction_bytes": [
102,
15,
239,
193
],
"initial": {
"registers": {
"XMM0": "0x00112233445566778899aabbccddeeff",
"XMM1": "0xffeeddccbbaa99887766554433221100"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "pand_xmm0_xmm1_basic",
"handler": "pand",
"instruction_bytes": [
102,
15,
219,
193
],
"initial": {
"registers": {
"XMM0": "0xf0f0f0f0f0f0f0f00f0f0f0f0f0f0f0f",
"XMM1": "0x00ff00ff00ff00ffff00ff00ff00ff00"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "por_xmm0_xmm1_basic",
"handler": "por",
"instruction_bytes": [
102,
15,
235,
193
],
"initial": {
"registers": {
"XMM0": "0xf0f0f0f0f0f0f0f00f0f0f0f0f0f0f0f",
"XMM1": "0x00ff00ff00ff00ffff00ff00ff00ff00"
},
"flags": {}
},
"expected": {
"registers": {
"XMM0": null
},
"flags": {}
}
},
{
"name": "smoke_adc_adc",
"handler": "adc",
-11
View File
@@ -1,17 +1,6 @@
@echo off
setlocal
call "%~dp0..\dev\build_iced.cmd"
if errorlevel 1 exit /b 1
set "CMAKE_EXE=%ProgramFiles%\CMake\bin\cmake.exe"
if not exist "%CMAKE_EXE%" (
echo ERROR: CMake executable not found at "%CMAKE_EXE%"
exit /b 1
)
"%CMAKE_EXE%" --build "%~dp0..\..\build_iced" --target rewrite_microtests
if errorlevel 1 exit /b 1
set "FULL_SEED=%~dp0oracle_seed_full_handlers.json"
set "ENRICHED_SEED=%~dp0oracle_seed_full_handlers_enriched.json"
+63 -16
View File
@@ -1,38 +1,85 @@
@echo off
setlocal
:setup
set "SCRIPT_DIR=%~dp0"
set "CHECK_FLAGS="
set "NO_BUILD="
set "FORCE_BUILD="
set "FORWARD_ARGS="
set "MICROTEST_EXE=%SCRIPT_DIR%..\..\build_iced\rewrite_microtests.exe"
:parse_args
if "%~1"=="" goto args_done
if /I "%~1"=="--check-flags" (
set "CHECK_FLAGS=1"
shift
goto parse_args
)
if /I "%~1"=="--no-build" (
set "NO_BUILD=1"
shift
goto parse_args
)
if /I "%~1"=="--build" (
set "FORCE_BUILD=1"
shift
goto parse_args
)
set "FORWARD_ARGS=%FORWARD_ARGS% %~1"
shift
goto parse_args
:args_done
if /I not "%NO_BUILD%"=="1" (
if /I "%FORCE_BUILD%"=="1" (
call :build_microtests
if errorlevel 1 exit /b 1
) else if not exist "%MICROTEST_EXE%" (
call :build_microtests
if errorlevel 1 exit /b 1
) else (
echo SKIP microtests build: existing executable "%MICROTEST_EXE%"
)
)
call "%~dp0..\dev\build_iced.cmd"
if errorlevel 1 exit /b 1
set "CMAKE_EXE=%ProgramFiles%\CMake\bin\cmake.exe"
if not exist "%CMAKE_EXE%" (
echo ERROR: CMake executable not found at "%CMAKE_EXE%"
exit /b 1
)
"%CMAKE_EXE%" --build "%~dp0..\..\build_iced" --target rewrite_microtests
if errorlevel 1 exit /b 1
:ensure_oracle
if /I not "%SKIP_ORACLE_GENERATION%"=="1" (
call "%~dp0generate_oracle_vectors.cmd"
call "%SCRIPT_DIR%generate_oracle_vectors.cmd"
if errorlevel 1 exit /b 1
)
set "MICROTEST_EXE=%~dp0..\..\build_iced\rewrite_microtests.exe"
:ensure_executable
if not exist "%MICROTEST_EXE%" (
echo ERROR: rewrite_microtests executable not found at "%MICROTEST_EXE%"
echo Run "%SCRIPT_DIR%run_microtests.cmd --build" or configure/build build_iced first.
exit /b 1
)
:run_tests
if /I "%CHECK_FLAGS%"=="1" (
set "MERGEN_TEST_CHECK_FLAGS=1"
echo Enabling strict oracle flag checks
)
)
"%MICROTEST_EXE%" %*
"%MICROTEST_EXE%"%FORWARD_ARGS%
exit /b %errorlevel%
:build_microtests
if not exist "%SCRIPT_DIR%..\..\build_iced\CMakeCache.txt" (
call "%SCRIPT_DIR%..\dev\configure_iced.cmd"
if errorlevel 1 exit /b 1
)
set "CMAKE_EXE="
for /f "usebackq delims=" %%I in (`where cmake 2^>nul`) do if not defined CMAKE_EXE set "CMAKE_EXE=%%I"
if not defined CMAKE_EXE if exist "C:\Program Files\CMake\bin\cmake.exe" set "CMAKE_EXE=C:\Program Files\CMake\bin\cmake.exe"
if not defined CMAKE_EXE (
echo ERROR: CMake executable not found in PATH
exit /b 1
)
set "BUILD_JOBS=%MERGEN_BUILD_JOBS%"
if not defined BUILD_JOBS set "BUILD_JOBS=4"
"%CMAKE_EXE%" --build "%SCRIPT_DIR%..\..\build_iced" --config Release --target rewrite_microtests --parallel %BUILD_JOBS%
exit /b %errorlevel%
+20
View File
@@ -17,6 +17,7 @@ FULL_VECTORS = ROOT / "lifter" / "test" / "test_vectors" / "oracle_vectors_full_
DEFAULT_VECTORS = ROOT / "lifter" / "test" / "test_vectors" / "oracle_vectors.json"
IR_OUTPUT_DIR = ROOT.parent / "rewrite-regression-work" / "ir_outputs"
GOLDEN_HASHES_FILE = ROOT / "lifter" / "test" / "test_vectors" / "golden_ir_hashes.json"
SEMANTIC_SCRIPT = REWRITE_DIR / "check_semantic.py"
def _run(argv: List[str], extra_env: Dict[str, str] | None = None) -> None:
@@ -186,6 +187,15 @@ def run_report(vectors_file: Path, as_json: bool) -> None:
_run([sys.executable, str(REWRITE_DIR / "report_coverage.py")] + args)
def run_semantic(filters: List[str] | None = None, input_ir: Path | None = None) -> None:
args = [sys.executable, str(SEMANTIC_SCRIPT), "--ir-dir", str(IR_OUTPUT_DIR)]
if filters:
args.extend(["--filter"] + filters)
if input_ir is not None:
args.extend(["--input-ir", str(input_ir)])
_run(args)
def run_negative_checks() -> None:
lifter_path = ROOT / "build_iced" / "lifter.exe"
if not lifter_path.exists():
@@ -347,6 +357,9 @@ def parse_args() -> argparse.Namespace:
report_cmd = sub.add_parser("report", help="print handler test coverage report")
report_cmd.add_argument("--json", action="store_true", help="output as JSON")
report_cmd.add_argument("--vectors", type=Path, default=None, help="explicit vectors file")
semantic = sub.add_parser("semantic", help="run runtime semantic regression for all samples")
semantic.add_argument("--input-ir", type=Path, default=None, help="override IR file (single sample)")
semantic.add_argument("filter", nargs="*", help="optional sample name filter tokens")
return parser.parse_args()
@@ -399,12 +412,18 @@ def main() -> None:
run_report(vectors_file, args.json)
return
if command == "semantic":
run_semantic(args.filter, args.input_ir)
return
if command == "flags":
run_flagstress(args.filter)
return
if command == "all":
run_baseline()
run_semantic()
run_full(check_flags=True)
if not args.no_coverage:
run_coverage(FULL_VECTORS)
@@ -412,6 +431,7 @@ def main() -> None:
if command == "quick":
run_baseline()
run_semantic()
run_micro([], check_flags=True, regenerate_oracle=False)
return