mirror of
https://github.com/NaC-L/Mergen.git
synced 2026-05-12 09:40:34 +00:00
lifter: fix Cyrillic homoglyph in resolveTargetedThemidaR9 identifier (#119)
The identifier 'resolveTargetedThemid\u0430R9' (declared in LifterClass_Concolic.hpp) contained U+0430 (Cyrillic small letter a) instead of U+0061 (Latin a) between 'Themid' and 'R9'. Every in-tree reference mirrored the Cyrillic form, but prose mentions and merge titles (e.g. PR #115 title) used ASCII, so an ASCII grep for 'resolveTargetedThemidaR9' returned zero hits. This was a silent discoverability hazard for future sessions and grep-based tooling. Rename to pure ASCII across the single declaration, the single caller in getLatestValueForKey, the six test entry points in lifter/test/Tester.hpp, and the four references in docs/LOOP_HANDLING.md. No behavior change. Verified: - python test.py micro: all instruction microtests passed (including the three targeted_themida_r9_override_* cases) - Themida reference sample (../testthemida/example2-virt.bin @ 0x140001000): 2544 instructions lifted, 0 warnings, 0 errors Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
This commit is contained in:
@@ -133,7 +133,7 @@ When the lifter is in loop mode (`currentBlockUsesGeneralizedLoopState() == true
|
||||
| `retrieve_generalized_loop_target_slot_value(addr, bytes)` | Phi of canonical/backedge values for a recognized target slot. |
|
||||
| `retrieve_generalized_loop_phi_address_value(load, bytes, orgLoad)` | Phi of loaded values when the load's address is a phi of two concrete addresses derived from canonical/backedge. |
|
||||
| `retrieve_generalized_loop_local_phi_address_value(load, bytes, orgLoad)` | Same as above for loop-local stack-buffer addresses. |
|
||||
| `resolveTargetedThemidаR9(value)` | At three hardcoded Themida instruction addresses, replaces R9 with `(canonicalControl + offset, backedgeControl + offset)` phi. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
|
||||
| `resolveTargetedThemidaR9(value)` | At three hardcoded Themida instruction addresses, replaces R9 with `(canonicalControl + offset, backedgeControl + offset)` phi. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
|
||||
|
||||
`computePossibleValues` (in `lifter/memory/GEPTracker.ipp`) also has a `PHINode` case that unions every incoming's value set, so callers downstream of these phis get the full possible-value enumeration instead of an empty fallback.
|
||||
|
||||
@@ -148,7 +148,7 @@ static constexpr std::array<uint64_t, 3> kSupportedGeneralizedControlFieldOffset
|
||||
0x6ULL, 0xAULL, 0xCULL};
|
||||
```
|
||||
|
||||
`resolveTargetedThemidаR9` adds three hardcoded `(instruction-address, control-offset)` pairs:
|
||||
`resolveTargetedThemidaR9` adds three hardcoded `(instruction-address, control-offset)` pairs:
|
||||
|
||||
| Instruction address | Control offset | Verified hit count on reference sample |
|
||||
|---|---|---|
|
||||
@@ -173,7 +173,7 @@ Loop handling has roughly thirty microtests in `lifter/test/Tester.hpp`. The mos
|
||||
| `pending_generalized_loop_*` | Same guards in the `pendingLoopGeneralizationAddresses` lifecycle. |
|
||||
| `generalized_loop_restore_*` | Backedge flag-state and register-state merging across `load_generalized_backup`. |
|
||||
| `generalized_loop_*_creates_phi` | Each `retrieve_generalized_loop_*` helper produces the expected phi shape (control slot, control slot displacement, target slot, control field load, local phi address). |
|
||||
| `targeted_themida_r9_override_produces_phi` | All three hardcoded `(address, offset)` pairs in `resolveTargetedThemidаR9`. |
|
||||
| `targeted_themida_r9_override_produces_phi` | All three hardcoded `(address, offset)` pairs in `resolveTargetedThemidaR9`. |
|
||||
| `compute_possible_values_*` | The PHI handler unions incomings (also covers cast-width preservation and rolled-arithmetic-chain enumeration). |
|
||||
|
||||
When changing loop handling, run at minimum:
|
||||
@@ -197,6 +197,6 @@ and inspect `output_diagnostics.json` for `lift_stats.instructions_lifted == 254
|
||||
|---|---|
|
||||
| `REP`/`REPE`/`REPNE`-prefixed `SCAS` | Rejected as `not_implemented`; needs a model for repeated-scan termination. |
|
||||
| `INT 2` continuation under VMP 3.6 | Naive architectural fallthrough is wrong; recovery requires modeling the dispatcher / exception-mediated control flow. See `VMP_TESTING_NOTES.md`. |
|
||||
| Hardcoded `(address, offset)` pairs in `resolveTargetedThemidаR9` | Only fire on the reference Themida sample. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
|
||||
| Hardcoded `(address, offset)` pairs in `resolveTargetedThemidaR9` | Only fire on the reference Themida sample. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
|
||||
| Loop unrolling / loop-invariant code motion | Not implemented. The lifter relies on LLVM's downstream optimization passes for this once the IR is in shape. |
|
||||
| Multi-way backedges (≥3 paths to the same header) | Not exercised by the current generalized-loop machinery; the canonical/backedge model assumes exactly two incoming paths. |
|
||||
|
||||
@@ -139,7 +139,7 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
llvm::Value* resolveTargetedThemidаR9(llvm::Value* value) {
|
||||
llvm::Value* resolveTargetedThemidaR9(llvm::Value* value) {
|
||||
auto* state = getMostRecentGeneralizedLoopState();
|
||||
if (this->liftProgressDiagEnabled && this->current_address >= 0x140023500ULL &&
|
||||
this->current_address <= 0x140023800ULL) {
|
||||
@@ -201,7 +201,7 @@ public:
|
||||
std::cout << "\n";
|
||||
}
|
||||
if (key == Register::R9) {
|
||||
return resolveTargetedThemidаR9(value);
|
||||
return resolveTargetedThemidaR9(value);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
@@ -2040,8 +2040,8 @@ bool runSolvePathResolvesGeneralizedPhiLoadTarget(std::string& details) {
|
||||
}
|
||||
|
||||
|
||||
bool runTargetedThemidаR9OverrideProducesPhi(std::string& details) {
|
||||
// resolveTargetedThemidаR9 hardcodes three instruction addresses where a
|
||||
bool runTargetedThemidaR9OverrideProducesPhi(std::string& details) {
|
||||
// resolveTargetedThemidaR9 hardcodes three instruction addresses where a
|
||||
// Themida cursor-derived R9 value must be rematerialized as a phi over
|
||||
// canonical/backedge control bases with a per-address offset. Verify
|
||||
// all three cases, not just one, so a regression that silently drops or
|
||||
@@ -2124,8 +2124,8 @@ bool runSolvePathResolvesGeneralizedPhiLoadTarget(std::string& details) {
|
||||
}
|
||||
|
||||
|
||||
bool runTargetedThemidаR9OverrideDoesNotFireAtAdjacentAddress(std::string& details) {
|
||||
// The switch in resolveTargetedThemidаR9 is exact-address. A regression
|
||||
bool runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress(std::string& details) {
|
||||
// The switch in resolveTargetedThemidaR9 is exact-address. A regression
|
||||
// that accidentally broadened it to a range (e.g. `addr >= 0x140023500 &&
|
||||
// addr <= 0x140023800`) would silently produce a phi at every R9 read in
|
||||
// that window, corrupting samples that rely on exact-match behavior.
|
||||
@@ -2171,7 +2171,7 @@ bool runSolvePathResolvesGeneralizedPhiLoadTarget(std::string& details) {
|
||||
return true;
|
||||
}
|
||||
|
||||
bool runTargetedThemidаR9OverrideFallsThroughWithoutLoopState(std::string& details) {
|
||||
bool runTargetedThemidaR9OverrideFallsThroughWithoutLoopState(std::string& details) {
|
||||
// Before any generalized-loop backup has been created,
|
||||
// getMostRecentGeneralizedLoopState() returns null and the override must
|
||||
// fall through to the ordinary R9 value instead of attempting to build a
|
||||
@@ -3077,7 +3077,7 @@ bool runComputePossibleValuesOnRolledArithmeticChain(std::string& details) {
|
||||
runCustom("generalized_loop_restore_merges_backedge_flag_state",
|
||||
&InstructionTester::runGeneralizedLoopRestoreMergesBackedgeFlagState);
|
||||
runCustom("targeted_themida_r9_override_produces_phi",
|
||||
&InstructionTester::runTargetedThemidаR9OverrideProducesPhi);
|
||||
&InstructionTester::runTargetedThemidaR9OverrideProducesPhi);
|
||||
runCustom("generalized_loop_restore_merges_backedge_register_state",
|
||||
&InstructionTester::runGeneralizedLoopRestoreMergesBackedgeRegisterState);
|
||||
runCustom("set_register_value_zero_extends_32bit_writes",
|
||||
@@ -3105,9 +3105,9 @@ bool runComputePossibleValuesOnRolledArithmeticChain(std::string& details) {
|
||||
runCustom("compute_possible_values_trunc_to_i1_preserves_width",
|
||||
&InstructionTester::runComputePossibleValuesTruncToI1PreservesWidth);
|
||||
runCustom("targeted_themida_r9_override_does_not_fire_at_adjacent_address",
|
||||
&InstructionTester::runTargetedThemidаR9OverrideDoesNotFireAtAdjacentAddress);
|
||||
&InstructionTester::runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress);
|
||||
runCustom("targeted_themida_r9_override_falls_through_without_loop_state",
|
||||
&InstructionTester::runTargetedThemidаR9OverrideFallsThroughWithoutLoopState);
|
||||
&InstructionTester::runTargetedThemidaR9OverrideFallsThroughWithoutLoopState);
|
||||
runCustom("generalized_loop_control_field_load_creates_phi",
|
||||
&InstructionTester::runGeneralizedLoopControlFieldLoadCreatesPhi);
|
||||
runCustom("solve_path_prefers_mapped_target_over_null_for_indirect_jump",
|
||||
|
||||
Reference in New Issue
Block a user