lifter: fix Cyrillic homoglyph in resolveTargetedThemidaR9 identifier (#119)

The identifier 'resolveTargetedThemid\u0430R9' (declared in LifterClass_Concolic.hpp)
contained U+0430 (Cyrillic small letter a) instead of U+0061 (Latin a)
between 'Themid' and 'R9'. Every in-tree reference mirrored the
Cyrillic form, but prose mentions and merge titles (e.g. PR #115 title)
used ASCII, so an ASCII grep for 'resolveTargetedThemidaR9' returned
zero hits. This was a silent discoverability hazard for future sessions
and grep-based tooling.

Rename to pure ASCII across the single declaration, the single
caller in getLatestValueForKey, the six test entry points in
lifter/test/Tester.hpp, and the four references in
docs/LOOP_HANDLING.md. No behavior change.

Verified:
  - python test.py micro: all instruction microtests passed
    (including the three targeted_themida_r9_override_* cases)
  - Themida reference sample (../testthemida/example2-virt.bin @
    0x140001000): 2544 instructions lifted, 0 warnings, 0 errors

Co-authored-by: yusufcanislek <yusuf.canislek@meetdandy.com>
This commit is contained in:
naci
2026-04-23 00:06:55 +03:00
committed by GitHub
parent 59a68af35f
commit 8d101dcc5a
3 changed files with 14 additions and 14 deletions
+4 -4
View File
@@ -133,7 +133,7 @@ When the lifter is in loop mode (`currentBlockUsesGeneralizedLoopState() == true
| `retrieve_generalized_loop_target_slot_value(addr, bytes)` | Phi of canonical/backedge values for a recognized target slot. |
| `retrieve_generalized_loop_phi_address_value(load, bytes, orgLoad)` | Phi of loaded values when the load's address is a phi of two concrete addresses derived from canonical/backedge. |
| `retrieve_generalized_loop_local_phi_address_value(load, bytes, orgLoad)` | Same as above for loop-local stack-buffer addresses. |
| `resolveTargetedThemidаR9(value)` | At three hardcoded Themida instruction addresses, replaces R9 with `(canonicalControl + offset, backedgeControl + offset)` phi. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
| `resolveTargetedThemidaR9(value)` | At three hardcoded Themida instruction addresses, replaces R9 with `(canonicalControl + offset, backedgeControl + offset)` phi. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
`computePossibleValues` (in `lifter/memory/GEPTracker.ipp`) also has a `PHINode` case that unions every incoming's value set, so callers downstream of these phis get the full possible-value enumeration instead of an empty fallback.
@@ -148,7 +148,7 @@ static constexpr std::array<uint64_t, 3> kSupportedGeneralizedControlFieldOffset
0x6ULL, 0xAULL, 0xCULL};
```
`resolveTargetedThemidаR9` adds three hardcoded `(instruction-address, control-offset)` pairs:
`resolveTargetedThemidaR9` adds three hardcoded `(instruction-address, control-offset)` pairs:
| Instruction address | Control offset | Verified hit count on reference sample |
|---|---|---|
@@ -173,7 +173,7 @@ Loop handling has roughly thirty microtests in `lifter/test/Tester.hpp`. The mos
| `pending_generalized_loop_*` | Same guards in the `pendingLoopGeneralizationAddresses` lifecycle. |
| `generalized_loop_restore_*` | Backedge flag-state and register-state merging across `load_generalized_backup`. |
| `generalized_loop_*_creates_phi` | Each `retrieve_generalized_loop_*` helper produces the expected phi shape (control slot, control slot displacement, target slot, control field load, local phi address). |
| `targeted_themida_r9_override_produces_phi` | All three hardcoded `(address, offset)` pairs in `resolveTargetedThemidаR9`. |
| `targeted_themida_r9_override_produces_phi` | All three hardcoded `(address, offset)` pairs in `resolveTargetedThemidaR9`. |
| `compute_possible_values_*` | The PHI handler unions incomings (also covers cast-width preservation and rolled-arithmetic-chain enumeration). |
When changing loop handling, run at minimum:
@@ -197,6 +197,6 @@ and inspect `output_diagnostics.json` for `lift_stats.instructions_lifted == 254
|---|---|
| `REP`/`REPE`/`REPNE`-prefixed `SCAS` | Rejected as `not_implemented`; needs a model for repeated-scan termination. |
| `INT 2` continuation under VMP 3.6 | Naive architectural fallthrough is wrong; recovery requires modeling the dispatcher / exception-mediated control flow. See `VMP_TESTING_NOTES.md`. |
| Hardcoded `(address, offset)` pairs in `resolveTargetedThemidаR9` | Only fire on the reference Themida sample. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
| Hardcoded `(address, offset)` pairs in `resolveTargetedThemidaR9` | Only fire on the reference Themida sample. See [Hardcoded reference-sample addresses](#hardcoded-reference-sample-addresses). |
| Loop unrolling / loop-invariant code motion | Not implemented. The lifter relies on LLVM's downstream optimization passes for this once the IR is in shape. |
| Multi-way backedges (≥3 paths to the same header) | Not exercised by the current generalized-loop machinery; the canonical/backedge model assumes exactly two incoming paths. |
+2 -2
View File
@@ -139,7 +139,7 @@ public:
}
}
llvm::Value* resolveTargetedThemidаR9(llvm::Value* value) {
llvm::Value* resolveTargetedThemidaR9(llvm::Value* value) {
auto* state = getMostRecentGeneralizedLoopState();
if (this->liftProgressDiagEnabled && this->current_address >= 0x140023500ULL &&
this->current_address <= 0x140023800ULL) {
@@ -201,7 +201,7 @@ public:
std::cout << "\n";
}
if (key == Register::R9) {
return resolveTargetedThemidаR9(value);
return resolveTargetedThemidaR9(value);
}
return value;
}
+8 -8
View File
@@ -2040,8 +2040,8 @@ bool runSolvePathResolvesGeneralizedPhiLoadTarget(std::string& details) {
}
bool runTargetedThemidаR9OverrideProducesPhi(std::string& details) {
// resolveTargetedThemidаR9 hardcodes three instruction addresses where a
bool runTargetedThemidaR9OverrideProducesPhi(std::string& details) {
// resolveTargetedThemidaR9 hardcodes three instruction addresses where a
// Themida cursor-derived R9 value must be rematerialized as a phi over
// canonical/backedge control bases with a per-address offset. Verify
// all three cases, not just one, so a regression that silently drops or
@@ -2124,8 +2124,8 @@ bool runSolvePathResolvesGeneralizedPhiLoadTarget(std::string& details) {
}
bool runTargetedThemidаR9OverrideDoesNotFireAtAdjacentAddress(std::string& details) {
// The switch in resolveTargetedThemidаR9 is exact-address. A regression
bool runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress(std::string& details) {
// The switch in resolveTargetedThemidaR9 is exact-address. A regression
// that accidentally broadened it to a range (e.g. `addr >= 0x140023500 &&
// addr <= 0x140023800`) would silently produce a phi at every R9 read in
// that window, corrupting samples that rely on exact-match behavior.
@@ -2171,7 +2171,7 @@ bool runSolvePathResolvesGeneralizedPhiLoadTarget(std::string& details) {
return true;
}
bool runTargetedThemidаR9OverrideFallsThroughWithoutLoopState(std::string& details) {
bool runTargetedThemidaR9OverrideFallsThroughWithoutLoopState(std::string& details) {
// Before any generalized-loop backup has been created,
// getMostRecentGeneralizedLoopState() returns null and the override must
// fall through to the ordinary R9 value instead of attempting to build a
@@ -3077,7 +3077,7 @@ bool runComputePossibleValuesOnRolledArithmeticChain(std::string& details) {
runCustom("generalized_loop_restore_merges_backedge_flag_state",
&InstructionTester::runGeneralizedLoopRestoreMergesBackedgeFlagState);
runCustom("targeted_themida_r9_override_produces_phi",
&InstructionTester::runTargetedThemidаR9OverrideProducesPhi);
&InstructionTester::runTargetedThemidaR9OverrideProducesPhi);
runCustom("generalized_loop_restore_merges_backedge_register_state",
&InstructionTester::runGeneralizedLoopRestoreMergesBackedgeRegisterState);
runCustom("set_register_value_zero_extends_32bit_writes",
@@ -3105,9 +3105,9 @@ bool runComputePossibleValuesOnRolledArithmeticChain(std::string& details) {
runCustom("compute_possible_values_trunc_to_i1_preserves_width",
&InstructionTester::runComputePossibleValuesTruncToI1PreservesWidth);
runCustom("targeted_themida_r9_override_does_not_fire_at_adjacent_address",
&InstructionTester::runTargetedThemidаR9OverrideDoesNotFireAtAdjacentAddress);
&InstructionTester::runTargetedThemidaR9OverrideDoesNotFireAtAdjacentAddress);
runCustom("targeted_themida_r9_override_falls_through_without_loop_state",
&InstructionTester::runTargetedThemidаR9OverrideFallsThroughWithoutLoopState);
&InstructionTester::runTargetedThemidaR9OverrideFallsThroughWithoutLoopState);
runCustom("generalized_loop_control_field_load_creates_phi",
&InstructionTester::runGeneralizedLoopControlFieldLoadCreatesPhi);
runCustom("solve_path_prefers_mapped_target_over_null_for_indirect_jump",