Recently, I’ve been working on a malware sample that arrived with almost no static strings and a completely empty Import Table—a clear sign that it’s been obfuscated.


String Deobfuscation
Looking around the sample, I found multiple functions with the same pattern:
func -> call wrapper(some data address) -> complex decryption/reconstruction logic -> return a string
For example:
int sub_402F40() // Wrapper function for string deobfuscation
{
return sub_401F30((int)&unk_44C000); // Deobfuscate data at unk_44C000
}
int __cdecl sub_401F30(int a1) // Deobfuscation logic that takes the encrypted data address
// as input and returns a pointer to the decrypted string
{
__int64 v2; // [esp+0h] [ebp-24h]
__int64 v3; // [esp+0h] [ebp-24h]
int v4; // [esp+Ch] [ebp-18h]
unsigned int v5; // [esp+14h] [ebp-10h]
unsigned int j; // [esp+18h] [ebp-Ch]
int v7; // [esp+1Ch] [ebp-8h]
unsigned int i; // [esp+20h] [ebp-4h]
if ( dword_44F9D0 )
return dword_44F9D0;
v7 = sub_438780(15);
if ( !v7 )
return 0;
v2 = sub_402BD0(0);
for ( i = 0; i + 4 <= 0xE; i += 4 )
{
v2 = sub_4038D0(v2, HIDWORD(v2));
*(_DWORD *)(i + v7) = *(_DWORD *)(i + a1) ^ sub_403750(v2, HIDWORD(v2));
}
if ( i < 0xE )
{
v3 = sub_4038D0(v2, HIDWORD(v2));
v5 = sub_403750(v3, HIDWORD(v3));
while ( i < 0xE )
{
*(_BYTE *)(i + v7) = v5 ^ *(_BYTE *)(i + a1);
++i;
v5 >>= 8;
}
}
*(_BYTE *)(v7 + 14) = 0;
v4 = sub_4039E0(&dword_44F9D0, v7, 0);
if ( !v4 )
return v7;
for ( j = 0; j < 0xF; ++j )
*(_BYTE *)(j + v7) = 0;
sub_4386F0(v7);
return v4;
}
This pattern strongly suggests string obfuscation where a generic helper function is used with different encrypted data blocks.
Running a debugger to examine one of these functions confirms this:


We can see encrypted data at unk_6BC010 has been decrypted to microsoft.com, a string of length 14 (including the null terminator).
5A F5 8F F2 98 07 B5 F9 46 B6 6A 29 0E 00
m i c r o s o f t . c o m \0
Good. Now I need to recover the strings for all other functions. But how to do that automatically?
1. Automatic Recovery
The strings are deobfuscated and returned by a function call, which is wrapped by another function. I could try to reproduce the decryption logic in Python, but there are multiple algorithms used, and they are complex.
A more robust way is to use a simulator to recover the strings statically. Let’s do a simple test against sub_672D60 with flare-emu:
import flare_emu
def emulate(target_ea):
eh = flare_emu.EmuHelper()
try:
eh.emulateRange(target_ea, strict=True, hookApis=True)
ret_ptr = eh.getRegVal("eax")
if ret_ptr:
return bytes(eh.getEmuString(ret_ptr)).decode("utf-8")
except Exception as e:
return f"Failed: {e}"
return "No return value"
Result:
[*] Emulating 0x672d60...
[+] EAX returned: 0x0
[+] Result: 'No return value'
2. The Chokepoint
It failed. Most of the arithmetic was handled fine, but a closer look at sub_6A8780 revealed the issue:
void sub_6A8640()
{
if ( !dword_6C2864 )
{
v0 = sub_6741F0(48); // Reads TEB/PEB
dword_6C2864 = *(_DWORD *)(v0 + 24); // Gets ProcessHeap
// Walks the PEB Ldr list to find kernel32.dll base address
v1 = *(_DWORD *)(**(_DWORD **)(*(_DWORD *)(v0 + 12) + 20) + 16);
// Resolves memory allocation APIs using hashes
off_6C2868 = sub_6A8550(v1, -336842543);
off_6C286C = sub_6A8550(v1, -1315160754);
// ...
}
}
flare-emu (powered by Unicorn) simulates raw CPU instructions. It doesn’t populate complex Windows structures like the PEB/TEB by default. When the code tried to walk the heap or the Ldr list to find kernel32.dll, it crashed into unmapped memory.
3. Mocking the Heap
Since we can’t easily emulate the entire Windows environment, we can “mock” the heap allocator. We’ll replace the call to the internal heap allocation function with a hook that returns a valid buffer in the emulator’s memory.
import ida_funcs
import idaapi
import flare_emu
def emulate(target_ea):
eh = flare_emu.EmuHelper()
heap_alloc_ea = 0x6A8780
print(f"[*] Emulating {hex(target_ea)} (heap_alloc @ {hex(heap_alloc_ea)})")
def call_hook(address, argv, func_name, user_data):
# Resolve target of the call
call_target = eh.analysisHelper.getOpndValue(address, 0)
target_func = ida_funcs.get_func(call_target)
target_ea_res = target_func.start_ea if target_func else call_target
if target_ea_res == heap_alloc_ea:
size = argv[0] if argv and argv[0] else 0x100
buf = eh.allocEmuMem(size)
eh.uc.reg_write(eh.regs["eax"], buf)
print(f" [hook] heap_alloc({hex(size)}) -> {hex(buf)}")
eh.skipInstruction(user_data)
# Emulate the function
eh.emulateRange(
target_ea,
callHook=call_hook,
skipCalls=False,
hookApis=False,
strict=True
)
# Get the return value (usually in EAX for x86)
ret_ptr = eh.getRegVal("eax")
if ret_ptr:
try:
raw_bytes = bytes(eh.getEmuString(ret_ptr))
decoded_str = raw_bytes.decode("utf-8", errors="replace")
return decoded_str
except Exception as e:
return f"Error reading string: {e}"
return "No return value"
if __name__ == "__main__":
target = 0x672D60
result = emulate(target)
print(f"\n[+] Result from {hex(target)}: {repr(result)}")
Running it again with the hook:
[*] Emulating 0x672d60 (heap_alloc @ 0x6a8780)
[hook] heap_alloc(0xe) -> 0x18000
[+] Result: 'microsoft.com'
Excellent. It worked.
Now I can write a script to search for all wrappers and recover the strings. After deobfuscation, script detect and recover ~ 350 strings.
wrapper_ea,wrapper_name,helper_ea,helper_name,data_ea,string,raw_hex
0x67c040,sub_67C040,0x678ae0,sub_678AE0,0x6bc234,\Login Data,5c4c6f67696e2044617461
0x67c070,sub_67C070,0x679a00,sub_679A00,0x6bc1e0,profiles_order,70726f66696c65735f6f72646572
0x67c0a0,sub_67C0A0,0x67a350,sub_67A350,0x6bc194,\AppData,5c41707044617461
0x67c0d0,sub_67C0D0,0x679b60,sub_679B60,0x6bc270,\logins.json,5c6c6f67696e732e6a736f6e
0x67c100,sub_67C100,0x679360,sub_679360,0x6bc294,\Web Data,5c5765622044617461
0x67c130,sub_67C130,0x67a020,sub_67A020,0x6bc240,\Login Data For Account,5c4c6f67696e204461746120466f72204163636f756e74
0x67c160,sub_67C160,0x67acd0,sub_67ACD0,0x6bc1d8,profile,70726f66696c65
0x67c1c0,sub_67C1C0,0x67ab00,sub_67AB00,0x6bc2b4,\formhistory.sqlite,5c666f726d686973746f72792e73716c697465
0x67c220,sub_67C220,0x679090,sub_679090,0x6bc218,\cookies.sqlite,5c636f6f6b6965732e73716c697465
0x67c2e0,sub_67C2E0,0x6791f0,sub_6791F0,0x6bc258,\Login Data For Account,5c4c6f67696e204461746120466f72204163636f756e74
...
4. Patching for Readability
After recovering about 350 strings, I wanted to make the IDB actually readable. Since these wrappers take no arguments and return a pointer in EAX, we can replace the call instruction (5 bytes) with a mov eax, offset plaintext (also 5 bytes).
I used a script to create a new .deobf segment for the strings and patch every call site:
def patch_callsite(call_ea, string_ea, text):
# Replace 'call wrapper' with 'mov eax, imm32'
patch = b"\xB8" + string_ea.to_bytes(4, "little")
ida_bytes.patch_bytes(call_ea, patch)
idc.set_cmt(call_ea, f"deobf: {text}", 0)
The transformation is dramatic:
Before: Mysterious calls to wrapper functions.
After: Clear, readable string references.
Before: Mysterious calls to wrapper functions.
After: Clear, readable string references.
IAT Reconstruction
The empty Import Table was the next hurdle. The malware resolves everything at runtime using get_proc_address_by_hash(module_hash, api_hash).

By identifying the hashing algorithm (a simple ROR-based additive hash), I could brute-force the names against a database of common Windows APIs. A total of approximately 80 API hashes were resolved.
| Address | Hash | Resolved Name |
|---|---|---|
| 0x00404099 | 0x529293ba | ntdll.dll:LdrLoadDll |
| 0x0040414f | 0xb302a467 | user32.dll:GetKeyboardLayoutList |
| 0x0040d9a9 | 0xbf885a30 | kernel32.dll:CreateProcessA |
| 0x00414ca9 | 0xe74bb84c | crypt32.dll:CryptUnprotectData |
| 0x00414d14 | 0xeea87529 | kernel32.dll:LocalFree |
| 0x00414e29 | 0x529293ba | ntdll.dll:LdrLoadDll |
| 0x00416e2e | 0x529293ba | ntdll.dll:LdrLoadDll |
| 0x00416ecd | 0xb0b37f1a | ws2_32.dll:WSAStartup |
| 0x00416ee5 | 0xcea5640f | ws2_32.dll:WSACleanup |
| 0x00416efe | 0xbd669943 | ws2_32.dll:getaddrinfo |
| … | … | … |
Finally, I reconstructed a full Import Address Table (IAT) in a new segment and patched the resolution logic to point directly to these entries. This allows IDA’s auto-analysis to resolve cross-references and library calls correctly.

Before: A call to the custom hash resolver with two opaque constants.
After: The call is patched to point directly to the reconstructed IAT, allowing IDA to resolve the API name.
Final Result
By adding custom segments, we’ve essentially “normalized” the malware.
Before patching: A standard packed-looking layout.
After patching: The addition of .deobf and .iat provides a dedicated space for our recovered data.
The result is a binary that looks and behaves almost like it was never obfuscated in the first place, allowing for much faster deep-dive analysis.