Table of Contents
At the end of the last part, I drawed your attention toward the fact that Mingw32 doesn’t produce movable binaries: it cannot create relocation table. You can force it to put the “Dll can move” flag, but without a relocation table, this binary would not work. We are going to change our packer to handle such non movable binaries.
A problem, and its solution
To handle a non-movable binary, we are going to place ourselve (the unpacker) at its image base, and pre allocate in our section its memory. Right now our packed binary in memory looks like this:
VA | RVA | Content |
---|---|---|
0x00400000 | 0 | unpacker PE header |
0x00401000 | 0x1000 | unpacker .text section |
0x00402000 | 0x2000 | unpacker .rdata section |
0x00403000 | 0x3000 | unpacker .eh_fram section |
0x00404000 | 0x4000 | unpacker .idata section |
0x00405000 | 0x5000 | unpacker .packed section |
But if we want to pack a binary expecting to be placed at the VA 0x00400000, like we are, we could not load it: we are already at this place, we would be writing over our own code. We could try to place ourselves somewhere else and hope to be able to allocate the packed binary ImageBase with VirtualAlloc
but there is no guarantee it would work: the OS could have placed something already there, like Kernel32.dll
.
So, to make sure everything runs smoothly, we are going to place ourselves at the packed binary image base, on purpose, but we’ll let room in memory for loading it. We’ll get something like this:
VA | RVA | Content |
---|---|---|
image base of the packed binary | 0 | unpacker PE header |
0x1000 | .alloc section, for the packed binary loading | |
0x1000 + size of the packed binary in memory | sections of the unpacker | |
… | sections of the unpacker | |
0x5000 + size of the packed binary in memory | .packed section, with the packed PE file |
We would be placing the packed binary at its expected Image Base, the same as ours. We can load its sections in memory, because we already allocated space for them in the .alloc
section. We can replace our own PE header by changing the memory page permissions, that works fine (some malware actually does this). Basically, we have everything we need, let’s program it.
Modifying the unpacking stub
There is little to do here. The first thing is to check the packed PE header for ASLR. If we cannot move, we don’t allocate memory, it’s going to be done by the packer in the .alloc
section. We will be using the current module address as the image base:
char* ImageBase = NULL;
if(p_NT_HDR->OptionalHeader.DllCharacteristics & IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE) {
ImageBase = (char*) VirtualAlloc(NULL, p_NT_HDR->OptionalHeader.SizeOfImage, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
if(ImageBase == NULL) {
// Allocation failed
return NULL;
}
} else {
//if no ASLR : the packer would have placed us at the expected image base already
ImageBase = (char*) GetModuleHandleA(NULL);
}
When loading the sections, we should do some VirtualProtect
: one to make the PE header writable (necessary), and others (optionnal) to make sur we can write the packed binary sections data in the .alloc
section:
DWORD oldProtect;
//The PE header is readonly, we have to make it writable to be able to change it
VirtualProtect(ImageBase, p_NT_HDR->OptionalHeader.SizeOfHeaders, PAGE_READWRITE, &oldProtect);
mymemcpy(ImageBase, PE_data, p_NT_HDR->OptionalHeader.SizeOfHeaders);
// Section headers starts right after the IMAGE_NT_HEADERS struct, so we do some pointer arithmetic-fu here.
IMAGE_SECTION_HEADER* sections = (IMAGE_SECTION_HEADER*) (p_NT_HDR + 1);
// For each sections
for(int i=0; i<p_NT_HDR->FileHeader.NumberOfSections; ++i) {
// calculate the VA we need to copy the content, from the RVA
// section[i].VirtualAddress is a RVA, mind it
char* dest = ImageBase + sections[i].VirtualAddress;
// check if there is Raw data to copy
if(sections[i].SizeOfRawData > 0) {
// A VirtualProtect to be sure we can write in the allocated section
VirtualProtect(dest, sections[i].SizeOfRawData, PAGE_READWRITE, &oldProtect);
// We copy SizeOfRaw data bytes, from the offset PointertoRawData in the file
mymemcpy(dest, PE_data + sections[i].PointerToRawData, sections[i].SizeOfRawData);
} else {
// if no raw data to copy, we just put zeroes, based on the VirtualSize
VirtualProtect(dest, sections[i].Misc.VirtualSize, PAGE_READWRITE, &oldProtect);
mymemset(dest, 0, sections[i].Misc.VirtualSize);
}
}
And that’s actually everything that we need to change in the unpacking stub. The biggest changes will be in the python packer.
Modifying the python packer
We’re going to need to compile the unpacking stub with options depending on the packed binary. So, let’s just start by writing this simple function that’s going to do this automatically for us :
def compile_stub(input_cfile, output_exe_file, more_parameters = []):
cmd = (["mingw32-gcc.exe", input_cfile, "-o", output_exe_file] # Force the ImageBase of the destination PE
+ more_parameters +
["-Wl,--entry=__start", # define the entry point
"-nostartfiles", "-nostdlib", # no standard lib
"-lkernel32" # Add necessary imports
])
print("[+] Compiling stub : "+" ".join(cmd))
subprocess.run(cmd)
The begining is the same, we just compile the unpacking stub automatically, and open the input PE with lief :
parser = argparse.ArgumentParser(description='Pack PE binary')
parser.add_argument('input', metavar="FILE", help='input file')
parser.add_argument('-o', metavar="FILE", help='output', default="packed.exe")
args = parser.parse_args()
# Opens the input PE
input_PE = lief.PE.parse(args.input)
# Compiles the unpacker stub a first time, with no particular options
compile_stub("unpack.c", "unpack.exe", more_parameters=[]);
# open the unpack.exe binary
unpack_PE = lief.PE.parse("unpack.exe")
# we're going to keep the same alignment as the ones in unpack_PE,
# because this is the PE we are modifying
file_alignment = unpack_PE.optional_header.file_alignment
section_alignment = unpack_PE.optional_header.section_alignment
The we need to check for ASLR in the input file:
ASLR = (input_PE.optional_header.dll_characteristics & lief.PE.DLL_CHARACTERISTICS.DYNAMIC_BASE != 0)
if ASLR:
output_PE = unpack_PE # we can use the current state of unpack_PE as our output
else:
Now, in the else
case, we need to add the .alloc
section when ASLR is disabled. We are going to start by checking the memory space used by the sections of input_PE
:
# The RVA of the lowset section of input PE
min_RVA = min([x.virtual_address for x in input_PE.sections])
# The RVA of the end of the highest section
max_RVA = max([x.virtual_address + x.size for x in input_PE.sections])
We could simply have used the SizeOfImage
as we did before (in the VirtualAlloc
), but it includes the memory used by the PE header, which is already allocated (and occupied by the unpacker PE header). We just need memory for the sections in .alloc
, and that’s what we computed here.
We can now create the .alloc
section that will cover all this space :
alloc_section = lief.PE.Section(".alloc")
alloc_section.virtual_address = min_RVA
alloc_section.virtual_size = align(max_RVA - min_RVA, section_alignment)
alloc_section.characteristics = (lief.PE.SECTION_CHARACTERISTICS.MEM_READ
| lief.PE.SECTION_CHARACTERISTICS.MEM_WRITE
| lief.PE.SECTION_CHARACTERISTICS.CNT_UNINITIALIZED_DATA)
The .alloc
section has no data : it’s just memory. Its raw size will be null.
We now need to make room in the unpacker for this section. We cannot just change the sections RVA, there are many dependencies to the RVA (the import tables for example, contains A LOT of RVA as we saw). Shifting all the sections in a PE is no trivial thing, but we can simply ask the compiler nicely. We also need it to place the unpacker at the same image base as the packed binary.
First, some math:
# to put the section just after ours, find the lowest section RVA in the stub
min_unpack_RVA = min([x.virtual_address for x in unpack_PE.sections])
# and compute how much we need to move to be exactly after the .alloc section
shift_RVA = (min_RVA + alloc_section.virtual_size) - min_unpack_RVA
We compute the minimal section RVA used by the unpacker (should be its section alignment, 0x1000 usually). And we compute by how much we need to move the sections to make the lowest one match the end of the .alloc
section. Now, we’ll be asking the compiler to put the unpacker at the packed binary image base, and to shift all the sections RVA by shift_RVA
:
# We need to recompile the stub to make room for the `.alloc` section, by shifting all its sections
compile_parameters = [f"-Wl,--image-base={hex(input_PE.optional_header.imagebase)}"]
for s in unpack_PE.sections:
compile_parameters += [f"-Wl,--section-start={s.name}={hex(input_PE.optional_header.imagebase + s.virtual_address + shift_RVA )}"]
# recompile the stub with the shifted sections
compile_stub("unpack.c", "shifted_unpack.exe", compile_parameters)
Note that the --section-start
option expects VA in hex, not RVA.
Now if all worked fine, we should have in shifted_unpack.exe
the unpacker with the same image base as the packed binary, and a space in the sections memory to fit our .alloc
one :
As you can see, we got a big space before the .text
section, and that’s where we’re going to put the .alloc
section. Now lief doesn’t let us add a section at the beginning easily, and it appears the Windows loader expects the sections to be ordered by increasing RVA. So we’re just going to make a new PE from scratch, and copy everything we need inside:
unpack_shifted_PE = lief.PE.parse("shifted_unpack.exe")
# This would insert .alloc section at the end of the table, so the RVA would not be in order.
# but Windows doesn' t seem to like it : the binary doesn' t load.
# output_PE = unpack_shifted_PE
# output_PE.add_section(alloc_section)
# Here is how we make a completely new PE, copying the important properties
# And adding the sections in order
output_PE = lief.PE.Binary("pe_from_scratch", lief.PE.PE_TYPE.PE32)
# Copy optional headers important fields
output_PE.optional_header.imagebase = unpack_shifted_PE.optional_header.imagebase
output_PE.optional_header.addressof_entrypoint = unpack_shifted_PE.optional_header.addressof_entrypoint
output_PE.optional_header.section_alignment = unpack_shifted_PE.optional_header.section_alignment
output_PE.optional_header.file_alignment = unpack_shifted_PE.optional_header.file_alignment
output_PE.optional_header.sizeof_image = unpack_shifted_PE.optional_header.sizeof_image
# make sure output_PE cannot move
output_PE.optional_header.dll_characteristics = 0
# copy the data directories (imports most notably)
for i in range(0, 15):
output_PE.data_directories[i].rva = unpack_shifted_PE.data_directories[i].rva
output_PE.data_directories[i].size = unpack_shifted_PE.data_directories[i].size
# add the sections in order
output_PE.add_section(alloc_section)
for s in unpack_shifted_PE.sections:
output_PE.add_section(s)
We need to make sure our packed PE doesn’t move, that’s all the point of placing ourselves at the same image base as the packed binary (that cannot be moved). That’s what the dll_characteristics = 0
is for.
You just need to modify the rest of the file to add the .packed
section to output_PE
and we’re done! Here is a packed “hello world” sections:
Final words
The final code can be found as usual here: https://github.com/jeremybeaume/packer-tutorial/tree/master/part4
Our packer is now able to handle any PE 32 binary .exe
files. You could for example pack a binary already packed, it works.
It does not yet work fully on DLLs (we’re missing a few things in the loader), and it also won’t work at all on .net executable files (they are also .exe
files, but doesn’t contains X86 ASM instructions).
This packer is still pretty useless, but we’re going to remedy that in the next tutorial part : Part 5 : simple obfuscation.
Hi Jeremy,
I ran the code of part4 from your github, but the output didn’t start. I checked it with IDA Pro and I saw that the address of external functions (like GetModuleHandleA) were failed. I don’t know why, have you encountered this problem? And how can I fix it?
Sorry for my bad english.