Types
In this section we will look at the different types in C, and how they translate to assembly.
Simple Integers
In this section we'll look at the simple integers and how they appear in assembly. We'll create a function that takes four unsigned integer parameters: a char,
a short
, an int
and a long int
. On the machine this code is compiled on these equate to 8, 16, 32 and 64 bits respectively. Each of the parameters will then be incremented by their size.
The values we'll pass to the function will be values just inside their range, i.e. the 32bit value is 2^16 + 1, and the 64bit value is 2^32 + 1.
long int type_increment(unsigned char c, unsigned short s, unsigned int i, unsigned long int l) {
c += sizeof(c);
s += sizeof(s);
i += sizeof(i);
l += sizeof(l);
return c + s + i + l;
}
int main(void) {
return type_increment(0x1, 0x101, 0x10001, 0x100000001);
}
This generates the following assembly:
type_increment:
push rbp
mov rbp, rsp
mov eax, esi
mov DWORD PTR [rbp-12], edx
mov QWORD PTR [rbp-24], rcx
mov BYTE PTR [rbp-4], dil
mov WORD PTR [rbp-8], ax
add BYTE PTR [rbp-4], 1
add WORD PTR [rbp-8], 2
add DWORD PTR [rbp-12], 4
add QWORD PTR [rbp-24], 8
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
movabs rcx, 4294967297
mov edx, 65537
mov esi, 257
mov edi, 1
call type_increment
mov eax, 0
pop rbp
ret
We start by looking in the main
function. The movabs
instruction loads the 64bit register rcx
with the absolute value 4294967297. This appears to be an error in the decoding of the instruction. The other absolute values are loaded into the 32 bit registers.
In type_increment
, following the prologue, esi
is moved to eax
. The parameters are placed on the stack at the following locations:
- 4 byte int (DWORD) is moved to
rbp-12 .. rbp-9
- 8 byte long int (QWORD) is moved to
rbp-24 .. rbp-17
- 1 byte char (BYTE) is moved to
rbp-4
. - 2 byte short (WORD) is moved to
rbp-8 .. rbp-7
Also notice the registers that are used. The int
is moved from the 32bit edx
register, and the long int
is moved from the 64bit rcx
register. The char
is moved from the dil
register, which is the low byte of the four byte edi
register. The short
is moved from ax
, which is the low two bytes of the four byte eax
register.
Each of the memory locations on the stack then have their size in bytes added to themselves. A nop
is placed at the end (I am unsure of the purpose of the noop - perhaps for alignment?). eax
is loaded with 0 as a return value even despite us being in a void function. The previous base pointer is popped off the stack pop rbp
and ret
causes the function to return.
We now increase the optimisation level to -O1 and see how the assembly changes:
type_increment:
add edi, 1
movzx edi, dil
add esi, 2
movzx esi, si
add esi, edi
lea eax, [rdx+4+rsi]
lea rax, [rcx+8+rax]
ret
main:
mov eax, 65811
ret
Arrays
In this section we'll take a look at how arrays are allocated on the stack.
#define ARRAY_SIZE 512
int use_up_stack_space(int y) {
char char_array[ARRAY_SIZE];
int int_array[ARRAY_SIZE];
int sum;
for (int x = 0; x < ARRAY_SIZE; x++) {
char_array[x] = x * x;
int_array[x] = y * y;
sum += char_array[x] + int_array[x];
}
return sum;
}
use_up_stack_space:
push rbp
mov rbp, rsp
sub rsp, 2464
mov DWORD PTR [rbp-2580], edi
mov DWORD PTR [rbp-8], 0
jmp .L2
.L3:
mov eax, DWORD PTR [rbp-8]
mov ecx, eax
mov eax, DWORD PTR [rbp-8]
mov edx, eax
mov eax, ecx
imul eax, edx
mov edx, eax
mov eax, DWORD PTR [rbp-8]
cdqe
mov BYTE PTR [rbp-528+rax], dl
mov eax, DWORD PTR [rbp-2580]
imul eax, DWORD PTR [rbp-2580]
mov edx, eax
mov eax, DWORD PTR [rbp-8]
cdqe
mov DWORD PTR [rbp-2576+rax*4], edx
mov eax, DWORD PTR [rbp-8]
cdqe
movzx eax, BYTE PTR [rbp-528+rax]
movsx edx, al
mov eax, DWORD PTR [rbp-8]
cdqe
mov eax, DWORD PTR [rbp-2576+rax*4]
add eax, edx
add DWORD PTR [rbp-4], eax
add DWORD PTR [rbp-8], 1
.L2:
cmp DWORD PTR [rbp-8], 511
jle .L3
mov eax, DWORD PTR [rbp-4]
leave
ret