Integer Overflow Intro | Relish the Moment

Here we experiment integer overflow on Linux.

Integer Overflow Introduction

In C language, the basic data types of integers are divided into short (short), integer (int), and long (long). These three data types are also divided into signed and unsigned, each data type. They all have their own size ranges (because the size range of the data type is determined by the compiler, so the default is to use gcc-5.4 under 64 bits), as shown below:

Type	Byte	Range
short int	2byte(word)	0~32767(0~0x7fff) -32768~-1(0x8000~0xffff)
unsigned short int	2byte(word)	0~65535(0~0xffff)
int	4byte(dword)	0~2147483647(0~0x7fffffff) -2147483648~-1(0x80000000~0xffffffff)
unsigned int	4byte(dword)	0~4294967295(0~0xffffffff)
long int	8byte(qword)	Positive: 0~0x7fffffffffffffff Negative: 0x8000000000000000~0xffffffffffffffff
unsigned long int	8byte(qword)	0~0xffffffffffffffff

When the data in the program exceeds the range of its data type, it will cause an overflow, and the overflow of the integer type is called integer overflow.

Principle

Here briefly explains the principle of integer overflow.

Upper Bound Overflow

There are two cases of upper bound overflow, 0x7fff + 1 and 0xffff + 1.

That is because the low-level instructions of the computer are not distinguishable between signed and unsigned, and all data is stored in binary (in compiler level, it distinguishes between signed and unsigned, and produces different assembly instructions).

So add 0x7fff, 1 == 0x8000, this kind of upper bound overflow has no effect on unsigned integers, but in signed short integers, 0x7fff means 32767, but 0x8000 It is -32768, which is represented by a mathematical expression in the signed short integer 32767+1 == -32768.

The second case is add 0xffff, 1. In this case, the first operand is to be considered.

For example, the assembly code for the signed addition above is add eax, 1, because eax=0xffff, so add eax, 1 == 0x10000, but the unsigned assembly code is to do add Word ptr [rbp - 0x1a], 1 == 0x0000.

In the signed addition, although the result of eax is 0x10000, only the value of ax=0x0000 is stored in the memory, and the result is the same as the unsigned.

In the signed short integer, 0xffff==-1, -1 + 1 == 0, this calculation is no problem from a signed one. In an unsigned short, 0xffff == 65535, 65535 + 1 == 0.

assembly instructions of signed and unsigned addition:

# pseudocode
short int a;
a = a + 1;

# corresponding assembly
movzx eax, word ptr [rbp - 0x1c]
add    eax, 1
mov word ptr [rbp - 0x1c], ax

# and 
unsigned short int b;
b = b + 1;

# assembly code
add    word ptr [rbp - 0x1a], 1

Lower Bound Overflow

Lower bound overflow is similar to the upper bound overflow. In the assembly, just need to replace add with sub.

There are two cases as well:

The first case is sub 0x0000, 1 == 0xffff, which is ok for signed 0 - 1 == -1, but for unsigned it becomes 0 - 1 == 65535.

The second case is sub 0x8000, 1 == 0x7fff, for unsigned it is 32768 - 1 == 32767 is correct, but for signed it becomes -32768 - 1 == 32767 .

Example

It can be summarized in two cases.

Unrestricted Range

This situation is easy to understand. A thing of a fixed size, if not constrained well, will cause unpredictable consequences.

Here we write a sample intof.c:

#include<stdio.h>
#include<stdlib.h>
int main(void)
{
    int len;
    int data_len;
    int header_len;
    char *buf;

    header_len = 0x10;
    scanf("%uld", &data_len);

    len = data_len+header_len
    buf = malloc(len);
    read(0, buf, data_len);
    return 0;
}

Compile and run it:

$ gcc -o intof intof.c
intof.c: In function ‘main’:
intof.c:17:2: warning: implicit declaration of function ‘read’; did you mean ‘fread’? [-Wimplicit-function-declaration]
   17 |  read(0, buf, data_len);
      |  ^~~~
      |  fread
$ ./intof
3                           # input len
abcdefg                     # input string
$ defg
-bash: defg: command not found
# it only reads 3 chars
$
$ ./intof
-1                          # input len
qwertyuiopasdfghjklzxcvbnm  # input string
# it reads all chars
$

What happened? We use gdb to see.

$ gdb intof -q
GEF for linux ready, type `gef` to start, `gef config` to configure
88 commands loaded for GDB 9.2 using Python engine 3.8
Reading symbols from intof...
(No debugging symbols found in intof)
gef➤  b malloc
Breakpoint 1 at 0x1040
gef➤r
...
gef➤c
Continuing.
-1                          # input data length
qwertyuiopasdfghjklzxcvbnm  # string to read
...
─────────────────── trace ────
[#0] 0x7ffff7e89a40 → __GI___libc_malloc(bytes=0xf)     
[#1] 0x555555555194 → main()
──────────────────────────────
gef➤

We input -1 to apply, only to apply 0x20 size heap, while we can input a string of 0xffffffff size. (from integer overflow to heap overflow)

Wrong Type Conversion

Even if the correct constraints on the variables, there is still the possibility of integer overflow vulnerabilities, I think it can be summarized as the wrong type conversion, if you continue to subdivide, it can be divided into:

A large range variable is assigned to a small range variable.

Example intof2.c:

#include<stdio.h>
#include<stdlib.h>
void check(int n)
{
    if (!n)
        printf("vuln");
    else
        printf("OK");
}

int main(void)
{
    long int a;

    scanf("%ld", &a);
    if (a == 0)
        printf("Bad");
    else
        check(a);
    return 0;
}

Compile it and run:

$ gcc -o intof2 intof2.c
$ ./intof2
4294967296
vuln

The above is to copy a large variable (long integer a) to a variable with a small range (integer variable n) and causing an integer overflow.

The long integer occupies 8 bytes of memory space, while the integer has only 4 bytes of memory space, so when long is converted to int, it will cause truncation, and only the low 4 bytes of the long integer will be passed to the integer variable.

In this example, it converts long: 0x100000000 to int: 0x00000000.

But when passing a smaller variable value to a larger variable, it causes no data loss.

Only Unilateral Restrictions.

This case is only for signed types
Example intof3.c:

#include<stdio.h>
#include<stdlib.h>
int main(void)
{
            int len, l;
            char buf[11];

            scanf("%d", &len);
            if (len < 10) {
                    l = read(0, buf, len);
                    *(buf+l) = 0;
                    puts(buf);
            } else
                    printf("Please len < 10");
}

Compile and run:

$ gcc intof3.c  -o intof3
intof3.c: In function ‘main’:
intof3.c:10:11: warning: implicit declaration of function ‘read’; did you mean ‘fread’? [-Wimplicit-function-declaration]
   10 |       l = read(0, buf, len);
      |           ^~~~
      |           fread
$ ./intof3
20                      # first input 20
Please len < 10
$ ./intof3
5                       # then input 5
1234567890
12345
$ 67890
-bash: 67890: command not found
$ ./intof3
-1                      # last input -1
12345678901234567890
123456789012345
$

It seems that we restrict len to be smaller than 10, but when len is negative, in read function len will be regarded as unsigned long int.

The two cases in the above examples have a commonality, that is, the formal parameters of the function and the arguments are different, so I think it can be summarized as the wrong type conversion.