Here we experiment integer overflow on Linux
.
Integer Overflow Introduction
In C language, the basic data types of integers are divided into short (short), integer (int), and long (long). These three data types are also divided into signed and unsigned, each data type. They all have their own size ranges (because the size range of the data type is determined by the compiler, so the default is to use gcc-5.4 under 64 bits), as shown below:
Type | Byte | Range |
---|---|---|
short int | 2byte(word) | 0~32767(0~0x7fff) -32768~-1(0x8000~0xffff) |
unsigned short int | 2byte(word) | 0~65535(0~0xffff) |
int | 4byte(dword) | 0~2147483647(0~0x7fffffff) -2147483648~-1(0x80000000~0xffffffff) |
unsigned int | 4byte(dword) | 0~4294967295(0~0xffffffff) |
long int | 8byte(qword) | Positive: 0~0x7fffffffffffffff Negative: 0x8000000000000000~0xffffffffffffffff |
unsigned long int | 8byte(qword) | 0~0xffffffffffffffff |
When the data in the program exceeds the range of its data type, it will cause an overflow, and the overflow of the integer type is called integer overflow.
Principle
Here briefly explains the principle of integer overflow.
Upper Bound Overflow
There are two cases of upper bound overflow, 0x7fff + 1
and 0xffff + 1
.
That is because the low-level instructions of the computer are not distinguishable between signed and unsigned, and all data is stored in binary (in compiler level, it distinguishes between signed and unsigned, and produces different assembly instructions).
So add 0x7fff, 1 == 0x8000
, this kind of upper bound overflow has no effect on unsigned integers, but in signed short integers, 0x7fff
means 32767
, but 0x8000
It is -32768
, which is represented by a mathematical expression in the signed short integer 32767+1 == -32768
.
The second case is add 0xffff, 1
. In this case, the first operand is to be considered.
For example, the assembly code for the signed addition above is add eax, 1
, because eax=0xffff
, so add eax, 1 == 0x10000
, but the unsigned assembly code is to do add Word ptr [rbp - 0x1a], 1 == 0x0000
.
In the signed addition, although the result of eax
is 0x10000
, only the value of ax=0x0000
is stored in the memory, and the result is the same as the unsigned.
In the signed short integer, 0xffff==-1, -1 + 1 == 0
, this calculation is no problem from a signed one. In an unsigned short, 0xffff == 65535, 65535 + 1 == 0
.
assembly instructions of signed and unsigned addition:
1 |
|
Lower Bound Overflow
Lower bound overflow is similar to the upper bound overflow. In the assembly, just need to replace add
with sub
.
There are two cases as well:
The first case is sub 0x0000, 1 == 0xffff
, which is ok for signed 0 - 1 == -1
, but for unsigned it becomes 0 - 1 == 65535
.
The second case is sub 0x8000, 1 == 0x7fff
, for unsigned it is 32768 - 1 == 32767
is correct, but for signed it becomes -32768 - 1 == 32767
.
Example
It can be summarized in two cases.
Unrestricted Range
This situation is easy to understand. A thing of a fixed size, if not constrained well, will cause unpredictable consequences.
Here we write a sample intof.c
:
1 |
|
Compile and run it:
1 | $ gcc -o intof intof.c |
What happened? We use gdb
to see.
1 | $ gdb intof -q |
We input -1
to apply, only to apply 0x20
size heap, while we can input a string of 0xffffffff
size. (from integer overflow to heap overflow)
Wrong Type Conversion
Even if the correct constraints on the variables, there is still the possibility of integer overflow vulnerabilities, I think it can be summarized as the wrong type conversion, if you continue to subdivide, it can be divided into:
- A large range variable is assigned to a small range variable.
Example intof2.c
:
1 |
|
Compile it and run:
1 | $ gcc -o intof2 intof2.c |
The above is to copy a large variable (long integer a) to a variable with a small range (integer variable n) and causing an integer overflow.
The long integer
occupies 8 bytes of memory space, while the integer
has only 4 bytes of memory space, so when long
is converted to int
, it will cause truncation, and only the low 4 bytes of the long integer will be passed to the integer variable.
In this example, it converts long: 0x100000000
to int: 0x00000000
.
But when passing a smaller variable value to a larger variable, it causes no data loss.
- Only Unilateral Restrictions.
This case is only for signed types
Example intof3.c
:
1 |
|
Compile and run:
1 | $ gcc intof3.c -o intof3 |
It seems that we restrict len
to be smaller than 10
, but when len
is negative, in read
function len
will be regarded as unsigned long int
.
The two cases in the above examples have a commonality, that is, the formal parameters of the function and the arguments are different, so I think it can be summarized as the wrong type conversion.