AFL Fuzzing Intro | Relish the Moment

It’s not too late to learn fuzzing which is booming in this age.

Fuzzing Intro

When performing software vulnerability mining or discovery, there are several technical methods includes: static analysis, dynamic analysis, symbolic execution and fuzzing.

Static Analysis:
Through its analysis of various syntax, semantics, data flow, etc., to carry out vulnerability discovery.
Dynamic Analysis:
Step by step analysis of program execution. Normal way to track program’s action is to debug.
Symbolic Execution:
Try to find out what input corresponds to what kind of running state, to cover all execution paths.
Fuzzing:
Use a large amount of input data to automatically execute the program, so as to find out which inputs can make the program abnormal, and then analyze possible vulnerabilities.

As for fuzzing, there’re still several types of it:

Generation Based: generate test cases from scratch
Mutation Based: mutated from existing data samples based on some rules
Evolutionary: Contains both of the above, and mutates based on feedback from code coverage.

Brief History

Not so brief, check it in A Brief History of Fuzzing

Fuzzing Process

Take AFL as an example.
(Remember it’s a model only for afl-fuzz, not for every fuzzer)
AFL fuzzing process
Do source code instrumentation before compilation, to record code coverage.

The debugger provides some input for the program, the leftmost testcases, which are put into a queue after the AFL is loaded.

For each iteration, AFL first takes a testcase from the queue, and then trims it to remove unnecessary data to improve the operation efficiency.

Then it performs mutation operations on the input. There are many modes of mutation, which can generate many new testcases.

Send these newly generated inputs to the target program to run. If it can generate a new execution path or cause the program to crash, put it in the queue.

Throughout the process, every program crash is recorded, which may represent a hidden vulnerability.

Fuzzing Technical Points

Input data

Because the input data is used to automatically execute the program, it is obvious that the generation of data will greatly affect the efficiency.

If the input format of the target program is a pdf file, then files that do not meet the format will be difficult to enter the target program for running tests.
Even if the data meets the input requirements, maybe data A and data B trigger the same execution path, so running both A and B is a waste of resources.
For individual data A, maybe only a small part of it really controls the execution path, then trims for rest parts are a waste of resources.

Mutation operation

The data given by the user is limited, but a lot of data is required to perform the fuzz test, so the fuzzer will generate new data based on the data given by the user. This process is called a mutation operation.

So what mutation operators to define in the mutation process (choose what operations to change the original input data)?
Which one to choose when facing multiple mutation operators in one mutation?
Which input data can be used for mutation?

Increase coverage

The essence of Fuzz is to use the input to detect whether the execution path corresponding to the current input will generate a possible vulnerability. Therefore, if more paths are covered, it means that more vulnerabilities may be detected. Increasing coverage is more like a fundamental issue. The solution to the first two issues is actually improving coverage.

AFL

“American fuzzy lop is a security-oriented fuzzer that employs a novel type of compile-time
instrumentation and genetic algorithms to automatically discover clean, interesting test cases
that trigger new internal states in the targeted binary.
This substantially improves the functional coverage for the fuzzed code. The compact synthesized
corpora produced by the tool are also useful for seeding other, more labor- or resource-intensive
testing regimes down the road.”

Ok nice … but what does this actually mean? (even native speakers are confused)

Binary Instrumentation

The first component of AFL is a wrapper for GCC/CLang compilers that will inject during compilation its own assembly code into the target software.
The fuzzer will use this code to trace execution paths while feeding the system with new inputs and to determine if a new mutation of the input is able to trigger known or unkown (new) execution paths.

In compliation, we get the binary program from the source code. First turn source code to assembly code, then from assembly code to machine code.
gcc (specifically the gcc compiler) is able to turn source code into assembly code, while as (also part of the GNU compiler suite) turns assembly code into machine code.

afl-gcc is a wrapper for the gcc compiler. It calls the gcc compiler for compilation on the one hand, and specifies afl-as instead of as for assembly. afl-as is also a package of as. It analyzes the assembly code and performs instrumentation on the one hand, and calls as to change the assembly code after instrumentation to machine code.

If you can’t/don’t want to recompile the source code of the target program, AFL also supports a QEMU mode, an extension that leverages the QEMU “user emulation” mode and allows callers to obtain instrumentation output for black-box, closed-source binaries. This mechanism can be then used by afl-fuzz to stress-test targets that couldn’t be built with afl-gcc.

Installation

Download the latest version of the source code (older releases), extract it and do the usual:

1 2	$ make $ sudo make install

You have to set CPU to performance mode so that afl-fuzz can normally run.
Install CPU frequency management software and set

1
2
3

$ apt install cpufrequtils
$ cpufreq-info
$ cpufreq-set -g performance

Edit default mode so it won’t get back after reboot (not necessary)

$ apt install sysfsutils
$ vim /etc/sysfs.conf
# add this line
devices/system/cpu/cpu0/cpufreq/scaling_governor = performance

Then it will be performance mode after next boot up.

Run an Example ( Reference )

A relatively simple choice to fuzz is binutils (including readelf, objdump, etc.).
Finish binary instrumentation:

1
2
3

$ ./configure CC="afl-gcc" CXX="afl-g++"
# CC="afl-clang" to use clang
$ make

The following command will instruct the system to output coredumps as files instead of sending them to a specific crash handler app

1	$ echo core > /proc/sys/kernel/core_pattern

Create an input folder to put ELF file, which AFL will use as a base template.
Create an output folder for fuzzer to store its state and all the generated samples that make the application crash or hang.
This time we give /bin/sh a chance.

1 2	$ mkdir afl_in afl_out $ cp /bin/sh afl_in/

Run the fuzzer

$ afl-fuzz -i afl_in -o afl_out ./binutils/objdump -d @@
# '-i' to choose input files' dir
# '-o' to choose output files' dir
# then the following should be usage of target elf file
# '@@' will be replaced by input files when running

And we see

...
[+] All set and ready to roll!

                       american fuzzy lop 2.52b (objdump)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│        run time : 0 days, 0 hrs, 0 min, 54 sec       │  cycles done : 0      │
│   last new path : 0 days, 0 hrs, 0 min, 24 sec       │  total paths : 40     │
│ last uniq crash : none seen yet                      │ uniq crashes : 0      │
│  last uniq hang : none seen yet                      │   uniq hangs : 0      │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│  now processing : 0 (0.00%)         │    map density : 3.75% / 4.37%         │
│ paths timed out : 0 (0.00%)         │ count coverage : 1.28 bits/tuple       │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│  now trying : bitflip 1/1           │ favored paths : 1 (2.50%)              │
│ stage execs : 848/971k (0.09%)      │  new edges on : 31 (77.50%)            │
│ total execs : 3048                  │ total crashes : 0 (0 unique)           │
│  exec speed : 12.54/sec (zzzz...)   │  total tmouts : 0 (0 unique)           │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│   bit flips : 0/0, 0/0, 0/0                         │    levels : 2          │
│  byte flips : 0/0, 0/0, 0/0                         │   pending : 40         │
│ arithmetics : 0/0, 0/0, 0/0                         │  pend fav : 1          │
│  known ints : 0/0, 0/0, 0/0                         │ own finds : 39         │
│  dictionary : 0/0, 0/0, 0/0                         │  imported : n/a        │
│       havoc : 0/0, 0/0                              │ stability : 100.00%    │
│        trim : 0.00%/1879, n/a                       ├────────────────────────┘
^C────────────────────────────────────────────────────┘          [cpu000: 53%]

In the afl_out/crashes folder you will find the files that made objdump crash, and you can run crash samples to see what happens.

Corpus Distillation

After finding input samples, we have to trim them to make them efficient and smaller size.
AFL provides two utils for us to do so: afl-cmin and afl-tmin.

(1) Remove files that executes same code – afl-cmin

afl-cmin aims to find the smallest subset with the same coverage as the full corpus.
For example: Suppose there are multiple files that all cover the same code, then the extra files are discarded.

It is used as follows:

$ afl-cmin -i input_dir -o output_dir -- /path/to/tested/program [params]
# if program requires input from stdin, the contents of testcase will be that
# always needs a '-f' parameter

# or

$ afl-cmin -i input_dir -o output_dir -- /path/to/tested/program [params] @@
# if program requires input file in command line, use '@@', which the real file name will replace in runtime

(2) Reduce size of single input file – afl-tmin

After decreasing of overall size, use afl-tmin to reduce single file’s size.

It has similar usage to afl-cmin, but still has two another working mode: instrumented mode (default) and crash mode (get rid of files causing abnormal exit.).

It is used as follows:

# instrumentation mode
$ afl-tmin -i input_file -o output_file -- /path/to/tested/program [params] @@

# crash mode
$ afl-tmin -x -i input_file -o output_file -- /path/to/tested/program [params] @@
# '-x' to work under crash mode

Batch processing with a simple shell script

1	$ for i in *; do afl-tmin -i $i -o tmin-$i -- /path/to/tested/program [params] @@; done;

Perform Instrumentation

Normal Mode

Replace gcc and g++ with afl-gcc and afl-g++ in Makefile
or
If built by autoconf, use

1	$ ./configure CC="afl-gcc" CXX="afl-g++"

To load shared lib instrumented by AFL, complie statically

1	$ ./configure --disable-shared CC="afl-gcc" CXX="afl-g++"`

LLVM Mode

Under LLVM mode, you’ll obtain higher speed of fuzzing.
Not the newest but lower version (like 3.x.x) of clang and llvm are needed

1
2
3

$ cd llvm_mode
$ apt-get install clang
$ export LLVM_CONFIG=`which llvm-config` && make && cd ..

Then use

1	$ ./configure --disable-shared CC="afl-clang-fast" CXX="afl-clang-fast++"

Start Fuzzing

Fuzzing Mode

Generally afl-fuzz provides 4 fuzzing modes that differ in how the program execution between different fuzzing inputs behaves:

Dumb mode that just executes the program by doing fork() and execv(). This is the slowest mode that does not rely on any fancy tricks to speed up program execution and also does not provide any insights how the program behaves with different inputs.
Basic fork server mode where the fuzzed binary does all the initialization steps that happen before calling the main() function and then program is repeatedly forked from that point on. This also includes instrumentation that is compiled in to the program so there already is some insight on what is happening inside the program when a specific input is processed. There exists QEMU mode for afl-fuzz that technically enables fork server mode for uninstrumented binaries, but with some performance penalty.
Deferred instrumentation that works in similar fashion as the basic fork server mode. Instead forking just before calling main() function, this enables to move the fork point further down the line and enables heavy program initialization steps to be avoided if they can be executed independently of the input.
Persistent mode where the fuzzable part of the program is repeatedly executed without resetting the program memory every time the program is called. This only works in practice if the program does not have a modifiable global state that can not be reset to the previous state.

afl-fuzz generates new inputs and analyzes the program execution results roughly at the same speed regardless of the mode. So these modes are in the order of efficiency in a sense that how much overhead there is for fuzzing one input. They are also in the order of complexity on how easy they are to integrate into an existing program that has not been made to be fuzzed. Especially as the fastest modes require clang to be available as a compiler and the fuzzable program needs to be able to be compiled and linked with it.

White Box Testing

(1) Test instrumented program

After compilation, use afl-showmap to run a test ( track execution path ) and see whether instrumentation is useful.

It prints output and captured tuples, which is for getting branch information

$ afl-showmap -o output_dir /path/to/tested/program [params] 

# example

$ afl-showmap -m none -o /dev/null -- ./build/bin/imagew 23.bmp out.png
[*] Executing './build/bin/imagew'...
-- Program output begins --
23.bmp -> out.png
Processing: 13x32
-- Program output ends --
[+] Captured 1012 tuples in '/dev/null'.
# '-m' to set cache memory
# '-o' to set output directory

$ afl-showmap -m none -o /dev/null -- ./build/bin/imagew 111.pgm out.png
[*] Executing './build/bin/imagew'...
-- Program output begins --
111.pgm -> out.png
Processing: 7x7
-- Program output ends --
[+] Captured 970 tuples in '/dev/null'.
# different input gets different tuples

(2) Run fuzzer

We have mentioned before:

1
2
3

$ afl-fuzz -i testcase_dir -o findings_dir /path/to/program [params]
#or
$ afl-fuzz -i testcase_dir -o findings_dir /path/to/program [params] @@

(3) Use screen

$ screen -S fuzzer1
# start a new screen
$ afl-fuzz -i testcase_dir -o findings_dir /path/to/program [params] @@
# run fuzzer and use ctrl+d to detach
$ screen -ls
# list screen 
$ screen -r fuzzer1
# recovery screen
# if cannot recovery [attached] screen
# add '-D' parameter to kick former user

Black Box Testing

Without source code, instrumentation won’t be finished.

It’s time for QEMU mode, which you should compile first as LLVM mode does.

Nevertheless, AFL’s QEMU version is also old enough, so follow the lead please

$ apt-get install libini-config-dev libtool-bin automake bison libglib2.0-dev -y
$ cd qemu_mode
$ build_qemu_support.sh
$ cd .. && make install

Start QEMU mode fuzzing

1 2	$ afl-fuzz -Q -i testcase_dir -o findings_dir /path/to/program [params] @@ # '-Q' means QEMU mode

Parallel Testing

(1) Sole-system parallel testing

In theory, you can run fuzzing instance according to your computer’s core number, since one one instance is bound to one corresponding core.
Check core number in this way:

1	$ cat /proc/cpuinfo\| grep "cpu cores"\| uniq

Use afl-fuzz to do parallel fuzzing, you have to use -M parameter to assign a Master Fuzzer, use -S to assign multiple Slave Fuzzer

$ screen afl-fuzz -i testcases/ -o sync_dir/ -M fuzzer1 -- ./program
$ screen afl-fuzz -i testcases/ -o sync_dir/ -S fuzzer2 -- ./program
$ screen afl-fuzz -i testcases/ -o sync_dir/ -S fuzzer3 -- ./program
  ...

These two kinds have different fuzzing strategies.
Master Fuzzer does deterministic testing – special rather than random mutations are performed on input files.
Slave Fuzzer performs completely random mutations.

afl-gotcpu helps checking status of every core.

afl-gotcpu e.g.

(2) Multi-system parallel testing

I Bet I’ll Hardly Use This. See You Next Time.