A supplement for AFL Fuzzing Intro.
History
From the blog of riusksk.
It has been around 30 years since fuzzing
was born.
Origin
In 1988, on the computer science class of Barton Miller from UW-Madison, it first proposed the concept of Fuzz Generator to test the robustness of Unix programs, which uses random data to test till it crashes.
Therefore Barton Miller is regarded as “the Father of Fuzzing“. But at that time fuzzing
just aims at checking code quality and program stability, instead of finding vulnerabilities.
From Academia to Industry
In 2001, the University of Oulu announced their results of the PROTOS
project, which first implement fuzzing
in network protocol security testing, you can still find testing set for different network protocol.
In 2002, the PROTOS
gradually improved and Microsoft starts to provide financial support for it. In 2003, company called Codenomicon
was set up by project members, they start to apply Fuzzing to commercial products and did find many security issues like Heartbleed
.
In 2002 BlackHat USA, Dave Aitel from Immunity (they got Immunity Debugger) published: “An Introduction to SPIKE, the Fuzzer Creation Kit”. That’s SPIKE
, a network protocol fuzzer based on block template. The strength of it is to support customizing block with variable length; besides random data, it provides some boundary value to increase crashing possibility. The born of SPIKE
let users customize their own network protocol fuzzer, and it plays a huge role in promoting the popularization of fuzzing
.
From PROTOS
to SPIKE
the academia and industry give proof of fuzzing
application in commercial and security field.
File Format Fuzzing
In 2004, the release of Peach
framework is the symbol of file format fuzzing
. It used to be developed in Python
, while after being acquired in 2007 it got rewritten in C#
, and divided into community edition and commercial version.
It supports file format, network protocol and ActiveX controller. By writing pit
file (xml) to define data format.
The Peach
is still being used and even some people released aflsmart to combine Peach
and AFL
.
File fuzzing
can also be used in network protocol fuzzing
, by instrumenting source code and printing Log, then calling API to write a harness
and test on AFL
/libfuzzer
, then it turns to be file fuzzing
.
Grammar Template Fuzzing
In 2008, Mozilla security team published jsfunfuzz
and DOMfuzz
, they produce test cases based on JS grammar template to discovery browser vulnerabilities. Later they are combined as open source funfuzz
, but it has a bad extensionof its grammar template (dharma and Domato are better).
After funfuzz
, there are also many JS
grammar fuzzing
tools like grinder
, nduja
, crossfuzz
and so on.
Browser is always the spot of cyber attacks. Rendering engine and JS engine have always been the main attack surface. Grammar fuzzing
aiming at html
, js
, vbs
and WebSQL
like sqlite
came out natually.
Except browser, JS
in PDF
and Action Script
in Flash
has always been the entry of Adobe Reader
and Adobe Flash
.
Symbolic Execution: Academia VS Industry
In 2008, the symbolic execution
engine KLEE
based on LLVM
was released, leading a new wave of program analysis. Later the symbolic execution
was implemented in fuzzing
to solve CTF
challenges, to finding keys and deobfuscating or so.
The driller is a combination of AFL
and angr
and was used in CGC
(Cyber Grand Challenge). There is also a combination of AFL
and KLEE
: kleefl.
The symbolic execution
is used more in academia than in industry. Applying se
to fuzzing
, increasing code coverage by solving the condition of new path through constraints, can somehow make up for brutal mutation. The main challenge for se
is path explosion, limitation of constraint solving capability, and performance issue. It’s good for small program, like using angr
in CTF
, while not that good for realworld. A new wave of technology for fuzzing
is started not by se
but by code coverage guided technology.
Code Coverage Guided
In the end of 2013, afl-fuzz was released, and it is the first time to instrument in source code and QEMU
to achieve code coverage guided fuzzing
(it must be the milestone in development). AFL
wasn’t that famous until many researchers tweeted their 0day
finding on Twitter in 2014-2015.
Then we got winafl
, libfuzzer
, AFLFast
, Vuzzer
… and editions for different programming language: go
, python
, js
, ruby
… Some well known fuzzer followed the lead, like kernel fuzzer syzkaller, it adds code coverage guiding ability.
In the same time the whole industry is trying to move it to new platform(windows
, android
, IoT
) and to support closed source softwares. That is the hot topic of recent years’ research, like dynamic and static instrumentation, virtual machine simulated execution, and some hardware characteristics.
Syscall Template
in 2015, Google opened source the syzkaller, which is a kernel fuzzer and still being used. It works by defining system function call template, defining parameters’ type in template, and dealing with order/value dependencies. Project Zero has written “Exploiting the Linux kernel via packet sockets(syzkaller usage)” about how to write template and how to use syzkaller
, to discover Linux
kernel vulns.
In windows
kernel fuzzing
, the GUI API calling template is always used; in macOS
kernel fuzzing
, that’s IOKit function. They all based on syscall template method.
In 2016, Google proposed “Structure-Aware Fuzzing”, and applied libprotobuf-mutator based on libfuzzer
and protobuf
. It has the similar idea with syzkaller
, with coverage guided which Peach
dont has, and deals with AFL
and libfuzzer
‘s inefficient mutation for complicated input types. Now Project Zero uses libprotobuf-mutator
to fuzz IOS kernel (“SockPuppet: A Walkthrough of a Kernel Exploit for iOS 12.4”).
Structure-Aware Fuzzing is not sth new, it’s like Peach
idea to define a template for input data type, to improve the accuracy of mutation.
Open Source
In industry the most famous fuzzing
platform should be clusterfuzz, running on 25000+ machines, having found 16000+ chrome bugs, 11000+ open source project bugs. This combines OSS-Fuzz, supports code coverage guidance like libfuzzer
and AFL
, and supports Balck-Box Fuzzing. OSS-Fuzz
and clusterfuzz
were opened source in 2016 and 2019, and Google let every research involve in working on it, set prizes for contributers. Google also developed ASan
, MSan
, TSan
, UBSan
, LSan
(they are all compile-time instrumentation tools) to find bugs, and some vulns only trigger with Sanitizer
open, like enable heap page in windows
, so it helps to find crashes and analyse.
In a word, Google did a lot for fuzzing
community and ecology.
Syntax Tree mutation in Parse Engine Vulnerability Discovery
In 2012, the “Fuzzing with Code Fragments” was published in USENIX, and LangFuzz
was developed by researchers. They collect JS
testing samples from open source browser projects (firefox
, webkit
, chromium
) and the Internet; then use ANTLR(Another Tool for Language Recognition) to analyse AST syntax tree; next split samples into non-terminal symbols (code fragments) and put into fragment pool; finally mutate (choose fragments of same type to replace/insert) input sample according to the pool.
Based on LangFuzz
someone added genetic algorithm to evaluate input and select better individuals to create new samples, that is open source IFuzzer
. However, it is not so complete, and not so many vulns was discovered by it.
In 2018, Samuel Groß from Project Zero released a JS
grammar fuzzer: fuzzilli
. It integrates multiple technologies such as grammar mutation, template generation, and coverage guidance. It use custom intermediate language for grammatical mutation, then convert it to JS
code. The fuzzilli
achieved a lot in JS
vulns discovery and has been used by industry peers for secondary development.
In 2019, “CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines” and “Superion: Grammar-Aware Greybox Fuzzing” are published, and they both cited “Fuzzing with Code Fragments”.CodeAlchemist
put input to syntax tree analysis and data flow analysis, then set pre & post constraints for splited fragments: pre constraints for variable defination; post constraints for fragment’s result, both for solving undefined variable reference.Superion
put syntax tree mutation rules into AFL
, use AFL
to select input. It supports mutiple programming languages and also uses ANTLR
for syntax tree analysis, grammatical expansion friendly.
You can find CodeAlchemist
and Superion
on the Github.
AI in Fuzzing
2018 is the Year of Artificial Intelligence.
AI
has penetrated into many fields, so will fuzzing
. Now people use AI
to produce testing inputs. The question is, to obtain desired effect we have to feed the model too much samples, which is definitely inefficient nowadays.
But that doesn’t mean it is useless, because even if brutal mutation is fast, it needs a good mutating direction to be efficient, that’s part of AI
jobs.
Fuzzer List
From the blog of riusksk.
- AFL
Code coverage guided fuzzer, a milestone, mutation strategy needs improvement. - WinAFL
TheAFL
forwindows
, useDynamoRIO
to instrument closed source softwares, support hardware PT(processor tracing) - AFLFast
FasterAFL
- Vuzzer
Coverage guided fuzzer, supports closed source softwares, tracks data flow using pin of LibDFT, get more path by combining static/dynamic analysis. - PTfuzzer
Coverage guided fuzzer using Intel PT(processor tracing) onLinux
, so it supports closed source softwares. - afl-unicorn
TheAFL
usingunicorn
simulating instructions, supportslinux
closed source softwares. - pe-afl
TheAFL
forwindows
closed source softwares using static instrumentation, supports user level apps and kernel driver. - kAFL
TheAFL
for kernelfuzzing
ofQEMU
VM, supportslinux
,macOS
andwindows
. - TriforceAFL
TheAFL
based onQEMU
whole OS simulation, tracks branch infomation relying on system simulator, supportslinux
kernelfuzzing
. - ClusterFuzzer
An open source extensivefuzzing
infrastructure. - LibFuzzer
A part ofllvm
, an open source in-process coverage guided fuzz engine library. - OSS-Fuzz
Open source fuzzer collection based onOSS-Fuzz
, can automatically download, compile and run in docker. - honggfuzz
coverage driven fuzzer from Google, based on software/hardware, supports multiple platform (Linux
/macOS
/Windows
/Android
). - KernelFuzzer
Cross-platform kernel fuzzing framework, the strategy is not open source (needs to implement by yourself), supportsWindows
/OSX
/QNX
(only providesWindows
compiling script). - OSXFuzzer
MacOS
kernel fuzzer based onKernelFuzzer
. - PassiveFuzzFrameworkOSX
Using HOOK to achieve passiveOSX
kernel fuzzer. - Bochspwn
Based onBoch
instrumentation API to detectDouble Fetches
kernel vulns. - Bochspwn-reloaded
Based onBoch
instrumentation API to detect info leaking. - syzkaller
Alinux
kernel fuzzer, based on coverage guidance, need to implement API call template based on its template syntax, then provide it forsyzkaller
to mutatedata. - dharma
Based on grammar template generated fuzzer, open sourced by Mozilla to fuzz Firefox JS engine. - domator
DOM Fuzzer open sourced by Project Zero, a fuzzer based on template generating in Python. - Fuzzilli
JS engine fuzzer based on grammar mutation, generates testing cases through grammar template, then generate intermediate grammar to mutate, trigger more path combining coverage guidance. - Razzer
Fuzzer for kernel race condition vulns. - ViridianFuzzer
For fuzzing Hyper-V hypercalls kernel driver, fromMWRLabs
. - ChromeFuzzer
Chrome fuzzer modified from grinder grammar generator. - funfuzz
JS fuzzer collection open sourced by Mozilla, mainly to fuzz SpiderMonkey.