A Brief History of Fuzzing | Relish the Moment

History

From the blog of riusksk.
It has been around 30 years since fuzzing was born.

Origin

In 1988, on the computer science class of Barton Miller from UW-Madison, it first proposed the concept of Fuzz Generator to test the robustness of Unix programs, which uses random data to test till it crashes.

Therefore Barton Miller is regarded as “the Father of Fuzzing“. But at that time fuzzing just aims at checking code quality and program stability, instead of finding vulnerabilities.

From Academia to Industry

In 2001, the University of Oulu announced their results of the PROTOS project, which first implement fuzzing in network protocol security testing, you can still find testing set for different network protocol.
In 2002, the PROTOS gradually improved and Microsoft starts to provide financial support for it. In 2003, company called Codenomicon was set up by project members, they start to apply Fuzzing to commercial products and did find many security issues like Heartbleed.

In 2002 BlackHat USA, Dave Aitel from Immunity (they got Immunity Debugger) published: “An Introduction to SPIKE, the Fuzzer Creation Kit”. That’s SPIKE, a network protocol fuzzer based on block template. The strength of it is to support customizing block with variable length; besides random data, it provides some boundary value to increase crashing possibility. The born of SPIKE let users customize their own network protocol fuzzer, and it plays a huge role in promoting the popularization of fuzzing.

From PROTOS to SPIKE the academia and industry give proof of fuzzing application in commercial and security field.

File Format Fuzzing

In 2004, the release of Peach framework is the symbol of file format fuzzing. It used to be developed in Python, while after being acquired in 2007 it got rewritten in C#, and divided into community edition and commercial version.
It supports file format, network protocol and ActiveX controller. By writing pit file (xml) to define data format.

The Peach is still being used and even some people released aflsmart to combine Peach and AFL.

File fuzzing can also be used in network protocol fuzzing, by instrumenting source code and printing Log, then calling API to write a harness and test on AFL/libfuzzer, then it turns to be file fuzzing.

Grammar Template Fuzzing

In 2008, Mozilla security team published jsfunfuzz and DOMfuzz, they produce test cases based on JS grammar template to discovery browser vulnerabilities. Later they are combined as open source funfuzz, but it has a bad extensionof its grammar template (dharma and Domato are better).

After funfuzz, there are also many JS grammar fuzzing tools like grinder, nduja, crossfuzz and so on.

Browser is always the spot of cyber attacks. Rendering engine and JS engine have always been the main attack surface. Grammar fuzzing aiming at html, js, vbs and WebSQL like sqlite came out natually.

Except browser, JS in PDF and Action Script in Flash has always been the entry of Adobe Reader and Adobe Flash.

Symbolic Execution: Academia VS Industry

In 2008, the symbolic execution engine KLEE based on LLVM was released, leading a new wave of program analysis. Later the symbolic execution was implemented in fuzzing to solve CTF challenges, to finding keys and deobfuscating or so.
The driller is a combination of AFL and angr and was used in CGC (Cyber Grand Challenge). There is also a combination of AFL and KLEE: kleefl.

The symbolic execution is used more in academia than in industry. Applying se to fuzzing, increasing code coverage by solving the condition of new path through constraints, can somehow make up for brutal mutation. The main challenge for se is path explosion, limitation of constraint solving capability, and performance issue. It’s good for small program, like using angr in CTF, while not that good for realworld. A new wave of technology for fuzzing is started not by se but by code coverage guided technology.

Code Coverage Guided

In the end of 2013, afl-fuzz was released, and it is the first time to instrument in source code and QEMU to achieve code coverage guided fuzzing (it must be the milestone in development). AFL wasn’t that famous until many researchers tweeted their 0day finding on Twitter in 2014-2015.

Then we got winafl, libfuzzer, AFLFast, Vuzzer… and editions for different programming language: go, python, js, ruby… Some well known fuzzer followed the lead, like kernel fuzzer syzkaller, it adds code coverage guiding ability.

In the same time the whole industry is trying to move it to new platform(windows, android, IoT) and to support closed source softwares. That is the hot topic of recent years’ research, like dynamic and static instrumentation, virtual machine simulated execution, and some hardware characteristics.

Syscall Template

in 2015, Google opened source the syzkaller, which is a kernel fuzzer and still being used. It works by defining system function call template, defining parameters’ type in template, and dealing with order/value dependencies. Project Zero has written “Exploiting the Linux kernel via packet sockets(syzkaller usage)” about how to write template and how to use syzkaller, to discover Linux kernel vulns.

In windows kernel fuzzing, the GUI API calling template is always used; in macOS kernel fuzzing, that’s IOKit function. They all based on syscall template method.

In 2016, Google proposed “Structure-Aware Fuzzing”, and applied libprotobuf-mutator based on libfuzzer and protobuf. It has the similar idea with syzkaller, with coverage guided which Peach dont has, and deals with AFL and libfuzzer‘s inefficient mutation for complicated input types. Now Project Zero uses libprotobuf-mutator to fuzz IOS kernel (“SockPuppet: A Walkthrough of a Kernel Exploit for iOS 12.4”).

Structure-Aware Fuzzing is not sth new, it’s like Peach idea to define a template for input data type, to improve the accuracy of mutation.

Open Source

In industry the most famous fuzzing platform should be clusterfuzz, running on 25000+ machines, having found 16000+ chrome bugs, 11000+ open source project bugs. This combines OSS-Fuzz, supports code coverage guidance like libfuzzer and AFL, and supports Balck-Box Fuzzing. OSS-Fuzz and clusterfuzz were opened source in 2016 and 2019, and Google let every research involve in working on it, set prizes for contributers. Google also developed ASan, MSan, TSan, UBSan, LSan (they are all compile-time instrumentation tools) to find bugs, and some vulns only trigger with Sanitizer open, like enable heap page in windows, so it helps to find crashes and analyse.

In a word, Google did a lot for fuzzing community and ecology.

Syntax Tree mutation in Parse Engine Vulnerability Discovery

In 2012, the “Fuzzing with Code Fragments” was published in USENIX, and LangFuzz was developed by researchers. They collect JS testing samples from open source browser projects (firefox, webkit, chromium) and the Internet; then use ANTLR(Another Tool for Language Recognition) to analyse AST syntax tree; next split samples into non-terminal symbols (code fragments) and put into fragment pool; finally mutate (choose fragments of same type to replace/insert) input sample according to the pool.

Based on LangFuzz someone added genetic algorithm to evaluate input and select better individuals to create new samples, that is open source IFuzzer. However, it is not so complete, and not so many vulns was discovered by it.

In 2018, Samuel Groß from Project Zero released a JS grammar fuzzer: fuzzilli. It integrates multiple technologies such as grammar mutation, template generation, and coverage guidance. It use custom intermediate language for grammatical mutation, then convert it to JS code. The fuzzilli achieved a lot in JS vulns discovery and has been used by industry peers for secondary development.

In 2019, “CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines” and “Superion: Grammar-Aware Greybox Fuzzing” are published, and they both cited “Fuzzing with Code Fragments”.
CodeAlchemist put input to syntax tree analysis and data flow analysis, then set pre & post constraints for splited fragments: pre constraints for variable defination; post constraints for fragment’s result, both for solving undefined variable reference.
Superion put syntax tree mutation rules into AFL, use AFL to select input. It supports mutiple programming languages and also uses ANTLR for syntax tree analysis, grammatical expansion friendly.

You can find CodeAlchemist and Superion on the Github.

AI in Fuzzing

2018 is the Year of Artificial Intelligence.

AI has penetrated into many fields, so will fuzzing. Now people use AI to produce testing inputs. The question is, to obtain desired effect we have to feed the model too much samples, which is definitely inefficient nowadays.

But that doesn’t mean it is useless, because even if brutal mutation is fast, it needs a good mutating direction to be efficient, that’s part of AI jobs.

Fuzzer List

From the blog of riusksk.

AFL
Code coverage guided fuzzer, a milestone, mutation strategy needs improvement.
WinAFL
The AFL for windows, use DynamoRIO to instrument closed source softwares, support hardware PT(processor tracing)
AFLFast
Faster AFL
Vuzzer
Coverage guided fuzzer, supports closed source softwares, tracks data flow using pin of LibDFT, get more path by combining static/dynamic analysis.
PTfuzzer
Coverage guided fuzzer using Intel PT(processor tracing) on Linux, so it supports closed source softwares.
afl-unicorn
The AFL using unicorn simulating instructions, supports linux closed source softwares.
pe-afl
The AFL for windows closed source softwares using static instrumentation, supports user level apps and kernel driver.
kAFL
The AFL for kernel fuzzing of QEMU VM, supports linux, macOS and windows.
TriforceAFL
The AFL based on QEMU whole OS simulation, tracks branch infomation relying on system simulator, supports linux kernel fuzzing.
ClusterFuzzer
An open source extensive fuzzing infrastructure.
LibFuzzer
A part of llvm, an open source in-process coverage guided fuzz engine library.
OSS-Fuzz
Open source fuzzer collection based on OSS-Fuzz, can automatically download, compile and run in docker.
honggfuzz
coverage driven fuzzer from Google, based on software/hardware, supports multiple platform (Linux/macOS/Windows/Android).
KernelFuzzer
Cross-platform kernel fuzzing framework, the strategy is not open source (needs to implement by yourself), supports Windows/OSX/QNX (only provides Windows compiling script).
OSXFuzzer
MacOS kernel fuzzer based on KernelFuzzer.
PassiveFuzzFrameworkOSX
Using HOOK to achieve passive OSX kernel fuzzer.
Bochspwn
Based on Boch instrumentation API to detect Double Fetches kernel vulns.
Bochspwn-reloaded
Based on Boch instrumentation API to detect info leaking.
syzkaller
A linux kernel fuzzer, based on coverage guidance, need to implement API call template based on its template syntax, then provide it for syzkaller to mutatedata.
dharma
Based on grammar template generated fuzzer, open sourced by Mozilla to fuzz Firefox JS engine.
domator
DOM Fuzzer open sourced by Project Zero, a fuzzer based on template generating in Python.
Fuzzilli
JS engine fuzzer based on grammar mutation, generates testing cases through grammar template, then generate intermediate grammar to mutate, trigger more path combining coverage guidance.
Razzer
Fuzzer for kernel race condition vulns.
ViridianFuzzer
For fuzzing Hyper-V hypercalls kernel driver, from MWRLabs.
ChromeFuzzer
Chrome fuzzer modified from grinder grammar generator.
funfuzz
JS fuzzer collection open sourced by Mozilla, mainly to fuzz SpiderMonkey.