Since IoT researches are facing analysing problems due to the discrepancy in the real and virtual environment, the full-system emulation is proved to be a trade-off, between the cost of obtaining physical hardwares/devices, and the insufficient of static or traditional dynamic analysis.
Here we talk about two full-system emulators: Firmadyne
and its successor FirmAE
which improves the Firmadyne
, based on their papers and usages.
Firmadyne
You can find source code and the paper:
Towards Automated Dynamic Analysis for Linux-based Embedded Firmware on Github
Paper
It is their 2016 Network and Distributed System Security Symposium (NDSS) paper, titled Towards Automated Dynamic Analysis for Linux-based Embedded Firmware.
They present FIRMADYNE
, their implementation of an automated and scalable dynamic analysis technique specifically designed to accurately identify vulnerabilities in Linux-based embedded firmware.
It addresses characteristic challenges of embedded systems, such as the presence of hardware-specific peripherals, usage of non-volatile memory (NVRAM), and creation of dynamically generated files.
As depicted above, FIRMADYNE
consists of four major components.
Acquisition
This part is about Crawling Firmware.
The web crawler is the first and largely independent component, developed using the Scrapy
framework, which downloads firmware images from vendor websites.
Authors manually wrote parsing templates (smart parsers) for each of these websites, allowing to distinguish between firmware images and other undesired binary content.
For dynamic websites that were difficult to crawl automatically, it instead crawled the vendor’s FTP site or collected manually, at the expense of no metadata.
Extraction
This part is about Extracting Firmware Filesystem.
FIRMADYNE
uses a custom-written extraction utility built around the binwalk
API to extract the kernel (optional) and the root filesystem contained within a firmware image.
These were normalized by storing them as compressed TAR
archives within the firmware repository.
The built-in recursive extraction mechanism (“Matryoshka”) within binwalk
was vulnerable to path explosion, so FIRMADYNE
implemented detection of non-firmware files, including blacklisting input files that were any type of structured binary, and extract sequentially in the order of priority-ranked signatures confidence which was corresponding to file types.
Emulation
This part is about Initial Emulation.
Once a filesystem is extracted, FIRMADYNE
identifies the hardware architecture of the firmware image. Then, our system uses a pre-built Linux kernel in an instance of the QEMU
full system emulator that matches the architecture, endianness, and word-width of the target firmware image. Currently three combinations are supported: little-endian ARM
, little-endian MIPS
, and big-endian MIPS
.
An initial emulation is performed to infer the system and network configuration, achieved by intercepting system calls to the filesystem, networking, and other relevant kernel subsystems.
After collecting information, FIRMADYNE
enters the actual emulation phase, in which a matching network environment is configured to communicate with the emulated firmware. A series of network connectivity checks will verify successful network configuration.
NVRAM
At least 52.6% of all their extracted images access a hardware non-volatile memory (NVRAM
) using a shared library named libnvram.so
to persist device-specific configuration parameters.
Since this peripheral is typically abstracted as a key-value store, FIRMADYNE
implements a custom NVRAM
library to emulate NVRAM-related functions. This custom library is loaded in advance by LD_PRELOAD
and intercepts NVRAM-related functions such as const char*nvram_get(const char*key)
and int nvram_set(const char*key, char*val)
, and emulates an NVRAM
without physical access. FIRMADYNE
initializes key-value pairs using default files in the given firmware, which typically exist for the factory reset functionality of a device, and it has a list of few hardcoded paths of default files to extract key-value pairs like /etc/nvram.default
, /etc/nvram.conf
or /var/etc/nvram.default
, or symbol router_defaults
or Nvrams
of type char *[]
within built-in libraries such as libnvram.so
or libshared.so
.
However it does not work if images call un-emulated functions, or they implement NVRAM
as a custom data structure on a MTD
partition, so the further emulation improvements still require manual process.
- Kernel
It uses custom pre-built kernels instead of extracted kernel.
By hooking 20 system calls using the kernel dynamic probes (kprobes
) framework, FIRMADYNE
intercepts calls that alter the execution environment, includes operations such as assigning MAC addresses, creating a network bridge, rebooting the system, and executing a program, all of which are monitored by our framework to properly configure the emulated networking environment. This functionality can also be used to provide automatic confirmation of vulnerabilities, especially in conjunction with predefined poison values (e.g., 0xDEADBEEF, 0x41414141).
It also uses rdinit
to run custom scripts mounting /dev
, /proc
etc. at booting, and it loads nandsim
kernel module at startup to emulate memory technology device (MTD
) partitions accessed via /dev/mtdX
.
- System Configuration
It is mainly about network configuration. System initially emulated firmware in a “learning” phase to gather expected network configuration.
Then it instantiate a TAP
device on the host to associate with emulated network interfaces, sometimes a VLAN
.
- QEMU
Since some hardware-specific peripherals (watchdog timers, additional flash storage devices) functionalities are implemented in userspace, instead of kernelspace which can be cleanly emulated with custom kernel module.
So authors modified the appropriate sixteen bytes in QEMU’s source code for the emulated platform flash device to respond with known good values.
Automated Analysis
This part is about Dynamic Analysis.
FIRMADYNE
implements 3 dynamic analysis passes, each is registered as a callback, such that when a firmware image enters the network inferred state, registered callbacks are triggered sequentially.
- Accessible Webpages
To help detect various information disclosure, buffer overflow, and command injection vulnerabilities, they wrote a Python
test harness to verify static resource in the image, and try to access pages via web interface. Then mark pages according to the HTTP response and prioritize for further analysis.
- SNMP Information
A basic analysis out of their curiousity.
- Vulnerabilities
Using 60 known expploits from MSF
to check known security vulnerabilities. For new vulnerabilities, they manually developed POC
leveraging predefined poisoned args like 0xDEADBEEF
and set verification condition for check.
Limitations
It requires additional manual effort to:
- fix extraction failures
- add support for additional hardware architectures
- correct emulation failures
- even implement a new analysis pass
Since FIRMADYNE
uses custom pre-built kernels, which do not load out-of-tree kernel modules from the filesystem. As a result, it cannot confirm vulnerabilities in kernels or kernel modules shipped by venders.
Besides it has problems identifying uplink (or WAN) port and downlink (or LAN) port, preventing us from determining whether detected vulnerabilities are exploitable from the Internet.
Usage
The Firmadyne
is powerful, however is cubersome to use.
A new repo called Firmware Analysis Toolkit
(aka FAT
) provides more tools (binwalk
, Firmadyne
) and easier ways for analysing IoT devices.
First of all you have to set up it by:
1 | $ git clone https://github.com/attify/firmware-analysis-toolkit |
After installation remember to edit fat.config
and provide sudo
password for Firmadyne
1 | [DEFAULT] |
Get the firmware image and emulate by:
1 | $ ./fat.py <imagename> |
Remove analyzed firmware images:
1 | $ ./reset.py |
Here is an example (needs to open in a new tab):
As the above shows, we should fix some network errors when using it.
FirmAE
It releases source code and cross-compiled utils, and the paper:
FirmAE: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis
Paper
It is titled FirmAE: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis
We can observe that a slight change in a configuration or device settings, which is easy to apply, may let firmware emulation run without suffering emulation discrepancy problem, which is difficult to handle.
In this regard, authors believe that FIRMADYNE
misses many chances of emulating and analyzing IoT firmware images not because of fundamental problems in emulation but because of device setup failures, although these can be easily handled.
To address this issue, they aim at systematizing such heuristics via analyzing many emulation failure cases, and conclude that failure cases in each category can be resolved by applying simple heuristics even though they originate from different root causes.
Most failure cases fall into the following five categories of problems:
- boot-related problems, such as an incorrect boot sequence or absence of files,
- network-related problems, such as mismatches of network interface or improper configuration,
- non-volatile RAM (NVRAM)-related problems, such as missing library functions or customized formats,
- kernel-related problems, such as unsupported hardware or functions, and
- minor problems, such as unsupported commands or timing issues.
The architecture of it:
For the emulation, They specifically focus on emulating web services of wireless routers and IP cameras.
Arbitrated emulation
They systemize the heuristics as arbitrated emulation to bypass the failure cases.
Instead of strictly following the execution behavior of the firmware as is, arbitrated emulation arbitrates between following the original behavior or injecting proper interventions, i.e., intentional operations.
Thereby, it may slightly alter the original behavior of the firmware. However, the goal is not to build an environment identical to the physical device, but to create an environment conducive to the dynamic analysis.
The key idea behind arbitrated emulation is that ensuring high-level behavior is sufficient to perform dynamic analysis on internal programs, which is relatively easy to do, rather than finding and fixing the exact root causes of emulation failures.
To solve corresponding failures of Firmadyne
, They proposed some arbitrations.
Boot Arbitrations
- improper booting sequence
Firmware has custom paths for initializing program.
Firmadyne
built a script that searches and executes a hard-coded list of files frequently accessed for initializing programs.
FirmAE
extracts useful information in the kernel of the image. Specifically, it utilizes a kernel’s command line string, which is used for default configuration of the kernel in the booting procedure.
- missing filesystem structure
Failure cases occur due to the absence of files or directories.
Firmadyne
attempted to address this by creating and mounting hard-coded paths such as proc, dev, sys, or root at the beginning of the custom booting script.
Similar to the previous case, but FirmAE
extracted all strings from executable binaries in its filesystem ranther kernel, before emulating a given image. Then, it filtered them to obtain strings that are highly likely to indicate paths and prepared the file structure based on the paths.
Network Arbitrations
- invalid IP alias handling
IP aliasing is assigning multiple IP addresses to a network interface.
Firmadyne
implements by adding static routing rules to TAP
in the host, which makes network collide.
FirmAE
lets the host use its default routing rule, because packets can be routed to any devices connected to TAP
without interventions.
- no network information
This is beacuse some firmwares rely on DHCP
server, others may rely on peripherals.
FirmAE
arbitrates these cases with an intervention that forcibly configures the network with a default setting. Specifically, we set an Ethernet interface, eth0
, with an IP address of 192.168.0.1
.
- multiple network interfaces in
ARM
On ARM
handling multiple network interfaces will meet errors that dont exist on other platforms.
Nevertheless, we could address the failure with a high-level intervention that forcibly sets up only one Ethernet interface, eth0
, and avoids setting the other interfaces.
- insufficient VLAN setup
Firmadyne
attempts to address it by running a command when setting host TAP
interface, however it forgets to set host’s VLAN
id (sometimes it should be the same as guests’) , which causes errors. FirmAE
arbitrates this by properly configuring VLAN
.
- filtering rules in iptables
Since many devices are wrongly configured to be accessible publicly, FirmAE
choose to remove existing filtering rules, by flushing all iptables
policies and setting the default to accept all incoming packets.
NVRAM Arbitrations
- supporting custom
NVRAM
default files
FirmAE
records all the key-value pairs accessed with the nvram_get()
and nvram_set()
functions during the pre-emulation. Then for key names whose values are unknown, it scans the filesystem of the target firmware and searches files that contain them. FirmAE
extracts the key-value pairs from the files (if they exist) and utilizes them in the final emulation.
- no
NVRAM
default file
Firmadyne
addresses this issue by returning the NULL
value for uninitialized keys.
FirmAE
handles this by arbitrating the behavior of the nvram_get()
function. Instead of returning the NULL
value when accessing uninitialized keys, FirmAE
returns a pointer to an empty string.
Kernel Arbitrations
- insufficient support of kernel module
Since Firmadyne
implemented dummy modules with hard-coded device names and ioctl
commands, some programs fail when accessing kernel modules with a different configuration.
As numerous kernel modules are accessed through shared libraries, FirmAE
intercepts library function calls similarly to handling NVRAM issues. Every lib func call gets a pre-defined value returned.
- Improper kernel version
Firmadyne
customized Linux kernel v2.6.32 in the firmware emulation. However, recent embedded devices use a newer version of the kernel.
FirmAE
tested Linux kernel v4.1.17, finding that the ASLR
is not compatible with old versions of libc, so authors used compatibility option (set CONFIG_COMPAT_BRT
, excluding randomizing brk
area in heap memory) when compiling new kernel.
Other arbitrations
- unexecuted web servers
It will search for widely used web servers like httpd
, lighttpd
, boa
and goahead
and their configs, then forcibly executes them.
- timeout issues
Firmadyne
use a 60 s timeout, then the program will be forcibly stopped.
FirmAE
investigated such cases and empirically found a suitable timeout of 240 s.
- lack of tools for emulation
It adds latest version of busybox
into the filesystem of the target firmware, which enables essential commands, and leads to successful emulation.
Automated Analysis
The analysis engine consists of two parts: it automatically initializes and logs into web pages if necessary, and identifies vulnerabilities including memory corruption bugs.
- initializing web services
A large portion of the web services in our dataset require a network and security configuration. It leveraged Selenium to automate the process.
- evaluating vulnerability discovery performance
To find 1-day vulnerabilities it utilized RouterSploit
and the customized PoC codes.
To find 0-day vulnerabilities, it extracts information from filesystem and constructs a valid request template for fuzzing. By searching filesystem, it could also check web services that are not reachable by crawling.
Usage
Install it first:
1 | $ git clone --recursive https://github.com/pr0v3rbs/FirmAE |
Initialize before emulation:
1 | $ ./init.sh |
Then check emulation, like the pre-emulation:
1 | $ sudo ./run.sh -c <brand> <firmware> |
Finally run analysis:
1 | $ sudo ./run.sh -a <brand> <firmware> |
There are four modes to emulate:
1 | $ ./run.sh |