Firmadyne and FirmAE
TyeYeah Lv4

Since IoT researches are facing analysing problems due to the discrepancy in the real and virtual environment, the full-system emulation is proved to be a trade-off, between the cost of obtaining physical hardwares/devices, and the insufficient of static or traditional dynamic analysis.
Here we talk about two full-system emulators: Firmadyne and its successor FirmAE which improves the Firmadyne, based on their papers and usages.

Firmadyne

You can find source code and the paper:
Towards Automated Dynamic Analysis for Linux-based Embedded Firmware on Github

Paper

It is their 2016 Network and Distributed System Security Symposium (NDSS) paper, titled Towards Automated Dynamic Analysis for Linux-based Embedded Firmware.

They present FIRMADYNE, their implementation of an automated and scalable dynamic analysis technique specifically designed to accurately identify vulnerabilities in Linux-based embedded firmware.
It addresses characteristic challenges of embedded systems, such as the presence of hardware-specific peripherals, usage of non-volatile memory (NVRAM), and creation of dynamically generated files.
firmdyne architecture

Architectural diagram of `FIRMADYNE` showing the emulation life-cycle for an example firmware image, as described

As depicted above, FIRMADYNE consists of four major components.

Acquisition

This part is about Crawling Firmware.

The web crawler is the first and largely independent component, developed using the Scrapy framework, which downloads firmware images from vendor websites.

Authors manually wrote parsing templates (smart parsers) for each of these websites, allowing to distinguish between firmware images and other undesired binary content.

For dynamic websites that were difficult to crawl automatically, it instead crawled the vendor’s FTP site or collected manually, at the expense of no metadata.

Extraction

This part is about Extracting Firmware Filesystem.

FIRMADYNE uses a custom-written extraction utility built around the binwalk API to extract the kernel (optional) and the root filesystem contained within a firmware image.
These were normalized by storing them as compressed TAR archives within the firmware repository.

The built-in recursive extraction mechanism (“Matryoshka”) within binwalk was vulnerable to path explosion, so FIRMADYNE implemented detection of non-firmware files, including blacklisting input files that were any type of structured binary, and extract sequentially in the order of priority-ranked signatures confidence which was corresponding to file types.

Emulation

This part is about Initial Emulation.
firmdyne initial emulation
Once a filesystem is extracted, FIRMADYNE identifies the hardware architecture of the firmware image. Then, our system uses a pre-built Linux kernel in an instance of the QEMU full system emulator that matches the architecture, endianness, and word-width of the target firmware image. Currently three combinations are supported: little-endian ARM, little-endian MIPS, and big-endian MIPS.

An initial emulation is performed to infer the system and network configuration, achieved by intercepting system calls to the filesystem, networking, and other relevant kernel subsystems.

After collecting information, FIRMADYNE enters the actual emulation phase, in which a matching network environment is configured to communicate with the emulated firmware. A series of network connectivity checks will verify successful network configuration.

  • NVRAM

At least 52.6% of all their extracted images access a hardware non-volatile memory (NVRAM) using a shared library named libnvram.so to persist device-specific configuration parameters.

Since this peripheral is typically abstracted as a key-value store, FIRMADYNE implements a custom NVRAM library to emulate NVRAM-related functions. This custom library is loaded in advance by LD_PRELOAD and intercepts NVRAM-related functions such as const char*nvram_get(const char*key) and int nvram_set(const char*key, char*val), and emulates an NVRAM without physical access. FIRMADYNE initializes key-value pairs using default files in the given firmware, which typically exist for the factory reset functionality of a device, and it has a list of few hardcoded paths of default files to extract key-value pairs like /etc/nvram.default, /etc/nvram.conf or /var/etc/nvram.default, or symbol router_defaults or Nvrams of type char *[] within built-in libraries such as libnvram.so or libshared.so.

However it does not work if images call un-emulated functions, or they implement NVRAM as a custom data structure on a MTD partition, so the further emulation improvements still require manual process.

  • Kernel

It uses custom pre-built kernels instead of extracted kernel.

By hooking 20 system calls using the kernel dynamic probes (kprobes) framework, FIRMADYNE intercepts calls that alter the execution environment, includes operations such as assigning MAC addresses, creating a network bridge, rebooting the system, and executing a program, all of which are monitored by our framework to properly configure the emulated networking environment. This functionality can also be used to provide automatic confirmation of vulnerabilities, especially in conjunction with predefined poison values (e.g., 0xDEADBEEF, 0x41414141).

It also uses rdinit to run custom scripts mounting /dev, /proc etc. at booting, and it loads nandsim kernel module at startup to emulate memory technology device (MTD) partitions accessed via /dev/mtdX.

  • System Configuration

It is mainly about network configuration. System initially emulated firmware in a “learning” phase to gather expected network configuration.
Then it instantiate a TAP device on the host to associate with emulated network interfaces, sometimes a VLAN.

  • QEMU

Since some hardware-specific peripherals (watchdog timers, additional flash storage devices) functionalities are implemented in userspace, instead of kernelspace which can be cleanly emulated with custom kernel module.

So authors modified the appropriate sixteen bytes in QEMU’s source code for the emulated platform flash device to respond with known good values.

Automated Analysis

This part is about Dynamic Analysis.

FIRMADYNE implements 3 dynamic analysis passes, each is registered as a callback, such that when a firmware image enters the network inferred state, registered callbacks are triggered sequentially.

  • Accessible Webpages

To help detect various information disclosure, buffer overflow, and command injection vulnerabilities, they wrote a Python test harness to verify static resource in the image, and try to access pages via web interface. Then mark pages according to the HTTP response and prioritize for further analysis.

  • SNMP Information

A basic analysis out of their curiousity.

  • Vulnerabilities

Using 60 known expploits from MSF to check known security vulnerabilities. For new vulnerabilities, they manually developed POC leveraging predefined poisoned args like 0xDEADBEEF and set verification condition for check.

Limitations

It requires additional manual effort to:

  • fix extraction failures
  • add support for additional hardware architectures
  • correct emulation failures
  • even implement a new analysis pass

Since FIRMADYNE uses custom pre-built kernels, which do not load out-of-tree kernel modules from the filesystem. As a result, it cannot confirm vulnerabilities in kernels or kernel modules shipped by venders.

Besides it has problems identifying uplink (or WAN) port and downlink (or LAN) port, preventing us from determining whether detected vulnerabilities are exploitable from the Internet.

Usage

The Firmadyne is powerful, however is cubersome to use.

A new repo called Firmware Analysis Toolkit (aka FAT) provides more tools (binwalk, Firmadyne) and easier ways for analysing IoT devices.

First of all you have to set up it by:

1
2
3
$ git clone https://github.com/attify/firmware-analysis-toolkit
$ cd firmware-analysis-toolkit
$ ./setup.sh

After installation remember to edit fat.config and provide sudo password for Firmadyne

1
2
3
[DEFAULT]
sudo_password=attify123
firmadyne_path=/home/attify/firmadyne

Get the firmware image and emulate by:

1
$ ./fat.py <imagename>

Remove analyzed firmware images:

1
$ ./reset.py

Here is an example (needs to open in a new tab):
asciicast
As the above shows, we should fix some network errors when using it.

FirmAE

It releases source code and cross-compiled utils, and the paper:
FirmAE: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis

Paper

It is titled FirmAE: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis

We can observe that a slight change in a configuration or device settings, which is easy to apply, may let firmware emulation run without suffering emulation discrepancy problem, which is difficult to handle.

In this regard, authors believe that FIRMADYNE misses many chances of emulating and analyzing IoT firmware images not because of fundamental problems in emulation but because of device setup failures, although these can be easily handled.

To address this issue, they aim at systematizing such heuristics via analyzing many emulation failure cases, and conclude that failure cases in each category can be resolved by applying simple heuristics even though they originate from different root causes.

Most failure cases fall into the following five categories of problems:

  1. boot-related problems, such as an incorrect boot sequence or absence of files,
  2. network-related problems, such as mismatches of network interface or improper configuration,
  3. non-volatile RAM (NVRAM)-related problems, such as missing library functions or customized formats,
  4. kernel-related problems, such as unsupported hardware or functions, and
  5. minor problems, such as unsupported commands or timing issues.

The architecture of it:
firmdyne architecture

FirmAE architecture overview

For the emulation, They specifically focus on emulating web services of wireless routers and IP cameras.

Arbitrated emulation

They systemize the heuristics as arbitrated emulation to bypass the failure cases.

Instead of strictly following the execution behavior of the firmware as is, arbitrated emulation arbitrates between following the original behavior or injecting proper interventions, i.e., intentional operations.
Thereby, it may slightly alter the original behavior of the firmware. However, the goal is not to build an environment identical to the physical device, but to create an environment conducive to the dynamic analysis.

The key idea behind arbitrated emulation is that ensuring high-level behavior is sufficient to perform dynamic analysis on internal programs, which is relatively easy to do, rather than finding and fixing the exact root causes of emulation failures.

To solve corresponding failures of Firmadyne, They proposed some arbitrations.

Boot Arbitrations

  • improper booting sequence

Firmware has custom paths for initializing program.

Firmadyne built a script that searches and executes a hard-coded list of files frequently accessed for initializing programs.

FirmAE extracts useful information in the kernel of the image. Specifically, it utilizes a kernel’s command line string, which is used for default configuration of the kernel in the booting procedure.

  • missing filesystem structure

Failure cases occur due to the absence of files or directories.

Firmadyne attempted to address this by creating and mounting hard-coded paths such as proc, dev, sys, or root at the beginning of the custom booting script.

Similar to the previous case, but FirmAE extracted all strings from executable binaries in its filesystem ranther kernel, before emulating a given image. Then, it filtered them to obtain strings that are highly likely to indicate paths and prepared the file structure based on the paths.

Network Arbitrations

  • invalid IP alias handling

IP aliasing is assigning multiple IP addresses to a network interface.

Firmadyne implements by adding static routing rules to TAP in the host, which makes network collide.

FirmAE lets the host use its default routing rule, because packets can be routed to any devices connected to TAP without interventions.

  • no network information

This is beacuse some firmwares rely on DHCP server, others may rely on peripherals.

FirmAE arbitrates these cases with an intervention that forcibly configures the network with a default setting. Specifically, we set an Ethernet interface, eth0, with an IP address of 192.168.0.1.

  • multiple network interfaces in ARM

On ARM handling multiple network interfaces will meet errors that dont exist on other platforms.

Nevertheless, we could address the failure with a high-level intervention that forcibly sets up only one Ethernet interface, eth0, and avoids setting the other interfaces.

  • insufficient VLAN setup

Firmadyne attempts to address it by running a command when setting host TAP interface, however it forgets to set host’s VLAN id (sometimes it should be the same as guests’) , which causes errors. FirmAE arbitrates this by properly configuring VLAN.

  • filtering rules in iptables

Since many devices are wrongly configured to be accessible publicly, FirmAE choose to remove existing filtering rules, by flushing all iptables policies and setting the default to accept all incoming packets.

NVRAM Arbitrations

  • supporting custom NVRAM default files

FirmAE records all the key-value pairs accessed with the nvram_get() and nvram_set() functions during the pre-emulation. Then for key names whose values are unknown, it scans the filesystem of the target firmware and searches files that contain them. FirmAE extracts the key-value pairs from the files (if they exist) and utilizes them in the final emulation.

  • no NVRAM default file

Firmadyne addresses this issue by returning the NULL value for uninitialized keys.

FirmAE handles this by arbitrating the behavior of the nvram_get() function. Instead of returning the NULL value when accessing uninitialized keys, FirmAE returns a pointer to an empty string.

Kernel Arbitrations

  • insufficient support of kernel module

Since Firmadyne implemented dummy modules with hard-coded device names and ioctl commands, some programs fail when accessing kernel modules with a different configuration.

As numerous kernel modules are accessed through shared libraries, FirmAE intercepts library function calls similarly to handling NVRAM issues. Every lib func call gets a pre-defined value returned.

  • Improper kernel version

Firmadyne customized Linux kernel v2.6.32 in the firmware emulation. However, recent embedded devices use a newer version of the kernel.

FirmAE tested Linux kernel v4.1.17, finding that the ASLR is not compatible with old versions of libc, so authors used compatibility option (set CONFIG_COMPAT_BRT, excluding randomizing brk area in heap memory) when compiling new kernel.

Other arbitrations

  • unexecuted web servers

It will search for widely used web servers like httpd, lighttpd, boa and goahead and their configs, then forcibly executes them.

  • timeout issues

Firmadyne use a 60 s timeout, then the program will be forcibly stopped.

FirmAE investigated such cases and empirically found a suitable timeout of 240 s.

  • lack of tools for emulation

It adds latest version of busybox into the filesystem of the target firmware, which enables essential commands, and leads to successful emulation.

Automated Analysis

The analysis engine consists of two parts: it automatically initializes and logs into web pages if necessary, and identifies vulnerabilities including memory corruption bugs.

  • initializing web services

A large portion of the web services in our dataset require a network and security configuration. It leveraged Selenium to automate the process.

  • evaluating vulnerability discovery performance

To find 1-day vulnerabilities it utilized RouterSploit and the customized PoC codes.

To find 0-day vulnerabilities, it extracts information from filesystem and constructs a valid request template for fuzzing. By searching filesystem, it could also check web services that are not reachable by crawling.

Usage

Install it first:

1
2
3
$ git clone --recursive https://github.com/pr0v3rbs/FirmAE
$ ./download.sh
$ ./install.sh

Initialize before emulation:

1
$ ./init.sh

Then check emulation, like the pre-emulation:

1
$ sudo ./run.sh -c <brand> <firmware>

Finally run analysis:

1
$ sudo ./run.sh -a <brand> <firmware>

There are four modes to emulate:

1
2
3
4
5
6
7
$ ./run.sh
Usage: ./run.sh [mode]... [brand] [firmware|firmware_directory]
mode: use one option at once
-r, --run : run mode - run emulation (no quit)
-c, --check : check mode - check network reachable and web access (quit)
-a, --analyze : analyze mode - analyze vulnerability (quit)
-d, --debug : debug mode - debugging emulation (no quit)
Powered by Hexo & Theme Keep
Total words 135.7k