Binary SCA (Software Composition Analysis) Related Tools
TyeYeah Lv4

Software Composition Analysis (SCA) for binary helps a lot for vulnerability discovery and program analysis, related fields include: stripped binary symbols recovery, and open-source components detection.

Recover Symbols for Stripped Binaries

Stripping symbols is a good way to reduce size of binaries, and to hide program information.

Use strip or sstrip in Kickers of ELF to remove symbols and sections.

But for program analysts, recovering binary symbols helps a lot.

IDA Pro FLIRT

The traditional technique FLIRT (Fast Library Identification and Recognition Technology) using signatures to match binary functions if they are stripped library functions linked statically.

Prepare Sig Files

In /path/to/IDA/ida_plugin/ there is a flair70.zip (for IDA 7.0 version, backup here), unzip for binaries we want to use (e.g. in linux), and remember to give them executable permission:

1
2
3
4
5
6
$ pwd
/path/to/flair70/bin/linux
$ ls
dumpsig pcf pelf pelf.rtb plb pmacho pomf166 ppsx ptmobj sigmake zipsig
$ chmod +x pelf sigmake
...

To prepare signature database, first choose materials (compiled archived libraries, like libc.a), then use pelf(it’s for linux ELF format, pcf for windows COFF format) and sigmake to generate sig file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# choose one small lib as an example, not *.so but *.a format 
$ file /lib/libsupp.a
/lib/libsupp.a: current ar archive
# copy to flair bins' dir
$ cp /lib/libsupp.a /path/to/flair70/bin/linux
$ cd /path/to/flair70/bin/linux

# generate pattern (intermediate) file
$ ./pelf libsupp.a libsupp.pat
/path/to/flair70/bin/linux/libsupp.a: skipped 0, total 1
# it goes smoothly, while sometimes goes wrong:
$ ./pelf libcurl.a libcurl.ptn
Fatal [libcurl.a] (libcurl_la-file.o): Unknown relocation type 42 (offset in section=0x14).
# to deal with it, : -rN:O:L support relocation type N (mark as variable L bytes at offset O from the relocation address). can be specified multiple times
$ ./pelf -r42:0:0 libcurl.a libcurl.ptn

# generate signature file
$ ./sigmake libsupp.pat libsupp.sig
# still smoothly, while not always when meeting big libraries:
$ ./sigmake libboost_thread.ptn libboost_thread-1.63.sig
libboost_thread-1.63.sig: modules/leaves: 95/146, COLLISIONS: 4
# when collisions occur, edit XXX.exc and make again
$ vim libboost_thread.exc
...
$ ./sigmake libboost_thread.ptn libboost_thread-1.63.sig

# finally copy `sig` files to `/path/to/IDA/sig/pc`
# (`pc` or `arm` or `ppc` or `mips` at the tail indicates ISA)
$ cp *.sig /path/to/IDA/sig/pc

There are databases for matching already:

Usage

shift + F5 and right button to choose Apply new signature,
shift + f5
sig list

or File -> Load file -> FLIRT signature file to add sig file,
sig list

After added sig, we can see matched results like:
sig result

To make a comparison, before and after:
before sig match
signature matched

While sig files may be too much, so how to quickly locate which sig to use?

  • Use file or strings to match version information first.
  • Use lscan
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    $ pip install pyelftools pefile
    $ git clone https://github.com/maroueneboubakri/lscan.git
    $ cd lscan
    $ python lscan.py -S amd64/sig -f stripped
    No symbol table found bin binary
    amd64/sig/libm-2.13.sig 6/445 (1.35%)
    amd64/sig/libpthread-2.13.sig 18/319 (5.64%)
    amd64/sig/libc-2.23.sig 447/2869 (15.58%)
    amd64/sig/libc-2.22.sig 420/2859 (14.69%)
    amd64/sig/libssl-1.0.2h.sig 0/665 (0.00%)
    amd64/sig/libm-2.23.sig 5/600 (0.83%)
    amd64/sig/libc-2.13.sig 133/3369 (3.95%)
    amd64/sig/libm-2.22.sig 5/582 (0.86%)
    amd64/sig/libpthread-2.22.sig 18/262 (6.87%)
    amd64/sig/libcrypto-1.0.2h.sig 3375/5057 (66.74%)
    amd64/sig/libpcre-8.38.sig 1/150 (0.67%)
    amd64/sig/libpthread-2.23.sig 19/258 (7.36%)
    check This DEMO out

Finger

The Best by far
Finger, a tool for recognizing function symbol, developed by Alibaba Cloud · Cloud Security Technology Lab.
Let’s take a look at the before-and-after comparison:
before Finger match
Finger matched

Rizzo

It also generate signatures to match, like FLIRT.

While, “Formal” signatures, “Fuzzy” signatures, String-based signatures and Immediate-based signatures can be generated separately, to face scenarios requiring different accuracy.

Rizzo-IDA for IDA 7.4+, and IDA7-Rizzo for IDA 7.0, and easy to find in tacnetsol/ida

To generate signatures for functions in your current IDB:
Generating Rizzo signatures

To apply generated signatures to your current IDB:
Applying Rizzo signatures

BinDiff and Diaphora

BinDiff and Diaphora are both binary code similarity detection tools, using hashing to do signature matching.

Actually all these tools need is a complete hash/signature database.

BinaryAI

BinaryAI: The Neural Search Engine for Binaries.
It used to be a IDA Pro plugin for binary-source matching, with a fairly good performance, opensourced at binaryai/sdk.

Researchers could install and register by referring to BinaryAI Docs, on which got installation steps, usages, and a video demo.

However for now BinaryAI aims at SCA, let users upload binary files, BinaryAI provides users with detailed and clear online reports, which include basic file information, software composition analysis, string information, etc., helping users to find the starting point for security analysis and improve efficiency of security analysis.

Try this at BinaryAI Binary Analysis Platform. See more docs at BinaryAI Documents

SCA (Software Composition Analysis) Tools

Software composition analysis (SCA) is a process of identifying the third party and open source components in the applications of an organization. This analysis leads to the discovery of security risk, quality of code and license compliance of the components.

Karta

“Karta” (Russian for “Map”) is an IDA Python plugin that identifies and matches open-sourced libraries in a given binary.

See RTD for its document.

Installation:

1
2
3
4
$ pip install elementals sark
$ git clone https://github.com/CheckPointSW/Karta
$ cd Karta
$ python setup.py install

Usage:

  • File -> Script File... to choose scripts in https://github.com/CheckPointSW/Karta/tree/master/src
  • thumbs_up/thumbs_up_firmware.py and thumbs_up/thumbs_up_ELF.py for analysis on ARM code/data segments (dealing with ARM/THUMB code transitions), needs scikit-learn library
  • karta_identifier.py identifies the existence of supported open source projects and fingerprints the exact library version, like:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    Karta Identifier - printer_firmware.bin:
    ========================================

    Identified Open Sources:
    ------------------------
    libpng: 1.2.29
    zlib: 1.2.3
    OpenSSL: 1.0.1j
    gSOAP: 2.7
    mDNSResponder: unknown

    Identified Closed Sources:
    --------------------------
    Treck: unknown

    Missing Open Sources:
    ---------------------
    OpenSSH: Was not found
    net-snmp: Was not found
    libxml2: Was not found
    libtiff: Was not found
    MAC-Telnet: Was not found

    Final Note - Karta
    ------------------
    If you encountered any bug, or wanted to add a new extension / feature, don't hesitate to contact us on GitHub:
    https://github.com/CheckPointSW/Karta
  • karta_manual_anchor.py for defining “manual anchors”:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    $ python karta_manual_anchor.py --help
    usage: karta_manual_anchor.py [-h] [-D] [-W] bin lib-name lib-version configs

    Enables the user to manually defined matches, acting as manual anchors, later
    to be used by Karta's Matcher.

    positional arguments:
    bin path to the disassembler's database for the wanted binary
    lib-name name (case sensitive) of the relevant open source library
    lib-version version string (case sensitive) as used by the identifier
    configs path to the *.json "configs" directory

    optional arguments:
    -h, --help show this help message and exit
    -D, --debug set logging level to logging.DEBUG
    -W, --windows signals that the binary was compiled for Windows
  • other scripts: karta_manual_identifier.py, karta_matcher.py, …

OpenSCA-Cli

OpenSCA is intended for scanning the third-party component dependencies and vulnerabilities.

Download github releases/gitee releases to use.
For detecting the component information only:

1
$ opensca-cli -path ${project_path}

For connecting to the cloud platform (visit opensca.xmirror.cn for token first):

1
2
3
$ opensca-cli -url ${url} -token ${token} -path ${project_path}
$ opensca-cli -url https://opensca.xmirror.cn -token ${token} -path ${project_path} -out output.json

Or for using the local vulnerability database:

1
$ opensca-cli -db db.json -path ${project_path}

Pigaios

Pigaios (‘πηγαίος’, Greek for ‘source’ as in ‘source code’) is a tool for diffing/matching source codes directly against binaries.

It can match binaries with source regardless of it being compilable or not, using Python CLang bindings to match, and import IDA symbols . However if source can be compiled easily, Diaphora might be preferred.

Install requirements first (my env: kali-rolling, python 3.10, LLVM-14, IDA 7.7):

1
2
3
4
5
$ sudo apt-get install clang python3-clang libclang-dev python-colorama python-sklearn
# or
$ pip install clang
$ pip install colorama
$ pip install scikit-learn

Choose Zlib 1.2.13 as an example, first generate code signature database.

Step 1: generate project file sbd.project

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ wget https://zlib.net/zlib-1.2.13.tar.gz
$ tar -xzf zlib-1.2.13.tar.gz
$ cd zlib-1.2.13

$ python /path/to/srcbindiff.py -create
Project file 'sbd.project' created.

$ cat sbd.project
####################################################
# Default Source-Binary-Differ project configuration
####################################################
[GENERAL]
includes = /usr/lib/llvm-14/lib/clang/14.0.6/include
inlines = 0

[PROJECT]
cflags = -I. -I./include
cxxflags = -I. -I./include
export-file = zlib-1.2.13.sqlite
export-header = zlib-1.2.13-exported.h
export-indent = clang-format -i

[FILES]
contrib/blast/blast.c = 1
contrib/infback9/infback9.c = 1
contrib/infback9/inftree9.c = 1
contrib/iostream/test.cpp = 1
contrib/iostream/zfstream.cpp = 1
contrib/iostream2/zstream_test.cpp = 1
...

Step 2: generate sqlite database zlib-1.2.13.sqlite

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
$ python /path/to/srcbindiff.py --no-parallel -export    # parallel also work
[i] Removing existing file zlib-1.2.13.sqlite
[+] CC contrib/blast/blast.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -..
[+] CC contrib/infback9/infback9.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include..
[+] CC contrib/infback9/inftree9.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include..
[+] CXX contrib/iostream/test.cpp -I/usr/lib/llvm-14/lib/clang/14.0.6/include ..
contrib/iostream/zfstream.h:5,10: fatal: 'fstream.h' file not found
[+] CXX contrib/iostream/zfstream.cpp -I/usr/lib/llvm-14/lib/clang/14.0.6/incl..
contrib/iostream/zfstream.h:5,10: fatal: 'fstream.h' file not found
[+] CXX contrib/iostream2/zstream_test.cpp -I/usr/lib/llvm-14/lib/clang/14.0.6..
contrib/iostream2/zstream.h:27,10: fatal: 'strstream.h' file not found
[+] CXX contrib/iostream3/test.cc -I/usr/lib/llvm-14/lib/clang/14.0.6/include ..
[+] CXX contrib/iostream3/zfstream.cc -I/usr/lib/llvm-14/lib/clang/14.0.6/incl..
[+] CC contrib/minizip/ioapi.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I...
[+] CC contrib/minizip/iowin32.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -..
contrib/minizip/iowin32.h:14,10: fatal: 'windows.h' file not found
[+] CC contrib/minizip/miniunz.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -..
[+] CC contrib/minizip/minizip.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -..
[+] CC contrib/minizip/mztools.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -..
[+] CC contrib/minizip/unzip.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I...
[+] CC contrib/minizip/zip.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -..
[+] CC contrib/puff/puff.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I...
[+] CC contrib/puff/pufftest.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I...
[+] CC contrib/testzlib/testzlib.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include..
contrib/testzlib/testzlib.c:3,10: fatal: 'windows.h' file not found
[+] CC contrib/untgz/untgz.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -..
contrib/untgz/untgz.c:277,7: warning: implicit declaration of function 'chmod'..
contrib/untgz/untgz.c:341,7: warning: implicit declaration of function 'mkdir'..
contrib/untgz/untgz.c:659,11: warning: incompatible pointer types assigning to..
contrib/untgz/untgz.c:665,18: warning: incompatible pointer types passing 'gzF..
[+] CC examples/enough.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./i..
[+] CC examples/fitblk.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./i..
[+] CC examples/gun.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./incl..
[+] CC examples/gzappend.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I...
[+] CC examples/gzjoin.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./i..
[+] CC examples/gzlog.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./in..
[+] CC examples/gznorm.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./i..
[+] CC examples/zpipe.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./in..
[+] CC examples/zran.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./inc..
[+] CC test/example.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./incl..
[+] CC test/infcover.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./inc..
[+] CC test/minigzip.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./inc..
[+] CC adler32.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC compress.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC crc32.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC deflate.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC gzclose.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC gzlib.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC gzread.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC gzwrite.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC infback.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC inffast.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC inflate.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC inftrees.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC trees.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC uncompr.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC ztest19.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] CC zutil.c -I/usr/lib/llvm-14/lib/clang/14.0.6/include -I. -I./include
[+] Building definitions...
[+] Building the callgraphs...
[+] Building the constants table...
[+] Creating indexes...

4 warning(s), 0 error(s), 5 fatal error(s)

We can open this sqlite db file

1
2
3
4
5
6
7
8
9
10
$ sqlite3 zlib-1.2.13.sqlite
SQLite version 3.40.0 2022-11-16 12:10:08
Enter ".help" for usage hints.
sqlite> select name from functions limit 5;
BeginCountPerfCounter
BeginCountRdtsc
Display64BitsSize
ExprMatch
ExprMatch
sqlite>

Finally use IDA to match binary with source. Take busybox installed from kali-rolling apt repo as an example, as it uses functions in zlib.
File -> Script File... to choose script sourceimp_ida.py
pigaios sourceimp_ida.py panel

choose right sqlite file, and we can see the result
pigaios match result

We can even visually diffing the pseudo-code of functions, like function gzopen
pigaios pseudo-code diffing

We can also see match reasons
pigaios match reasons

CycloneDX + Dependency-Track, not for Binary but Open source

Nowadays SCA tools are mostly commercial and not free open source, but we can get support from community and OWASP.

As companies use multiple languages to develop, SCA tools should cover multi-langs like Java, Golang, Python and NodeJS.
By analysing files like pom.xml, go.mod, requirements.txt and yarn.lock to extract name/version of third parties from key parameters, we can achieve software composition analysis.
SCA tools should also be convenient to embed in existing DevOps process, and an updating vuln database is needed, which crawling OSSIndex, NVD, NPM and CPE.

CycloneDX and Dependency-Track are both from OWASP.
OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction, in a word, it creates BOM for projects.
OWASP Dependency-Track is a continuous SBOM analysis platform that allows organizations to identify and reduce risk in the software supply chain, it accepts BOM and does vulnerability analysis.

CycloneDX

CycloneDX has its own github repos and various plugins for different language projects, like CycloneDX Maven Plugin for Java and cyclonedx-gomod for Golang, usages can be found in README and all their results are SBOM for the project.

Dependency-Track

Dependency-Track also has its own github repos and we can setup according to docs. Deploying with docker container and executable war are both available, here I use jar bundle to run.

First download in releases page. The dependency-track-apiserver.jar should be used with frontend, and I use dependency-track-bundled.jar which can be used independently.

1
2
3
$ java -jar  dependency-track-bundled.jar
# visit on http://localhost:8080
# admin/admin to login, change passwd first

dependency-track login

After login, setup a project for testing
dependency-track project

Upload BOM in Components tab to analyze, here we upload the bom.json in releases page
dependency-track upload BOM

Finally we get SCA result
dependency-track SCA result

It includes all recognized components, and indexes their potential vulnerabilities.

More

For DevOps, like for Jenkins, it has dependency-track plugin, and add commands of Cyclonedx to building routine can generate SBOM everytime you build the release.

Dependency-Check is also a Software Composition Analysis (SCA) tool that attempts to detect publicly disclosed vulnerabilities contained within a project’s dependencies. It does this by determining if there is a Common Platform Enumeration (CPE) identifier for a given dependency. If found, it will generate a report linking to the associated CVE entries.
It also has Jenkins plugin to use.

Synk is another SCA scanner, which provides more accurate service than Dependency-Check, but it is not open source and needs token authentication.

Powered by Hexo & Theme Keep
Total words 135.7k