Info
Abstract
- complexity of compilers → manual (suffice issue), formal verification (scale issue)
- end-to-end fuzzing → low coverage of some components
- IRFuzzer: specialized fuzzing of LLVM compiler backend
- input validity: constrained mutations, generate LLVM IR inputs (structured control flow, vector types, function definitions)
- feedback quality: instruments coding patterns (monitor execution status of instruction selection) → matcher table coverage feedback + architecture specific guidance (to mutator)
Introduction
- complexity of LLVM: 7M+ C/C++ code, 2500+ devs
- critical: effective and scalable verification method
- extensive regression testing: not sufficient, not bug-free, influential
- testing techniques: model checking, fuzzing, differential testing
- formal verification -x→ LLVM (architectures, languages, models (just-in-time compilation & link-time optimization))
- challenge: compiler optimization ↔ machine code generation (widely used but not specified)
- backends differ greatly in their relative code maturity
- sota fuzzers failed to find new bugs of a compiler backend:
- AFL++ (general-purpose fuzzing techniques):
- not consider input validity
- most binary strings are invalid inputs → struggle to explore control paths in the backend
- FuzzMutate (valid IR mutation):
- cannot construct complex control flow
- only generate a few instructions with scalar types
- CSmith, GrayC (end-to-end fuzzing tools):
- cannot efficiently explore control paths in backend
- CSmith no feedback → ineffectiveness
- GrayC (branch coverage feedback from libFuzzer) → missed many backend bugs
- front-end parser & middle-end optimizations → limit the set of features seen by backend
- C (high level language) → not exercise all backend features
- AFL++ (general-purpose fuzzing techniques):
- test effectively: generate LLVM IR that compiles with the language reference
- llvm-opt-fuzzer & llvm-isel-fuzzer → generate valid IR for middle-end and backend fuzzing
- challenges in generating valid IR:
- use-before-definition situations → maintain all data dependencies to generate complex CFG (easily invalidated by a jump (Fig 2), only exists in goto-languages)
- model instructions missing in FuzzMutate → math types of operands in each IR instruction → enumerate large numbers of natively supported vector types
- model intrinsic functions for all architectures → intrinsics are often poorly documented and vary from architectures
- feedback limitation:
- AFL++’s feedback mechanism performs poorly in backend
- branch coverage → severe branch collision problems (fuzzing large code based, e.g. LLVM)
- code generation logic (in LLVM backend) → table-driven state machines
- contribution:
-
IRFuzzer → fuzzing LLVM compiler backend (the first)
- green: contributions
- orange: AFL++
- blue: LLVM
-
mutator → generate valid IR (CFG correctness) → efficiency
-
descriptive language → list instructions requirements
-
FuzzMutate + { multiple basic blocks, function calls, intrinsic functions, vector types } (special handling in backend)
-
instrument table-driven state machine → new coverage metric → all information about the instructions and intrinsics haven’t been fuzzed → coverage report (states that haven’t been executed) → guide mutations
-
- evaluation:
- 29 mature backend architectures
- better edge coverage & matcher table coverage
- 74 confirmed, new bugs in LLVM
Background
LLVM
-
compiler, different architectures, target-independent abstraction IR
- frontend (clang): programming language → LLVM IR, with lexer, parser, AST transformations
- middle-end (opt): common target-independent optimizations
- backend (llc): LLVM IR → target-specific machine code, multiple target architectures (plug-in abstraction, API functions implementation)
-
LLVM IR:
- static single-assignment (SSA), with a fix set of instructions
- instructions with types (match between definitions and uses)
- definition of functions (control flow ↔
call
instruction) - architecture specific intrinsic: no corresponding IR instructions, represented as function calls at IR level
- control flow within a function → basic blocks & branch instructions
- PHI instructions: refer values in different blocks → respect constraints, only refer defined values (domination constraint) → high-level language generation techniques cannot be easily adapted to LLVM IR
- instruction selection: target-independent LLVM IR instructions → target-specific machine code instructions
- SelectionDAG (instruction selection framework): basic block → DAG
- GlobalIsel
- rewrite patterns: LLVM IR instruction → machine instruction
- TableGen (LLVM-specific language): patterns → matcher table (state-machine representation)
- matcher table: IR instruction → determine pattern to apply
Coverage-Guided Fuzzing
-
AFL:
- track control-flow coverage
- store test inputs (with new coverage) in seed cache
- mutate, not randomly generate
- explore different control-flow paths
-
libFuzzer, FuzzMutate (LLVM): limited type of code
Challenges in Compiler Fuzzing
- size and complexity:
- semantically meaningful test input (var define-use, LLVM IR doesn’t have scope / lifetime)
- syntax-correct input:
- LLVM IR is a strongly typed language (with numerous types)
- intrinsic functions: customized LLVM IR instructions implemented by each architecture (poorly documented)
- static single-assignment (SSA) representation: only allows each variable to be assigned once
- machine instructions do not correlate with instruction selection’s control flow:
- traditional code coverage: ineffective
- LLVM IR → table-driven method (SelectionDAG / GlobalIsel) → binary executable
- TableGen (code generation patterns) → matcher table (data & control instructions)
- different data → same control flow → different instructions
Design
- mutation: generate function
- create more control flows (change CFG)
- generate new IR instructions and mutate them
- measure coverage
LLVM IR Mutation
- mutation-based strategy (variety & valid): small valid seed inputs → modify (also valid)
Function generation
- function calls: target-specific code in LLVM backend → function definitions & calls w/ different arguments & return types
- generate new function definitions (mutation strategy):
- constraints: function signature return type = type of function definition each
return
instruction
- constraints: function signature return type = type of function definition each
- generate new
call
instructions (mutation strategy):- select any declared function → generate compatible arguments & return values
- intrinsic functions: target specific operations (correspond to complicated machine instructions)
Control flow graph mutation
-
control flow: target-specific backend code differs
- machine code optimization (jump threading) → restructure control flow
-
generate control flow:
- sub-control flow graph (sCFG): not change edges, but split block and insert sCFG inside
- randomly select a block and split into two (source entry block & sink exit block) → generate random sCFG starting from source and ending with sink
- schemas: branch, switch, return
3.1.3 Instruction modeling and generation
- backend: LLVM IR types (wide) → types implemented by target architecture (small)
- to exercise all code generation features → generate IR instructions with many data types
- IR instructions categories (modeling):
- generation: randomly select an opcode → randomly select values (compatible types)
- other operations: store, load (easy), call (complex)
- ensure values use-after-define: search defined values (global variable, function argument, values in dominators, values defined by previous instructions)
- cases
- values in dead code → create a use for such values (store in stack memory or global variable)
- avoid undefined behavior: values → placeholders
- intrinsic functions: not model, rely on matcher table coverage feedback
- call the function definitions (intrinsics that haven’t been generated)
Matcher Table Feedback
Matcher table instrumentation
- matcher table → machine instructions generation (not control flow)
- to generate more patterns → track matcher table usage (not edge coverage)
- matcher table size (num of entries):
- an entry being accessed multiple times ↔ the pattern being triggered multiple times → only track accessed or not → boolean for each entry
- save space: 8 boolean in 1 byte
- fuzzer (calculate boolean offset → access entry) → compiler (matcher table coverage) → w/ edge coverage, if either new, the input is new
IR mutation feedback
- need: instruction / intrinsic is generated or not knowledge (LLVM hides)
- modify TableGen → look-up table (matcher table entries ↔ machine instruction patterns) → generation condition
- look-up table for each architecture: decode matcher table coverage (instructions whether generated)
Implementation
- FuzzMutate + more mutation strategies (function calls, control flow graph mutation, arbitrary data types)
- matcher table coverage instrumentation
- modify TableGen → dump look-up table for each matcher table
Evaluation
- RQs:
- compare with sota backend fuzzers
- compare with end-to-end fuzzers (CSmith / GrayC)
- ablation: mutator & feedback
- new bugs
- insights from bugs
- 21 architectures backends, matcher table size > 25,000 → 29 target CPUs in 12 architectures
- baseline fuzzers: AFL++, AFL++ (w/ FuzzMutate)
- metrics: branch table coverage (AFL++ report), matcher table coverage (#entries accessed / matcher table size)
- variants: IRF (full), (w/o feedback)
- 24h, 5 times average, 92 seeds initialized (randomly selected from LLVM’s unit testing)
Baseline Comparison
- AFL++ (w/o LLVM IR-aware mutator), FuzzMutate (w/ limited LLVM IR-aware mutator)
- SelectionDAG
- AFL++: lack of support for structured input
- matcher table coverage ↔ coverage of instruction selection patterns
Comparison with End-to-end Fuzzers
- baselines: CSmith, GrayC
- generate C code (IRFuzzer generates LLVM IR mutation)
- exercise the entire compilation pipeline (IRFuzzer focuses backend)
- CSmith lacks feedback, GrayC relies on branch coverage (not for backend)
- need cross compilation toolchain to compile C to different architectures (3 architectures)
- 715,147 C by GrayC, 506,971 C by CSmith in 24h
- metrics: control-flow coverage, matcher table coverage (IRFuzzer instrumentation)
- why CSmith and GrayC poor at matcher table coverage:
- lack of coverage of vector data types
- Vector instructions can only be generated when the frontend and middle-end decide a vector instruction will speed up a particular piece of code, which turns out to be rather unlikely for random C programs.
- why CSmith and GrayC good at matcher table coverage:
- size of generated files: IRFuzzer (less than 10 KB), CSmith (> 40 KB)
- many backend edges are executed more times → increase branch coverage
- specialized fuzzing > end-to-end fuzzing in backend testing
Individual Contributions
- Table 3
- > FuzzMutate → mutator generates more diverse input
- IRFuzzer ~> :
- matcher table coverage feedback lowers the throughput → affects overall branch coverage
- more diverse inputs → higher matcher table coverage
- tradeoff is acceptable
Bug Categories and Analysis
-
74 confirmed bugs
-
bugs categorized by locations:
- CodeGen and popular architectures have more bugs
-
bugs categorized by causes:
- hang, memory errors, assertion failures, logic errors, missing patterns, and other bugs
- missing patterns: instructions exist, patterns missing
- assertion failures: counterexamples for properties which were considered as always true by developers
-
49 fixed bugs
Bugs Case Study
Compiler hang
- optimization undo-redo infinity loop
- TurnSwitchRangeIntoICmp: left → right (bit operation)
- FoldValueComparisonIntoPredecessors: right → left (reduce number of comparison operations)
Memory error
- null pointer reference, double frees, out of bounds (OOB)
- out of bounds (undef):
Logic error
- unclear documentation or undocumented assumptions
- signed integer extension
- introduced 9 years ago, never noticed:
- less vector operations by frontend
- documentation was ambiguous