Specialized Fuzzing for Compiler Backend Code Generation

Info

IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

ICSE’25 arXiv pdf

AMD & UC Davis

Abstract

complexity of compilers → manual (suffice issue), formal verification (scale issue)
end-to-end fuzzing → low coverage of some components
IRFuzzer: specialized fuzzing of LLVM compiler backend
- input validity: constrained mutations, generate LLVM IR inputs (structured control flow, vector types, function definitions)
- feedback quality: instruments coding patterns (monitor execution status of instruction selection) → matcher table coverage feedback + architecture specific guidance (to mutator)

Introduction

complexity of LLVM: 7M+ C/C++ code, 2500+ devs
critical: effective and scalable verification method
extensive regression testing: not sufficient, not bug-free, influential
testing techniques: model checking, fuzzing, differential testing
formal verification -x→ LLVM (architectures, languages, models (just-in-time compilation & link-time optimization))
challenge: compiler optimization ↔ machine code generation (widely used but not specified)
backends differ greatly in their relative code maturity
sota fuzzers failed to find new bugs of a compiler backend:
- AFL++ (general-purpose fuzzing techniques):
  - not consider input validity
  - most binary strings are invalid inputs → struggle to explore control paths in the backend
- FuzzMutate (valid IR mutation):
  - cannot construct complex control flow
  - only generate a few instructions with scalar types
- CSmith, GrayC (end-to-end fuzzing tools):
  - cannot efficiently explore control paths in backend
  - CSmith no feedback → ineffectiveness
  - GrayC (branch coverage feedback from libFuzzer) → missed many backend bugs
    - front-end parser & middle-end optimizations → limit the set of features seen by backend
    - C (high level language) → not exercise all backend features
test effectively: generate LLVM IR that compiles with the language reference
llvm-opt-fuzzer & llvm-isel-fuzzer → generate valid IR for middle-end and backend fuzzing
challenges in generating valid IR:
- use-before-definition situations → maintain all data dependencies to generate complex CFG (easily invalidated by a jump (Fig 2), only exists in goto-languages)
- model instructions missing in FuzzMutate → math types of operands in each IR instruction → enumerate large numbers of natively supported vector types
- model intrinsic functions for all architectures → intrinsics are often poorly documented and vary from architectures
feedback limitation:
- AFL++’s feedback mechanism performs poorly in backend
- branch coverage → severe branch collision problems (fuzzing large code based, e.g. LLVM)
- code generation logic (in LLVM backend) → table-driven state machines
contribution:
- IRFuzzer → fuzzing LLVM compiler backend (the first)
  - green: contributions
  - orange: AFL++
  - blue: LLVM
- mutator → generate valid IR (CFG correctness) → efficiency
- descriptive language → list instructions requirements
- FuzzMutate + { multiple basic blocks, function calls, intrinsic functions, vector types } (special handling in backend)
- instrument table-driven state machine → new coverage metric → all information about the instructions and intrinsics haven’t been fuzzed → coverage report (states that haven’t been executed) → guide mutations
evaluation:
- 29 mature backend architectures
- better edge coverage & matcher table coverage
- 74 confirmed, new bugs in LLVM

Background

LLVM

compiler, different architectures, target-independent abstraction IR
- frontend (clang): programming language → LLVM IR, with lexer, parser, AST transformations
- middle-end (opt): common target-independent optimizations
- backend (llc): LLVM IR → target-specific machine code, multiple target architectures (plug-in abstraction, API functions implementation)
LLVM IR:
- static single-assignment (SSA), with a fix set of instructions
- instructions with types (match between definitions and uses)
- definition of functions (control flow ↔ call instruction)
- architecture specific intrinsic: no corresponding IR instructions, represented as function calls at IR level
- control flow within a function → basic blocks & branch instructions
- PHI instructions: refer values in different blocks → respect constraints, only refer defined values (domination constraint) → high-level language generation techniques cannot be easily adapted to LLVM IR
- instruction selection: target-independent LLVM IR instructions → target-specific machine code instructions
  - SelectionDAG (instruction selection framework): basic block → DAG
  - GlobalIsel
- rewrite patterns: LLVM IR instruction → machine instruction
- TableGen (LLVM-specific language): patterns → matcher table (state-machine representation)
- matcher table: IR instruction → determine pattern to apply

Coverage-Guided Fuzzing

AFL:
- track control-flow coverage
- store test inputs (with new coverage) in seed cache
- mutate, not randomly generate
- explore different control-flow paths
libFuzzer, FuzzMutate (LLVM): limited type of code

Challenges in Compiler Fuzzing

size and complexity:
- semantically meaningful test input (var define-use, LLVM IR doesn’t have scope / lifetime)
syntax-correct input:
- LLVM IR is a strongly typed language (with numerous types)
- intrinsic functions: customized LLVM IR instructions implemented by each architecture (poorly documented)
- static single-assignment (SSA) representation: only allows each variable to be assigned once
machine instructions do not correlate with instruction selection’s control flow:
- traditional code coverage: ineffective
- LLVM IR → table-driven method (SelectionDAG / GlobalIsel) → binary executable
- TableGen (code generation patterns) → matcher table (data & control instructions)
- different data → same control flow → different instructions

Design

mutation: generate function
create more control flows (change CFG)
generate new IR instructions and mutate them
measure coverage

LLVM IR Mutation

mutation-based strategy (variety & valid): small valid seed inputs → modify (also valid)

Mutation

Function generation

function calls: target-specific code in LLVM backend → function definitions & calls w/ different arguments & return types
generate new function definitions (mutation strategy):
- constraints: function signature return type = type of function definition each return instruction
generate new call instructions (mutation strategy):
- select any declared function → generate compatible arguments & return values
- intrinsic functions: target specific operations (correspond to complicated machine instructions)

Control flow graph mutation

control flow: target-specific backend code differs
- machine code optimization (jump threading) → restructure control flow
generate control flow:
- sub-control flow graph (sCFG): not change edges, but split block and insert sCFG inside
- randomly select a block and split into two (source entry block & sink exit block) → generate random sCFG starting from source and ending with sink
- schemas: branch, switch, return

3.1.3 Instruction modeling and generation

backend: LLVM IR types (wide) → types implemented by target architecture (small)
to exercise all code generation features → generate IR instructions with many data types
IR instructions categories (modeling):

Instructions

generation: randomly select an opcode → randomly select values (compatible types)
other operations: store, load (easy), call (complex)
ensure values use-after-define: search defined values (global variable, function argument, values in dominators, values defined by previous instructions)
cases
- values in dead code → create a use for such values (store in stack memory or global variable)
- avoid undefined behavior: values → placeholders
- intrinsic functions: not model, rely on matcher table coverage feedback
  - call the function definitions (intrinsics that haven’t been generated)

Matcher Table Feedback

Matcher table instrumentation

matcher table → machine instructions generation (not control flow)
to generate more patterns → track matcher table usage (not edge coverage)
matcher table size (num of entries):

Matcher table size

an entry being accessed multiple times ↔ the pattern being triggered multiple times → only track accessed or not → boolean for each entry
save space: 8 boolean in 1 byte
fuzzer (calculate boolean offset → access entry) → compiler (matcher table coverage) → w/ edge coverage, if either new, the input is new

IR mutation feedback

need: instruction / intrinsic is generated or not knowledge (LLVM hides)
modify TableGen → look-up table (matcher table entries ↔ machine instruction patterns) → generation condition
look-up table for each architecture: decode matcher table coverage (instructions whether generated)

Implementation

FuzzMutate + more mutation strategies (function calls, control flow graph mutation, arbitrary data types)
matcher table coverage instrumentation
modify TableGen → dump look-up table for each matcher table

Evaluation

RQs:
1. compare with sota backend fuzzers
2. compare with end-to-end fuzzers (CSmith / GrayC)
3. ablation: mutator & feedback
4. new bugs
5. insights from bugs
21 architectures backends, matcher table size > 25,000 → 29 target CPUs in 12 architectures
baseline fuzzers: AFL++, AFL++ (w/ FuzzMutate)
metrics: branch table coverage (AFL++ report), matcher table coverage (#entries accessed / matcher table size)
variants: IRF (full), $I R F_{ba re}$ (w/o feedback)
24h, 5 times average, 92 seeds initialized (randomly selected from LLVM’s unit testing)

Baseline Comparison

AFL++ (w/o LLVM IR-aware mutator), FuzzMutate (w/ limited LLVM IR-aware mutator)

baseline

SelectionDAG
AFL++: lack of support for structured input
matcher table coverage ↔ coverage of instruction selection patterns

Comparison with End-to-end Fuzzers

baselines: CSmith, GrayC
- generate C code (IRFuzzer generates LLVM IR mutation)
- exercise the entire compilation pipeline (IRFuzzer focuses backend)
- CSmith lacks feedback, GrayC relies on branch coverage (not for backend)
- need cross compilation toolchain to compile C to different architectures (3 architectures)
715,147 C by GrayC, 506,971 C by CSmith in 24h
metrics: control-flow coverage, matcher table coverage (IRFuzzer instrumentation)

Optimization

why CSmith and GrayC poor at matcher table coverage:
- lack of coverage of vector data types
- Vector instructions can only be generated when the frontend and middle-end decide a vector instruction will speed up a particular piece of code, which turns out to be rather unlikely for random C programs.
why CSmith and GrayC good at matcher table coverage:
- size of generated files: IRFuzzer (less than 10 KB), CSmith (> 40 KB)
- many backend edges are executed more times → increase branch coverage
- specialized fuzzing > end-to-end fuzzing in backend testing

Individual Contributions

Table 3
$I RF u zze r_{ba re}$ > FuzzMutate → mutator generates more diverse input
IRFuzzer ~> $I RF u zze r_{ba re}$ :
- matcher table coverage feedback lowers the throughput → affects overall branch coverage
- more diverse inputs → higher matcher table coverage
- tradeoff is acceptable

Bug Categories and Analysis

74 confirmed bugs
bugs categorized by locations:
- CodeGen and popular architectures have more bugs
bugs categorized by causes:
- hang, memory errors, assertion failures, logic errors, missing patterns, and other bugs
- missing patterns: instructions exist, patterns missing
- assertion failures: counterexamples for properties which were considered as always true by developers
49 fixed bugs

Bugs Case Study

Compiler hang

optimization undo-redo infinity loop

compiler hang

TurnSwitchRangeIntoICmp: left → right (bit operation)
FoldValueComparisonIntoPredecessors: right → left (reduce number of comparison operations)

Memory error

null pointer reference, double frees, out of bounds (OOB)
out of bounds (undef)：

memory error

Logic error

unclear documentation or undocumented assumptions
signed integer extension

logic error

introduced 9 years ago, never noticed:
- less vector operations by frontend
- documentation was ambiguous

Super Lee

Recent Writing

Let Your Synology NAS Remotely Download Resources from Web Drives

CPU Fuzzing via Intricate Program Generation

Manage Your Cloudflare R2 Remotely & Easily

Specialized Fuzzing for Compiler Backend Code Generation

Recent Notes

Register a new Apple Account

Install Docker on Ubuntu

Specialized Fuzzing for Compiler Backend Code Generation

Abstract

Introduction

Background

LLVM

Coverage-Guided Fuzzing

Challenges in Compiler Fuzzing

Design

LLVM IR Mutation

Function generation

Control flow graph mutation

Matcher Table Feedback

Matcher table instrumentation

IR mutation feedback

Implementation

Evaluation

Baseline Comparison

Comparison with End-to-end Fuzzers

Individual Contributions

Bug Categories and Analysis

Bugs Case Study

Compiler hang

Memory error

Logic error

Graph View

Table of Contents