Info

IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

ICSE’25 arXiv pdf

AMD & UC Davis

Abstract

  • complexity of compilers → manual (suffice issue), formal verification (scale issue)
  • end-to-end fuzzing → low coverage of some components
  • IRFuzzer: specialized fuzzing of LLVM compiler backend
    • input validity: constrained mutations, generate LLVM IR inputs (structured control flow, vector types, function definitions)
    • feedback quality: instruments coding patterns (monitor execution status of instruction selection) → matcher table coverage feedback + architecture specific guidance (to mutator)

Introduction

  • complexity of LLVM: 7M+ C/C++ code, 2500+ devs
  • critical: effective and scalable verification method
  • extensive regression testing: not sufficient, not bug-free, influential
  • testing techniques: model checking, fuzzing, differential testing
  • formal verification -x→ LLVM (architectures, languages, models (just-in-time compilation & link-time optimization))
  • challenge: compiler optimization ↔ machine code generation (widely used but not specified)
  • backends differ greatly in their relative code maturity
  • sota fuzzers failed to find new bugs of a compiler backend:
    • AFL++ (general-purpose fuzzing techniques):
      • not consider input validity
      • most binary strings are invalid inputs → struggle to explore control paths in the backend
    • FuzzMutate (valid IR mutation):
      • cannot construct complex control flow
      • only generate a few instructions with scalar types
    • CSmith, GrayC (end-to-end fuzzing tools):
      • cannot efficiently explore control paths in backend
      • CSmith no feedback → ineffectiveness
      • GrayC (branch coverage feedback from libFuzzer) → missed many backend bugs
        • front-end parser & middle-end optimizations → limit the set of features seen by backend
        • C (high level language) → not exercise all backend features
  • test effectively: generate LLVM IR that compiles with the language reference
  • llvm-opt-fuzzer & llvm-isel-fuzzer → generate valid IR for middle-end and backend fuzzing
  • challenges in generating valid IR:
    • use-before-definition situations → maintain all data dependencies to generate complex CFG (easily invalidated by a jump (Fig 2), only exists in goto-languages)
    • model instructions missing in FuzzMutate → math types of operands in each IR instruction → enumerate large numbers of natively supported vector types
    • model intrinsic functions for all architectures → intrinsics are often poorly documented and vary from architectures
  • feedback limitation:
    • AFL++’s feedback mechanism performs poorly in backend
    • branch coverage → severe branch collision problems (fuzzing large code based, e.g. LLVM)
    • code generation logic (in LLVM backend) → table-driven state machines
  • contribution:
    • IRFuzzer → fuzzing LLVM compiler backend (the first)

      overview

      • green: contributions
      • orange: AFL++
      • blue: LLVM
    • mutator → generate valid IR (CFG correctness) → efficiency

    • descriptive language → list instructions requirements

    • FuzzMutate + { multiple basic blocks, function calls, intrinsic functions, vector types } (special handling in backend)

    • instrument table-driven state machine → new coverage metric → all information about the instructions and intrinsics haven’t been fuzzed → coverage report (states that haven’t been executed) → guide mutations

  • evaluation:
    • 29 mature backend architectures
    • better edge coverage & matcher table coverage
    • 74 confirmed, new bugs in LLVM

Background

LLVM

  • compiler, different architectures, target-independent abstraction IR

    LLVM

    • frontend (clang): programming language → LLVM IR, with lexer, parser, AST transformations
    • middle-end (opt): common target-independent optimizations
    • backend (llc): LLVM IR → target-specific machine code, multiple target architectures (plug-in abstraction, API functions implementation)
  • LLVM IR:

    • static single-assignment (SSA), with a fix set of instructions
    • instructions with types (match between definitions and uses)
    • definition of functions (control flow ↔ call instruction)
    • architecture specific intrinsic: no corresponding IR instructions, represented as function calls at IR level
    • control flow within a function → basic blocks & branch instructions
    • PHI instructions: refer values in different blocks → respect constraints, only refer defined values (domination constraint) → high-level language generation techniques cannot be easily adapted to LLVM IR
    • instruction selection: target-independent LLVM IR instructions → target-specific machine code instructions
      • SelectionDAG (instruction selection framework): basic block → DAG
      • GlobalIsel
    • rewrite patterns: LLVM IR instruction → machine instruction
    • TableGen (LLVM-specific language): patterns → matcher table (state-machine representation)
    • matcher table: IR instruction → determine pattern to apply

Coverage-Guided Fuzzing

  • AFL:

    AFL

    • track control-flow coverage
    • store test inputs (with new coverage) in seed cache
    • mutate, not randomly generate
    • explore different control-flow paths
  • libFuzzer, FuzzMutate (LLVM): limited type of code

Challenges in Compiler Fuzzing

  • size and complexity:
    • semantically meaningful test input (var define-use, LLVM IR doesn’t have scope / lifetime)
  • syntax-correct input:
    • LLVM IR is a strongly typed language (with numerous types)
    • intrinsic functions: customized LLVM IR instructions implemented by each architecture (poorly documented)
    • static single-assignment (SSA) representation: only allows each variable to be assigned once
  • machine instructions do not correlate with instruction selection’s control flow:
    • traditional code coverage: ineffective
    • LLVM IR → table-driven method (SelectionDAG / GlobalIsel) → binary executable
    • TableGen (code generation patterns) → matcher table (data & control instructions)
    • different data → same control flow → different instructions

Design

  • mutation: generate function
  • create more control flows (change CFG)
  • generate new IR instructions and mutate them
  • measure coverage

LLVM IR Mutation

  • mutation-based strategy (variety & valid): small valid seed inputs → modify (also valid)

Mutation

Function generation

  • function calls: target-specific code in LLVM backend → function definitions & calls w/ different arguments & return types
  • generate new function definitions (mutation strategy):
    • constraints: function signature return type = type of function definition each return instruction
  • generate new call instructions (mutation strategy):
    • select any declared function → generate compatible arguments & return values
    • intrinsic functions: target specific operations (correspond to complicated machine instructions)

Control flow graph mutation

  • control flow: target-specific backend code differs

    • machine code optimization (jump threading) → restructure control flow
  • generate control flow:

    sCFG

    • sub-control flow graph (sCFG): not change edges, but split block and insert sCFG inside
    • randomly select a block and split into two (source entry block & sink exit block) → generate random sCFG starting from source and ending with sink
    • schemas: branch, switch, return

3.1.3 Instruction modeling and generation

  • backend: LLVM IR types (wide) → types implemented by target architecture (small)
  • to exercise all code generation features → generate IR instructions with many data types
  • IR instructions categories (modeling):

Instructions

  • generation: randomly select an opcode → randomly select values (compatible types)
  • other operations: store, load (easy), call (complex)
  • ensure values use-after-define: search defined values (global variable, function argument, values in dominators, values defined by previous instructions)
  • cases
    • values in dead code → create a use for such values (store in stack memory or global variable)
    • avoid undefined behavior: values → placeholders
    • intrinsic functions: not model, rely on matcher table coverage feedback
      • call the function definitions (intrinsics that haven’t been generated)

Matcher Table Feedback

Matcher table instrumentation

  • matcher table → machine instructions generation (not control flow)
  • to generate more patterns → track matcher table usage (not edge coverage)
  • matcher table size (num of entries):

Matcher table size

  • an entry being accessed multiple times ↔ the pattern being triggered multiple times → only track accessed or not → boolean for each entry
  • save space: 8 boolean in 1 byte
  • fuzzer (calculate boolean offset → access entry) → compiler (matcher table coverage) → w/ edge coverage, if either new, the input is new

IR mutation feedback

  • need: instruction / intrinsic is generated or not knowledge (LLVM hides)
  • modify TableGen → look-up table (matcher table entries ↔ machine instruction patterns) → generation condition
  • look-up table for each architecture: decode matcher table coverage (instructions whether generated)

Implementation

  • FuzzMutate + more mutation strategies (function calls, control flow graph mutation, arbitrary data types)
  • matcher table coverage instrumentation
  • modify TableGen → dump look-up table for each matcher table

Evaluation

  • RQs:
    1. compare with sota backend fuzzers
    2. compare with end-to-end fuzzers (CSmith / GrayC)
    3. ablation: mutator & feedback
    4. new bugs
    5. insights from bugs
  • 21 architectures backends, matcher table size > 25,000 → 29 target CPUs in 12 architectures
  • baseline fuzzers: AFL++, AFL++ (w/ FuzzMutate)
  • metrics: branch table coverage (AFL++ report), matcher table coverage (#entries accessed / matcher table size)
  • variants: IRF (full), (w/o feedback)
  • 24h, 5 times average, 92 seeds initialized (randomly selected from LLVM’s unit testing)

Baseline Comparison

  • AFL++ (w/o LLVM IR-aware mutator), FuzzMutate (w/ limited LLVM IR-aware mutator)

baseline

  • SelectionDAG
  • AFL++: lack of support for structured input
  • matcher table coverage ↔ coverage of instruction selection patterns

Comparison with End-to-end Fuzzers

  • baselines: CSmith, GrayC
    • generate C code (IRFuzzer generates LLVM IR mutation)
    • exercise the entire compilation pipeline (IRFuzzer focuses backend)
    • CSmith lacks feedback, GrayC relies on branch coverage (not for backend)
    • need cross compilation toolchain to compile C to different architectures (3 architectures)
  • 715,147 C by GrayC, 506,971 C by CSmith in 24h
  • metrics: control-flow coverage, matcher table coverage (IRFuzzer instrumentation)

Optimization

  • why CSmith and GrayC poor at matcher table coverage:
    • lack of coverage of vector data types
    • Vector instructions can only be generated when the frontend and middle-end decide a vector instruction will speed up a particular piece of code, which turns out to be rather unlikely for random C programs.
  • why CSmith and GrayC good at matcher table coverage:
    • size of generated files: IRFuzzer (less than 10 KB), CSmith (> 40 KB)
    • many backend edges are executed more times → increase branch coverage
    • specialized fuzzing > end-to-end fuzzing in backend testing

Individual Contributions

  • Table 3
  • > FuzzMutate → mutator generates more diverse input
  • IRFuzzer ~> :
    • matcher table coverage feedback lowers the throughput → affects overall branch coverage
    • more diverse inputs → higher matcher table coverage
    • tradeoff is acceptable

Bug Categories and Analysis

  • 74 confirmed bugs

  • bugs categorized by locations:

    bug locations

    • CodeGen and popular architectures have more bugs
  • bugs categorized by causes:

    bug causes

    • hang, memory errors, assertion failures, logic errors, missing patterns, and other bugs
    • missing patterns: instructions exist, patterns missing
    • assertion failures: counterexamples for properties which were considered as always true by developers
  • 49 fixed bugs

Bugs Case Study

Compiler hang

  • optimization undo-redo infinity loop

compiler hang

  • TurnSwitchRangeIntoICmp: left → right (bit operation)
  • FoldValueComparisonIntoPredecessors: right → left (reduce number of comparison operations)

Memory error

  • null pointer reference, double frees, out of bounds (OOB)
  • out of bounds (undef):

memory error

Logic error

  • unclear documentation or undocumented assumptions
  • signed integer extension

logic error

  • introduced 9 years ago, never noticed:
    • less vector operations by frontend
    • documentation was ambiguous