AFL Internals - Stages

Mon, May 23, 2022

Introduction

In this article, I’ll go over the various fuzzing stages included in AFL. These are bitflip, arithmetic, interesting, extras, havoc and splice. Do you think all these stages are deterministic? Or else, do you want to know the probabilistic ones? Read on.

*static u8 fuzz_one(char **argv)*

All we know up to this point is that there is a huge function called fuzz_one() that does the fuzzing. It starts by picking an input from the queue, memory map it to orig_file = in_file and then it is copied to out_buf, a buffer that every stage mutates. One interesting function also we have studied is common_fuzz_stuff(). Basically, it takes the arguments and an output buffer along with its length, executes the target binary and tells us about its exit condition. These are fundamental to all stages that I’m going to describe.

Let’s have an overview and see what each stages focuses on.

bitflip

Flips consecutive bits by xor’ing them with all ones. This stage has different variations like 1-,2-,4-,8-,16-,32- consequent bits,
arithmetic

Employs algebraic manipulations such as addition and subtraction to find new edges,
interesting

Works by modifying the data using a small set of interesting values provided in the codebase,
extras

These are the extra stages that has been collected via various methods,
havoc

Probablistic stage that applies rounds (or sub-stages) in a probabilistic manner,
splice

This is also a probabilistic stage that does splicing at a random point in the input with another item in the queue, trying to discover new edges,

Stage: bitflip

Let’s look into the first fuzzing stage. It flips a single bit and is called bitflip 1/1.

/* Single walking bit. */
stage_short = "flip1";
stage_max   = len << 3;
stage_name  = "bitflip 1/1";

stage_val_type = STAGE_VAL_NONE;
orig_hit_cnt = queued_paths + unique_crashes;
prev_cksum = queue_cur->exec_cksum;

for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
  stage_cur_byte = stage_cur >> 3;
  FLIP_BIT(out_buf, stage_cur);
  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
  FLIP_BIT(out_buf, stage_cur);
}
new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP1]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP1] += stage_max;

It starts by setting the stage name and shortname. stage_max is set to the number of bits since we are going to flip every bit of the input and run the target. The original hit count is saved in orig_hit_cnt. After the stage is executed, new_hit_cnt is calculated. Also, stats are saved in stage_finds and stage_cycles.

FLIP_BIT is a macro defined as follows:

#define FLIP_BIT(_ar, _b) do { \
    u8* _arf = (u8*)(_ar); \
    u32 _bf = (_b); \
    _arf[(_bf) >> 3] ^= (128 >> ((_bf) & 7)); \
  } while (0)

Bitflip stage has variations where the number of consecutive bits increase in powers of two. For brevity, I’m only going to provide 4 bits variation here. It is almost the same apart from the number of bits flipped.

/* Four walking bits. */
stage_name  = "bitflip 4/1";
stage_short = "flip4";
stage_max   = (len << 3) - 3;
orig_hit_cnt = new_hit_cnt;
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
  stage_cur_byte = stage_cur >> 3;
  FLIP_BIT(out_buf, stage_cur);
  FLIP_BIT(out_buf, stage_cur + 1);
  FLIP_BIT(out_buf, stage_cur + 2);
  FLIP_BIT(out_buf, stage_cur + 3);
  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
  FLIP_BIT(out_buf, stage_cur);
  FLIP_BIT(out_buf, stage_cur + 1);
  FLIP_BIT(out_buf, stage_cur + 2);
  FLIP_BIT(out_buf, stage_cur + 3);
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_FLIP4]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP4] += stage_max;

When we arrive at byte flip (8bits), the stage is overloaded with something called Effector. Effector basically tries to identify which bytes have any effect on the trace_bits by comparing checksums.

/* Effector map setup. These macros calculate:
   EFF_APOS      - position of a particular file offset in the map.
   EFF_ALEN      - length of a map with a particular number of bytes.
   EFF_SPAN_ALEN - map span for a sequence of bytes.
 */
#define EFF_APOS(_p)          ((_p) >> EFF_MAP_SCALE2)
#define EFF_REM(_x)           ((_x) & ((1 << EFF_MAP_SCALE2) - 1))
#define EFF_ALEN(_l)          (EFF_APOS(_l) + !!EFF_REM(_l))
#define EFF_SPAN_ALEN(_p, _l) (EFF_APOS((_p) + (_l) - 1) - EFF_APOS(_p) + 1)

Constants are defined config.h as follows:

/* Scaling factor for the effector map used to skip some of the more
   expensive deterministic steps. The actual divisor is set to
   2^EFF_MAP_SCALE2 bytes: */
#define EFF_MAP_SCALE2      3

/* Minimum input file length at which the effector logic kicks in: */
#define EFF_MIN_LEN         128

/* Maximum effector density past which everything is just fuzzed
   unconditionally (%): */
#define EFF_MAX_PERC        90

So, EFF_MAP_SCALE is basically for going from bits to bytes. EFF_MIN_LEN is the minimum number of bytes that Effector is interested in. EFF_MAX_PERC is the threshold for full-fuzzing. We will see how this is used in a second.

/* Initialize effector map for the next step (see comments below). Always
   flag first and last byte as doing something. */
eff_map    = ck_alloc(EFF_ALEN(len));
eff_map[0] = 1;

if (EFF_APOS(len - 1) != 0) {
  eff_map[EFF_APOS(len - 1)] = 1;
  eff_cnt++;
}

Nothing interesting so far, allocate eff_map, mark first and last bytes.

/* Walking byte. */
stage_name  = "bitflip 8/8";
stage_short = "flip8";
stage_max   = len;

orig_hit_cnt = new_hit_cnt;
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
  stage_cur_byte = stage_cur;
  out_buf[stage_cur] ^= 0xFF;
  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
  /* We also use this stage to pull off a simple trick: we identify
     bytes that seem to have no effect on the current execution path
     even when fully flipped - and we skip them during more expensive
     deterministic stages, such as arithmetics or known ints. */
  if (!eff_map[EFF_APOS(stage_cur)]) {

Let’s see how it handles unmarked bytes.

    u32 cksum;
    /* If in dumb mode or if the file is very short, just flag everything
       without wasting time on checksums. */
    if (!dumb_mode && len >= EFF_MIN_LEN)
      cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);
    else
      cksum = ~queue_cur->exec_cksum;

Compare the checksum to see if there is any changes in trace_bits. Then, mark those bytes.

    if (cksum != queue_cur->exec_cksum) {
      eff_map[EFF_APOS(stage_cur)] = 1;
      eff_cnt++;
    }
  }
  out_buf[stage_cur] ^= 0xFF;
}
/* If the effector map is more than EFF_MAX_PERC dense, just flag the
   whole thing as worth fuzzing, since we wouldn't be saving much time
   anyway. */
if (eff_cnt != EFF_ALEN(len) && eff_cnt * 100 / EFF_ALEN(len) > EFF_MAX_PERC) {
  memset(eff_map, 1, EFF_ALEN(len));
  blocks_eff_select += EFF_ALEN(len);
} else {
  blocks_eff_select += eff_cnt;
}

Save total count in blocks_eff_select. If the percentage is higher than 90 percent, mark all the region as important.

blocks_eff_total += EFF_ALEN(len);
new_hit_cnt = queued_paths + unique_crashes;

stage_finds[STAGE_FLIP8]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP8] += stage_max;

Stage ends with storing the total in blocks_eff_total.

Stage: arithmetic

Arithmetic stage basically employs addition and subtraction of values in [0..ARITH_MAX]. The value ARITH_MAX is defined to be 35 in config.h. Let’s see the first variation.

/* 8-bit arithmetics. */
stage_name  = "arith 8/8";
stage_short = "arith8";
stage_cur   = 0;
stage_max   = 2 * len * ARITH_MAX; // = 35

stage_val_type = STAGE_VAL_LE;
orig_hit_cnt = new_hit_cnt;
for (i = 0; i < len; i++) {
  u8 orig = out_buf[i];
  /* Let's consult the effector map... */
  if (!eff_map[EFF_APOS(i)]) {
    stage_max -= 2 * ARITH_MAX;
    continue;
  }

Here we see the Effector map in play. We skip when the byte doesn’t yield anything interesting in trace_bits.

  stage_cur_byte = i;
  for (j = 1; j <= ARITH_MAX; j++) {
    u8 r = orig ^ (orig + j);

Here, the operation is defined as:

\[ R = ORIG \oplus (ORIG + j) \]

Note that, addition and xor does not commute in this equation so the result is not \(j\). If the result does not seem to be a bitflip, let’s run the target and see if it yields anything interesting.

    /* Do arithmetic operations only if the result couldn't be a product
       of a bitflip. */
    if (!could_be_bitflip(r)) {
      stage_cur_val = j;
      out_buf[i] = orig + j;

      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;
    } else stage_max--;

This time let’s try the following:

\[ R = ORIG \oplus (ORIG - j) \]

    r =  orig ^ (orig - j);
    if (!could_be_bitflip(r)) {
      stage_cur_val = -j;
      out_buf[i] = orig - j;

      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;
    } else stage_max--;
    out_buf[i] = orig;
  }
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_ARITH8]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_ARITH8] += stage_max;

Epilogue as usual, saves total counts for stats. Let’s see the 16-bit variation.

stage_name  = "arith 16/8";
stage_short = "arith16";
stage_cur   = 0;
stage_max   = 4 * (len - 1) * ARITH_MAX;
orig_hit_cnt = new_hit_cnt;

for (i = 0; i < len - 1; i++) {
  u16 orig = *(u16*)(out_buf + i);
  /* Let's consult the effector map... */
  if (!eff_map[EFF_APOS(i)] && !eff_map[EFF_APOS(i + 1)]) {
    stage_max -= 4 * ARITH_MAX;
    continue;
  }

We skip if both bytes are uninteresting in the effector map.

  stage_cur_byte = i;
  for (j = 1; j <= ARITH_MAX; j++) {
    u16 r1 = orig ^ (orig + j),
        r2 = orig ^ (orig - j),
        r3 = orig ^ SWAP16(SWAP16(orig) + j),
        r4 = orig ^ SWAP16(SWAP16(orig) - j);

The first two r1,r2 are for little-endian arch, the latters are for big-endian.

    /* Try little endian addition and subtraction first. Do it only
       if the operation would affect more than one byte (hence the
       & 0xff overflow checks) and if it couldn't be a product of
       a bitflip. */
    stage_val_type = STAGE_VAL_LE;
    if ((orig & 0xff) + j > 0xff && !could_be_bitflip(r1)) {
      stage_cur_val = j;
      *(u16*)(out_buf + i) = orig + j;
      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;
    } else stage_max--;

So, we evaluate if the least significant byte overflows and r1 is not a bitflip (ie. already fuzzed in bitflip).

    if ((orig & 0xff) < j && !could_be_bitflip(r2)) {
      stage_cur_val = -j;
      *(u16*)(out_buf + i) = orig - j;
      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;
    } else stage_max--;

Similarly, we evaluate if the least significant byte underflows and r2 is not a bitflip.

    /* Big endian comes next. Same deal. */
    stage_val_type = STAGE_VAL_BE;
    if ((orig >> 8) + j > 0xff && !could_be_bitflip(r3)) {
      stage_cur_val = j;
      *(u16*)(out_buf + i) = SWAP16(SWAP16(orig) + j);
      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;
    } else stage_max--;

    if ((orig >> 8) < j && !could_be_bitflip(r4)) {
      stage_cur_val = -j;
      *(u16*)(out_buf + i) = SWAP16(SWAP16(orig) - j);
      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      stage_cur++;
    } else stage_max--;
    *(u16*)(out_buf + i) = orig;
  }
}

new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_ARITH16]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_ARITH16] += stage_max;

Big-endian evaluations and epilogue are almost the same. There is one interesting point though. Here, the stage mutates by the following equation:

\[ R = ORIG + j \]

Note that this is different than the first arithmetic variation. The 32-bit arithmetic stage is similar to the 16-bit version and skipped for brevity.

Before we proceed into the next stage, what about that could_be_bitflip()? Recall, the function call is similar to could_be_bitflip(orig ^ (orig + j)).

static u8 could_be_bitflip(u32 xor_val) {
  u32 sh = 0;
  if (!xor_val) return 1;

Return if:

\[ 0 = orig \oplus (orig + j) \] \[ orig = orig + j \] \[ 0 = j \]

  /* Shift left until first bit set. */
  while (!(xor_val & 1)) { sh++; xor_val >>= 1; }

Shift until we get something non-zero. We try to skip zeros on the right of j.

  /* 1-, 2-, and 4-bit patterns are OK anywhere. */
  if (xor_val == 1 || xor_val == 3 || xor_val == 15) return 1;

So, let’s try to understand the above statement. Here are a few calculations that lead to result=3:

\[ orig \oplus (orig + j) = result \] \[ 0b00 \oplus (0b00 + 0b11) = 0b00 \oplus 0b11 = 0b11 \] \[ 0b01 \oplus (0b01 + 0b01) = 0b01 \oplus 0b10 = 0b11 \] \[ 0b10 \oplus (0b10 + 0b10) = 0b10 \oplus 0b100 = 0b110 \]

The last one is 3 after a shift right. So first and the second one can be achieved by two bitflips. Last one is still a double bitflip, one on the left. On the other hand, the following is not covered and it is three bitflips.

\[ 0b11 \oplus (0b11 + 0b01) = 0b11 \oplus 0b100 = 0b111 \]

This makes sense if you recall bitflip variations are powers of 2. Still not convinced, let’s try another approach.

\[ orig \oplus (orig + j) = 1 \] \[ orig \oplus 1 = orig + j \]

Now, the problem becomes, how do we cancel orig?

\[ 00 \oplus 1 = 01 = 00 + j \implies j = 1 \] \[ 01 \oplus 1 = 00 = 01 + j \implies j = -1 \] \[ 10 \oplus 1 = 11 = 10 + j \implies j = 1 \] \[ 11 \oplus 1 = 10 = 11 + j \implies j = -1 \]

So, j alters sign depending on the least significant bit of orig. Let’s try when the result is 2.

\[ 00 \oplus 10 = 10 = 00 + j \implies j = 2 \] \[ 01 \oplus 10 = 11 = 01 + j \implies j = 2 \] \[ 10 \oplus 10 = 00 = 10 + j \implies j = -2 \] \[ 11 \oplus 10 = 01 = 11 + j \implies j = -2 \]

Don’t get confused, it’s the same thing, just the ordering is different. If the orig >> 2 & 1 is true, we get 2, otherwise -2. So let’s try 3.

\[ 00 \oplus 11 = 11 = 00 + j \implies j = 3 \] \[ 01 \oplus 11 = 10 = 01 + j \implies j = 1 \] \[ 10 \oplus 11 = 01 = 10 + j \implies j = -1 \] \[ 11 \oplus 11 = 00 = 11 + j \implies j = -3 \]

It is apparent that the final j is the sum of correspoding j’s in the previous two equations. Aha! Now I see the logic behind. If we are comparing the result with all ones of any number digits, then j can be any of the followings:

\[ j = \sum_{i=j}^{n} a_0 2^i\]

where \( j \ge 0 \), \( a_0 \in \{ 1,-1 \} \). Let’s move on.

  /* 8-, 16-, and 32-bit patterns are OK only if shift factor is
     divisible by 8, since that's the stepover for these ops. */
  if (sh & 7) return 0;

This one check if the least significant byte is all-zero therefore, catches the cases where lsb is overflowed exactly by its complement.

  if (xor_val == 0xff || xor_val == 0xffff || xor_val == 0xffffffff)
    return 1;

  return 0;
}

This is similar to the above situation. This concludes arithmetic stages.

Stage: interesting

These stages do employ constant values that seem to be interesting when fuzzing a target. Let’s look into the first variation.

  stage_name  = "interest 8/8";
  stage_short = "int8";
  stage_cur   = 0;
  stage_max   = len * sizeof(interesting_8);

  stage_val_type = STAGE_VAL_LE;
  orig_hit_cnt = new_hit_cnt;

  /* Setting 8-bit integers. */
  for (i = 0; i < len; i++) {
    u8 orig = out_buf[i];
    /* Let's consult the effector map... */

    if (!eff_map[EFF_APOS(i)]) {
      stage_max -= sizeof(interesting_8);
      continue;
    }

We skip if the Effector map tells us this byte has no effect.

    stage_cur_byte = i;
    for (j = 0; j < sizeof(interesting_8); j++) {
      /* Skip if the value could be a product of bitflips or arithmetics. */
      if (could_be_bitflip(orig ^ (u8)interesting_8[j]) ||
          could_be_arith(orig, (u8)interesting_8[j], 1)) {
        stage_max--;
        continue;
      }

We check if this could be a bitflip or arithmetic. Otherwise, proceed into fuzzing.

      stage_cur_val = interesting_8[j];
      out_buf[i] = interesting_8[j];
      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
      out_buf[i] = orig;
      stage_cur++;
    }
  }

Set the value in the array, fuzz and restore.

  new_hit_cnt = queued_paths + unique_crashes;
  stage_finds[STAGE_INTEREST8]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_INTEREST8] += stage_max;

Finally, store stats about the stage. So, what are those constants? afl-fuzz.c has the following definitions.

static s8  interesting_8[]  = { INTERESTING_8 };
static s16 interesting_16[] = { INTERESTING_8, INTERESTING_16 };
static s32 interesting_32[] = { INTERESTING_8, INTERESTING_16, INTERESTING_32 };

The rest is defined in config.h:

/* List of interesting values to use in fuzzing. */
#define INTERESTING_8 \
  -128,          /* Overflow signed 8-bit when decremented  */ \
  -1,            /*                                         */ \
   0,            /*                                         */ \
   1,            /*                                         */ \
   16,           /* One-off with common buffer size         */ \
   32,           /* One-off with common buffer size         */ \
   64,           /* One-off with common buffer size         */ \
   100,          /* One-off with common buffer size         */ \
   127           /* Overflow signed 8-bit when incremented  */

Some of the values are for overflow and underflows. The others are weight zero and one. Finally, there is 100 :) Let’s see the 16-bit variation.

  stage_name  = "interest 16/8";
  stage_short = "int16";
  stage_cur   = 0;
  stage_max   = 2 * (len - 1) * (sizeof(interesting_16) >> 1);
  orig_hit_cnt = new_hit_cnt;

  for (i = 0; i < len - 1; i++) {
    u16 orig = *(u16*)(out_buf + i);
    /* Let's consult the effector map... */
    if (!eff_map[EFF_APOS(i)] && !eff_map[EFF_APOS(i + 1)]) {
      stage_max -= sizeof(interesting_16);
      continue;
    }

Again, skip if the Effector tells us that these two bytes are uninteresting.

    stage_cur_byte = i;
    for (j = 0; j < sizeof(interesting_16) / 2; j++) {
      stage_cur_val = interesting_16[j];
      /* Skip if this could be a product of a bitflip, arithmetics,
         or single-byte interesting value insertion. */

      if (!could_be_bitflip(orig ^ (u16)interesting_16[j]) &&
          !could_be_arith(orig, (u16)interesting_16[j], 2) &&
          !could_be_interest(orig, (u16)interesting_16[j], 2, 0)) {

        stage_val_type = STAGE_VAL_LE;
        *(u16*)(out_buf + i) = interesting_16[j];
        if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
        stage_cur++;
      } else stage_max--;

Little-endian fuzzing if it is not bitflip, arithmetic or single byte interesting.

      if ((u16)interesting_16[j] != SWAP16(interesting_16[j]) &&
          !could_be_bitflip(orig ^ SWAP16(interesting_16[j])) &&
          !could_be_arith(orig, SWAP16(interesting_16[j]), 2) &&
          !could_be_interest(orig, SWAP16(interesting_16[j]), 2, 1)) {

        stage_val_type = STAGE_VAL_BE;
        *(u16*)(out_buf + i) = SWAP16(interesting_16[j]);
        if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
        stage_cur++;
      } else stage_max--;

    }
    *(u16*)(out_buf + i) = orig;
  }

Big-endian fuzzing block.

  new_hit_cnt = queued_paths + unique_crashes;
  stage_finds[STAGE_INTEREST16]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_INTEREST16] += stage_max;

Store stats as usual. The 4-byte variation is similar to the 2-byte version so I skip it for brevity. Let’s see the interesting 2-byte and 4-byte values.

#define INTERESTING_16 \
  -32768,        /* Overflow signed 16-bit when decremented */ \
  -129,          /* Overflow signed 8-bit                   */ \
   128,          /* Overflow signed 8-bit                   */ \
   255,          /* Overflow unsig 8-bit when incremented   */ \
   256,          /* Overflow unsig 8-bit                    */ \
   512,          /* One-off with common buffer size         */ \
   1000,         /* One-off with common buffer size         */ \
   1024,         /* One-off with common buffer size         */ \
   4096,         /* One-off with common buffer size         */ \
   32767         /* Overflow signed 16-bit when incremented */

#define INTERESTING_32 \
  -2147483648LL, /* Overflow signed 32-bit when decremented */ \
  -100663046,    /* Large negative number (endian-agnostic) */ \
  -32769,        /* Overflow signed 16-bit                  */ \
   32768,        /* Overflow signed 16-bit                  */ \
   65535,        /* Overflow unsig 16-bit when incremented  */ \
   65536,        /* Overflow unsig 16 bit                   */ \
   100663045,    /* Large positive number (endian-agnostic) */ \
   2147483647    /* Overflow signed 32-bit when incremented */

Before proceeding, let’s see could_be_arith(). I promise this is easier than could_be_bitflip(). The call is similar to could_be_arith(orig, (u8)interesting_8[j], 1)).

static u8 could_be_arith(u32 old_val, u32 new_val, u8 blen) {
  u32 i, ov = 0, nv = 0, diffs = 0;
  if (old_val == new_val) return 1;
  /* See if one-byte adjustments to any byte could produce this result. */
  for (i = 0; i < blen; i++) {
    u8 a = old_val >> (8 * i),
       b = new_val >> (8 * i);

    if (a != b) { diffs++; ov = a; nv = b; }
  }

Skip bytes that are exactly the same. Save new values in ov and nv.

  /* If only one byte differs and the values are within range, return 1. */
  if (diffs == 1) {
    if ((u8)(ov - nv) <= ARITH_MAX ||
        (u8)(nv - ov) <= ARITH_MAX) return 1;

  }

Compare the difference with ARITH_MAX and return 1 if that’s the case (ie. already covered by arithmetic stage). Recall, ARITH_MAX is defined in config.h to be 35.

  if (blen == 1) return 0;
  /* See if two-byte adjustments to any byte would produce this result. */
  diffs = 0;
  for (i = 0; i < blen / 2; i++) {
    u16 a = old_val >> (16 * i),
        b = new_val >> (16 * i);

    if (a != b) { diffs++; ov = a; nv = b; }
  }
  /* If only one word differs and the values are within range, return 1. */
  if (diffs == 1) {
    if ((u16)(ov - nv) <= ARITH_MAX ||
        (u16)(nv - ov) <= ARITH_MAX) return 1;

LSB’s are the same but the next two bytes are different. Compare the difference with ARITH_MAX and decide.

    ov = SWAP16(ov); nv = SWAP16(nv);

    if ((u16)(ov - nv) <= ARITH_MAX ||
        (u16)(nv - ov) <= ARITH_MAX) return 1;
  }

This is basically the big-endian variation of the previous check.

  /* Finally, let's do the same thing for dwords. */
  if (blen == 4) {
    if ((u32)(old_val - new_val) <= ARITH_MAX ||
        (u32)(new_val - old_val) <= ARITH_MAX) return 1;

So, all four bytes are different nevertheless check if they are close enough.

    new_val = SWAP32(new_val);
    old_val = SWAP32(old_val);

    if ((u32)(old_val - new_val) <= ARITH_MAX ||
        (u32)(new_val - old_val) <= ARITH_MAX) return 1;

  }
  return 0;
}

Finally, big-endian check is applied. The only function left is could_be_interest(). It is given below and looks complicated. However, if you look close enough, you’ll see that parts of the input are replaced by constants defined in the interest_{8,16,32} array and a check is applied whether the new and old values match. Depending on the size of the interesting, it masks and or’s the values.

static u8 could_be_interest(u32 old_val, u32 new_val, u8 blen, u8 check_le) {
  u32 i, j;
  if (old_val == new_val) return 1;

  /* See if one-byte insertions from interesting_8 over old_val could
     produce new_val. */
  for (i = 0; i < blen; i++) {
    for (j = 0; j < sizeof(interesting_8); j++) {
      u32 tval = (old_val & ~(0xff << (i * 8))) |
                 (((u8)interesting_8[j]) << (i * 8));
      if (new_val == tval) return 1;
    }
  }
  /* Bail out unless we're also asked to examine two-byte LE insertions
     as a preparation for BE attempts. */
  if (blen == 2 && !check_le) return 0;
  /* See if two-byte insertions over old_val could give us new_val. */
  for (i = 0; i < blen - 1; i++) {
    for (j = 0; j < sizeof(interesting_16) / 2; j++) {
      u32 tval = (old_val & ~(0xffff << (i * 8))) |
                 (((u16)interesting_16[j]) << (i * 8));
      if (new_val == tval) return 1;
      /* Continue here only if blen > 2. */
      if (blen > 2) {
        tval = (old_val & ~(0xffff << (i * 8))) |
               (SWAP16(interesting_16[j]) << (i * 8));

        if (new_val == tval) return 1;
      }
    }
  }
  if (blen == 4 && check_le) {
    /* See if four-byte insertions could produce the same result
       (LE only). */
    for (j = 0; j < sizeof(interesting_32) / 4; j++)
      if (new_val == (u32)interesting_32[j]) return 1;
  }

  return 0;
}

This concludes the interesting stage.

Stage: extras

These extra fuzzing stages allow us to replace a part of the input or insert special bytes into the input. The user_extras (over) variant just overwrites the input with the provided extra.

 stage_name  = "user extras (over)";
 stage_short = "ext_UO";
 stage_cur   = 0;
 stage_max   = extras_cnt * len;
 stage_val_type = STAGE_VAL_NONE;
 orig_hit_cnt = new_hit_cnt;

 for (i = 0; i < len; i++) {
   u32 last_len = 0;
   stage_cur_byte = i;
   /* Extras are sorted by size, from smallest to largest. This means
      that we don't have to worry about restoring the buffer in
      between writes at a particular offset determined by the outer
      loop. */
   for (j = 0; j < extras_cnt; j++) {
     /* Skip extras probabilistically if extras_cnt > MAX_DET_EXTRAS. Also
        skip them if there's no room to insert the payload, if the token
        is redundant, or if its entire span has no bytes set in the effector
        map. */
     if ((extras_cnt > MAX_DET_EXTRAS && UR(extras_cnt) >= MAX_DET_EXTRAS) ||
         extras[j].len > len - i ||
         !memcmp(extras[j].data, out_buf + i, extras[j].len) ||
         !memchr(eff_map + EFF_APOS(i), 1, EFF_SPAN_ALEN(i, extras[j].len))) {

       stage_max--;
       continue;
     }

Check if the data in extras is the same. Check if the Effector tells us there is nothing interesting in the following bytes. One might argue that this stage is probabilistic as you might noticed the condition UR(extras_cnt) >= MAX_DET_EXTRAS. That’s actually correct. The data is deterministic but the decision to apply that mutation is probabilistic. MAX_DET_EXTRAS is defined in config.h as 200. So, say we have 400 extras, then that particular extra will be included with probability of \( \frac{1}{2} \). Interesting enough, under 200 extras, all will be deterministic. Therefore, it determines an upper bound on the determinism and switches to probabalistic mode when we have more extras than 200.

     last_len = extras[j].len;
     memcpy(out_buf + i, extras[j].data, last_len);
     if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
     stage_cur++;
   }

The mutation is applied to the data at offset i and the target is executed.

   /* Restore all the clobbered memory. */
   memcpy(out_buf + i, in_buf + i, last_len);
 }

 new_hit_cnt = queued_paths + unique_crashes;
 stage_finds[STAGE_EXTRAS_UO]  += new_hit_cnt - orig_hit_cnt;
 stage_cycles[STAGE_EXTRAS_UO] += stage_max;

user_extras (insert) stage is similar to the previous overwrite stage however it inserts the extras instead of overwriting.

  stage_name  = "user extras (insert)";
  stage_short = "ext_UI";
  stage_cur   = 0;
  stage_max   = extras_cnt * (len + 1);
  orig_hit_cnt = new_hit_cnt;

  ex_tmp = ck_alloc(len + MAX_DICT_FILE);
  for (i = 0; i <= len; i++) {
    stage_cur_byte = i;

    for (j = 0; j < extras_cnt; j++) {
      if (len + extras[j].len > MAX_FILE) {
        stage_max--;
        continue;
      }

Skip if the length of the extra plus the current buffer is larger than MAX_FILE. The upper bound is defined in config.h as 1MiB (1 * 1024 * 1024).

      /* Insert token */
      memcpy(ex_tmp + i, extras[j].data, extras[j].len);

Insert extra at offset i.

      /* Copy tail */
      memcpy(ex_tmp + i + extras[j].len, out_buf + i, len - i);

Copy the rest of the data to the temporary buffer ex_tmp.

      if (common_fuzz_stuff(argv, ex_tmp, len + extras[j].len)) {
        ck_free(ex_tmp);
        goto abandon_entry;
      }
      stage_cur++;
    }
    /* Copy head */
    ex_tmp[i] = out_buf[i];
  }

So, this is a little bit strange, don’t you think? What about bytes before i? The trick is extras are sorted by ascending length and the buffer is allocated only once. Your guess is correct as mine. It is that last line that restores the temporary buffer to the original ;)

  ck_free(ex_tmp);
  new_hit_cnt = queued_paths + unique_crashes;
  stage_finds[STAGE_EXTRAS_UI]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_EXTRAS_UI] += stage_max;

Epilogue of the stage is similar to the others. The only question we are left with is, where do these extras[] come from? Let’s start with the global definitions.

struct extra_data {
  u8* data;                           /* Dictionary token data            */
  u32 len;                            /* Dictionary token length          */
  u32 hit_cnt;                        /* Use count in the corpus          */
};

static struct extra_data* extras;     /* Extra tokens to fuzz with        */
static u32 extras_cnt;                /* Total number of tokens read      */

static struct extra_data* a_extras;   /* Automatically selected extras    */
static u32 a_extras_cnt;              /* Total number of tokens available */

Next, let’s see where these extras provided by the user.

 case 'x': /* dictionary */
   if (extras_dir) FATAL("Multiple -x options not supported");
   extras_dir = optarg;
   break;
 }

 .. skippd for brevity ..

 if (extras_dir) load_extras(extras_dir);

Aha! So, -x switch on the command line is related to extras[]. Besides, it is load_extras()’s duty to load the extras into the global array. I’m not going to cover the whole function but a small part at the end.

check_and_sort:
  if (!extras_cnt) FATAL("No usable files in '%s'", dir);
  qsort(extras, extras_cnt, sizeof(struct extra_data), compare_extras_len);
  OKF("Loaded %u extra tokens, size range %s to %s.", extras_cnt,
      DMS(min_len), DMS(max_len));

  if (max_len > 32)
    WARNF("Some tokens are relatively large (%s) - consider trimming.",
          DMS(max_len));

  if (extras_cnt > MAX_DET_EXTRAS)
    WARNF("More than %u tokens - will use them probabilistically.",
          MAX_DET_EXTRAS);
}

The epilogue of the function tells us several things. First, extras[] are sorted according to their lengths. Next, it is better to keep the length below 32. Similarly, if the number of dictionary items is larger than MAX_DET_EXTRAS, then it will switch to probabilistic mode. Recall, that the value is 400.

The next variation is called auto_extras (over) is another overwriting variation.

  stage_name  = "auto extras (over)";
  stage_short = "ext_AO";
  stage_cur   = 0;
  stage_max   = MIN(a_extras_cnt, USE_AUTO_EXTRAS) * len;

  stage_val_type = STAGE_VAL_NONE;

  orig_hit_cnt = new_hit_cnt;
  for (i = 0; i < len; i++) {
    u32 last_len = 0;
    stage_cur_byte = i;
    for (j = 0; j < MIN(a_extras_cnt, USE_AUTO_EXTRAS); j++) {
      /* See the comment in the earlier code; extras are sorted by size. */
      if (a_extras[j].len > len - i ||
          !memcmp(a_extras[j].data, out_buf + i, a_extras[j].len) ||
          !memchr(eff_map + EFF_APOS(i), 1, EFF_SPAN_ALEN(i, a_extras[j].len))) {

        stage_max--;
        continue;
      }

      last_len = a_extras[j].len;
      memcpy(out_buf + i, a_extras[j].data, last_len);

      if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;

      stage_cur++;
    }
    /* Restore all the clobbered memory. */
    memcpy(out_buf + i, in_buf + i, last_len);
  }

  new_hit_cnt = queued_paths + unique_crashes;
  stage_finds[STAGE_EXTRAS_AO]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_EXTRAS_AO] += stage_max;

Very similar to user_extras (over) stage with one question? Where does a_extras[] come from?

auto collect

I must admit I’ve cheated and removed a code block in the first bitflip stage. Let’s go ahead and revisit that.

/* Single walking bit. */
stage_short = "flip1";
stage_max   = len << 3;
stage_name  = "bitflip 1/1";

stage_val_type = STAGE_VAL_NONE;
orig_hit_cnt = queued_paths + unique_crashes;
prev_cksum = queue_cur->exec_cksum;

for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
  stage_cur_byte = stage_cur >> 3;
  FLIP_BIT(out_buf, stage_cur);
  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
  FLIP_BIT(out_buf, stage_cur);

Until this point, everything seems in order.

  /* While flipping the least significant bit in every byte, pull of an extra
     trick to detect possible syntax tokens. In essence, the idea is that if
     you have a binary blob like this:

     xxxxxxxxIHDRxxxxxxxx

     ...and changing the leading and trailing bytes causes variable or no
     changes in program flow, but touching any character in the "IHDR" string
     always produces the same, distinctive path, it's highly likely that
     "IHDR" is an atomically-checked magic value of special significance to
     the fuzzed format.

     We do this here, rather than as a separate stage, because it's a nice
     way to keep the operation approximately "free" (i.e., no extra execs).

     Empirically, performing the check when flipping the least significant bit
     is advantageous, compared to doing it at the time of more disruptive
     changes, where the program flow may be affected in more violent ways.

     The caveat is that we won't generate dictionaries in the -d mode or -S
     mode - but that's probably a fair trade-off.

     This won't work particularly well with paths that exhibit variable
     behavior, but fails gracefully, so we'll carry out the checks anyway.
    */
  if (!dumb_mode && (stage_cur & 7) == 7) {

Branch if we have just evaluated the last bit of a byte.

    u32 cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);

Calculate the checksum of the trace_bits.

    if (stage_cur == stage_max - 1 && cksum == prev_cksum) {
      /* If at end of file and we are still collecting a string, grab the
         final character and force output. */
      if (a_len < MAX_AUTO_EXTRA) a_collect[a_len] = out_buf[stage_cur >> 3];
      a_len++;

      if (a_len >= MIN_AUTO_EXTRA && a_len <= MAX_AUTO_EXTRA)
        maybe_add_auto(a_collect, a_len);

Now, collect the last byte into a_collect array. If we had enough, decide in maybe_add_auto().

    } else if (cksum != prev_cksum) {
      /* Otherwise, if the checksum has changed, see if we have something
         worthwhile queued up, and collect that if the answer is yes. */
      if (a_len >= MIN_AUTO_EXTRA && a_len <= MAX_AUTO_EXTRA)
        maybe_add_auto(a_collect, a_len);

      a_len = 0;
      prev_cksum = cksum;
    }

If checksum is different (ie something interesting), decide in maybe_add_auto(). Notice that the counter is reset in this branch and prev_cksum is set.

    /* Continue collecting string, but only if the bit flip actually made
       any difference - we don't want no-op tokens. */
    if (cksum != queue_cur->exec_cksum) {
      if (a_len < MAX_AUTO_EXTRA) a_collect[a_len] = out_buf[stage_cur >> 3];
      a_len++;
    }
  }
}

So the above still collects bytes into a_collect buffer deterministicly. The rest similar to any bitflip stage.

new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_FLIP1]  += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP1] += stage_max;

Constants are defined in config.h as follows.

#define MIN_AUTO_EXTRA      3
#define MAX_AUTO_EXTRA      32

Last piece of the puzzle is maybe_add_auto().

/* Maybe add automatic extra. */
static void maybe_add_auto(u8* mem, u32 len) {
  u32 i;
  /* Allow users to specify that they don't want auto dictionaries. */
  if (!MAX_AUTO_EXTRAS || !USE_AUTO_EXTRAS) return;

Check if MAX_AUTO_EXTRAS or USE_AUTO_EXTRAS is zero. They are defined in config.h as follows:

#define USE_AUTO_EXTRAS     50
#define MAX_AUTO_EXTRAS     (USE_AUTO_EXTRAS * 10)

Next, skip the constant header.

  /* Skip runs of identical bytes. */
  for (i = 1; i < len; i++)
    if (mem[0] ^ mem[i]) break;

  if (i == len) return;

Check to see if the value is already in interesting.

  /* Reject builtin interesting values. */
  if (len == 2) {
    i = sizeof(interesting_16) >> 1;
    while (i--)
      if (*((u16*)mem) == interesting_16[i] ||
          *((u16*)mem) == SWAP16(interesting_16[i])) return;
  }

  if (len == 4) {
    i = sizeof(interesting_32) >> 2;
    while (i--)
      if (*((u32*)mem) == interesting_32[i] ||
          *((u32*)mem) == SWAP32(interesting_32[i])) return;
  }

Check to see if it is already in the dictionary.

  /* Reject anything that matches existing extras. Do a case-insensitive
     match. We optimize by exploiting the fact that extras[] are sorted
     by size. */
  for (i = 0; i < extras_cnt; i++)
    if (extras[i].len >= len) break;

  for (; i < extras_cnt && extras[i].len == len; i++)
    if (!memcmp_nocase(extras[i].data, mem, len)) return;

  /* Last but not least, check a_extras[] for matches. There are no
     guarantees of a particular sort order. */
  auto_changed = 1;

Check to see if it is previously auto collected.

  for (i = 0; i < a_extras_cnt; i++) {
    if (a_extras[i].len == len && !memcmp_nocase(a_extras[i].data, mem, len)) {
      a_extras[i].hit_cnt++;
      goto sort_a_extras;
    }
  }

Finally, if everything seems OK, save it in a_extras[].

/* At this point, looks like we're dealing with a new entry. So, let's
     append it if we have room. Otherwise, let's randomly evict some other
     entry from the bottom half of the list. */
  if (a_extras_cnt < MAX_AUTO_EXTRAS) {
    a_extras = ck_realloc_block(a_extras, (a_extras_cnt + 1) *
                                sizeof(struct extra_data));

    a_extras[a_extras_cnt].data = ck_memdup(mem, len);
    a_extras[a_extras_cnt].len  = len;
    a_extras_cnt++;
  } else {
    i = MAX_AUTO_EXTRAS / 2 +
        UR((MAX_AUTO_EXTRAS + 1) / 2);

    ck_free(a_extras[i].data);

    a_extras[i].data    = ck_memdup(mem, len);
    a_extras[i].len     = len;
    a_extras[i].hit_cnt = 0;
  }
sort_a_extras:
  /* First, sort all auto extras by use count, descending order. */
  qsort(a_extras, a_extras_cnt, sizeof(struct extra_data),
        compare_extras_use_d);
  /* Then, sort the top USE_AUTO_EXTRAS entries by size. */
  qsort(a_extras, MIN(USE_AUTO_EXTRAS, a_extras_cnt),
        sizeof(struct extra_data), compare_extras_len);
}

This concludes all deterministic stages. The next stage is called havoc and it is pretty exciting since it is probabilistic. Curious?

Stage: havoc

This stage is a probabilistic stage where the mutators are applied in a probabilistic manner. Also, the havoc stage is intertwined with the splice’ng stage. When we are in havoc, splice_cycle is zero. doing_det is set to 1 in the beginning of fuzz_one() and stays the same.

havoc_stage:
stage_cur_byte = -1;
/* The havoc stage mutation code is also invoked when splicing files; if the
   splice_cycle variable is set, generate different descriptions and such. */
if (!splice_cycle) {
  stage_name  = "havoc";
  stage_short = "havoc";
  stage_max   = (doing_det ?
                  HAVOC_CYCLES_INIT /* 1024 */:
                  HAVOC_CYCLES /* 256 */) * perf_score / havoc_div / 100;

stage_max is set to HAVOC_CYCLES_INIT at the beginning and the value is 1024.

} else {
  static u8 tmp[32];
  perf_score = orig_perf;
  sprintf(tmp, "splice %u", splice_cycle);
  stage_name  = tmp;
  stage_short = "splice";
  stage_max   = SPLICE_HAVOC * perf_score / havoc_div / 100;
}

The above is splice stage initial configuration. I will revisit it when we are dealing with that particular stage.

if (stage_max < HAVOC_MIN) stage_max = HAVOC_MIN;

HAVOC_MIN is set to 16 in config.h, so stage_max stays the same.

temp_len = len;
orig_hit_cnt = queued_paths + unique_crashes;
havoc_queued = queued_paths;

/* We essentially just do several thousand runs (depending on perf_score)
   where we take the input file and make random stacked tweaks. */
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
  u32 use_stacking = 1 << (1 + UR(HAVOC_STACK_POW2));
  stage_cur_val = use_stacking;

So the stacking number is set here. HAVOC_STACK_POW2 is set 7 in config.h, therefore the number of stackings is in \( \{ 1,2,4,8,16,32,128 \}\). The idea here is to select a mutator randomly and stack them together to fuzz the the target. Next, we’ll see a large switch statement with cases for each mutator. These mutators are similar to the ones we have already seen. One important note is, mutation is applied to random bytes as in out_buf[UR(temp_len)] = ....

  for (i = 0; i < use_stacking; i++) {
    switch (UR(15 + ((extras_cnt + a_extras_cnt) ? 2 : 0))) {
      case 0:
        /* Flip a single bit somewhere. Spooky! */
        FLIP_BIT(out_buf, UR(temp_len << 3));
        break;

One bitflip mutator.

      case 1:
        /* Set byte to interesting value. */
        out_buf[UR(temp_len)] = interesting_8[UR(sizeof(interesting_8))];
        break;
      case 2:
        /* Set word to interesting value, randomly choosing endian. */
        if (temp_len < 2) break;
        if (UR(2)) {
          *(u16*)(out_buf + UR(temp_len - 1)) =
            interesting_16[UR(sizeof(interesting_16) >> 1)];
        } else {
          *(u16*)(out_buf + UR(temp_len - 1)) = SWAP16(
            interesting_16[UR(sizeof(interesting_16) >> 1)]);
        }
        break;
      case 3:
        /* Set dword to interesting value, randomly choosing endian. */
        if (temp_len < 4) break;
        if (UR(2)) {
          *(u32*)(out_buf + UR(temp_len - 3)) =
            interesting_32[UR(sizeof(interesting_32) >> 2)];
        } else {
          *(u32*)(out_buf + UR(temp_len - 3)) = SWAP32(
            interesting_32[UR(sizeof(interesting_32) >> 2)]);
        }
        break;

Interesting mutators for different sizes (8, 16, 32). It selects random values from interesting_{8,16,32} arrays via UR(2) that returns a random values in \( \{ 0, 1 \}\).

      case 4:
        /* Randomly subtract from byte. */
        out_buf[UR(temp_len)] -= 1 + UR(ARITH_MAX);
        break;
      case 5:
        /* Randomly add to byte. */
        out_buf[UR(temp_len)] += 1 + UR(ARITH_MAX);
        break;
      case 6:
        /* Randomly subtract from word, random endian. */
        if (temp_len < 2) break;
        if (UR(2)) {
          u32 pos = UR(temp_len - 1);
          *(u16*)(out_buf + pos) -= 1 + UR(ARITH_MAX);
        } else {
          u32 pos = UR(temp_len - 1);
          u16 num = 1 + UR(ARITH_MAX);
          *(u16*)(out_buf + pos) =
            SWAP16(SWAP16(*(u16*)(out_buf + pos)) - num);
        }
        break;
      case 7:
        /* Randomly add to word, random endian. */
        if (temp_len < 2) break;
        if (UR(2)) {
          u32 pos = UR(temp_len - 1);
          *(u16*)(out_buf + pos) += 1 + UR(ARITH_MAX);
        } else {
          u32 pos = UR(temp_len - 1);
          u16 num = 1 + UR(ARITH_MAX);
          *(u16*)(out_buf + pos) =
            SWAP16(SWAP16(*(u16*)(out_buf + pos)) + num);
        }
        break;
      case 8:
        /* Randomly subtract from dword, random endian. */
        if (temp_len < 4) break;
        if (UR(2)) {
          u32 pos = UR(temp_len - 3);
          *(u32*)(out_buf + pos) -= 1 + UR(ARITH_MAX);
        } else {
          u32 pos = UR(temp_len - 3);
          u32 num = 1 + UR(ARITH_MAX);
          *(u32*)(out_buf + pos) =
            SWAP32(SWAP32(*(u32*)(out_buf + pos)) - num);
        }
        break;
      case 9:
        /* Randomly add to dword, random endian. */
        if (temp_len < 4) break;
        if (UR(2)) {
          u32 pos = UR(temp_len - 3);
          *(u32*)(out_buf + pos) += 1 + UR(ARITH_MAX);
        } else {
          u32 pos = UR(temp_len - 3);
          u32 num = 1 + UR(ARITH_MAX);
          *(u32*)(out_buf + pos) =
            SWAP32(SWAP32(*(u32*)(out_buf + pos)) + num);
        }
        break;

These mutators randomly apply algebraic operators with randomly selected operands.

      case 10:
        /* Just set a random byte to a random value. Because,
           why not. We use XOR with 1-255 to eliminate the
           possibility of a no-op. */
        out_buf[UR(temp_len)] ^= 1 + UR(255);
        break;

XOR’s a randomly generated byte with a byte in randomly selected position in the input.

      case 11 ... 12: {
          /* Delete bytes. We're making this a bit more likely
             than insertion (the next option) in hopes of keeping
             files reasonably small. */
          u32 del_from, del_len;
          if (temp_len < 2) break;
          /* Don't delete too much. */
          del_len = choose_block_len(temp_len - 1);
          del_from = UR(temp_len - del_len + 1);
          memmove(out_buf + del_from, out_buf + del_from + del_len,
                  temp_len - del_from - del_len);
          temp_len -= del_len;
          break;
        }

This one deletes a block. The length of the block and the position is selected randomly. Let’s see how the block length is choosen in choose_block_len().

static u32 choose_block_len(u32 limit) {
  u32 min_value, max_value;
  u32 rlim = MIN(queue_cycle, 3);
  if (!run_over10m) rlim = 1;
  switch (UR(rlim)) {
    case 0:  min_value = 1;
             max_value = HAVOC_BLK_SMALL;
             break;
    case 1:  min_value = HAVOC_BLK_SMALL;
             max_value = HAVOC_BLK_MEDIUM;
             break;
    default:
             if (UR(10)) {
               min_value = HAVOC_BLK_MEDIUM;
               max_value = HAVOC_BLK_LARGE;
             } else {
               min_value = HAVOC_BLK_LARGE;
               max_value = HAVOC_BLK_XL;
             }
  }
  if (min_value >= limit) min_value = 1;
  return min_value + UR(MIN(max_value, limit) - min_value + 1);
}

run_over10m is a flag that tells if the fuzzer is running more than 10 minutes. So when the fuzzer first starts, it only selects small blocks. queue_cycle is the number cycles that this fuzzer instance has spent fuzzing the target. A cycle is defined to be a complete set of fuzzing actions listed in fuzz_one() on a single target binary. So after the third cycle, rlim is set to be 3. Finally, the havoc block sizes are defined in config.h as follows:

#define HAVOC_BLK_SMALL     32
#define HAVOC_BLK_MEDIUM    128
#define HAVOC_BLK_LARGE     1500
#define HAVOC_BLK_XL        32768

The next mutator is somewhat unlucky ;) It has two modes. First mode clones bytes, the other one inserts a constant block. Mode is selected randomly according to the flag u8 actually_clone = UR(4). In the first mode, when the flag is zero, clone_len is set to the maximum havoc blocksize, clone_from is zero. clone_to is a random length less than the length of the input. First, new_buf is allocated, first clone_to bytes of input is copied into it. From clone_to to clone_len bytes, the buffer is overwritten by either random bytes or random bytes from the input. Finally, the rest of the input is copied. This is the insert mode. Next, when actually_clone flag is non-zero, clone_len is a random havoc blocksize and clone_from is random in [0…temp_len-clone_len+1]. The first clone_to bytes are from the input as in the previous mode. Then, clone_len bytes from the input are cloned. Finally, the buffer is filled with the rest.

Insert mode is executed with \( \frac{1}{4} \) probability and clone mode is executed with \( \frac{3}{4} \) probability.

      case 13:
        if (temp_len + HAVOC_BLK_XL < MAX_FILE) {
          /* Clone bytes (75%) or insert a block of constant bytes (25%). */
          u8  actually_clone = UR(4);
          u32 clone_from, clone_to, clone_len;
          u8* new_buf;
          if (actually_clone) {
            clone_len  = choose_block_len(temp_len);
            clone_from = UR(temp_len - clone_len + 1);
          } else {
            clone_len = choose_block_len(HAVOC_BLK_XL);
            clone_from = 0;
          }
          clone_to   = UR(temp_len);
          new_buf = ck_alloc_nozero(temp_len + clone_len);
          /* Head */
          memcpy(new_buf, out_buf, clone_to);
          /* Inserted part */
          if (actually_clone)
            memcpy(new_buf + clone_to, out_buf + clone_from, clone_len);
          else
            memset(new_buf + clone_to,
                   UR(2) ? UR(256) : out_buf[UR(temp_len)], clone_len);

          /* Tail */
          memcpy(new_buf + clone_to + clone_len, out_buf + clone_to,
                 temp_len - clone_to);

          ck_free(out_buf);
          out_buf = new_buf;
          temp_len += clone_len;
        }
        break;

Next mutator is similar to the previous one. There are two different modes. copy_len is first set to a random havoc blocksize. With \( \frac{1}{4} \) probability, copy_len bytes are copied from offset copy_from to offset copy_to. With probability \( \frac{3}{4} \), instead of the input buffer, the source is either a randomly generated byte stream or a byte at a random position in the input buffer with equal probabilities.

      case 14: {
          /* Overwrite bytes with a randomly selected chunk (75%) or fixed
             bytes (25%). */
          u32 copy_from, copy_to, copy_len;
          if (temp_len < 2) break;
          copy_len  = choose_block_len(temp_len - 1);
          copy_from = UR(temp_len - copy_len + 1);
          copy_to   = UR(temp_len - copy_len + 1);
          if (UR(4)) {
            if (copy_from != copy_to)
              memmove(out_buf + copy_to, out_buf + copy_from, copy_len);

          } else memset(out_buf + copy_to,
                        UR(2) ? UR(256) : out_buf[UR(temp_len)], copy_len);

          break;
        }

Next two mutators are enabled only when there are either dictionary items provided by the user or auto collect sub-stage collected several items during the bitflip stage. The first one overwrites an extra at a random position while the second one inserts.

      /* Values 15 and 16 can be selected only if there are any extras
         present in the dictionaries. */
      case 15: {
          /* Overwrite bytes with an extra. */
          if (!extras_cnt || (a_extras_cnt && UR(2))) {
            /* No user-specified extras or odds in our favor. Let's use an
               auto-detected one. */

            u32 use_extra = UR(a_extras_cnt);
            u32 extra_len = a_extras[use_extra].len;
            u32 insert_at;

            if (extra_len > temp_len) break;

            insert_at = UR(temp_len - extra_len + 1);
            memcpy(out_buf + insert_at, a_extras[use_extra].data, extra_len);
          } else {
            /* No auto extras or odds in our favor. Use the dictionary. */
            u32 use_extra = UR(extras_cnt);
            u32 extra_len = extras[use_extra].len;
            u32 insert_at;

            if (extra_len > temp_len) break;

            insert_at = UR(temp_len - extra_len + 1);
            memcpy(out_buf + insert_at, extras[use_extra].data, extra_len);
          }
          break;
        }
      case 16: {
          u32 use_extra, extra_len, insert_at = UR(temp_len + 1);
          u8* new_buf;

          /* Insert an extra. Do the same dice-rolling stuff as for the
             previous case. */
          if (!extras_cnt || (a_extras_cnt && UR(2))) {
            use_extra = UR(a_extras_cnt);
            extra_len = a_extras[use_extra].len;

            if (temp_len + extra_len >= MAX_FILE) break;

            new_buf = ck_alloc_nozero(temp_len + extra_len);

            /* Head */
            memcpy(new_buf, out_buf, insert_at);

            /* Inserted part */
            memcpy(new_buf + insert_at, a_extras[use_extra].data, extra_len);
          } else {
            use_extra = UR(extras_cnt);
            extra_len = extras[use_extra].len;

            if (temp_len + extra_len >= MAX_FILE) break;

            new_buf = ck_alloc_nozero(temp_len + extra_len);

            /* Head */
            memcpy(new_buf, out_buf, insert_at);

            /* Inserted part */
            memcpy(new_buf + insert_at, extras[use_extra].data, extra_len);
          }
          /* Tail */
          memcpy(new_buf + insert_at + extra_len, out_buf + insert_at,
                 temp_len - insert_at);

          ck_free(out_buf);
          out_buf   = new_buf;
          temp_len += extra_len;
          break;
        }
    }
  }

After use_stacking many mutations are applied, the target binary is run.

  if (common_fuzz_stuff(argv, out_buf, temp_len)) goto abandon_entry;

  /* out_buf might have been mangled a bit, so let's restore it to its
     original size and shape. */
  if (temp_len < len) out_buf = ck_realloc(out_buf, len);
  temp_len = len;
  memcpy(out_buf, in_buf, len);

Restore the buffer to a pristine state.

  /* If we're finding new stuff, let's run for a bit longer, limits
     permitting. */
  if (queued_paths != havoc_queued) {
    if (perf_score <= HAVOC_MAX_MULT * 100) {
      stage_max  *= 2;
      perf_score *= 2;
    }
    havoc_queued = queued_paths;
  }
}

These are for splice’ng stage. We’ll see how they configure the stage in the next section.

new_hit_cnt = queued_paths + unique_crashes;
if (!splice_cycle) {
  stage_finds[STAGE_HAVOC]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_HAVOC] += stage_max;
} else {
  stage_finds[STAGE_SPLICE]  += new_hit_cnt - orig_hit_cnt;
  stage_cycles[STAGE_SPLICE] += stage_max;
}

As usual, stats are saved for later use.

Stage: splice

By default, this stage is not enabled. It is enabled when we have a full cycle without any new findings. Let’s see relevant lines in main():

 /* If we had a full queue cycle with no new finds, try
    recombination strategies next. */
 if (queued_paths == prev_queued) {
   if (use_splicing) cycles_wo_finds++; else use_splicing = 1;
 } else cycles_wo_finds = 0;

Also, -d command-line switch enables splicing which basically makes the fuzzer skip deterministic stages.

#ifndef IGNORE_FINDS
/************
 * SPLICING *
 ************/
/* This is a last-resort strategy triggered by a full round with no findings.
   It takes the current input file, randomly selects another input, and
   splices them together at some offset, then relies on the havoc
   code to mutate that blob. */
retry_splicing:
if (use_splicing && splice_cycle++ < SPLICE_CYCLES &&
    queued_paths > 1 && queue_cur->len > 1) {

  struct queue_entry* target;
  u32 tid, split_at;
  u8* new_buf;
  s32 f_diff, l_diff;

  /* First of all, if we've modified in_buf for havoc, let's clean that up. */
  if (in_buf != orig_in) {
    ck_free(in_buf);
    in_buf = orig_in;
    len = queue_cur->len;
  }

First restore in_buf to a pristine state.

  /* Pick a random queue entry and seek to it. Don't splice with yourself. */
  do { tid = UR(queued_paths); } while (tid == current_entry);
  splicing_with = tid;
  target = queue;

  while (tid >= 100) { target = target->next_100; tid -= 100; }
  while (tid--) target = target->next;

  /* Make sure that the target has a reasonable length. */
  while (target && (target->len < 2 || target == queue_cur)) {
    target = target->next;
    splicing_with++;
  }

  if (!target) goto retry_splicing;

The above code basically tries to find a random item in the queue to splice with. If it fails, retries until it gets one.

  /* Read the testcase into a new buffer. */
  fd = open(target->fname, O_RDONLY);
  if (fd < 0) PFATAL("Unable to open '%s'", target->fname);
  new_buf = ck_alloc_nozero(target->len);
  ck_read(fd, new_buf, target->len, target->fname);
  close(fd);

Let’s load the target into the new_buf buffer.

  /* Find a suitable splicing location, somewhere between the first and
     the last differing byte. Bail out if the difference is just a single
     byte or so. */
  locate_diffs(in_buf, new_buf, MIN(len, target->len), &f_diff, &l_diff);

This one compares two buffers in_buf and new_buf. Basically, we are looking for splice locations in both buffers. This is a point where these two buffers differ. f_diff is the position where these two first differs. l_diff is the last position before len where they differ. They are set to -1 if the search fails.

  if (f_diff < 0 || l_diff < 2 || f_diff == l_diff) {
    ck_free(new_buf);
    goto retry_splicing;
  }

Retry if two buffers match everywhere or the difference is less than two bytes or differ only in the last byte.

  /* Split somewhere between the first and last differing byte. */
  split_at = f_diff + UR(l_diff - f_diff);

Select a random location after f_diff for splicing.

  /* Do the thing. */
  len = target->len;
  memcpy(new_buf, in_buf, split_at);
  in_buf = new_buf;

Do the splicing.

  ck_free(out_buf);
  out_buf = ck_alloc_nozero(len);
  memcpy(out_buf, in_buf, len);
  goto havoc_stage;
}
#endif /* !IGNORE_FINDS */

Copy the new buffer to out_buf and run havoc stage and we are done.

Is that all? Nope! Let’s revisit splice’ng related configurations in the above code.

} else {
  static u8 tmp[32];
  perf_score = orig_perf;
  sprintf(tmp, "splice %u", splice_cycle);
  stage_name  = tmp;
  stage_short = "splice";
  stage_max   = SPLICE_HAVOC * perf_score / havoc_div / 100;
}

The constant SPLICE_HAVOC is defined to be 32 in config.h. havoc_div is always 1 not sure why !? Therefore, for each perf_score we get 32 rounds of mutations. What’s with this perf_score? There is an assignment at the beginning of fuzz_one():

  orig_perf = perf_score = calculate_score(queue_cur);

Let’s visit the definition.

/* Calculate case desirability score to adjust the length of havoc fuzzing.
   A helper function for fuzz_one(). Maybe some of these constants should
   go into config.h. */
static u32 calculate_score(struct queue_entry* q) {
  u32 avg_exec_us = total_cal_us / total_cal_cycles;

Average execution time provided by the calibration stage.

  u32 avg_bitmap_size = total_bitmap_size / total_bitmap_entries;

Average bitmap size.

  u32 perf_score = 100;

We start with an initial perf_score of 100.

  /* Adjust score based on execution speed of this path, compared to the
     global average. Multiplier ranges from 0.1x to 3x. Fast inputs are
     less expensive to fuzz, so we're giving them more air time. */
  if (q->exec_us * 0.1 > avg_exec_us) perf_score = 10;
  else if (q->exec_us * 0.25 > avg_exec_us) perf_score = 25;
  else if (q->exec_us * 0.5 > avg_exec_us) perf_score = 50;
  else if (q->exec_us * 0.75 > avg_exec_us) perf_score = 75;
  else if (q->exec_us * 4 < avg_exec_us) perf_score = 300;
  else if (q->exec_us * 3 < avg_exec_us) perf_score = 200;
  else if (q->exec_us * 2 < avg_exec_us) perf_score = 150;

Adjust according to the execution time. For example, if it takes 10x the average, set perf_score to 10. Note that these values are absolute.

  /* Adjust score based on bitmap size. The working theory is that better
     coverage translates to better targets. Multiplier from 0.25x to 3x. */
  if (q->bitmap_size * 0.3 > avg_bitmap_size) perf_score *= 3;
  else if (q->bitmap_size * 0.5 > avg_bitmap_size) perf_score *= 2;
  else if (q->bitmap_size * 0.75 > avg_bitmap_size) perf_score *= 1.5;
  else if (q->bitmap_size * 3 < avg_bitmap_size) perf_score *= 0.25;
  else if (q->bitmap_size * 2 < avg_bitmap_size) perf_score *= 0.5;
  else if (q->bitmap_size * 1.5 < avg_bitmap_size) perf_score *= 0.75;

Relative adjustment according to the bitmap_size. They are not exact but sane enough. For instance, 10/3 = 3.33 > 3.

  /* Adjust score based on handicap. Handicap is proportional to how late
     in the game we learned about this path. Latecomers are allowed to run
     for a bit longer until they catch up with the rest. */
  if (q->handicap >= 4) {
    perf_score *= 4;
    q->handicap -= 4;
  } else if (q->handicap) {
    perf_score *= 2;
    q->handicap--;
  }

These are to fix the latecomers as stated in the comments.

  /* Final adjustment based on input depth, under the assumption that fuzzing
     deeper test cases is more likely to reveal stuff that can't be
     discovered with traditional fuzzers. */
  switch (q->depth) {
    case 0 ... 3:   break;
    case 4 ... 7:   perf_score *= 2; break;
    case 8 ... 13:  perf_score *= 3; break;
    case 14 ... 25: perf_score *= 4; break;
    default:        perf_score *= 5;
  }

Relative adjustments according to the queue depth.

  /* Make sure that we don't go over limit. */
  if (perf_score > HAVOC_MAX_MULT * 100) perf_score = HAVOC_MAX_MULT * 100;
  return perf_score;
}

HAVOC_MAX_MULT is defined in config.h to be 16. The last predicate checks if perf_score is larger than 1600. If that’s the case, the upper bound is assigned. This concludes the scoring algorithm.

After all stages are complete, fuzz_one() ends with cleanup.

  ret_val = 0;
abandon_entry:
  splicing_with = -1;

  /* Update pending_not_fuzzed count if we made it through the calibration
     cycle and have not seen this entry before. */
  if (!stop_soon && !queue_cur->cal_failed && !queue_cur->was_fuzzed) {
    queue_cur->was_fuzzed = 1;
    pending_not_fuzzed--;
    if (queue_cur->favored) pending_favored--;
  }
  munmap(orig_in, queue_cur->len);
  if (in_buf != orig_in) ck_free(in_buf);
  ck_free(out_buf);
  ck_free(eff_map);

  return ret_val;
#undef FLIP_BIT
}

Conclusion

Wow! Never thought we’ll come this far. What we have learned in this chapter? Let’s have a recap. Bitflip stage flips consequtive bits with variants 1,2,4,8,16,32 bits. Arithmetic stage adds or subtracts values in [0..35] and it has 8,16,32 bits variants. Interesting stage tries achieve some kind of wierd behaviour by replacing values with constants. It has 8, 16 and 32-bit variants. Stage extras acquires data from a dictionary provided by the user and it has two variants, one overwrites, the other inserts. Similarly, auto extras stage follows a similar path however the dictionary is a dynamic one auto-collected during the first bitflip stage. These are all deterministic stages other than one of the extras stages. That particular stage runs probabilisticly depending on the number of items in the dictionary. Havoc and Splice stages run totally probabilisticly by applying randomly selected mutations. The number of stages in Splice stage depends on perf_score and it is calculated for each query item independently.

Hope you enjoyed it.