Software engineer series: Fuzzing I

Software engineer series: Fuzzing I

For those who can't afford to make mistakes

Fuzzing is a technique to check against unexpected errors, which lead to software quality and security assurance. On the contrary than traditional software testing methodologies, where we specify a set of cases to prove for correctness. In fuzzing we rely on random data chunks, also known as corpus, to be sent directly to the application. Historically, this has been done in a clever way instrumenting the application to gather code coverage information, that gives us information on which code paths are triggered by the data corpus. In that way most fuzzer frameworks perform an evaluation loop where they mutate the data corpus searching for a buggy code path.

A simple web application

The example consist of a simple TCP application that process requests.

The request consist of a magic number which hold the magic value 0x4545, a request type, its length and the message itself.

struct request_header {
  uint16_t magic;
  request_type type;
  uint16_t length;
};

On the processing side once we check the magic number we inspect the request type. Where the critical path is on parse_message

void parse_request(const uint8_t *raw) {
  auto h = reinterpret_cast<const request_header *>(raw);
  if (h->magic != 0x4545) {
    std::printf("Invalid magic number\n");
    return;
  }

  auto *data = raw + sizeof(request_header);

  switch (h->type) {
  case request_type::HELLO:
    std::printf("HELLO\n");
    break;
  case request_type::GOODBYE:
    std::printf("GOODBYE\n");
    break;
  case request_type::MESSAGE:
    std::printf("MESSAGE: ");
    parse_message(h, data);
    break;
  default:
    std::printf("Unknown request type\n");
    break;
  }
}

And here is its implementation:

void parse_message(const request_header *h, const uint8_t *data) {
  uint8_t msg[MSG_SZ];

  std::memcpy(msg, data, h->length);
  msg[MSG_SZ - 1] = '\0';
}

The application has a potential bug. If you have not spotted it, we will get there in a moment. Let´s go back for a moment and think on testing strategies for this code. As testing this would require combinatorial tests, checking for each branch that the code can reach and for edge cases, and that means valuable development time. Instead of that just use libFuzzer[1] and see what it can find out..

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, std::size_t Size) {
  parse_request(Data);
  return 0;
}

To run the snippet we need to instrument our code using -fsanitize=fuzzer and run the parse_request_fuzz

...
Invalid magic number
Invalid magic number
MESSAGE: 
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==7794==ERROR: UndefinedBehaviorSanitizer: stack-overflow on address 0x7ffd6d724000 (pc 0x7f11a3f0dc07 bp 0x7ffd6d721670 sp 0x7ffd6d7215f8 T7794)
    #0 0x7f11a3f0dc07  (/lib64/libc.so.6+0x15ec07) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)

SUMMARY: UndefinedBehaviorSanitizer: stack-overflow (/lib64/libc.so.6+0x15ec07) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756) 
==7794==ABORTING
MS: 2 CrossOver-InsertByte-; base unit: 1c5a8986e3206987ea816de97b488280c319bc82
0x45,0x45,0x2,0x45,0x45,0xcd,
EE\002EE\315
artifact_prefix='./'; Test unit written to ./crash-d23ca2d4b8e04e5e320161d684ac16ba9b27fdef
Base64: RUUCRUXN

That informs that something went wrong and it dumps the payload that caused it onto a file. As well in the console output we got.

Hex: 0x45,0x45,0x2,0x45,0x45,0xcd,
Binary: EE\002EE\315
# Also we can get it from the dump file
cat <fuzz_dump> | xxd

We can test against that input in the following way

# Terminal 1
./server
# Terminal 2
 cat <payload> | nc localhost 9000

It crashed the application as expected. We can set up a debugging environment to inspect what failed, in my case I use lldb.

lldb ./server
(lldb) breakpoint set --file request.cpp --name parse_message
(lldb) run
# on terminal 2 send the payload
Process 49205 launched: '/workspace/out/build/x64-linux-libFuzzer/server/server' (x86_64)
Process 49205 stopped
* thread #1, name = 'server', stop reason = breakpoint 1.1
    frame #0: 0x0000000000401380 server`parse_message(h=0x00007fffffffd8f0, data="\n") at request.cpp:23:3
   20     // Remove conditional for unchecked bounds on memcpy
   21     // Discoverable via fuzzing
   22     // if (h->length < 64) {
-> 23     std::memcpy(msg, data, h->length);
   24     msg[MSG_SZ - 1] = '\0';
   25     std::printf("%s\n", msg);
   26     // }
(lldb) frame variable *h
(const request_header) *h = (magic = 17733, type = MESSAGE, length = 581)

The problem is that we don't check the message length, so memcpy and printf operations fail.

In the next part of the series will cover fuzztest which is another library that gives you an option for a lightweight fuzz testing where you define some bounds and datatypes to test for a time interval, or to do an instrumented test which fails until it encounters an error.

Source code: A tour into Fuzzing repository

Remarks

[1] llvm.org/docs/LibFuzzer.html

[2] Fuzzing against the machine

[3] Low level programming - finding bugs in software is easier than you think

Did you find this article valuable?

Support Isaac David by becoming a sponsor. Any amount is appreciated!