redpwnCTF 2021 - Pickled Onions
- Author: clubby789
- Date:
Reversing
Pickled Onions was a reversing challenge where we are given a small Python file that loads a pickle object, and checks a flag that we input. It turns out to be a nested, self-unpacking program written in the Pickle bytecode language.
Stage 1
The initial challenge file looks like this:
1__import__('pickle').loads(b'(I128\nI4\nI99\n...\x85R\x85R.')
When we run it:
What is the flag? test
Nope!
Pickle is a Python standard module allowing serialization of objects. Pickled objects can define a routine used while unpickling, and this code is included in the pickled string. The bytecode language seems to be a subset of Python. It is stack based, with some ‘memo’ slots for more persistent storage, and can build dicts, tuples and call functions.
While researching, I found the pickletools module, which allows us to disassemble a pickle string, and optionally annotate the meaning of each opcode. We will begin by disassembling the bytecode.
>>> pickletools.dis(challenge, annotate=30)
    0: (    MARK                      Push markobject onto the stack.
    1: I        INT        128        Push an integer or bool.
    6: I        INT        4          Push an integer or bool.
    9: I        INT        99         Push an integer or bool.
   13: I        INT        112        Push an integer or bool.
   18: I        INT        105        Push an integer or bool.
   23: I        INT        99         Push an integer or bool.
   27: I        INT        107        Push an integer or bool.
   32: I        INT        108        Push an integer or bool.
   37: I        INT        101        Push an integer or bool.
   42: I        INT        10         Push an integer or bool.
   46: I        INT        105        Push an integer or bool.
   51: I        INT        111        Push an integer or bool.
   56: I        INT        10         Push an integer or bool.
   60: I        INT        40         Push an integer or bool.
   64: I        INT        140        Push an integer or bool.
   69: I        INT        18         Push an integer or bool.
[ ... ]
544273: t        TUPLE      (MARK at 0) Build a tuple out of the topmost stack slice, after markobject.
544274: p    PUT        69420           Store the stack top into the memo.  The stack is not popped.
544281: c    GLOBAL     'pickle loads'  Push a global object (module.attr) on the stack.
544295: c    GLOBAL     'builtins bytes' Push a global object (module.attr) on the stack.
544311: g    GET        69420            Read an object from the memo and push it on the stack.
544318: \x85 TUPLE1                      Build a one-tuple out of the topmost item on the stack.
544319: R    REDUCE                      Push an object built from a callable and an argument tuple.
544320: \x85 TUPLE1                      Build a one-tuple out of the topmost item on the stack.
544321: R    REDUCE                      Push an object built from a callable and an argument tuple.
544322: .    STOP                        Stop the unpickling machine.
When unpickled, this string pushes thousands of ints to the stack, then builds a tuple out of them, storing it at memo position 69420. It then loads the bytes function, converting the tuple into a bytestring, then calls pickle.loads on it. We can deduce that the integers pushed are another pickled object, and extract it.
Stage 2
The stage2 pickle code is much more complex. it pushes a number of constants, which themselves appear to be more nested pickles, and a number of names such as pickledgreekoregano. It then builds these all into a dict, and calls input with the What is the flag? prompt. At this point I was stuck for a while, as I wasn’t able to follow the sequence of operations. Luckily, my teammate discovered Fickling, a module written by TrailOfBits (coincidentally, a sponsor of RedPwnCTF). This module can disassemble a pickle into a Python AST - and the built-in unparse function can transform the AST into something resembling valid Python source code. Fickling didn’t have support for the INT and DICT opcodes, so I patched these in:
 1diff --git a/fickling/pickle.py b/fickling/pickle.py
 2index ab62566..3eb46c4 100644
 3--- a/fickling/pickle.py
 4+++ b/fickling/pickle.py
 5@@ -162,6 +162,10 @@ class ConstantOpcode(Opcode):
 6         interpreter.stack.append(make_constant(self.arg))
 7 
 8 
 9+class Int(ConstantOpcode):
10+    name = "INT"
11+
12+
13 class StackSliceOpcode(Opcode):
14     def run(self, interpreter: "Interpreter", stack_slice: List[ast.expr]):
15         raise NotImplementedError(f"{self.__class__.__name__} must implement run()")
16@@ -743,6 +747,18 @@ class Tuple(StackSliceOpcode):
17         interpreter.stack.append(ast.Tuple(tuple(stack_slice), ast.Load()))
18 
19 
20+class Dict(StackSliceOpcode):
21+    name = "DICT"
22+
23+    def run(self, interpreter: Interpreter, stack_slice: List[ast.expr]):
24+        k = list()
25+        v = list()
26+        for i in range(0, len(stack_slice), 2):
27+            k.append(stack_slice[i])
28+            v.append(stack_slice[i+1])
29+        interpreter.stack.append(ast.Dict(k, v))
30+
31+
32 class Build(Opcode):
33     name = "BUILD"
Here was the result:
 1from pickle import io
 2from builtins import input
 3_var0 = input('What is the flag? ')
 4_var1 = io
 5_var1.__setstate__({'pickledhorseradish': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.', 'pickledcoconut': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.', 'pickledlychee': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.', 'pickledcrabapple': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.', 'pickledportabella': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.', 'pickledquince': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.', 'pickledeasternmayhawthorn': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.', 'pickledmonstera': b'I1\n.', 'pickledcorneliancherry': b'I0\n.', 'pickledalligatorapple': b'\x80\x04I', 'pickledboysenberry': <VERY VERY LONG PICKLED DATA> 'pickledgreekoregano': '\nI', 'pickledpupunha': ('Nope!', 'Correct!'), 'pickledximenia': _var0, 'pickledgarlic': ('pickledcorneliancherry', 'pickledboysenberry')})
 6from io import pickledximenia.__len__
 7_var2 = pickledximenia.__len__()
 8from io import pickledximenia.encode
 9_var3 = pickledximenia.encode()
10_var4 = _var1
11_var4.__setstate__({'pickledburmesegrape': _var2, 'pickledximenia': _var3})
12from builtins import print
13from io import pickledpupunha.__getitem__
14from pickle import loads
15from io import pickledgarlic.__getitem__
16from io import pickledburmesegrape.__eq__
17_var5 = pickledburmesegrape.__eq__(64)
18_var6 = pickledgarlic.__getitem__(_var5)
19from io import _var6
20_var7 = loads(_var6)
21_var8 = pickledpupunha.__getitem__(_var7)
22_var9 = print(_var8)
23result = _var9
At first it may not be immediately obvious what this is doing, but if we follow it through it begins to make sense. We see pickle.io is assigned to var1, which then has __setstate__ called upon it. setstate assigns each key/value pair as a property of pickle.io - this is so these constant strings are added to the io module, and can be accessed later on. We can rewrite this as regular Python:
 1flag = input("What is the flag? ")
 2io.pickledhorseradish = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.'
 3io.pickledcoconut = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.'
 4io.pickledlychee = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.'
 5io.pickledcrabapple = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.'
 6io.pickledportabella = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.'
 7io.pickledquince = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.'
 8io.pickledeasternmayhawthorn = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.'
 9# Each of these pushes io.pickledmacadamia.__add__ to the stack followed by (pickledbarberry,)
10# Or another method
11io.pickledmonstera = 'I1\n.'
12io.pickledcorneliancherry = 'I0\n.'
13io.pickledalligatorapple = '\x80\x04I'
14io.pickledboysenberry = 'VERY LONG'
15io.pickledgreekoregano = '\nI'
16io.pickledpupunha =  ("Nope!", "Correct!")
17io.pickledximenia = flag
18io.pickledgarlic = ("pickledcorneliancherry", "pickledboysenberry")
19io.pickledburmesgrape = flag.len()
20io.pickledximenia = flag.encode()
21var5 = int(io.pickledburmesgrape == 64) # 1 if len(input) is 64, 0 otherwise
22var6 = pickledgarlic[var5] 				# pickledcorneliancherry if len(input) is not 64,  pickledboysenberry if it is
23var7 = loads(var6)						# loads(b'I0\n.') if len(input) is not 64, loads(b'VERY LONG') if it is
24var8 = pickledpupunha[var7]				# 'Nope!' if returned value is 0, 'Correct!' if it's 1
25print(var8)
The first few variables saved are constants which, when unpickled, will compare io.pickledmacadamia to io.pickledbarberry in some way. If our flag length is 64, it will run the pickle in pickledboysenberry - otherwise, it will skip it and return 0. Our flag is saved to pickledximenia.
Stage 3
Again running Fickling on our new pickle, we receive this output
 1from pickle import io
 2from io import pickledgreekoregano.join
 3from builtins import map
 4from builtins import str
 5_var0 = map(str, pickledximenia)
 6_var1 = b"\nI".join(_var0)
 7_var2 = io
 8_var2.__setstate__({'pickledximenia': _var1})
 9from io import pickledximenia.encode
10_var3 = pickledximenia.encode()
11_var4 = _var2
12_var4.__setstate__({'pickledximenia': _var3})
13from io import pickledalligatorapple.__add__
14from io import pickledximenia
15_var5 = pickledalligatorapple.__add__(pickledximenia)
16_var6 = _var4
17_var6.__setstate__({'pickledalligatorapple': _var5})
18from io import pickledalligatorapple.__add__
19_var7 = pickledalligatorapple.__add__(b'\n')
20_var8 = _var6
21_var8.__setstate__({'pickledalligatorapple': _var7})
22from io import pickledalligatorapple
23_var9 = _var8
24_var9.__setstate__({'pickledblackapple': pickledalligatorapple, 'pickledyellowpeartomato': 1})
If we clean it up:
1flag = "\nI".join([str(x) for x in pickledximenia]).encode()
2flag = b'\x80\x04I' + flag + b'\n'
3io.pickledblackapple = flag
4io.pickledyellowpeartomato = 1
Our flag is converted into 64 integers which are pushed onto the stack.
The next part:
1from io import pickledblackapple.__add__
2_var10 = pickledblackapple.__add__(b'p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\ndb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__gt__\nI32\n\x85R\x85Rdb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__lt__\nI127\n\x85R\x85Rdb0')
3_var11 = _var9
4_var11.__setstate__({'pickledblackapple': _var10})
This part is repeated 64 times, and looks like this when disassembled
>>> pickletools.dis(a, annotate=50)
    4: p    PUT        0                                  Store the stack top into the memo.  The stack is not popped.
    7: 0    POP                                           Discard the top stack item, shrinking the stack by one item.
    8: c    GLOBAL     'pickle io'                        Push a global object (module.attr) on the stack.
   19: (    MARK                                          Push markobject onto the stack.
   20: \x8c     SHORT_BINUNICODE 'pickledmacadamia'       Push a Python Unicode string object.
   38: g        GET        0                              Read an object from the memo and push it on the stack.
   41: d        DICT       (MARK at 19)                   Build a dict out of the topmost stack slice, after markobject.
   42: b    BUILD                                         Finish building an object, via __setstate__ or dict update.
   43: (    MARK                                          Push markobject onto the stack.
   44: \x8c     SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
   69: c        GLOBAL     'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
  105: c        GLOBAL     'io pickledmacadamia.__gt__'   Push a global object (module.attr) on the stack.
  133: I        INT        32                             Push an integer or bool.
  137: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  138: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  139: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  140: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  141: d        DICT       (MARK at 43)                   Build a dict out of the topmost stack slice, after markobject.
  142: b    BUILD                                         Finish building an object, via __setstate__ or dict update.
  143: (    MARK                                          Push markobject onto the stack.
  144: \x8c     SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
  169: c        GLOBAL     'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
  205: c        GLOBAL     'io pickledmacadamia.__lt__'   Push a global object (module.attr) on the stack.
  233: I        INT        127                            Push an integer or bool.
  238: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  239: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  240: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  241: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  242: d        DICT       (MARK at 143)                  Build a dict out of the topmost stack slice, after markobject.
  243: b    BUILD                                         Finish building an object, via __setstate__ or dict update.
  244: 0    POP                                           Discard the top stack item, shrinking the stack by one item.
This looks complex, but with a quick glance at the constants and functions, we can guess the purpose - checking that each character of the flag is between 32 and 127 - or 0x20 <= c <= 0x7f, i.e. the printable ASCII range.
We then reach the net part, which is where the flag checking logic actually begins:
1from io import pickledalligatorapple.__add__
2_var140 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000000p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI102\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x11pickledbluetongue\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
3from io import pickledalligatorapple.__add__
4_var141 = pickledalligatorapple.__add__(b'00000000000000000000000000000000000000000000000000000000000000p0\n00cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI108\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x16pickledroseleafbramble\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
5from io import pickledalligatorapple.__add__
6_var142 = pickledalligatorapple.__add__(b'0000000000000000000000000000000000000000000000000000000000000p0\n000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI97\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0cpickledolive\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
7from io import pickledalligatorapple.__add__
8_var143 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000p0\n0000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI103\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0fpickledvoavanga\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
9_var144 = _var139
If we disassemble the first, we see:
    POP * 63
    4: p    PUT        0
    7: 0    POP
    8: c    GLOBAL     'pickle io'
   19: (    MARK
   20: \x8c     SHORT_BINUNICODE 'pickledmacadamia'
   38: g        GET        0
   41: \x8c     SHORT_BINUNICODE 'pickledbarberry'
   58: I        INT        102
   63: \x8c     SHORT_BINUNICODE 'pickledgarlic'
   78: \x8c     SHORT_BINUNICODE 'pickledcorneliancherry'
  102: \x8c     SHORT_BINUNICODE 'pickledbluetongue'
  121: \x86     TUPLE2
  122: d        DICT       (MARK at 19)
  123: b    BUILD
  124: c    GLOBAL     'pickle loads'
  138: \x8c SHORT_BINUNICODE 'io'
  142: c    GLOBAL     'io pickledgarlic.__getitem__'
  172: c    GLOBAL     'pickle loads'
  186: c    GLOBAL     'io pickledcrabapple'
  207: \x85 TUPLE1
  208: R    REDUCE
  209: \x85 TUPLE1
  210: R    REDUCE
  211: \x93 STACK_GLOBAL
  212: \x85 TUPLE1
  213: R    REDUCE
  214: .    STOP
This constructs a dict of {pickledmacadamia: <first char>, pickledbarberry: 102, pickledgarlic: (pickledbluetongue, pickledcorneliancherry)}
We then use pickledcrabapple (earlier defined as pickledmacadamia.__eq__(pickledbarberry)). This is repeated 4 times, with different numbers of pops and constants. If we convert the constants to ASCII, we get flag - this pops the first 4 characters, and checks that they are ‘flag’.
Checking Logic
The rest of the checking logic is a little more complex. They each load two characters from the flag, and compare them using sub, xor, eq, ne, le, ge. Some checks also add two characters together then use eq to check if it matches a constant. This is a perfect case for solving using Z3.
We create a stub like this:
1from z3 import *
2s = Solver()
3flag = [BitVec('flag_%s' % i, 8) for i in range(0, 64)]
4for c in flag:
5    s.add(c <= 0x7f)
6    s.add(c >= 0x20)
Then all we need to do is parse each check and convert it into a z3-style check.
 1data = open("checks.txt").read().split("\n")
 2import pickle
 3
 4for line in data:
 5    # Convert the data into a bytestring literal
 6    pickletext = eval("b'" + line + "'")
 7    example = pickletext
 8    # Create a fake stack so it will unpickle properly. The value at each index is the index, so we can track the position of the checks
 9    flag = [i for i in range(0, 64)]
10    flag = b"I" + b'\nI'.join([str(i).encode('latin1') for i in flag]) + b"\n"
11    end = example.index(b'dbc') + 1
12    # Slice it up to the parts calling functions - we only want to build the dict
13    example = example[:end]
14    example = flag + example + b"."
15    example = (pickle.loads(example))
16    # Load the 2
17    p0 = example["pickledmacadamia"]
18    p1 = example["pickledbarberry"]
19    # Check which operation is used, and output the appropriate check
20    if b"pickledhorseradish" in pickletext:
21      tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
22      print(f"s.add(flag[{p0}] + flag[{p1}] == {tmp})")
23    if b"pickledlychee" in pickletext:
24      tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
25      print(f"s.add(flag[{p0}] ^ flag[{p1}] == {tmp})")
26    if b"pickledcoconut" in pickletext:
27      tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
28      print(f"s.add(flag[{p0}] - flag[{p1}] == {tmp})")
We are now able to print the z3-generated model, and read the flag:
flag{n0w_th4t5_4_b1g_p1ckl3_1n_4_p1ckl3_but_1t_n33d3d_s0m3_h3lp}