Swift: How did I do horrible things?
Warning: some assembly required
Previously, I did a horrible thing by hooking up to a Swift internal library function then probing to determine its arguments.
In today's post I want to cover how I went about doing that. Perhaps it will help you in your debugging adventures.
Taking the Dump
The first step is to dump the dylib exported symbols so we can go sifting for executable gold. We do that using the built-in nm
command. (I encourage you to check out the man page.)
For our purposes, we want to dump the Swift standard library located at
<xcode.app>/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/macosx
. The file is libswift_stdlib_core.dylib
, so we run nm -a libswift_stdlib_core.dylib > stdlibdump.txt
The output will contain a list of both public and private symbols:
000000000013e000 (__TEXT,__text) non-external +[SwiftObject allocWithZone:]
000000000013e030 (__TEXT,__text) non-external +[SwiftObject alloc]
000000000013e060 (__TEXT,__text) non-external +[SwiftObject class]
000000000013dff0 (__TEXT,__text) non-external +[SwiftObject initialize]
000000000013e0b0 (__TEXT,__text) non-external +[SwiftObject isMemberOfClass:]
000000000013e1f0 (__TEXT,__text) non-external +[SwiftObject isSubclassOfClass:]
000000000013dfe0 (__TEXT,__text) non-external +[SwiftObject load]
000000000013e230 (__TEXT,__text) non-external +[SwiftObject respondsToSelector:]
000000000013e080 (__TEXT,__text) non-external +[SwiftObject superclass]
Already we can see there is some Objective-C class called SwiftObject
(and indeed that is the hidden base class for classes defined in Swift.) In this case we're interested in demangling type names, so we just do the simple thing and search for demangle and bam: _swift_demangleSimpleClass
. If you haven't already noticed, you may now realize these symbols do not indicate their argument or return types. We can be reasonably sure they're executable since they're in the TEXT
section, but that's about it.
What do I mean by
TEXT
section? That's a long and complicated topic, but in short executable formats define different types of sections and for historical reasons the one containing executable code is called the TEXT section.
Other sections include DATA which has things like bss subsections that contain static initialization values, or const that contain constants.
The sections are important because they change how the loader treats them. The TEXT pages are marked execute no write, bss sections are loaded read/write, no execute, and typically copy-on-write so modifications don't get paged back out to storage, while const may be loaded read-only. So when you writestatic int x = 5
, the value5
is stored in the bss section. When the code is copied from storage to memory, the code just assumes that some address Z contains that value. If it weren't copy-on-write and your code modified x, then your system came under memory pressure the memory manager would see the memory page having the modified value as dirty and write it back to disk, changing the x from 5 to the new value in the executable!
There are other behaviors, like values in bss are guaranteed to have a specific value immediately on load with no atomic operations or locking required, a fact that dispatch_once takes advantage of
Needless to say, dynamic linking, loaders, and executable file formats are arcane but incredibly important functionality we rely on every day without giving it a second thought.
Guess Number 1
We know this is an internal function, probably not written in Swift, so let's guess that it expects a C string as input. Perhaps it returns an array of strings or a pointer to a struct?
@asmname("swift_demangleSimpleClass")
func demangleSimpleClass(mangledName: ConstUnsafePointer<Int8>) -> Int
let name = object_getClassName(instance)
let ret = demangleSimpleClass(name)
When I first ran this test, I didn't get a crash but I got no useful output either. Later while replicating the results for this blog post the code crashed with EXC_BAD_ACCESS repeatedly at the same address. Putting in an extra println()
statement before the call changed the crash address from the upper part of memory to the lower part, a massive change for a 64-bit address space.
This sort of seemingly random or nonsense behavior is a clue that we probably have memory corruption and that probably means the function takes more parameters. We can infer they are pointers due to the memory access crashes; if they were integers we'd just expect a nonsense value instead, or if the crashing address were related to the first parameter's address we might infer the second parameter to be a length or offset.
At any rate those parameters are passed in registers, so they are coming into the function as whatever random garbage happened to be left in the register. That might point to read-only memory and trigger a crash, or it might point to writable memory in some random location. Either way, not healthy for the execution of our program! The fact that putting in println()
wildly changed the crashing address is another clue.
One thing that can help in this situation is enabling memory debugging like guard malloc; we're more likely to trigger an immediate crash with an invalid pointer (at the cost of greatly increasing memory usage and slowing our program down.)
Since we can control the parameters, let's see if we can at least control the crash address by adding more parameters. The question becomes how many should we add?
Disassemble This
Perhaps disassembling the function will give us some clues. I use Hopper because it has a nice pseudocode disassembly. It turns this:
into this:
It appears the function stores off the values from rdx
, rsi
, and rdi
. Since we are dealing with compiled and optimized code, we can't be certain but this smells like saving the function arguments. Why would the function bother doing that? It's a fancy thing called register scheduling and the scheduler's interaction with the Application Binary Interface, or ABI.
ABI Digression
What does ABI actually mean in practice? It's just a set of rules that every function agrees on. These rules specify which registers contain the function's arguments, which register holds the return address, and which register will contain the return value. The rules also indicate which registers the called function is responsible for preserving and which ones it can safely overwrite. By following the rules, we can be certain that we can call functions in other libraries compiled with other compilers and they can call us. That becomes extremely important when we want to interact with things like the operating system which will quite rudely kill your program if it chooses to ignore the rules.
The ABI is intimately tied to the hardware because the rules involve the number and size of the registers; the four most common platforms are 32-bit x86, 64-bit x86, 32-bit ARM, and 64-bit ARM. Microsoft has their own rules for Windows but everyone else has mostly settled on the Unix System V rules. Since we are disassembling the 64-bit Mac OS version of the Swift standard library, we care about the 64-bit x86 rules.
Remember that the function is immediately saving rdx
, rsi
, and rdi
. It just so happens that the 64-bit x86 ABI says up to six parameters can be passed in registers rdi, rsi, rdx, rcx, r8, r9
. An educated guess is this function expects three arguments and it is saving a copy of the original arguments because it wants to compare against them or because the scheduler decided to use the parameter registers for something else (e.g. a nested function call).
Guess Number 2
Armed with this information, let's modify our function signature and try again.
@asmname("swift_demangleSimpleClass") func demangleSimpleClass(mangledName: ConstUnsafePointer<Int8>,
ptr1:Int, ptr2:Int) -> Int
let name = object_getClassName(instance)
let ret = demangleSimpleClass(name, 0, 0)
This might look like just another crash, but it's actually a very important success. We now control the crash address which is 0x0. We can confirm by changing the ptr1
parameter to 3 and indeed the crash then gives EXC_BAD_ACCESS at 0x3 as expected.
Guess Number 3
We don't know what the pointers are for just yet. They could be pointers to a pointer of some type, or it could be a pointer to a buffer to receive some data. Let's test our theory by allocating some buffers and passing them.
@asmname("swift_demangleSimpleClass")
func demangleSimpleClass(mangledName: UnsafePointer<Int8>,
ptr1:UnsafeMutablePointer<Int>, ptr2:UnsafeMutablePointer<Int>) -> Int
let name = object_getClassName(obj)
let p1 = UnsafeMutablePointer<Int>.alloc(100)
let p2 = UnsafeMutablePointer<Int>.alloc(100)
let ret = demangleSimpleClass(name,p1,p2)
(lldb) po p1.memory
4300268944
(lldb) x 4300268944
0x10050e590: 48 6f 72 72 69 62 6c 65 54 68 69 6e 67 73 00 e0 HorribleThings..
0x10050e5a0: 46 61 6e 63 79 00 00 e0 00 00 00 00 00 00 00 e0 Fancy...........
(lldb) po p2.memory
4300268960
Now we are getting somewhere! It appears the pointer is not to a buffer, but is in fact a pointer to a pointer to a UTF-8 string containing the module or namespace of our class. Completely by accident, lldb read past that to the contents of memory at 0x10050E5A0 which is the hex equivalent of 4300268960. Knowing that we now have a much better educated guess. The first parameter is the mangled class name, the second is a pointer to a pointer to a string containing the module or namespace name, and the third parameter is a pointer to a pointer to a string containing the class name.
Return Value
The last step involves figuring out that the return value is actually a boolean. It's fairly simple - call the function with a valid mangled class name, then call it with gibberish. We see that the last few bytes of our Int are always 1 when successful, and 0 otherwise, but the upper bytes are random values. That indicates only the lower word(s) of the register are being set which is pretty much what we'd expect for a boolean.
Results
As of Beta 5 I was unable to make Swift handle an inout UnsafePointer
parameter properly, so I'm just treating p1
and p2
as integers. Otherwise the call is fairly straightforward.
@asmname("swift_demangleSimpleClass")
func demangleSimpleClass(mangledName: UnsafePointer<Int8>,
inout moduleName: Int,
inout className: Int) -> Bool
let name = object_getClassName(obj)
var p1 = 0, p2 = 0
let ret = demangleSimpleClass(namen, &p1, &p2)
if ret {
let moduleName = String.fromCString(UnsafePointer<CChar>(p1))!
let className = String.fromCString(UnsafePointer<CChar>(p2))!
println("Module: \(moduleName), Class: \(className)")
} else {
println("Failure")
}
Final Thoughts
This was a long post touching on a wide variety of topics but I hope it demonstrates how a little bit of knowledge about the underlying platform can go a long way. These kinds of skills might not be useful every day, but when you need them they are absolutely invaluable (plus it's always fun to peer behind the curtain and see the unseen.)
Personal Notes
The movers have packed up the house and I am flying to San Francisco Tuesday to look for a place to live, with the family to follow later in September. Wish me luck, the market is absolutely insane.
I may not have time to blog for the next few weeks, but then again I told myself I wouldn't have time to blog during the move yet here we are. I suspect I am unable to stop. I hope to tackle something a bit more practical next time!
This blog represents my own personal opinion and is not endorsed by my employer.