Swift: Let's Query DNS

Interop in the real world

Russ Bishop

July 19, 2014

So far we've been going over some of the interop capabilities in Swift, but today I want to switch gears a bit and actually put this stuff to use. Along the way we'll discover some annoyances and take opportunities not available in Objective-C to remedy them.

Let's say we want to determine if a host is reachable. To do that, we can use the getaddrinfo() function. This function is IPv4 vs IPv6 agnostic and combines both hostname lookup and service lookup which usually saves us a lot of trouble.

func getaddrinfo(node: CString, 
      service: CString, 
      hints: ConstUnsafePointer<addrinfo>, 
      result: UnsafePointer<UnsafePointer<addrinfo>>) 
-> Int32

Here we pass the node (host) name, service (name or port) along with an addrinfo struct to give some hints about what sort of address family and protocol we want. The result is a pointer to a pointer to a struct of the same addrinfo; as we'll see a little later, this result is actually a linked list of structs we can traverse to enumerate all the responses.

Throughout this post I'm going to gloss over some of the details on the structs and parameters - check out the OS X man pages for the full explanation.

Making the Call

var hints = addrinfo(
    ai_flags: 0,
    ai_family: AF_UNSPEC,
    ai_socktype: SOCK_STREAM,
    ai_protocol: IPPROTO_TCP,
    ai_addrlen: 0,
    ai_canonname: nil,
    ai_addr: nil,
    ai_next: nil)

let host:CString = "www.apple.com"
let port:CString = "http" //could use "80" here
var result = UnsafePointer<addrinfo>.null()

let error = getaddrinfo(csHost, csPort, &hints, &result)

First we declare the hints:addrinfo struct. In Swift, we're required to initialize the struct before using it so we must provide values for all the fields. The only ones we care about are the family as Unspecified (meaning IPv4 or IPv6), socket type (stream instead of datagram), and protocol TCP.

You might be tempted to declare hints with let, but if you try that you'll get an error trying to pass it to getaddrinfo. Why? Because that parameter is declared in C as const struct addrinfo * __restrict. That means it's a constant pointer to a mutable struct addrinfo, so Swift is correctly enforcing the requirements.

Aside: __restrict declares that the pointer has no aliases, which allows a lot of optimizations that are normally impossible. Aliases are bad because they force the compiler to assume in function void doIt(int * a, int * b) that a and b might happen to point to the same memory location.

Consider for(...) { *a = *a + *b; }. With aliasing, the compiler cannot place a and b in registers because writes to a might also be modifying b.

__restrict promises the compiler that this can't happen, allowing it to move both values into registers, then writing the register value back out to memory only when the loop exits.

It almost goes without saying that Swift is largely immune to this problem since we don't work with pointers directly most of the time; the default assumption is no aliasing.

Next we declare the host and service we are interested in as CString; Swift has automatic conversion via the StringLiteralConvertible protocol, though a quirk causes the generated associated type signature on CString to show up as conversion from CString to CString which makes no sense. In reality it means you can convert from String to CString.

The most interesting declaration is result. Why aren't we declaring an UnsafePointer<UnsafePointer<addrinfo>>? It turns out that would be redundant, cluttering up our code with indirection that isn't required. Instead we just pass a reference to result and Swift will automatically turn that into an UnsafePointer<T> for us.

Why use UnsafePointer<addrinfo>.null()? Because the function we are calling is going to allocate all the addrinfo structs for us. Calling UnsafePointer<addrinfo>.alloc(1) would allocate enough memory to hold one addrinfo struct and store that address, which getaddrinfo would promptly overwrite with a different address, leaking memory.

Using the Results

if(error == 0) {
    for var res = result; res; res = res.memory.ai_next {
        if res.memory.ai_family == AF_INET {
            var buffer = UnsafePointer<CChar>.alloc(Int(INET_ADDRSTRLEN))
            if inet_ntop(res.memory.ai_family,
                res.memory.ai_addr, buffer, UInt32(INET_ADDRSTRLEN))
            {
                let ipAddress = String.fromCString(CString(buffer))
                println("IPv4 \(ipAddress) for host \(host):\(port)")
            }
        }
        else if res.memory.ai_family == AF_INET6 {
            var buffer = UnsafePointer<Int8>.alloc(Int(INET6_ADDRSTRLEN))
            if inet_ntop(res.memory.ai_family,
                res.memory.ai_addr, buffer, UInt32(INET6_ADDRSTRLEN))
            {
                let ipAddress = String.fromCString(CString(buffer))
                println("IPv6 \(ipAddress) for host \(host):\(port)")
            }
        }
    }

    //free the chain
    freeaddrinfo(result)
}

First, we dutifully check the error code. Then we setup a for loop that assigns result to a new variable res, continues as long as res isn't null, and follows the next pointer in the linked list on each iteration. We're using the fact that UnsafePointer implements the LogicValue protocol to make the for loop look almost exactly like it would in C. This loop will just follow the linked list chain of pointers until it reaches the end. freeaddrinfo takes care of freeing the entire linked list, for which I am very thankful since doing it correctly would be a bit of a pain.

You'll notice we use UnsafePointer<T>.memory to dereference the pointer and access the underlying struct. We want to use the inet_ntop function to convert from network representation to presentation. In normal human words, we want to convert from network byte order binary to a readable string.

func inet_ntop(family: Int32, 
      addrInfoStruct: ConstUnsafePointer<()>, 
      destBuffer: UnsafePointer<Int8>, 
      bufferSize: socklen_t) 
-> CString

Now we have a problem; we need to give inet_ntop a destination buffer to write the string into. We have two ways of accomplishing that in Swift. We could declare a [Int8] array or possibly use ContiguousArray<T>, but here I've chosen to use an UnsafePointer<CChar> to just directly allocate some memory. I don't think any of us really know what idiomatic Swift is or much about best practices just yet, so until the community settles on a preferred direction we'll just continue to wing it.

We have some annoying casting in there and once we have a valid result we've gotta convert from the other direction to get a Swift String.

Casting Pointers

The sockaddr struct in ai_addr is actually treated somewhat like a union in that it is really a pointer to a different struct type that must be cast to the appropriate type depending on the protocol family, e.g. sockaddr_in or sockaddr_in6. How can we accomplish that in Swift?

Turns out it's relatively easy:

if res.memory.ai_family == AF_INET6 {
    let s = UnsafePointer<sockaddr_in6>(res.memory.ai_addr).memory
    let opaqueAddr = s.sin6_addr
    let portNum = s.sin6_port
}

UnsafePointer<T> has an initializer that takes an UnsafePointer<U> where U is any type you like, thereby converting the pointer from one type to another. Just as it would be in C, you should be very careful because this is an unsafe operation. Swift will not complain whatsoever if you convert an UnsafePointer<Int> to an UnsafePointer<sockaddr_in6>, you'll just start trying to read or write random memory when you access the pointer.

Making Our Lives Easier

Back to the ugly casting and having to allocate a buffer; perhaps we can make things a bit easier on ourselves.

The first extension is to String that allows us to skip the CString cast and just accept an UnsafePointer directly:

extension String {
    ///Converts from a null-terminated vector of CChar (or Int8)
    static func fromCString(buf:UnsafePointer<CChar>) -> String? {
        return String.fromCString(CString(buf))
    }

    ///Converts from a null-terminated vector of UInt8
    static func fromCString(buf:UnsafePointer<UInt8>) -> String? {
        return String.fromCString(CString(buf))
    }
}

Not bad and definitely an improvement, but why not create our own StringBuffer class to encapsulate the buffer creation and conversion?

///Represents a vector of CChar, null-terminated, 
///with automatic conversion to String.
///Useful for APIs that want a pointer to write a 
///string result, allowing you to specify the 
///max size of the allocation.
///
///This class is initialized to be filled with null, 
///so absent any writes it is always safe to
///use description or convert to String.
///
///Warning: Writes are always responsible 
///for ensuring the terminating null is present!
class StringBuffer : Printable {
    var length:Int
    var buffer:UnsafePointer<CChar>
    init(_ capacity:Int) {
        assert(capacity > 0, "capacity must be > 0")
        self.length = capacity + 1
        self.buffer = UnsafePointer.alloc(self.length)
        self.buffer.initializeZero(self.length)
    }
    convenience init(_ capacity:Int32) {
        self.init(Int(capacity))
    }
    deinit {
        self.buffer.dealloc(self.length)
    }
    @conversion func __conversion() -> String? {
        return String.fromCString(self.buffer)
    }
    @conversion func __conversion<T>() -> UnsafePointer<T> {
        return UnsafePointer<T>(self.buffer)
    }
    var description: String {
    get {
        let s = self as String?
        if let ss = s {
            return ss
        } else {
            return "[empty StringBuffer]"
        }
    }
    }
    var ulength: UInt { get { return UInt(self.length) } }
    var ulength32: UInt32 { get { return UInt32(self.length) } }
}

Now we can accept lengths in different units, keep track of them to automatically deallocate the buffer when the class is deallocated, and support automatic conversion of the buffer back to String. We use one shortcut to try and make it slightly safer: we allocate one more than the requested size and initialize the entire chunk of memory to null. Since the automatic conversion is expecting a null terminated C string ([CChar]), we ensure that the automatic conversion can't run off the end of the buffer if the user fails to include the terminator.

My Soapbox

There is a bit of a philosophical debate among C programmers as to whether that kind of feature is desirable or not; some claim you shouldn't provide those kinds of safety mechanisms because people will come to rely on them and it can hide incorrect behavior. My own feeling is that no one is perfect and memory management bugs tend to lead to massive security vulnerabilities (Heartbleed, Sasser, Slammer, Code Red, ...) that have very real costs both in terms of money and even sometimes directly leading to people's deaths (we now know that intelligence agencies were exploiting Heartbleed, including in some countries to locate dissents or political enemies and harm them). I greatly prefer safety and refuse to entertain any arguments to the contrary. The trail of failure is too long now to pretend all the if only fantasy scenarios will come true [1]. Many of the programs you depend on are written by dicks and idiots. They won't get memory management right. Deal with it.

Ahem.

Back to the Show

The ulength and ulength32 properties are just conveniences since Swift doesn't normally have built-in integer conversions, even when widening. Unfortunately the various constants involved here are just #defines, so they end up imported in Swift as Int32 even when the API involved is declared UInt32, so I made initializers that takes signed or unsigned lengths and provide some computed length properties to smooth everything over. In the real version of this code I've got some assert()s in there to blow things up if you happen to allocate a 5 GB buffer then ask for the ulength32 of it.

So what is this initializeZero call? Yet another extension:

extension UnsafePointer {
    ///Writes value to the pointer, optionally 
    ///repeating repeatCount times.
    ///All use of this function is unsafe and 
    ///should be reviewed thoroughly
    func write(value:CChar, repeatCount:UInt = 1) {
        self.write(value, startOffset: 0, repeatCount: repeatCount)
    }

    ///Writes value to the pointer, optionally 
    ///starting at the given offset and repeating for repeatCount.
    ///All use of this function is unsafe and should be reviewed thoroughly
    func write(value:CChar, startOffset:Int = 0, repeatCount:UInt = 1) {
        memset(UnsafePointer<()>(self + startOffset), 
                Int32(UInt8(value)), repeatCount)
    }

    ///Initializes the memory pointed to by UnsafePointer to zero. 
    ///num should be the number of objects of T (same as alloc()). 
    ///Passing a larger size will stomp on memory 
    ///and represents a potentially significant security risk;
    ///All use of this function is unsafe and should be reviewed thoroughly
    func initializeZero(num:Int) {
        self.write(0, startOffset:0, repeatCount:UInt(num))
    }
}

This gives you a way to write to a buffer arbitrarily and could be easily extended to support writing more than one value, but in that case you already have built-in support for that with initializeFrom. The initializeZero call might be pointless if UnsafePointer<T> initializes its own memory to nulls, but in my limited testing it didn't appear to do that so better safe than sorry.

Malloc You Too Buddy

If UnsafePointer<T> knew the size of the object it points at, I'd put more assert()s in there to make sure the memory writes were valid, but in reality it is just a simple struct with only one field of pointer size so at runtime there is nowhere to keep that information. Needless to say, use with care. Personally, I ran this example with the malloc debugging options enabled to make sure I was really using memory correctly.

Silvrback blog image

To enable malloc debugging, edit your Scheme in Xcode and check the boxes on the Diagnostics tab as shown. The downside to these options is they consume much more memory since the memory is not freed and neither are Objective-C objects; their memory sticks around forever as a literal NSZombie object which causes an immediate failure if you attempt to send it any messages. The guard pages and scribble protection go even further, allocating extra unmapped pages around each valid allocation that will trigger signals if touched and overwriting all unused memory on valid pages with specific byte patterns so any write overruns can be detected. I highly encourage you to run your application with at least one good pass using these tools before release, even if you only use ARC.

All this nonsense also enhances my warm fuzzy feeling for Swift since we don't have to deal with pointers in most cases and our array accesses are bounds-checked. Other than debugging retain cycles, memory management is a mostly hands-off affair.

End Digression

Where were we? Oh yeah, let's look at our for loop now:

    for var res = result; res; res = res.memory.ai_next {
        if res.memory.ai_family == AF_INET {
            var ipAddress = StringBuffer(INET_ADDRSTRLEN)
            if inet_ntop(res.memory.ai_family,
                res.memory.ai_addr, ipAddress, ipAddress.ulength32)
            {
                println("IPv4 \(ipAddress) for host \(host):\(port)")
            }
        }
        else if res.memory.ai_family == AF_INET6 {
            var ipAddress = StringBuffer(INET6_ADDRSTRLEN)
            if inet_ntop(res.memory.ai_family,
                res.memory.ai_addr, ipAddress, ipAddress.ulength32)
            {
                println("IPv6 \(ipAddress) for host \(host):\(port)")
            }
        }
    }

That's better and a bit clearer in intent. We know immediately what the point of ipAddress is (a buffer for a string) and and immediately infer that the constructor parameter is the size. Plus we don't have to bother converting the result, we can just use it.

Conclusion

I hope this gives you some good examples for calling C-based APIs in real Swift code.

Go Forth Swift-ly, Be JOVIAL, and C what BASIC Processing you Make... uh... Haskell.[2]

[1] If only programmers wrote better code, if only people didn't use undefined behavior, if only programmers properly sanitized their input... The list of if only*s is a long one and harkens back to the days when car makers said the same thing to justify refusing to install seat belts, air bags, and other safety features: *if only people drove better, if only we had better roads... they can't and we don't.

[2] I'm sorry