Packing Bytes in Swift

It's Unsafe Pointers All The Way Down

Russ Bishop

May 12, 2016

Today I want to do a small exercise in packing some Float32s into a SQLite Binary Large Object (BLOB) column. Sure I could use JSON, protobuf, or some other encoding. Yes I could also use NSNumber, NSArray, and NSCoder or plists.

Instead I want to do this purely in Swift and mostly analogous to what you'd do in C because it's relatively quick, doesn't introduce any dependencies, and the decoder is really simple and could be implemented on any platform.

PointEncoder

Here is the basic interface we'll end up with:

struct PointEncoder {
  // When parsing if we get a wildly large value we can
  // assume denial of service or corrupt data.
  static let MaxPoints = 1_310_719

  // How big is our count
  private static let _sizeOfCount = sizeof(Int64.self)

  // How big a point (two Float32s are)
  private static let _sizeOfPair = 2 * sizeof(Float32.self)

  static func encodePoints(points: [CGPoint]) -> NSData? {
  static func decodePoints(data: NSData) -> [CGPoint]
}

The MaxPoints number was chosen to be around 10MB of data at maximum. That's well over the limit I can even use in my case because these points have to be communicated to the server and it will drop the connection before we could even transmit that much data over cellular or slow WiFi. Each situation is different so pick a number appropriate to your specific use.

Then we need to get the sizes of some types. It is usually much easier to reason about the algorithm if you give these sizes explicit names and define them in one place, rather than spreading sizeof() all over the place.

encoding

Now let's walk through the implementation of encodePoints:

guard !points.isEmpty && points.count < MaxPoints else { return nil }

// Total size of the buffer
let bufferLength = _sizeOfCount + (points.count * _sizeOfPair)
precondition(bufferLength >= (_sizeOfCount + _sizeOfPair), "Empty buffer?")
precondition(bufferLength < megabytes(10), "Buffer would exceed 10MB")

The first step is to make sure we have something to encode and that it doesn't exceed our size limits.

The next step is to figure out what the size of our buffer should be and check that it is big enough but not too big. The earlier check for isEmpty should in theory ensure that the first check never fails but if someone refactors the code later that could change. Then we check that no one tricked us into attempting to allocate a huge buffer.

This is one of those situations where I favor extra safety checks mostly due to human nature. Someone may refactor the code later and accidentally introduce a bug, but the odds that they'll remove the precondition checks are much lower... at least if they're good programmers. A precondition next to allocating memory screams "Possible danger here, be extra careful!".

let rawMemory = UnsafeMutablePointer<Void>.alloc(bufferLength)

// Failed to allocate memory
guard rawMemory != nil else { return nil }

The next step is to actually allocate the buffer, then bail out if we failed.

It is very difficult to make programs resilient in the face of true out of memory conditions. If creating a new class instance fails because we're out of memory the process will just abort() because even simple logging or print statements often do some memory allocations and the language has no way to communicate the failure anyway (it would effectively make all initializers failable).

However allocating larger buffers is much more likely to fail even if there is some memory available due to heap fragmentation so it makes perfect sense (especially in a constrained environment like iOS) to attempt to handle it gracefully.

UnsafeMutablePointer<Int64>(rawMemory).memory = Int64(points.count)

let buffer = UnsafeMutablePointer<Float32>(rawMemory + sizeOfCount)

There's a bit to unpack here. The right-hand side is just converting points.count to a 64-bit integer so the size doesn't vary by platform (Swift's Int type matches the native integer type on the platform it is compiled for, so 32-bit on 32-bit systems, 64-bit on 64-bit systems). We don't want the user to get a crash or corrupt data if they happen to upgrade devices!

The left-hand side casts the rawMemory pointer to an Int64 pointer, then immediately assigns the 64-bit count to that memory address. A 64-bit integer is 8 bytes so the first 8 bytes of our allocated memory now contain the size.

Lastly, we advance the pointer by the size of the count (8 bytes as mentioned above) and make that our buffer pointer.

for (index, point) in points.enumerate() {
  let ptr = buffer + (index * 2)

  // Store the point values.
  ptr.memory = Float32(point.x)
  ptr.advancedBy(1).memory = Float32(point.y)
}

Next we enumerate the points. We can calculate the appropriate location in the buffer by performing arithmetic on the UnsafeMutablePointer itself. The key is that unsafe pointers in Swift know the size of the type they are working with and all adjustments are made in terms of the size of that type, not in bytes! (A Void pointer does work in terms of bytes since it knows nothing about the size of the type).

So by adding index * 2 to the base pointer we get the next location for the pair of floats. Then we assign the point's values to those memory locations in the buffer.

I used ptr.advancedBy() last because I didn't need to keep that pointer around and didn't want to make ptr mutable. That's just my personal preference. You are free to use + or advancedBy, they do the same thing.

return NSData(
  bytesNoCopy: rawMemory, 
  length: bufferLength, 
  deallocator: { (ptr, length) in
    ptr.destroy(length)
    ptr.dealloc(length)
})

Last but not least we return the data to our caller. We already allocated an appropriate buffer so we use the bytesNoCopy initializer and pass in the appropriate length (in bytes) and a deallocator function.

Why pass in a deallocator function? Technically you might be able to get away with NSData(bytesNoCopy:length:freeWhenDone:) but that would just be an accident of implementation. If the Swift runtime did (or does) use a different allocator than the default system malloc/free then You're Gonna Have a Bad Time.

If we happened to be holding a more complex Swift type in our buffer then proper deallocation is mandatory: you must call ptr.destroy(count) to perform any cleanup required by reference types, indirect enumeration cases, and so on or the code will leak. In this case we know Float32 and Int64 are inert bags of bits but for technical correctness I included the destroy call anyway.

decoding

guard
  data.bytes != nil &&
    data.length > (_sizeOfCount + _sizeOfPair)
  else { return [] }

First we check that the bytes pointer in our NSData is not nil and is at least big enough to have the Int64 size and one pair of points. This makes things easier on us down below so we don't need to do as much safety checking.

let rawMemory = data.bytes
let buffer = rawMemory + _sizeOfCount

// Extract the point count as an Int64
let pointCount64 = UnsafePointer<Int64>(rawMemory).memory

precondition(
  Int64(MaxPoints) < Int64(Int32.max),
  "MaxPoints would overflow on 32-bit platforms")
precondition(
  pointCount64 > 0 && pointCount64 < Int64(MaxPoints),
  "Invalid pointCount = \(pointCount64)")

let pointCount = Int(pointCount64)

We go ahead and setup our pointers. Again we cast the raw pointer to an Int64 pointer, only this time we use a non-mutable pointer because we're just reading from it.

I am leaving the count as a 64-bit value at first to ensure my checks against Int32.max won't overflow or underflow; This is a source of much undefined behavior in C and many terribly broken overflow checks like if(value + x > INT_MAX). If you stop and think about it for a minute, how can the computer represent value + x if it is already bigger than the maximum integer? The answer is of course that it can't and the value will wrap around to become negative. What happens when that huge negative value is used for something important (like malloc or is_admin()) is an exercise left to the reader.

The last line converts the point count to Int. On a 32-bit platform where the value is >Int32.max we would trap and die (which is preferable to corruption). Swift is a lot safer here than C - we can't accidentally get overflow or underflow. The program will trap and die at runtime if that happened, but it is always nice to provide more specific error messages before a process dies so that's what we're doing here.

It is definitely possible to structure this code such that it supports >4GB arrays of points (or more than 4 billion points) on 64-bit platforms but that is entirely unnecessary for my purposes so I am just enforcing a hard limit. This also prevents me from accidentally creating values on a 64-bit system that can't be loaded on a 32-bit one (though that's entirely theoretical since all my uses for this will be far smaller).

var points: [CGPoint] = []
points.reserveCapacity(pointCount)

for ptr in (0..<pointCount).map({
  UnsafePointer<Float32>(buffer) + (2 * $0)
}) {
  points.append(
    CGPoint(
      x: CGFloat(ptr.memory),
      y: CGFloat(ptr.advancedBy(1).memory))
  )
}

return points

This is fairly straightforward. We reserve capacity in the array to avoid reallocation. It wouldn't be a huge performance hit but we already know the size so why not?

Again, the pointer is a Float32 pointer so Swift knows the size of the type. We just multiply our index by 2 to account for the points being in pairs and read the floats from memory.

A Word on Testing

It should go without saying but a type like this should definitely be run with Address Sanitizer to help catch any memory misuse issues and should definitely be heavily code reviewed before putting it into production (AFL fuzzing would also be handy to expose any problems).

I'm never 100% certain any of my threading or memory related code is free of bugs. I'm not even 100% sure this post is free of bugs. I've tested it with Address Sanitizer and Leaks without finding any problems but I firmly believe a good programmer should never be overconfident. Always be on the lookout for possible errors or mistakes (and if you see any in this post please leave a comment and let me know!)

No one is too good to write a buffer overflow and that includes you.

Conclusion

The Swift compiler is always concerned about safety, but it's also pretty chill. If you promise you're not doing anything naughty, it will totally trust you. If you have a need to do some byte manipulation or wrangle void pointers just pull up a .swift file and have at it.

Final Implementation

I've embedded a gist below with the final implementation and more comments. Feel free to use it if you find a use for it.