Cross-process Rendering
Surfaces save us
The proper way to render across processes on Apple platforms is to use an IOSurface
.
Though the headers don't declare it,
IOSurface
andIOSurfaceRef
are toll-free bridged. If calling an API that takesIOSurfaceRef
from Swift, useunsafeBitCast(surface, to: IOSurfaceRef.self)
An IOSurface is a kernel-managed chunk of texture memory that can be paged on or off the GPU automatically and shared across processes. When shared across processes no data is copied. In fact if you render into a surface from Metal shader the pixels will stay on the GPU (a lie, see below). Then your application can display the surface but the pixels still stay on the GPU. In fact nothing gets copied anywhere, not even when your UI application submits updates to render server. This makes IOSurface
very efficient.
The lie: On macOS the system may decide it needs more GPU memory for something else. If it does, the surface may get paged off the GPU. That is completely transparent to you, so you can effectively ignore it most of the time. This only applies to discrete macOS GPUs. For integrated GPUs and all other platforms memory is unified so there is no paging.
To create a surface:
let props: [IOSurfacePropertyKey : Any] = [
.width: 1024,
.height: 768,
.pixelFormat: kCMPixelFormat_32BGRA,
.bytesPerElement: 4
]
let surface = IOSurface(properties: props)
If you don't specify the allocation size or bytes per row those values will be calculated automatically from the dimensions and pixel format. Be sure your bytesPerElement
matches your pixel format.
To pass a surface to another process use IOSurfaceCreateXPCObject
or IOSurfaceCreateMachPort
. For the mach port, make sure you clean it up with mach_port_deallocate()
on both sides since a live XPC object or mach port keeps the surface alive too. You'll end up leaking surfaces until your process dies. (Technically the sending side on mach could use a move send but using that correctly in the face of errors is non-trivial so I don't recommend it).
Metal
For Metal textures, create the surface with the correct parameters then use device.newTexture(descriptor:iosurface:plane:)
to create the texture from the surface. This can be used to have an XPC service that does the rendering while displaying the results in the main application (this is mostly relevant for macOS).
If you are using NSXPCConnection
you can use MTLSharedTextureHandle
instead.
Display
To display the content of an IOSurface
create a CALayer
and set its contents property to the surface. If you are rendering live frames you may need to tickle the layer to let the render server know the contents of the surface have changed (eg via setting the property again or calling setNeedsDisplay()
). The background process would send you a "frame updated" message each frame, but there is no need to send the surface again.
CPU Access
If you want to read or write the surface from CPU code you need to "lock" it so the system can copy data back to system RAM if required, then copy your mutations back to GPU memory if you locked for writing. For example:
surface.lock(options: .readOnly, seed: nil)
defer { surface.unlock(options: .readOnly, seed: nil) }
// Use surface.baseAddress to read the pixel data
// Make sure to step by bytesPerRow for each row
You must pass the same options to the lock/unlock calls or Bad Things™️ will happen. Note that there is no public API to issue a fence with the lock so it is up to you to synchronize with any GPU rendering, for example if you want to read back the results of a Metal command add a completion handler to the command buffer. Once that is called the contents of the IOSurface on the GPU are correct, then issuing the lock ensures that is synchronized to CPU memory (if needed) and you are clear to read.
Always use .readOnly
if you are just reading, otherwise you pay the cost of copying the texture back to the GPU for no reason.
Locking a surface is not a synchronization mechanism! There is no kernel lock held for the duration, so if you schedule another Metal job against the surface while it is "locked" or attempt to obtain the lock from another process those operations will proceed in parallel. It is up to you to make sure you coordinate exclusive access via some other mechanism. Touching a surface from the CPU is not a recipe for high performance so it should be avoided when possible. The one exception is to initialize the surface with the contents of an asset. Note that setting a CALayer.contents
property to a surface doesn't trigger a read. The surface is sent to the rendering server which then composites on the GPU so the pixels aren't copied.
You probably don't need the seed
parameter; it gets incremented each time the surface is updated. You can get it much cheaper with the same-named property. See the details in the header if you want more information.
This blog represents my own personal opinion and is not endorsed by my employer.