After my recent experiment with a Metal framework GPU based particle system, I've taken the code a step further and managed to get a million particle system running at 20fps (or over 30fps if I disable the glow composite shader) on an iPad Air 2. Actually, because the new technique requires power-of-two length datasets, I've actually got a 1,048,576 particle system, but what's forty eight thousand between friends?
Here's what a million red, green and blue particles look like. This is a realtime, unadulterated screen recording from my iPad Air 2:
The technique I've used comes from this amazingly understated blog post from memkite.com. In it, Amund Tveit discusses a way to share data between the CPU and GPU. Using this technique, I no longer write back the particle data from Metal to Swift which gives a significant speed improvement.
In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:
let particleCount: Int = 1048576
var particlesMemory:UnsafeMutablePointer<Void> = nil
let alignment:UInt = 0x4000
let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))
var particlesVoidPtr: COpaquePointer!
var particlesParticlePtr: UnsafeMutablePointer<Particle>!
var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!
When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:
posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)
particlesVoidPtr = COpaquePointer(particlesMemory)
particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)
particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)
The loop to populate the particles is slightly different - I now loop over the buffer pointer:
for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex
{
[...]
let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)
particlesParticleBufferPtr[index] = particle
}
Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:
let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),
options: nil, deallocator: nil)
commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)
commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)
...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:
particlesVoidPtr = COpaquePointer(particlesMemory)
particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)
particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)
For a better explanation, I'd suggest a look at the original memkite.com blog post.
I've made a new branch that uses this technique which you can access here. The original branch that uses a simple array is still available to compare and contrast.
Incredibly, this simulation runs at almost 17fps on my iPhone 6 and shows the potential of the Metal Framework combined with Swift not just for games but for some pretty serious simulation work.