1,000,000 Particles on an iPad!

After my recent experiment with a Metal framework GPU based particle system, I've taken the code a step further and managed to get a million particle system running at 20fps (or over 30fps if I disable the glow composite shader) on an iPad Air 2. Actually, because the new technique requires power-of-two length datasets, I've actually got a 1,048,576 particle system, but what's forty eight thousand between friends?

Here's what a million red, green and blue particles look like. This is a realtime, unadulterated screen recording from my iPad Air 2:

The technique I've used comes from this amazingly understated blog post from memkite.com. In it, Amund Tveit discusses a way to share data between the CPU and GPU. Using this technique, I no longer write back the particle data from Metal to Swift which gives a significant speed improvement.

In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:

    let particleCount: Int = 1048576

    var particlesMemory:UnsafeMutablePointer<Void> = nil

    let alignment:UInt = 0x4000

    let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))

    var particlesVoidPtr: COpaquePointer!

    var particlesParticlePtr: UnsafeMutablePointer<Particle>!

    var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!

When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:

        posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)

        particlesVoidPtr = COpaquePointer(particlesMemory)

        particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)

        particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

The loop to populate the particles is slightly different - I now loop over the buffer pointer:

        for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex

{

            [...]

            let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)

            particlesParticleBufferPtr[index] = particle

}

Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:

        let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),

            options: nil, deallocator: nil)

        commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)

        commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)

...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:

        particlesVoidPtr = COpaquePointer(particlesMemory)

        particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)

        particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

For a better explanation, I'd suggest a look at the original memkite.com blog post.

I've made a new branch that uses this technique which you can access here. The original branch that uses a simple array is still available to compare and contrast.

Incredibly, this simulation runs at almost 17fps on my iPhone 6 and shows the potential of the Metal Framework combined with Swift not just for games but for some pretty serious simulation work.

Latest Images

Trending Articles

Latest Images