I've looked at using Metal kernel functions in the past for image processing. This post looks at them for a different purpose: calculating and rendering large particle systems.
Ordinarily, you may write a particle system where the particle logic (e.g. changing position and velocity) is executed on the CPU and then rendered on the GPU. Metal allows us to pass through an array of objects (e.g. particle value objects) and act upon individual array items in parallel much like a shader acts on individual pixels in an image. In fact, a Metal compute shader can do the particle maths and the rendering in one pass.
My MetalParticles project creates 250,000 particles which are all attracted towards a single gravity well. The user can touch the screen to move the gravity well and this change is illustrated with a transient grey circle.
Most of the hard work is done in my view controller and much of this code is borrowed from my reaction diffusion application. I have an array named particles that is populated with Particle structs:
struct Particle
{
var positionX: Float = 0
var positionY: Float = 0
var velocityX: Float = 0
var velocityY: Float = 0
}
After setting up Metal (take a look at this post for an introduction to compute shaders), I execute applyShader() in the background. In here, I pass in the particle array:
let particleVectorByteLength = particles.count*sizeofValue(particles[0])
var buffer: MTLBuffer = device.newBufferWithBytes(&particles, length: particleVectorByteLength, options: nil)
commandEncoder.setBuffer(buffer, offset: 0, atIndex: 0)
var inVectorBuffer = device.newBufferWithBytes(&particles, length: particleVectorByteLength, options: nil)
commandEncoder.setBuffer(inVectorBuffer, offset: 0, atIndex: 0)
...and create an empty array to receive the updated version of particles:
var resultdata = [Particle](count:particles.count, repeatedValue: Particle(positionX: 0, positionY: 0, velocityX: 0, velocityY: 0))
var outVectorBuffer = device.newBufferWithBytes(&resultdata, length: particleVectorByteLength, options: nil)
commandEncoder.setBuffer(outVectorBuffer, offset: 0, atIndex: 1)
I also pass in a blank texture to draw the rendered particles to:
let blankBitmapRawData = [UInt8](count: Int(640 * 640 * 4), repeatedValue: 0)
textureA.replaceRegion(self.region, mipmapLevel: 0, withBytes: blankBitmapRawData, bytesPerRow: Int(bytesPerRow))
commandEncoder.setTexture(textureA, atIndex: 0)
After the shader has run, I need to extract both the updated version of particles and the rendered texture. Both of these are done with getBytes():
var data = NSData(bytesNoCopy: outVectorBuffer.contents(),
length: particles.count*sizeof(Particle), freeWhenDone: false)
data.getBytes(&particles, length:particles.count * sizeof(Particle))
length: particles.count*sizeof(Particle), freeWhenDone: false)
data.getBytes(&particles, length:particles.count * sizeof(Particle))
textureA.getBytes(&imageBytes, bytesPerRow: Int(bytesPerRow), fromRegion: region, mipmapLevel: 0)
The kernel shader itself is pretty simple. The thread_position_in_grid gives me access to the index of the particle in the array, so I can read the current particle's properties, update the position and velocity and write the new values to the out vector. Using the particle's position, I can also write to the output texture.
Here, I'm doing some simple arithmetic to update the particle's velocity to simulate gravitational attraction and add a little drag:
kernelvoid particleRendererShader(texture2d<float, access::write> outTexture [[texture(0)]],
constdevice Particle *inParticle [[ buffer(0) ]],
device Particle *outParticle [[ buffer(1) ]],
constant Particle &inGravityWell [[ buffer(2) ]],
uint id [[thread_position_in_grid]])
{
constuint2 particlePosition(inParticle[id].positionX, inParticle[id].positionY);
const Particle thisParticle = inParticle[id];
constbool isEven = id % 2 == 0;
constfloat4 outColor(1.0, isEven ? 1.0 : 0.0 , isEven ? 0.0 : 1.0, 1.0);
constfloat distanceSquared = ((thisParticle.positionX - inGravityWell.positionX) * (thisParticle.positionX - inGravityWell.positionX)) + ((thisParticle.positionY - inGravityWell.positionY) * (thisParticle.positionY - inGravityWell.positionY));
constfloatdistance = distanceSquared < 1 ? 1 : sqrt(distanceSquared);
constfloat factor = (1 / distance) * (isEven ? 0.01 : 0.015);
float newVelocityX = (thisParticle.velocityX * 0.999) + (inGravityWell.positionX - thisParticle.positionX) * factor;
float newVelocityY = (thisParticle.velocityY * 0.999) + (inGravityWell.positionY - thisParticle.positionY) * factor;
outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX;
outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY;
outParticle[id].velocityX = newVelocityX;
outParticle[id].velocityY = newVelocityY;
outTexture.write(outColor, particlePosition);
}
Particles are treated differently depending on whether their index is even or odd: they have different colours and slightly different effective masses.
It's entirely possible to loop over the entire array inside this shader, so the particles could be mutually gravitationally attractive or a shader could be written for smoothed particle hydrodynamics. I think my next step will be to revisit swarm chemistry.
This particle system runs very smoothly on my iPad Air 2 and surprisingly well on my iPhone 6. Being able to pass huge arrays of any type of data into Metal opens up a whole world of opportunities for amazing applications.
Big thanks to memkite for this post discussing data parallel computing and, of course, to the great Metal By Example site for their many posts, especially their Introduction to Compute Programming in Metal.
Stop Press: Since I originally posted this, I've made a handful of changes to the code including adding an addition filter to effectively add a glow and composite the texture generated by the particleRendererShader() over the top of the previous frame.
I've also changed the code to support three particle types (which are rendered as red, green and blue) and bumped up the velocities of the particles. The new output is, IMHO, much more impressive:
All the source code is available at my GitHub repository here.