I was quite please with my recent Swift and Metal experiment - managing to get my iPad to calculate and render over 4,000,000 particles at 25 frames per second. However, there's always room for improvement and I looked at extending my Particle struct to contain the definition of four particles - that way, I'd only need to pass in half a million items for Metal to calculate four million particles.
This worked well, and my iPad could now top 30fps with four million particles. However, Amund Tveit at Memkite suggested replacing my new Particle struct, which looked like this...
struct Particle
{
var positionX: Float = 0
var positionY: Float = 0
var velocityX: Float = 0
var velocityY: Float = 0
var positionBX: Float = 0
var positionBY: Float = 0
var velocityBX: Float = 0
var velocityBY: Float = 0
var positionCX: Float = 0
var positionCY: Float = 0
var velocityCX: Float = 0
var velocityCY: Float = 0
var positionDX: Float = 0
var positionDY: Float = 0
var velocityDX: Float = 0
var velocityDY: Float = 0
}
...with a float4x4 data type which is a matrix of 16 floats. I took a quick trip to Metal By Example to get a hint on how to implement the Swift side of a float4x4: my Particle type is now a struct of four Vector4 types each of which has four floats:
struct Particle // Matrix4x4
{
var X: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0)
var Y: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0)
var Z: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0)
var W: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0)
}
struct Vector4
{
var x: Float32 = 0
var y: Float32 = 0
var z: Float32 = 0
var w: Float32 = 0
}
Now, for example, the Z property of a particle instance refers to its third particle and Z's x, y, z and w refer to its x position, y position, x velocity and y velocity respectively.
Over in the Metal shader code, I can access each particle by a subscript and the particle properties use the same x, y, z and w. So, to get the particle position of the third particle, I use this code:
constfloat2 particlePositionCFloat(inParticle[2].x, inParticle[2].y);
Inside Metal, I create a new particle by creation four float4 instances - for each particle:
constfloat4 particleA = {
inParticle[0].x + inParticle[0].z,
inParticle[0].y + inParticle[0].w,
(inParticle[0].z * 0.998) + ((inGravityWell[0].x - inParticle[0].x) * factor) + ((inGravityWell[1].x - inParticle[0].x) * factorTwo),
(inParticle[0].w * 0.998) + ((inGravityWell[0].y - inParticle[0].y) * factor) + ((inGravityWell[1].y - inParticle[0].y) * factorTwo)
};
...and so on for the other three, and then use those four float4 instances to populate a float4x4:
outParticles[id] = float4x4 (particleA, particleB, particleC, particleD);
The net result is astounding performance - 4,000,000 particles orbiting two gravity wells at around 40 - 45 frames per second!
The latest source code for this is available at my GitHub repo here. I challenge you to improve on that performance - 8,000,000 particles at 60 frames per second would be amazing!
Thanks again to Amund for the great suggestions!