Quantcast
Channel: FlexMonkey
Viewing all articles
Browse latest Browse all 257

Two Million Particles at 25 Frames Per Second on an iPad

$
0
0

Following on from my last post where I managed to calculate and render over 1,000,000 particles in realtime, I've done some pretty effective tweaking of the code to create an app that calculates and renders (with blur and trails) over 2,000,000 particles at around 25 frames per second on my iPad Air 2.

The main change is to reuse the compute shader not only to do the calculation and first render but also do the post-processing. 

In Swift, I set the thread groups and thread group count based on particleCount which is 221 or 2,097,152:

    particle_threadGroupCount = MTLSize(width:32,height:1,depth:1)
    particle_threadGroups = MTLSize(width:(particleCount + 31) / 32, height:1, depth:1)

Because my image 1,204 x 1,024 which is 1,048,576 pixels, I can reuse the kernel function to execute code on each pixel by converting the one dimensional thread_position_in_grid to a two dimension coordinate named textureCoordinate:

    constfloat imageWidth = 1024;
    uint2 textureCoordinate(fast::floor(id / imageWidth),id % int(imageWidth));

    if (textureCoordinate.x < imageWidth && textureCoordinate.y < imageWidth)
    {
        float4 outColor = inTexture.read(textureCoordinate);
        
        // do some work...
        
        outTexture.write(outColor, textureCoordinate);
    }

Having the single shader gave a significant speed improvement. Furthermore, because I'm now passing in a read access texture, I can composite the particles over each other which makes for a better looking render:

    const Particle inParticle = inParticles[id];
    constuint2 particlePosition(inParticle.positionX, inParticle.positionY);
    
    constint type = id % 3;
    
    constfloat3 thisColor = inTexture.read(particlePosition).rgb;

    constfloat4 outColor(thisColor.r + (type == 0 ? 0.15 : 0.0),
                          thisColor.g + (type == 1 ? 0.15 : 0.0),
                          thisColor.b + (type == 2 ? 0.15 : 0.0),

                          1.0);

One downside was that I was getting some artefacts  when reading and writing to the same texture. I've overcome this by using a ping-pong technique with two textures in the Swift code that toggle between being the input and output textures with each frame.

I use a flag Boolean to decide which texture to use:

        ifflag
        {
            commandEncoder.setTexture(particlesTexture_1, atIndex: 0)
            commandEncoder.setTexture(particlesTexture_2, atIndex: 1)
        }
        else
        {
            commandEncoder.setTexture(particlesTexture_2, atIndex: 0)
            commandEncoder.setTexture(particlesTexture_1, atIndex: 1)

        }

        [...]

        ifflag
        {
            particlesTexture_1.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0)
        }
        else
        {
            particlesTexture_2.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0)
        }

        flag = !flag

My last version of the code didn't write the image from Metal directly to the UIImageView component, rather, it used an intermediate UIImage instance. I found that by removing this variable could squeeze out an extra few frames per second. 

I've set the Metal optimisations to the maximum in the compiler settings and also prefixed my call to distance() with the fast namespace:

        constfloat dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(inGravityWell.positionX, inGravityWell.positionY));

For this demonstration, I've removed the touch handlers. There's one gravity well which orbits around the centre of the screen. It gives some nice effects while I plan how to productize my particle system.

All the source code for this project is available in my GitHib repository here.

Viewing all articles
Browse latest Browse all 257

Trending Articles