So for some time now I had a big performance issue in my game. Now if you don’t know, my game is a tiles based game. I wrote my engine to take a 2D array of integers and assign a 2d texture (image) to each integer.

So if the array contains number 1 it will draw grass texture (image) in that position. If the array contains number 2 it will draw a stone texture in that position and so on.

The array would look something like

 

1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
1,2,1,1,1,1,1,1,1,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,
1,2,1,1,1,1,1,1,1,2,1,1,2,1,1,1,1,2,2,2,2,2,2,2,1,
1,2,1,1,1,1,1,1,1,2,1,1,2,1,1,1,1,2,1,1,1,1,1,2,1,
1,2,1,1,1,1,1,1,1,2,1,1,2,1,1,1,1,2,1,1,1,1,1,2,1,
1,2,2,2,2,2,2,2,2,2,1,1,2,1,1,1,1,2,1,1,1,1,1,2,1,
1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1,1,2,2,2,2,1,
1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1,1,2,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1,1,2,1,1,1,1,
1,1,1,1,2,2,2,2,2,2,2,2,2,1,1,1,1,2,1,1,2,1,1,1,1,
1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,2,1,1,1,1,
1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,2,1,1,1,1,
1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,2,1,1,3,3,
1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,2,1,1,3,3,
1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,2,2,2,2,3,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,3,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,3,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1;

 
Now this is a 25X19 array which means my engine will have to draw 475 tiles (image) every frame. Now that doesn’t seem much but it’s still a problem. Now let’s say we have a 100X100 array. That means the CPU has to send 10,000 images (draw calls) to the GPU every frame. And this is a big problem since the CPU can’t send that much data to the GPU. So in other word my 3930K @ 4.5GHz in one thread can’t handle sending that much data to the GPU to process.

CPUs are just slow. Even the fastest CPU in the world is very slow.

So let’s say you have a screen resolution of 1280X720 and each tile you have is 32 pixels. And let’s also say you have a map or array size of 100X100. So 100 * 32 = 3200 pixels. So you will have a map size of 3200X3200 pixels. Now here is the question. Why draw the whole map when you are only seeing 1280X720 pixels of the actual map?

So before we were drawing 3200X3200 pixel (map size) and we were getting around 200fps. Now after only drawing the visible parts of the map which is 1280X720 we are getting ~1000 fps !!! Isn’t that great? 😀

But still if we look at our GPU usage, we will see that the GPU usage is only about 70%. Why not 100% ? Well we are still sending 880 draw calls from the CPU to the GPU. So we can see that the CPU is bottlenecking the GPU. Once again the CPU is very slow and can’t send that much data to the GPU.

How did I calculate that the CPU is sending 880 draw calls?
Well we take the screen resolution which in our case is 1280X720 pixels and divide it by the tile (image) size which is 32 pixels.

so 1280 / 32 = 40;
720 / 32 = 22.5 so let’s say 22;

40 * 22 = 880 tiles (images) or draw calls that need to be sent from the CPU to the GPU to render.

So today I came up with the best solution which will get you 100% GPU usage. But first you have to understand that the more textures (tiles, images, 3D objects) we try to send from the CPU to the GPU the lower our frame rate gets and the lower our GPU usage gets and the higher our CPU usage gets. That is because once again the CPU is very slow.

So why render 880 small images every frame when we can render one big image? Well for one if we want to change how our game level looks then we have to tell our artist / designer to draw the same level again with the same art but in a different way. And that will cost time. It’s just easier to take tiles (small images) and make our game level with them.


But why not use the best of both worlds. Let our artist create tiles (small images) and construct the game level with them but also have the advantage of having just one big image to send to the GPU. And that is exactly what I did.

In my engine I load the game level which is basically a 2D array and then the game will draw the array (map) once. Then the engine will take all of these images which are stored in the GPU buffer and store them in one big image. A texture 2D variable. Then we will take that one big image and send it to the GPU to render it instead of rendering 880 small images.

And that is how I did it. Before my fps was 1000fps now its 2100fps 😀