Flash to iOS Performance Tests
Last week, I sat down to do some intense Flash to iOS Performance tests to get an impression what the best approaches for porting Save the Maidens to iOS would be. Besides testing out a couple of tweaks that were supposed to improve iOS performance, I wanted first hand proof whether pure blitting actually was the holy performance grail it is proclaimed to be, when it comes to porting animations.
As described below, I've come up with another approach in this years spring which works with bitmaps as well but instead of copying pixels around, simply slices up the initial sprite sheet containing all animation states and assigns the sliced bitmapDatas to bitmaps on stage.
I needed proof whether this was worth anything compared to blitting, since plenty of developers including some on the Adobe front promote blitting techniques as the one true solution for an acceptable mobile performance.
But before we dive into the test results, let's take a closer look at the test setup and various techniques that I was planning to sound out and compare. For those of you aching to learn about the final results, I can already tell you at this point: get ready for some surprises! Here's a quick index of the content of this post:
- The Setup
- Animation Techniques
- Object Pooling
- Test #1 Blitting vs. Assigning
- Test #2 POT Dimensions
- Test #3 Collision Detection
- Test #4 Up Scaling
- Test #5 No Sprite Wrappers
- Test #6 Rotation
- Test #7 Ad Hoc Version
- Final Conclusions
The Setup
I used the following setup for my tests:
- Device: iPad 1st generation
- iOS version: 4.3.5
- Compiled with: Flex SDK 4.5.1
- Packaged with: AIR SDK 3.0
- Build type: ipa-test
- IDE: FDT 4.3
- Target FPS: 60
The app will be aiming to deliver 60 FPS, while I'm increasing the load by ...
- ... adding one animated object every 10 frames to a maximum of 100 objects. All objects run the same animation and are being moved 1 pixel to the right on every frame.
- ... eventually adding collision detection between all objects on stage.
Once all 100 objects are on stage, moving (and detecting collisions) I'm going to measure the "final" FPS the app still delivers.
Animation Techniques
Now, that we have the setup, there are the various techniques that are up for testing and comparison:
Frame Blitting ...
I guess, everybody has already heard of "Blitting" as a technique for displaying animations: The basic concept behind it is having a large (probably screen filling) bitmap into which the pixels of the current animation frame of each animated object are being copied directly from the sprite sheet:
bitmap.bitmapData.copyPixels(frame.bitmapData, ...);
A sprite sheet (or tile sheet or animation sheet) is a bitmap containing all frames of all animations an object has. In "the old days" the sizes of those bitmaps as well as those of each animation frame used to be a power of 2 (e.g. 4, 8, 16, 32, 64 etc.) because it would be the least hindering way for computers to work with them. As a negative side effect, though, the bitmaps would naturally increase in size to match the next highest power of 2 (POT) dimension.
In this setup, I'm going to test both: bitmaps and frames with POT based dimensions and such with even but not necessarily POT based dimensions.
In my blitting setup, I wrote an object that basically uses as little as a rectangle to keep track of it's own position and dimensions along with information of it's current animation frame position and size on the sprite sheet.
In a global enter frame loop I use this information to move all objects and copy the pixels of their current frames into one large bitmap on stage.
... vs. Frame Assignment
As an alternative to blitting, I developed a technique that works with bitmaps wrapped into Flash Sprite objects (UPDATE: wrappers removed in Test #5 and #6 to improve speed). I called these objects BitmapClips, since they kind of work like Bitmap based MovieClips.
The original animation sprite sheet is being sliced up into frames (BitmapDatas) and piled up in arrays by a processor which afterwards provides the clip instances with ready-made animations.
In contrast to my blitting approach the clips are actually added to stage. While the Sprite instance is used for positioning and provides all functionality Flash has to offer, animation frames are being swapped by simply assigning the respective BitmapData to the Bitmap instance contained:
bitmap.bitmapData = frame.bitmapData;
So, instead of copying pixels around manually, all I do is set a pointer to another BitmapData instance in memory.
Considering the size of the DisplayList, this - of course - is not a very effective approach. However, I thought it might eventually consume less CPU and, thus, make a stand against blitting.
Advantages and Disadvantages
There are some advantages and disadvantages I already see for either technique, though, without having actually tested it and Iguess, they are worth considering even before looking at the performance:
- Assigning frames is rather simple to implement and easily combined with game object implementations, but produces overhead with the bitmap and sprite instances each object consists of. Plus, the process is not transparent from the point when you're assigning the bitmap data.
- Copying pixels is fast, supposedly, since you're dodging the Display List. But a good blitting system is complex and can easily waste precious CPU performance when not well elaborated and tested in detail. Plus, you're facing issues like depth sorting you wouldn't have using objects in the Display List.
So, I expect implementation time to be a crucial factor here, as well.
Object Pooling
I've heard from various people experimenting with Flash to iOS portations, that Object Pooling has positive effects, especially on in-game performance.
Object Pooling allows pre-generation and recycling of objects and, thus, avoids costly memory allocation during the game - widely known as a potential performance killer.
Using an object pool for animated game objects, one pre-generates the maximum amount of instances simultaneously displayed (estimate if unsure) and adds them to stage at a point in time, no ditches in performance are visible e.g. during the display of a splash screen. During the game itself objects are simply drawn from the pool and flushed back into it for recycling. They are never fully removed from the stage though, to avoid the negative effects on performance bound to this action.
Without object pooling, objects are created and attached to the stage on demand. In the following tests, I'll be using object pooling as a default. I ran tests with object pooling turned off, but couldn't find any remarkable performance drops. I guess that this test setup is probably not big enough to actually prove this concept. Also it completely lacks recycling objects.
I assume that it's wise to use object pooling, and I'll do so although I havn't really proven it's positive effects, yet. But that's ok for now. Let's head to the tests!
Test #1 Blitting vs. Assigning
In the following test I'm going to compare blitting vs. assigning animation frames.
In all of these tests, Object Pooling is turned on by default, generating all objects at app start and instantly adding them to the stage (not possible with blitting, since it works with one large bitmap instead of addable objects).
Both on-device render modes were tested: GPU and CPU based rendering.
At first, the sprite sheet and all animation frames are not power of 2 based but have even dimensions:
Technique: Blitting
Sheet & frame dims: no POT
Max FPS: 23 (GPU) 32 (CPU)
Final FPS: 19 (GPU) 23 (CPU)
Technique: Assigning
Sheet & frame dims: no POT
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 31 (GPU) 35 (CPU)
GPU: Stable at 60 FPS until ~75 objects. Rapid FPS loss afterwards.
CPU: Stable at 60 FPS until ~25 objects. Slow FPS loss afterwards.
Conclusions
A rather unexpected result: While blitting manages to deliver a mere 32 FPS in CPU mode even with as little as 1 object on stage, the alternative assigning technique manages to deliver full 60 FPS in both rendering modes.
The application performs worst when blitting in GPU mode and runs best in GPU mode as long as we're below ~75 objects
Obviously, blitting does not provide optimal results for running Flash ported sprite sheet based animations on the iOS platform. We achieve significantly better results assigning the animation frames to bitmap instances on stage.
Rendering in CPU mode seems to perform better than rendering in GPU mode - at least with up to 100 objects.
Test #2 POT dimensions
To rule out, that sprite sheet dimensions have a negative effect on the results from Test #1, I'm going to use a sprite sheet holding animation frames, which have dimensions based on the power of 2: every frame now has the smallest possible dimension of 128x128. Before they varied between either 80x90 or 80x120.
As a side effect, I had to increase the sprite sheet's dimensions to 512x1024 pixels and thus inflated the amount of pixels in use by 100% . Let's see how that affects performance:
Technique: Blitting
Sheet & frame dims: POT
Max FPS: 24 (GPU) 34 (CPU)
Final FPS: 16 (GPU) 19 (CPU)
Technique: Assigning
Sheet & frame dims: POT
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 18 (GPU) 27 (CPU)
GPU: Stable at 60 FPS until ~40 objects. Sudden drop from 40 FPS to 20 FPS at ~78 objects.
CPU: Stable at 60 FPS until ~20 objects. Slow FPS loss afterwards.
Conclusions
The performance is significantly worse than before - with either technique.
If POT based dimensions have a positive effect on processing animation frames, it is apparently overruled by the massive performance loss caused by processing the increased frame sizes.
So, from this point forth, we can safely ditch blitting in future tests and concentrate on the alternative technique: assigning animation frames.
Test #3 Collision detection
At this point, I'm going to bring in collision detection, continually testing all objects on stage against one another (no double testing), as their number grows.
With collision detection involved, I kinda expect the GPU to fail this one pretty bad.
Just to get an impression of what impact larger sprite sheet and animations frames have in each render mode, I used the POT dimensioned sheet again in a second test run:
Sheet & frame dims: no POT
Collision detection: hitTestObject
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 17 (GPU) 14 (CPU)
GPU: Stable at 60 FPS until ~55 objects. Drop below 30 FPS at~75 objects.
CPU: Stable at 60 FPS until ~20 objects. Drop below 30 FPS at ~55 objects.
Sheet & frame dims: POT
Collision detection: hitTestObject
Max FPS: 60 (GPU) 59 (CPU)
Final FPS: 17 (GPU) 12 (CPU)
GPU: Stable at 60 FPS until ~40 objects. Drop below 30 FPS at ~75 objects.
CPU: Drop below 30 FPS at ~45 objects.
Conclusions
As expected, the FPS go down massively, but what really surprises me is that the GPU manages to handle up to 55 objects with as much as 60 FPS before starting to give in. It even manages to deliver more FPS with 100 objects on stage - an object amount the CPU used to dominate.
What's more: using larger images with twice the pixels on average reduces the amount of objects that can be handled with 60 FPS only by 25%.
What's most interesting - or rather remarkable - is the fact that no matter the image size, the GPU delivers a solid 30 FPS until it reaches approx 75 objs. Then it drastically drops (see also Test #2).
Test #4 Up Scaling
Another tip I got from Marvin Blase aka @beautifycode is supposed to save precious app byte size and memory in use by identifying fast moving objects in your game and creating the sprite sheets for these objects half the size they're supposed to be displayed. Within the game these animations are then scaled up to 200% using the instances' scaleX and scaleY attributes.
The declared aim of the follwing test was, to identify the effects (positive and negative) of moving scaled up sprites and detecting collisions among them.
In the first run, I used a scaled down version of the compact sprite sheet with no POT dimensions. Afterwards I used a 50% reduced version of the POT sprite sheet as a comparison and to top things off, I even turned collision detection back on in the last test run:
Sheet & frame dims: no POT
Collision detection: off
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 31 (GPU) 22 (CPU)
Sheet & frame dims: POT
Collision detection: off
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 17 (GPU) 12 (CPU)
Sheet & frame dims: no POT
Collision detection: hitTestObject
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 19 (GPU) 12 (CPU)
GPU: Stable at 60 FPS until ~55 objects. Drop below 30 FPS at ~80 objects.
Observations
While the graphics looked rather blocky in CPU mode, they were slightly blurred on GPU, which made them look rather smooth. I believe very well, that when in fast motion, this doesn't make much of a difference to the unscaled visuals.
Conclusions
While the GPU seems unaffected by scaling up the images, the CPU seems to suffer tremendously and looses a full 13 frames compared to Test #1.
As expected, the results are even worse with the larger frame images.
What's interesting to see, though, is that with collision detection the GPU actually profits from this technique and makes an additional 2 FPS compared to Test #3 where we used the regularly sized images. Also the app seemed to be capable of displaying 5 more objects before dropping below the commonly used frame rate of 30FPS and achieved an actual 90 objects before falling below 25FPS.
Test #5 No Sprite Wrappers
With the bitmaps wrapped into Sprite containers, the above setup sure had improvement potential as Damian correctly pointed out in the comments. So today I followed that exact same TODO that I found in my comments ;) and removed the Sprite wrapper around each animation and, thus, flattened the Display List by 100 objects.
I basically ran a mixture of Test #1 and Test #3 with this (only assigning bitmapDatas and using no POT sprite sheets), to again see what a difference 100 Sprites can make and was suprised as I managed to squeeze out even more frames. But the CPU still didn't hold a candle to the performance on the GPU, so, to save some time, I started neglecting it afterwards.
I started with collision detection turned on, to directly compare the results to Test #1, then turned it and, eventually, even applied the scaling technique from Test #4. With the last test I started neglecting CPU comparisons since, so far, they always turned out worse:
Collision detection: off
Scaling technique: off
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 31 (GPU) 35 (CPU)
GPU: Stable at 60 FPS until ~75 objects.
CPU: Framerate dropping right away.
Collision detection: hitTestObject
Scaling technique: off
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 20 (GPU) 15 (CPU)
GPU: Stable at 60 FPS until ~50 objects. Drop below 30/25 FPS at ~80/~90 objects.
Collision detection: hitTestObject
Scaling technique: on
Max FPS: 60 (GPU)
Final FPS: 19 (GPU)
Stable at 60 FPS until ~50 objects. Drop below 30/25 FPS at ~78/~87 objects.
Conclusions
This is odd: 100 missing Sprite wrappers seem to have no impact at all as long as there is no collision detection in play. The results are the exact same one we received in Test #1.
On the other hand, when collision detection comes into play, the missing wrappers squeeze out one more frame on the CPU and 3 frames on the GPU lifting the frame rate up to a magnificent 20. That's huge!
The details show, that we're able to animate, move and hitTest around 80 objects on stage before passing the crucial frame rate of 30 which is 5 more than with the wrappers around the bitmaps.
90 objects at 25 FPS might actually be something we can work with in most games considering we'll be having large background graphics and other elements that might lower the frame rate some more. Great!
Sadly, the scaling method does not seem to profit from the missing Sprite wrappers, whysoever.
Test #6 Rotation
Rotation always comes into play at some point in a game, so I wanted to check this as well. So, in the following test, I rotate each object by one degree per frame. Note that this test also runs without any Sprite wrappers which makes the numbers a little hard to compare against Tests #1 to #4 but the important comparison is against Test #5 anyhow, so I think that's alright.
First, I let all 100 objects rotate with collision detection turned off. Afterwards I turned smoothing on, which, by default, is set to false in my setup. In the last run, turn collision detection on and smoothing off again to be able to compare the results to the first run:
Collision detection: off
Smoothing: false
Max FPS: 60 (GPU)
Final FPS: 31 (GPU)
Drop below 60 FPS at ~80 objects.
Collision detection: off
Smoothing: true
Max FPS: 60 (GPU)
Final FPS: 31 (GPU)
Drop below 60 FPS at ~80 objects.
Collision detection: hitTestObject
Smoothing: false
Max FPS: 60 (GPU)
Final FPS: 17 (GPU)
Stable at 60 FPS until ~50 objects. Drop below 30/25 FPS at ~78/~85 objects.
Observations
Setting smoothing to true appears to have no impact whatsoever on the result - not only in terms of FPS but also visually. The graphics look smoothed in either setup, something that apparently comes naturally when running in GPU render mode and also responsible for the blurred graphics in Test #4.
Conclusions
While smoothing appears to have no effect on neigher the frame rate nor the visual results, the GPU handles the rotation rather effortlessly. It loses 3 frames in comparison to the improved BitmapData assigment results from Test #5. What seems a little odd, though, is that it seems to be capable of handling more rotating clips at the same speed than clips that are not. I guess, this is due to the higher amount of overlapping pixels in the runs with rotation, which results in fewer pixels changing per frame. It's the only explanation I can imagine at this point.
Test #7 Ad Hoc Version
As a final test, I decided to create an ad hoc version from the ones that delivered the best results, which were the ones from Test #5 using the assign technique without scaling. As before, I turned on collision detection in the second run:
Collision detection: off
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 31 (GPU) 36 (CPU)
GPU: Stable at 60 FPS until ~70 objects. Significant FPS drop (around 20) at ~85 objects.
CPU: Stable at 60 FPS until ~20 objects.
Collision detection: onHitTest
Max FPS: 60 (GPU) 60 (CPU)
Final FPS: 24 (GPU) 17 (CPU)
GPU: Stable at 60 FPS until ~60 objects. Drop below 30/25 FPS at ~90/~98 objects.
Conclusions
Going from "test" to "ad hoc" version, has blessed us with another 4 frames. But the performance gain seems to kick in mainly when there is more involved than just mere display of animations.
I tested blitting as well and got pretty much the same disappointing numbers as with the test version, which is why I didn't mention it explicitly here.
However, we've now received even better results than before running 100 objects all hit testing one another at marvellous 24 FPS. I guess these numbers are assuring enough to finally get me started on porting Save the Maidens.
[/UPDATE]
Final Conclusions
So, after all this testing, I think I can safely sum up the results into the following rules and guidance tips when developing games for iOS:
- Forget about blitting - assign sliced BitmapDatas instead.
- Forget about power of 2 dimensions - pack your sprite sheets tightly and save pixels (keep even dimensions).
- GPU render mode works best for most setups.
- Exception: no collision detection /no scaling involved: CPU mode may work better.
- UPDATE: Keep the Display List flat: 100 Sprite wrappers made a difference of 3 FPS on the GPU.
- UPDATE: Forget about smoothing Bitmaps - it makes no difference on the GPU.
- UPDATE: Test with release (ad hoc) versions early. There's hidden performance in there.
- Try scaling down sprite sheets of fast objects by half and scale them up in code again (GPU only).
- Object Pooling may have positive effects on the large scale or long run
Well, at least these results apply for 1gen iPads. Today, I received shipping information about my brand new iPhone 4S and, I guess,once I find the time, I'm gonna rerun some of these tests on it as well as on my very old iPhone 3G (no S ;)) and then update this post.
I hope these results help you plan your first or next Flash to iOS portation a bit or at least they save you some time finding the right setup for your project.
If have other or additional findings or found flaws in any of my setup, I'd be happy to read about them in the comments so we can all learn and improve.

October 24th, 2011 - 21:09
Hey Michel, nice post! Some questions:
Why do you add your Bitmaps to Sprites in order to move them around? It’s an extra depth that seems unnecessary.
If I understood how you’re working your Object pooling, you adding all of the objects to the stage? Surely that affects the test? Even if they’re not visible, they’re still taking up resources (though minimal).
Can you post how you’re doing your blitting for comparison? Are you using lock() and unlock()?
November 8th, 2011 - 21:11
There’s a couple reasons to wrap the bitmap’s in sprites. For one thing, sprite’s support mouseEnabled, which bitmap’s do not. Also, by wrapping in a sprite, you can do cool stuff, like upscale the texture when making your draw call, and then sizing the bitmap back down. This way, you end up with a Bitmap-Sprite that can be scaled up without looking pixelated.
I show this in my second example here:
http://esdot.ca/site/2011/fast-rendering-in-air-3-0-ios-android
October 24th, 2011 - 22:08
Hey Damien, thanks for asking:
You’re right questioning Bitmaps wrapped into Sprites. Primarily, I did it to easily implement registration points within the animation, which is a common need. This way, the Sprites holds the actual position of the object, while the Bitmap’s x and y attributes are used to correct the Bitmaps position according to the reg point. Of course, some math could help get rid of the Sprite here, flatten the DisplayList and give us a raw animation element – probably a cleaner solution. But so far I’ve been taking advantages of the DisplayObjectContainer functionality it provided me with including the various events a Bitmap doesn’t throw. But I’ll sure improve the system in the next days hunting for more performance ;)
As for the object pooling, you’re right again. Be this as it may, the effect of allocating new memory and adding/removing objects while the game is running causes higher peaks in CPU usage than having taken care of this in advance. The constant occupation of memory appears to be a better trade off than constantly allocating and freeing it.
I wrote a class that was supposed to take care of blitting as a central unit. Apart from the fact that I’m not very satisfied with it’s structure it basically did what you can find in pretty much every blitting tutorial out there (e.g. The ones from Adobe for Flash Builder).
Interesting part: I first forgot to use lock and unlock and then added them today and ran the tests again – without significant improvement, if any. While I remember the heavy impact on my Android test app, it doesn’t seem to make a difference on iOS.
I’m a little irritated about this myself, I have to admit. Might double check my code again tomorrow.
October 25th, 2011 - 11:34
Dude… We need to discuss this over beers at gotoAndSki this year. I’ve done similar tests for my Tower Defense game and I’ve come to similar conclusions. Been playing with Starling in hopes that it’d rectify the situation. I assume you have too?
J
October 25th, 2011 - 12:23
I have rather mixed results too. I tested for both Android and iOS and they contradict each other in a way.
My tests also involved blitting and frame assignment as you described. In my scenario blitting behaves way better on both platforms, weirdly enough on Android CPU is faster and on iOS GPU is faster. (Nexus One and iPad 1st Gen). Something that really helped was changing the StageQuality and doing a release build, specially on iOS. Even standard compilation has about 8-10 frames less than release compilation.
I still have to continue with my tests since my scenario was not particularly good for it. I made a tile based game on AIR desktop/web and then tried to deploy it on Android/iOS, that’s why I have around 380 objects, around 10 of them animated and continuously moving. I didn’t try the POT test cause I refused to believe to it will have that much impact (plus no time :P).
November 8th, 2011 - 21:08
If blitting is faster, you’re doing something wrong. GPU is far faster, even on Nexus One’s, or other Android devices with shity GPU’s.
Make sure you’re sharing a single bitmapData instance among all bitmap’s of the same type.
October 25th, 2011 - 17:03
@Jensa
Heya :) I’m really surprised that blitting delivers these aweful results. I havn’t tried Starling, yet, but had my eye on it. But I guess we’ll have to wait some time until we’ll see Stage3D on mobiles, ey?
Weyert pointed me to the ND2D engine today and it looks promising once Stage3D will be available. Check out the recent performance tests at http://www.nulldesign.de/ . Is Starling worth spending time with?
I’m actually thinking a talk on this topic at gotoAndSki would spark a rather lively discussion, don’t you think?
@Joe
I remember blitting to be the kick-ass way to do things during my Android tests in April – sadly I didn’t blog about them. But right now, I just can’t get to speed with it on iOS. Do you have any numbers that depict the differences in performance between your blitting and bitmap assignment?
Problem is that the reasons for the differences in our setups could be anywhere: compiler commands, code, of course, app descriptor?
Is there a way we can compare things? Robbert should have my email. Maybe we can get in touch and sort things out a bit.
[edit]I checked setting StageQuality to low and it didn’t seem to have an impact on my setup. Might make a difference with larger apps though. And thanks for the hint with the iOS release. Will take the time to investigate it. 8-10 additional frames sounds like a dream![/edit]
October 25th, 2011 - 18:39
Are U sure that your flex sdk has -swf-version=13 set?
October 25th, 2011 - 18:47
You mean set as a compile option when compiling the SWF? I tried to add it earlier today and it failed my build. What kind of impact do you expect from this?
[update]I’m really running into issues with this: due to my current setup, I’ve been compiling my SWF using the 4.5.0 SDK and then packaged it using Flex SDK 4.5.1 and AIR 3.0. Everything worked well as you can see from the tests.
Now, that I tried to set -swf-version=13 I realised, I was using an old Flex SDK. Trying to change that, and using 4.5.1 instead, I ran in serious issues with my app. Having compiled the same code as before, the whole thing now crashes right away. Feels like the whole thing is incompatible with compiling for FP 10.2 and above. I tried to investigate without success so far. Maybe I’ll find out more tomorrow…Hopefully.[/update]
October 26th, 2011 - 09:37
Small Update: I cleaned up my ANT scripts and managed to recompile everything with Flex SDK 4.5.1 and with the compile command -swf-version=13. I ran Test #1 again to see whether it had any effect on performance and it appears not to.
October 28th, 2011 - 09:37
Dude! Great article and outstanding job on the tests!
November 2nd, 2011 - 18:27
You left out the newest render mode under AIR 3.0: direct. It uses the CPU to render vectors and GPU to blit. With direct mode you can have things like filters and anti-aliased text, whereas GPU mode precludes those possibilities.
November 2nd, 2011 - 18:38
Hi Ben,
thanks for the heads up! Although, I’m not sure it’ll have a large effect on the results from the above test setup, since I don’t use any vectors and completely focus on bitmaps. I imagine this mode to run well on native AIR (on Android) but am unsure if it has the expected effects on iOS, cause in another setup I was running filters in CPU mode on the iPad and they didn’t show, so why would they in direct mode?
Either way, I guess it’s best to run some tests. I’ll do that tomorrow, quickly, and see if it affects the final results in any way.
[update]I just gave it a quick shot and it appears that the results are similar to those achieved with CPU render mode… rather unsatisfying. But it’s good to keep in mind for the future.[/update]
November 8th, 2011 - 21:07
Direct mode is slightly better than CPU mode with this technique, and far slower than GPU. See here:
http://esdot.ca/site/2011/comparing-advanced-transforms-in-gpu-mode-vs-cpu-mode
November 3rd, 2011 - 04:31
you should compare regular movieclips with a bitmap on each frame. I suspect swapping bitmapdatas still wins, but not by a huge amount in gpu mode
November 3rd, 2011 - 16:33
Hi Mike,
I believe that MCs will result in a performance loss, considering, that ditching Sprite wrappers already made a difference of 3 FPS. I expect the performance with MCs to be even worse than with Sprites, not to mention the overhead bitmap instances in every MC frame… In the end, every little extra performance matters :)
November 8th, 2011 - 21:06
Nice post! I did a similar test last week and came to the same conclusions:
http://esdot.ca/site/2011/fast-rendering-in-air-3-0-ios-android
I tested on a few different devices as well, and the performance gains hold throughout them all :)
November 8th, 2011 - 21:07
Hi,
Thanks for the great tests! I’ve run similar test also and I can agree with you in most of the cases.
My final prototype animation is built from separate bitmaps that are animated using bitmapdata assignment from the vector object. With this setup I achieved the best performance at least on my animation.
There is one weird thing that I have noticed. Behind the animation I’ve used just a static bitmap background. If I generate that bitmap on runtime or use embedded bitmap, there’s a significant FPS drop, but if I take the bitmap from the SWC file (bitmap inside a movieclip, used as a Sprite), there are no performance problems.
So the question is, why is that lighter for iOS to render? I’ve tried testing methods like cacheAsBitmap etc. with that bitmap which is created on runtime.
November 8th, 2011 - 23:25
Hi Michel, thanks for posting this, good to see that it reflects the results I was getting in my tests: http://www.codeandvisual.com/2011/flash-on-iphone-better-than-blitting-real-world-performance-results/ , however it seems that you’re getting even better performance which is encouraging.
Your list of conclusions at the end were exactly the same as mine, including the fact that power of 2 textures tend to provide no benefits and as a result use more precious RAM due to precaching. Currently with the game I am working on RAM usage has become the biggest bottleneck so it’s important to keep an eye on how much you’re using and how much your device has to offer. If you push past the RAM limit you’ll often get a black screen crash which is good to be aware of.
From what I can tell, for every prechached piece of data stored in RAM you need the same again free in order to display it on stage. i.e. if your device has 50MB of RAM, you can only precache 25MB worth. Keep in mind that this is a very naive understanding from my point of view but the basic premise was that I could precache twice as much in RAM without a crash if I never displayed any of it on the display list. I’m sure there’s room for a much better understanding of this with some better comprehensive tests or understanding of the devices memory and rendering process.
November 9th, 2011 - 09:16
Hey James,
thanks for that piece of useful information about how much you can actually precache. That is very valuable input and I wasn’t quite there, yet, with my testing. I do expect more complications as I move along with my game. I guess adding background images might already cause performance issues.
And you’re probably right, there need to be more tests on RAM issues and stuff so we can start naming the problems. But for that, we’d need well thought through test setups, delivering real world values.
December 9th, 2011 - 09:41
Hi Michel,
Have you tested with the latest FLASH BUILDER4.6 and AIR3.1?
I use your Assigning Technique and choose the GPU mode to run my test. But it’s much more slowly than CPU mode.
I don’t know what’s wrong with my test or that may be the FLASH BUILDER4.6 and AIR3.1
Can you please send your test project to me, and I can run it to test
Thank you
December 9th, 2011 - 14:42
Hi Tim,
no I havn’t done that (see testing specs at beginning of article). I do not own Flash Builder 4.5/4.6 and I don’t plan on changing that. I use FDT instead and have an ANT script to compile with. It’s possible, things and performances will change over time as Adobe keeps working on the AIR SDK. But I’d be surprised if there would be an aprupt change in the results and all of a sudden one mode would win over the other.
Please note that, there there are setups in which the CPU wins over the GPU (Tests #1 and #2). However that changes dramatically once the number of objects reaches a certain amount (somewhere close behind 100) and/or collision detection gets involved. I pointed that out in the final conclusions as well. However, in most games you’ll have collision detection involved and so you’ll probably want to use the GPU.
Concerning the setup, I hate to disappoint, but it sure isn’t clean enough to be passed around, yet. I might open source the BitmapClip class somewhen in the near future, which sure is the most interesting part about it. With that you could write your own test setups. But, currently, I don’t see a time slot before 2012, since I want to document it appropriately.