Image#dup doesn't need a :caching option as internally it involves a #splice over the whole image, which will cause TP to overwrite any junk data that may be in the cache. TexPlay.create_image may NOT need a refresh_cache, because really it's just a large blank (or colored with the :color option to create_image) filled rect over the entire image, so it too will overwrite any junk data. I think I can safely get rid of the refresh_cache in create_image(), that should make dynamic image creation/manipulation at runtime substantially faster.
Ok, there already is a #cached? method, but it's called Image#quad_cached? This is because gosu images (if they're < MAX_TEXTURE_SIZE x MAX_TEXTURE_SIZE) share territory on a single opengl texture; so you're actually checking if the entire opengl texture is cached. If you look at the prepare_image() method in texplay.rb, you'll see that #quad_cached? is already being used to determine if refresh_cache is necessary
http://github.com/banister/texplay/blob/master/lib/texplay.rb#L154 )
I wrote somewhere an explanation for why the refresh_cache appears in Gosu::Image.new, here it is:
"90% of the time you will be able to use TexPlay just fine without the automatic caching in the Gosu::Image.new method...however in certain rare situations (see below) TexPlay will not do what you want unless auto-caching is enabled.
The rare situation is when two or more Gosu images are stored internally on the same quad yet you invoke a TexPlay drawing method on one of those images BEFORE you've finished loading the other images.. In this situation all TexPlay manipulations will work fine for the images loaded prior to the first TexPlay drawing call (TexPlay lazily caches the quad on a drawing action if the quad is not already cached) but will produce weird results for all image manipulation on the images loaded AFTER (since they missed out on being cached.)
"
So, so long as you load all images at the start of the game, you do not need refresh_cache at all. The difficulty only arises where some images are loaded at runtime, after the other images that share their opengl texture have already been cached. So yeah, in most cases you wont need the #refresh_cache at all.
The reason #refresh_cache is so expensive is that it's caching the whole quad, so in most cases this requires downloading a 1024x1024 image from video memory and this is a slow process. It is impossible (afaik) to just download a portion of an opengl texture...though it IS possible to upload a portion of an opengl texture (see glTexSubImage2D). This is why syncing is relatively fast (and so manipulating images at runtime is viable) but caching is slow and prohibitive).
One strange point you may note from looking at prepare_image() is that #refresh_cache is called IF #quad_cached? returns true, rather than if it returns false (as you'd expect). This is because if it returns false we dont have to worry about it as TP will lazily cache it on first drawing action and everything will be fine. However if it returns true then the cached data is out of date, as the quad was cached BEFORE the new image was loaded and so excludes the new image data; hence we have to refresh it.
Notwithstanding all of the above, i think it is possible to still do things better -- maybe an image can recognize that the cache is out of date but postpone refreshing it until a drawing action is first performed. Still though, this doesn't really help things, as the bottleneck -- i.e downloading a huge 1024x1024 image from video memory must still be done at some point.
One technique i use myself, to get around this slow caching process, is to reuse old images. So rather than doing a TexPlay.create_image() at runtime, I just clear out an old image that has already been cached by doing something like: old_image.rect 0, 0, width, height, :fill => true, :color => :alpha, and then drawing over it as usual.
TL;DR
With the removal of the refresh_cache call in TexPlay.create_image() creating/manipulating images at runtime using TexPlay.create_image() should be significantly faster. However the bottleneck still exists when loading a normal image from a file at runtime (i.e not using create_image). But this bottleneck disappears when you do not intend to manipulate the image (load with :caching => false). In the case where we do need to manipulate the loaded image though, I cannot think of any way to get around needing to refresh_cache and so downloading a 1024x1024 texture from video memory, I think this will always be slow. But now that i've restricted the stituations where this (refresh_cache) happens I hope it's not such a big deal anymore.