The problem is (im guessing) that Gosu::Image.from_text
generates a new image each time it's called. In order for TexPlay to read the image data so it can perform a splice it has to cache the images (pull down image data from vram) - these cache operations are very, very expensive. I don't know a way around this, but spooner or jlnr may have some ideas.
Another thing that could be slowing it down is the syncing process after each splice - you can speed this up by passing :sync_mode => :no_sync to the splice calls, and calling a final force_sync at the end to sync the changes to vram all in one go.
I would like to formally request text drawing as a feature of Texplay. There should be two methods, let's call them "draw_text" and "text_size", which operate on a Gosu::Image. The "draw_text" method should take an x,y coordinate and a string, as well as, possibly, an options hash for font options like font name, size, italics, bold, etc. The "text_size" should take the same arguments, except x,y, and return the width and height of that string with those options if it is rendered. There may also possibly be an option in "text_size" as to whether it returns the actual pixel width or the kerned width, so that say you could draw a string one character at a time with the proper kerned width.
I have considered using RMagick for this, and possibly other drawing operations, and concluded that unless there is a capability to modify a Gosu texture every frame with the contents of the RMagick image, my performance would suffer as much or more than the current way I am doing things.
Here's a use case for drawing text directly to the image instead of returning an image containing that image:
Primary use case: draw lots of text on the screen without keeping track of all of it in Gosu::Window.update . For the sake of elegant programming, there could be a function that builds a chat room or some other text-heavy scene, then Gosu::Window only has to display one image. Perhaps this could be done with render_to_image, but then you can't change the background. Drawing from a Font is faster, but requires keeping track of all the different texts and their locations. Drawing a bunch of Gosu:Image's has the same disadvantage, and is slower to start; splicing them all into one image is simpler but still slow. And then what happens if I want to clip the text in a viewport? I'd probably have to either keep track of a call to Gosu::Window.clip_to if I'm using Font.
Ultimately I am trying to preserve an existing interface, and I can see this interface has an advantage when rendering text-heavy scenes. I shouldn't have to build extraneous Gosu textures to combine several text draws into one image.