I had also not heard of stable_sort either until I went to look up the stability of std::sort.
Unfortunately when I swapped in stable_sort I found my stress test dropped back to about 670 FPS. I also realised that I was unfairly favouring sorting over multiset by doing my drawing in z order, with most z values the same. When I moved to random z values, the std::sort approach fell back to about 700FPS. My profiling suggested that these drops were due to time spent copying DrawOps, which are quite bulky, so I have altered DrawOpQueue to sort a vector of pointers to DrawOps with associated z-values rather than DrawOps themselves. This takes the FPS back up to about 900. I am still keeping the DrawOps themselves in a vector so that memory is allocated for them in large chunks rather than for each in turn.
I have uploaded the patch to
http://www.mediafire.com/?x34uexq6vpj4wmc . I have also included a full copy of DrawOpQueue.hpp just in case, as I have not made a patch with svn before. I will be interested to hear if it makes any difference where performance is a limitation.
An improvement might be to avoid the copying of existing DrawOps when the vector is expanded - push pointers to DrawOps into a vector as they were added, but allocate the memory for the DrawOps themselves in large chunks. It would be nice to also reuse the DrawOp memory between frames - a DrawOpPool maybe? [EDIT] Actually I just checked and I am not sure copying of existing DrawOps is worth worrying about, reserving a large vector in advance does not seem to make any difference in my stress test. [/EDIT]
It would be very elegant to be able to use a DrawOps vector for the VertexArray directly, however I expect the copying costs during sorting might outweigh the benefits of not copying into a separate VertexArray.
Using a single-pixel texture to approximate drawTriangle and drawQuad sounds like a very neat way of unifying the drawing operations.
An idea that just occurred to me is that you could easily offer the option of disabling the sort operation, disabling most of the overhead of z-order, for those who wanted just to rely on order of drawing.