Anyway, I've since gotten back into rendering and have made myself more familiar with the PVR API, finally. Anyway, one of the things that has always been on my mind is how 2ndMix needs a new rendition, because it's one of the main examples we test with while developing KOS to look for regressions... and tbh, it's not really using the hardware very well, as it was a very early tech demo. Actually, we got it to dip from 60fps to 30fps with a change the other day when built with -O2 when a bunch of text appears onscreen... So basically that little thing was *barely* hitting 60fps all this time...
So anyway, I dove into the codebase, and started working on it. In the end, I went from <100k polys/sec to more than 1.2 million polys/sec and only had to stop because I was running out of VRAM for storing the vertices, without texture compressing the font... Here's what the end result wound up looking like.
Here's a still screenshot (looks way better in motion): Here's a video on Twitter/X:
https://x.com/falco_girgis/status/17510 ... 58590?s=20
What did I do?
One thing to note: I have yet to do a damn thing to the cube rendering code, which is similarly TERRIBLE. All math is done in software without the SH4 vector/matrix instructions, FIPR and FTRV! Need to fix that!Falco Girgis wrote: 1) Changed stars from being triangles to being PVR HW sprites
2) Changed text from being triangle-strips to being PVR HW sprites
3) Redid how stars were stored. Rather than putting X, Y, Z coords in 3 separate arrays, packed them into a struct, using only int16_t for each field. Much easier on the cache.
4) Redid all star update math, moving it from being done as integer math to FP math... which made quite a hilarious amount of difference!
5) Cached the HW sprite strip header for text and only resubmitted it when the color was changed for the fade effect.
6) Moved all rendering away from pvr_prim() and towards using the direct rendering API with pvr_dr_commit() for each vertex. Unfortunately I had to do terrible hacks around the KOS PVR API to do this, since the direct rendering API only supports regular 32-bit polygon vertex types, not the 64-bit sprite vertex format that I needed here... Fortunately I wound up being able to do disgusting things to the APIand aliasing pointers to work around it... Will have to think about how to fix this in KOS soon, because there was a pretty significant performance increase using direct rendering.
7) Figured out a much more clever way to handle the coloring on the stars, which was a big one... After moving them to HW sprites, I would still have to resubmit a new sprite header each time the color had to change between sprites. At first I developed an intelligent batching system, which used C's qsort() to sort the stars by Z coord (which is what determined color), to reduce the number of mid-render state changes...
That gave an "okay" amount of performance gain, but what really did it was completely doing away with calculating color per-star in software... How? I realized that exact fade-in effect could be done using the hardware fog effect. Once I figured out how to recreate it (took forever to tweak the fog parameters), I wound up only ever needing to submit a single header for all stars within the scene, since color never changes. That's one header for 7k+ sprites! THAT made the biggest difference in polygon throughput.
8 ) Changed the static, 256-entry integer-based LUT for sin() and cos() to use the actual HW instructions on floats. Gave a little performance boost and made the interpolation and rotations look smoother.
9) Added Z-scaling to the stars so they get bigger realistically as they zoom towards the camera
Anyway, here's the source code if anybody wants to play with it or help optimize it further. Trying to think of other things we can do with it, like maybe add something with post-processing, render-to-texture, or even modifier volumes to make it a better demonstration of the PVR and a better regression test in terms of KOS performance and features...