I'm going to use this place as a way to share progress regarding my projects, including
Reloaded3 and other related work.
Due to how busy I am, I'm not sure how often I'll be able to write here, as there's just too
much work to do for a single person, but I'll try to post something from time to time.
Why Run a Blog?
It's simple; I've learned a bunch from reading other people's Blogs, so I want to pass the
knowledge on to the people in the future. 🤞
As part of the Reloaded3 project, I am designing a new archive format to serve as a container for game mods, both for distribution and loading of assets.
Read-Only Virtual Filesystem: Make games read file from Archive as if they were on disk.
Efficient Distribution: Minimizing size and supporting streaming downloads.
Legacy Replacement: Capable of replacing native game archives with superior performance.
High-Speed Archival: Decompression speeds matching modern NVMe drives (GB/s).
To do this, I sat down and wrote a quick tool to analyze the existing mods that are out there.
I wanted to look at all Reloaded-II mods (or as close as possible). The easiest way to do that was
the Reloaded-II.Index, the same index where the built-in mod browser pulls its data from.
This resulted in a dataset of 2,197 unique packages; after excluding duplicates, etc.
I don't usually talk about AI or LLMs. Too much stigma, too many grifters, too much slop.
Let's be clear: LLMs cannot and should not replace you today - it is a tool (like any other) that can only aid you. The quality of your code is only as good as you are; LLMs lack good 'taste' and thus will produce slop by default. That goes for writing code equally as well as other uses. I say this as someone who has to clean up that slop daily, both mine and PR'd.
It's your responsibility to judge and produce good quality code at the end of the day. That includes ensuring that a dumb silicon-made machine doesn't output crap.
That said, AI can be a powerful tool to speed up work - if you know what you're doing.
This post is about documenting an optimization experiment, that's all.
Graph aside (38 lines), everything was (of course), written by me, Sewer, manually by hand.
Two months ago I ran an experiment to optimize LLM-based coding workflows at extreme speeds.
The findings were originally shared as a Discord post, but I never got around to posting them here. Today, with some free time, I've reformatted it as a more proper blog post with additional context.
While the BC1-BC3 transforms were fairly simple and straight forward, formats like BC6h and BC7
massively increase complexity. Experimenting with different ways to transform them takes a lot of
time.
To help with those formats, any many others, I built a tool for defining and comparing transforms.
In my previous post in the series, I've demonstrated a recipe for making BC1-BC3 texture data more
compressible, ~10% saving at a blazing ~60GB/s on the single thread.
That transform is usually beneficial, but there will be rare cases where it isn't.
With more complex files, such as files with multiple distinct sections, you may want to apply
transforms on a per section basis. Or even skip individual steps of a transform.
But how do we know if a transform is beneficial or not?
Hooking NT / POSIX API calls to trick processes into reading files from another place.
Except we're reading from an archive.
If we load/decompress on all threads, we load data faster, using less disk space.
File Downloads: Mods need fast file downloads.
Support streaming/partial download of archive contents.
Minimize file size.
User downloads less, mod site needs less traffic.
Everyone is happy.
As a game archive format: Replace old games' native archives with the new format.
Through hooking, we can replace a native archive format with our own.
For size, performance, and just because you can.
Medium Term Archival: I want to save some mods on disk for future use.
Basically as a general purpose archive format.
For non-Reloaded3 software, to archive currently disabled mods to save disk space.
The archive format extracts data so fast, extraction is as fast as your storage drive.
So like 8GiB/s on a modern NVMe.
The last one, mainly to seek adoption.
Today, we're going to be unraveling one of the tricks to achieve one of these goals;
faster texture loads and smaller sizes via texture transformation.
I'll speedrun you through the basics of the simplest (BC1 / DXT1) transform, and how you can
write similar transforms to improve compression ratios of known data.
I'll keep it simple, no prior compression theory/experience required.