Skip to content

Format Specification

File Format Version: 1.0.0

Note

This is a semi-SOLID archive format for storing game mod content; intended to double up as a packaging format for uploading mods.

It has the following properties:

  • Files under Block Size are SOLID Compressed.
  • Files above Block Size are non-SOLID Compressed.
  • Variable Block Size.
  • Stores File Hashes Within.
  • Huge Files Split into Chunks for Faster (De)compression.
  • TOC in-front.

We use SOLID compression to bundle up small files together, while keeping the large files as separate compressed blobs. All files are entirely contained within a slice of a given block.

flowchart TD
    subgraph Block 2
        BigFile1.bin
    end

    subgraph Block 1
        BigFile0.bin
    end

    subgraph Block 0
        ModConfig.json -.-> Updates.json
        Updates.json -.-> more["... more .json files"]
    end

Offsets of each block is stored in header, therefore large files can be completely skipped during the extract operation if a small file is all that is needed.

Note

This format is optimized for transferring and unpacking files; editing existing archives might lead to sub-optimal performance.

Overall Format Layout

The overall file is structured in this order:

| Header + TOC | Block 1 | Block 2 | ... | Block N |

All sections (indicated by |) are 4096 aligned to match physical sector size of modern drives and page granularity.

Field sizes used below are similar to Rust notation; with some custom types e.g.

  • u8: Unsigned 8 bits.
  • i8: Signed 8 bits.
  • u4: 4 bits.
  • u32/u64: 4 Bytes or 8 Bytes (depending on variant).
  • align8: Add 0-7 padding bytes to align the current address to multiple of 8.

Assume any bit packed values are sequential, i.e. if u4 then u4 is specified, first u4 is the upper 4 bits.

All packed fields are little-endian; and written out when total number of bits aligns with a power of 2.

  • u6 + u12 is 2 bytes little-endian
  • u15 + u17 is 4 bytes little-endian
  • u26 + u22 + u16 is 8 bytes little-endian
  • u6 + u11 + u17 is 4 bytes little-endian, not 2+2

Terminology

  • Block: Represents a compressed section of data of any size smaller than chunk size.
  • Chunk: A block that corresponds to a slice of a file.
    • A file compressed in a single block is said to have 1 chunk.
    • A file compressed in multiple block(s) is said to have multiple chunks.

Use as Packaging Format

Tip

Inclusion of hash for each file has some nice benefits.

  • Can do partial download to upgrade from older version of mod.

    • We can download header (incl. Table of Contents) only, compare hashes.
    • Then only download the chunks we need to decompress our needed data.
    • Inspired by MSIX and certain Linux package formats.
  • Certain applications like Nexus Mods App can avoid re-hashing files.

Previewing the Format

Info

For people wishing to study the format, or debug it, a 010-Editor template is available for usage 010 Template.

Hit Templates -> Open Template and then the big play button. Then you'll be able to browse the format in 'Variables' window.

Alternatively, contributions are welcome if anyone wants to make a Kaitai Struct variation 💜.

Section Alignment

Each section is aligned to the following values in bytes

For arrays, this lists alignment for each entry.

Table of Contents:

User Data:

File entries are aligned to 8 bytes when [Version] is V0,

Version History

This is the version history for the file format, not the reference implementation/library.

To view the file format specification for a given version, navigate to the linked commit for each version and read this specification.

2.0.0

Initial Release, in Rust

Version in header is updated to 1.

THIS IS A WIP. REST OF SPEC IS NOT YET UPDATED TO ACCOUNT FOR THIS

  • Hashing algorithm replaced with XXH3 (from XXH64).
  • [Unconfirmed] Support for per-extension dictionaries.
  • Implementation of User Data Segment in reference implementation.
  • Added Section Alignment section to docs.

Implementation of User Data Segment

The User Data Segment, proposed in 1.X docs is finalized and implemented.

Example use cases:

  • Storing a binary baked-in hashtable to quickly find files by name.
  • Storing update information for a mod package if Nx is used to power a package manager.
  • Storing file metadata (read/write timestamps, file permissions, etc.)

Implementation of Per-Extension Dictionaries

See: Per-Group Dictionary Experiment

Hashing Algorithm Change

The hashing algorithm has been changed to XXH3 from XXH64.

This is a hard change because XXH3 is superior in just about all use cases. The format originally intended to use XXH3, however the Nexus Mods App opted to go with XXH64 instead.

The original intent was that you'd take the hash of each file from the archive and get hashes 'for free' (no I/O bottleneck). However the design changed.

Since the Nexus Mods App does not make use of the hashes in the archives, the archive format is migrating to XXH3 as standard.

String Pool Experiment

An alternative implementation of the String Pool was experimented with.

It speeds up from around 89us to 67us, however the format will not use it, as it increases binary size considerably. The format was a variation of the pool where the buffer starts with an array of u8 containing the path lengths. The strings following naturally.

VPrefix Pool Benchmark Numbers

1.1.0

Revisions of the Spec

This is a minor revision of the spec which tightens some assumptions about the format.

This does not increment the version in the header. There are no changes in the actual format itself, just that certain behaviours of the reference implementation are being standardised into the spec.

Version in header remains 0.

Tightened String Pool Assumptions

The String Pool is now assumed to have a number of items equivalent to the amount of files which are stored in the Table of Contents. In other words, FileCount == NumOfItemsInPool.

This was already the case previously, but now this is part of the spec.

This allows for faster parsing of pool

1.0.0

Initial Release

Last commit with previous version: 196d116d09cd436818dfd596e069eaef2b7a616d

Dated 21st of July 2024, this marks the 'initial release' as 1.0.0.

File Header Changes

This release removes the Block Size (u4) field from the header, as this can vary per block and with the use of features such as deduplication and archive merging. It was also not used in the reference implementation anywhere.

Instead, the Chunk Size field is extended to 5 bits and the header page count to 15 bits. This allows the chunk size to be in range of 512 bytes to 1 TiB. (Previous range 32K - 1GiB)

The version field is repurposed. In the previous version, it was used to indicate the version of the table of contents. Now that is moved to the actual table of contents itself. The version field is now used to indicate incompatible changes in the format itself. This field is u7. The previous field, was moved to the actual Table of Contents itself.

The Header Page Count field is extended to 16 bits, allowing for a max size of 256MiB. This allows for storage of arbitrary user data as part of the Nx header. A reserved, but not yet implemented section for User Data was also added to the header.

The Table of Contents has also received its own proper 'size' field. Which led to some fields being slightly re-organised.