Skip to content

Table of Contents (TOC)

This document describes the Table of Contents (TOC) format used in the archive files.

Size: 8 bytes

  • u1: IsFlexibleFormat
  • u2: Preset
  • Remaining bits are allocated differently depending on the preset.

If the first bit is set, use Flexible Entry Format 64 (FEF64), otherwise use one of the presets.

Flexible Entry Format 64 (FEF64)

  • Summary: A series of sub-formats with 16-byte FileEntry, fits most small mods and update packages.
  • Purpose: Uploads/downloads of common mods to/from the internet.
    • Allows including/excluding the Hash field and supporting SOLID-less archives.
    • Most mods in practice will use this format.

Format:

Values stored in CompressedPoolSizeBits, BlockCountBits andFileCountBits are offset by 1. That means that if the stored value is 0, we actually mean 1.

If CompressedPoolSizeBits + BlockCountBits + FileCountBits fit in the 42 bits of padding; then they are placed in the lower 42 bits. Otherwise we allocate 8 bytes for the ItemCounts struct.

Compressed pool data is ~4 bytes per entry usually.

This version works under the assumption of ~21 bytes per entry, 16 for entry, 5 for compressed file path. 4080 / 21 = 194.2 files.

Preset 0

  • Summary: 20-byte FileEntry. Suitable for 99.9% of mods.
  • Purpose: General archival/unarchival of mods.
  • Limits:
    • Max File Count: 256K
    • Max Block Count: 4M
    • Max SOLID Block Size: 16MiB
    • Max File Size: 4GiB
    • Max Block Size: 1GiB
  • Derived Limits:
    • Max Guaranteed Content Size: 1PiB (4M blocks * 1GiB size)
    • Max Size @64K Block Size: 256GiB (4M blocks * 64KiB size)

Format:

Preset 1

  • Summary: 12-byte FileEntry. Variant of Preset 0, but with no hash.
  • Purpose: When hashes are not needed. e.g. Read-only virtual filesystems.
  • Limits:
    • Max File Count: 256K
    • Max Block Count: 4M
    • Max SOLID Block Size: 16MiB
    • Max File Size: 4GiB
    • Max Block Size: 1GiB
  • Derived Limits:
    • Max Guaranteed Content Size: 1PiB (4M blocks * 1GiB size)
    • Max Size @64K Block Size: 256GiB (4M blocks * 64KiB size)

Format:

Preset 2

  • Summary: 24-byte FileEntry. Variant of Preset 0 with 64-bit file sizes.
  • Purpose: Edge cases. Exceptionally huge archives.
  • Limits:
    • Max File Count: 256K
    • Max Block Count: 4M
    • Max SOLID Block Size: 16MiB
    • Max File Size: 4GiB
    • Max Block Size: 1GiB
  • Derived Limits:
    • Max Guaranteed Content Size: 1PiB (4M blocks * 1GiB size)
    • Max Size @64K Block Size: 256GiB (4M blocks * 64KiB size)
    • Max Size @1M Block Size: 4TiB (4M blocks * 1MiB size)

Format:

Preset 3

  • Summary: 8/16-byte FileEntry, for services hosting SOLID-less mods.
  • Purpose: Uploads/downloads of mods with small files to the internet.
  • Limits:
    • Max File Count: 64K
    • Max Block Count: 64K
    • Max SOLID Block Size: 0 MiB
    • Max File Size: 4GiB
    • Max Block Size: 1GiB
  • Derived Limits:
    • Max Guaranteed Content Size: 64TiB (64K blocks * 1GiB block size)

Format:

Field Explanations

FileCount

The FileCount in the TOC header determines the number of FileEntry structs following.

BlockCount

The BlockCount in the TOC header determines the number of Block structs following.

CompressedPoolSize

The CompressedPoolSize in the TOC header specifies the size of the compressed StringPool

Based on observation, a CompressedPoolSize of 16 MB can accommodate approximately 4.4 million files with average path lengths.

DecompressedBlockOffset

Offset of the start of the file in the decompressed block

FilePathIndex

The FilePathIndex specifies the order of the file path for a given FileEntry in the StringPool.

FirstBlockIndex

The FirstBlockIndex specifies the index of the block containing the file.

Or the first block if the file is split into multiple chunks.

File Entries

Use known fixed size and are word aligned to improve parsing speed; size 20-24 bytes per item depending on variant.

Implicit Property: Chunk Count

Tip

Files exceeding Chunk Size span multiple blocks.

Number of blocks used to store the file is calculated as: DecompressedSize / Chunk Size, and +1 if there is any remainder, i.e.

public int GetChunkCount(int chunkSizeBytes)
{
    var count = DecompressedSize / (ulong)chunkSizeBytes;
    if (DecompressedSize % (ulong)chunkSizeBytes != 0)
        count += 1;

    return (int)count;
}

All chunk blocks are stored sequentially.

Blocks

Each entry contains raw size of the block; and compression used. This avoids us having to have an offset for each block.

Compression

Size: 2 bits (0-7)

  • 0: Copy
  • 1: ZStandard
  • 2: LZ4
  • 3: Reserved

As we do not store the length of the decompressed data, this must be determined from the compressed block.

Nx uses non-standard zstandard compressor settings

For more details, see [Stripping ZStandard Frame Headers]

String Pool

The String Pool has the following format:

  • u32 DecompressedSize
  • u8[DecompressedSize] RawData

The DecompressedSize stores the size of the pool after decompression, the compressed size can be found above as CompressedPoolSize.

The RawData is a buffer of UTF-8 deduplicated file path strings. Each string is null terminated. The strings in this pool are first lexicographically sorted (to group similar paths together); and then compressed using ZStd. This improves compression ratios.

For example a valid (decompressed) pool might look like this: data/textures/cat.png\0data/textures/dog.png

String length is determined by searching null terminators. We will determine lengths of all strings ahead of time by scanning for (0x00) using SIMD. No edge cases; 0x00 is guaranteed null terminator due to nature of UTF-8 encoding.

See UTF-8 encoding table:

Code point range Byte 1 Byte 2 Byte 3 Byte 4 Code points
U+0000 - U+007F 0xxxxxxx 128
U+0080 - U+07FF 110xxxxx 10xxxxxx 1920
U+0800 - U+FFFF 1110xxxx 10xxxxxx 10xxxxxx 61440
U+10000 - U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 1048576

When parsing the archive; we decode the StringPool into an array of strings.

Nx archives should only use '/' as the path delimiter.

The number of items in the pool is equivalent to the number of files in the Table of Contents

If an archive has 1000 items, the pool has 1000 strings. The StringPool parser may only parse up to FileCount items. Files may choose not to use file names, in which case they would use a 0 length name. (Only \0 null terminator)

Note

It is possible to make ZSTD dictionaries for individual game directories that would further improve StringPool compression ratios.

This might be added in the future but is currently not planned until additional testing and a backwards compatibility plan for decompressors missing the relevant dictionaries is decided.

Performance Considerations

The header + TOC design aim to fit under 4096 bytes when possible. Based on a small 132 Mod, 7 Game Dataset, it is expected that >=90% of mods out there will fit. This is to take advantage of read granularity; more specifically:

  • Page File Granularity

For our use case where we memory map the file. Memory maps are always aligned to the page size, this is 4KiB on Windows and Linux (by default). Therefore, a 5 KiB file will allocate 8 KiB and thus 3 KiB are wasted.

  • Unbuffered Disk Read

If you have storage manufactured in the last 10 years, you probably have a physical sector size of 4096 bytes.

fsutil fsinfo ntfsinfo c:
# Bytes Per Physical Sector: 4096

a.k.a. 'Advanced Format'. This is very convenient (especially since it matches page granularity); as when we open a mapped file (or even just read unbuffered), we can read the exact amount of bytes to get header.

Version Optimization Note

The version formats are optimized around the read speeds of a 980 Pro NVMe SSD as reference

  • 4K: 52us
  • 8K: 80us
  • 16K: 95us
  • 32K: 85us
  • 64K: 89us
  • 128K: 100us
  • 256K: 140us
  • 512K: 218us

There are 2 size 'thresholds':

  • Up to 4K
  • Up to 128K

In practice, the 128K size can handle around 5000 files when using a 20-byte FileEntry.

This is considered to be the 'upper limit' in terms of file counts for mod packages; therefore we do not have many variants beyond 20 bytes/entry. i.e. Beyond this limit, we don't aggressively optimize.

Selecting CompressedPoolSize in Version/Variants

The CompressedPoolSize field size is proportional to the FileCount

Consider the following:

  • Game file names have a length of 16 bytes.
  • 12 bytes if we exclude the extension.
  • File paths are usually 5 bytes after compression.

To compensate for games with highly varying file names, we'll assume that the average file path (after compression) is 8 bytes.

As a direct result, the CompressedPoolSize field should use 3 more bits than the FileCount field.