Skip to content

Dictionary Compression

For reference all code was ran on a 5900X with 2133MHz DDR4 RAM (no OC). This page just has results of some random tests.

A page of notes/random rambling related to dictionary compression.

How to run the tests

You can find a test Python scripts in the tools folder of the repo, this test script requires that you have zstd available in your PATH. On most Linux distributions, you will have this available out of the box, on Windows you'll need to install it.

Originally I intended these scripts to be throwaways, but kept it for future use.

Per-file dictionary Size

Testing on a Random Texture

Source: Whiterun alternative street stone by Pfuscher 2.2-2347-2-2-1578423661.7z

File: wrstonefloor01.dds Original size: 64.0MiB

110K Dict Size, 128K Blocks

Input file size: 64.00 MiB
Target dictionary size: 110.00 KiB (110 KiB (fixed))
Splitting into 1024 KiB blocks...

Overall Size Comparison:
Original size:              132.39 MiB
Dictionary size:            110.00 KiB (0.1%)
Dictionary compressed:      53.90 KiB (0.0%)
Compressed (no dict):       33.99 MiB (25.7%)
Compressed (with dict):     34.34 MiB (25.9%)
Total with dict+blocks:     34.40 MiB (26.0%)

Overall Space Savings:
Without dictionary:         74.32%
With dictionary:            74.02%
Dictionary advantage:       -0.31%

Per-Block Statistics:
Number of blocks: 65
Blocks with ≥4KiB improvement: 0 (0.0%)
Average 4KiB units saved per block: -1.88

Dictionary Advantage (percentage points):
  Min: -2.36%
  Max: -0.14%
  Avg: -0.58%

Compression Ratio:
  Without Dictionary:
    Min: 34.65%
    Max: 50.40%
    Avg: 46.70%
  With Dictionary:
    Min: 32.28%
    Max: 49.64%
    Avg: 46.12%

Most Improved Block:
  block_0045:
    Original: 1.00 MiB
    Without dict: 577.79 KiB (43.58% saved)
    With dict: 579.17 KiB (43.44% saved)
    Advantage: -0.14%
    Bytes saved: -1416.00 B

Least Improved Block:
  block_0064:
    Original: 127.00 B
    Without dict: 83.00 B (34.65% saved)
    With dict: 86.00 B (32.28% saved)
    Advantage: -2.36%
    Bytes saved: -3.00 B

Decodes @ 831.4 MB/s with dict. Decodes @ 1015.8 MB/s without dict.

Size/100 Dict Size, 128K Blocks

Input file size: 64.00 MiB
Target dictionary size: 655.36 KiB (1/100 of input)
Splitting into 1024 KiB blocks...

Overall Size Comparison:
Original size:              130.63 MiB
Dictionary size:            655.36 KiB (0.5%)
Dictionary compressed:      331.79 KiB (0.2%)
Compressed (no dict):       33.99 MiB (26.0%)
Compressed (with dict):     32.31 MiB (24.7%)
Total with dict+blocks:     32.63 MiB (25.0%)

Overall Space Savings:
Without dictionary:         73.98%
With dictionary:            75.02%
Dictionary advantage:       1.04%

Per-Block Statistics:
Number of blocks: 65
Blocks with ≥4KiB improvement: 64 (98.5%)
Average 4KiB units saved per block: 6.15

Dictionary Advantage (percentage points):
  Min: -7.09%
  Max: 4.59%
  Avg: 2.48%

Compression Ratio:
  Without Dictionary:
    Min: 34.65%
    Max: 50.40%
    Avg: 46.70%
  With Dictionary:
    Min: 27.56%
    Max: 53.22%
    Avg: 49.18%

Most Improved Block:
  block_0045:
    Original: 1.00 MiB
    Without dict: 577.79 KiB (43.58% saved)
    With dict: 530.80 KiB (48.16% saved)
    Advantage: 4.59%
    Bytes saved: 46.98 KiB

Least Improved Block:
  block_0064:
    Original: 127.00 B
    Without dict: 83.00 B (34.65% saved)
    With dict: 92.00 B (27.56% saved)
    Advantage: -7.09%
    Bytes saved: -9.00 B

Decodes @ 744.4 MB/s with dict. Decodes @ 1015.8 MB/s without dict.

Speed tanks, presumably due to failed branch prediction more often, either that or L2 cache size. I haven't dug into instruction by instruction to find out.

Size/100 Dict Size, 1M Blocks

Input file size: 64.00 MiB
Target dictionary size: 655.36 KiB (1/100 of input)
Splitting into 1024 KiB blocks...

Overall Size Comparison:
Original size:              130.63 MiB
Dictionary size:            655.36 KiB (0.5%)
Dictionary compressed:      331.79 KiB (0.2%)
Compressed (no dict):       33.99 MiB (26.0%)
Compressed (with dict):     32.31 MiB (24.7%)
Total with dict+blocks:     32.63 MiB (25.0%)

Overall Space Savings:
Without dictionary:         73.98%
With dictionary:            75.02%
Dictionary advantage:       1.04%

Per-Block Statistics:
Number of blocks: 65
Blocks with ≥4KiB improvement: 64 (98.5%)
Average 4KiB units saved per block: 6.15

Dictionary Advantage (percentage points):
  Min: -7.09%
  Max: 4.59%
  Avg: 2.48%

Compression Ratio:
  Without Dictionary:
    Min: 34.65%
    Max: 50.40%
    Avg: 46.70%
  With Dictionary:
    Min: 27.56%
    Max: 53.22%
    Avg: 49.18%

Most Improved Block:
  block_0045:
    Original: 1.00 MiB
    Without dict: 577.79 KiB (43.58% saved)
    With dict: 530.80 KiB (48.16% saved)
    Advantage: 4.59%
    Bytes saved: 46.98 KiB

Least Improved Block:
  block_0064:
    Original: 127.00 B
    Without dict: 83.00 B (34.65% saved)
    With dict: 92.00 B (27.56% saved)
    Advantage: -7.09%
    Bytes saved: -9.00 B

Decodes @ 744.4 MB/s with dict. Decodes @ 1015.8 MB/s without dict.

Per-file dictionary Speed

This section contains only raw data

1M Blocks, 256K Dict Size

/home/sewer/Project/sewer56-archives-nx/tools/benchmark-single-file-dict.py  /home/sewer/Downloads/202x/textures/architecture/solitude --block-size 1024 --dict-size 262144 -e dds
Per file stats
scastlecol01.dds:
Average speed without dict: 1078.49 MB/s
Average speed with dict: 1302.66 MB/s
Average difference: +224.17 MB/s (+20.8%)

sclovers01.dds:
Average speed without dict: 1047.38 MB/s
Average speed with dict: 783.59 MB/s
Average difference: -263.80 MB/s (-25.2%)

sdetails01.dds:
Average speed without dict: 789.62 MB/s
Average speed with dict: 629.51 MB/s
Average difference: -160.12 MB/s (-20.3%)

sdetint01.dds:
Average speed without dict: 826.24 MB/s
Average speed with dict: 710.49 MB/s
Average difference: -115.75 MB/s (-14.0%)

sdirt01.dds:
Average speed without dict: 888.40 MB/s
Average speed with dict: 953.77 MB/s
Average difference: +65.38 MB/s (+7.4%)

sdirt02.dds:
Average speed without dict: 1127.68 MB/s
Average speed with dict: 1148.15 MB/s
Average difference: +20.46 MB/s (+1.8%)

sdoor02.dds:
Average speed without dict: 925.98 MB/s
Average speed with dict: 800.76 MB/s
Average difference: -125.23 MB/s (-13.5%)

sdoor03.dds:
Average speed without dict: 892.41 MB/s
Average speed with dict: 907.79 MB/s
Average difference: +15.38 MB/s (+1.7%)

sdragonhead01.dds:
Average speed without dict: 866.21 MB/s
Average speed with dict: 1045.39 MB/s
Average difference: +179.18 MB/s (+20.7%)

sdragontile01.dds:
Average speed without dict: 839.15 MB/s
Average speed with dict: 685.96 MB/s
Average difference: -153.19 MB/s (-18.3%)

sfloorhouse01.dds:
Average speed without dict: 1048.65 MB/s
Average speed with dict: 886.49 MB/s
Average difference: -162.16 MB/s (-15.5%)

sfloorhouse02.dds:
Average speed without dict: 975.19 MB/s
Average speed with dict: 891.84 MB/s
Average difference: -83.35 MB/s (-8.5%)

sgrass01.dds:
Average speed without dict: 939.30 MB/s
Average speed with dict: 1128.20 MB/s
Average difference: +188.90 MB/s (+20.1%)

sintcoloumn02.dds:
Average speed without dict: 840.77 MB/s
Average speed with dict: 685.05 MB/s
Average difference: -155.72 MB/s (-18.5%)

sintfloor01.dds:
Average speed without dict: 786.31 MB/s
Average speed with dict: 573.08 MB/s
Average difference: -213.23 MB/s (-27.1%)

smill01.dds:
Average speed without dict: 858.45 MB/s
Average speed with dict: 971.74 MB/s
Average difference: +113.29 MB/s (+13.2%)

smoss01.dds:
Average speed without dict: 1623.20 MB/s
Average speed with dict: 1191.50 MB/s
Average difference: -431.70 MB/s (-26.6%)

smoss01_m.dds:
Average speed without dict: 2337.08 MB/s
Average speed with dict: 1937.47 MB/s
Average difference: -399.61 MB/s (-17.1%)

smoss02walls.dds:
Average speed without dict: 2698.35 MB/s
Average speed with dict: 2506.42 MB/s
Average difference: -191.92 MB/s (-7.1%)

sroofslate01.dds:
Average speed without dict: 827.25 MB/s
Average speed with dict: 857.58 MB/s
Average difference: +30.33 MB/s (+3.7%)

sslatebaseint01.dds:
Average speed without dict: 818.95 MB/s
Average speed with dict: 599.92 MB/s
Average difference: -219.03 MB/s (-26.7%)

ssteps01.dds:
Average speed without dict: 1219.74 MB/s
Average speed with dict: 1328.23 MB/s
Average difference: +108.50 MB/s (+8.9%)

ssteps02.dds:
Average speed without dict: 1176.62 MB/s
Average speed with dict: 1194.79 MB/s
Average difference: +18.17 MB/s (+1.5%)

sstonebase01.dds:
Average speed without dict: 836.94 MB/s
Average speed with dict: 765.79 MB/s
Average difference: -71.14 MB/s (-8.5%)

sstonefloor01.dds:
Average speed without dict: 828.64 MB/s
Average speed with dict: 666.23 MB/s
Average difference: -162.41 MB/s (-19.6%)

sstonefloortrim01.dds:
Average speed without dict: 879.63 MB/s
Average speed with dict: 884.46 MB/s
Average difference: +4.83 MB/s (+0.5%)

sstonestep01.dds:
Average speed without dict: 847.76 MB/s
Average speed with dict: 705.67 MB/s
Average difference: -142.09 MB/s (-16.8%)

sstonewall.dds:
Average speed without dict: 983.41 MB/s
Average speed with dict: 823.37 MB/s
Average difference: -160.03 MB/s (-16.3%)

sstonewall02.dds:
Average speed without dict: 789.47 MB/s
Average speed with dict: 572.64 MB/s
Average difference: -216.83 MB/s (-27.5%)

sstonewall03.dds:
Average speed without dict: 799.66 MB/s
Average speed with dict: 636.41 MB/s
Average difference: -163.26 MB/s (-20.4%)

sstuccowall.dds:
Average speed without dict: 918.42 MB/s
Average speed with dict: 646.83 MB/s
Average difference: -271.59 MB/s (-29.6%)

sstuccowall02.dds:
Average speed without dict: 1150.37 MB/s
Average speed with dict: 1003.71 MB/s
Average difference: -146.66 MB/s (-12.7%)

sstuccowallint01.dds:
Average speed without dict: 995.63 MB/s
Average speed with dict: 852.71 MB/s
Average difference: -142.92 MB/s (-14.4%)

strims01.dds:
Average speed without dict: 1110.67 MB/s
Average speed with dict: 1021.41 MB/s
Average difference: -89.26 MB/s (-8.0%)

swoodbeam01.dds:
Average speed without dict: 797.30 MB/s
Average speed with dict: 579.73 MB/s
Average difference: -217.57 MB/s (-27.3%)

swoodbeam02.dds:
Average speed without dict: 804.17 MB/s
Average speed with dict: 665.80 MB/s
Average difference: -138.37 MB/s (-17.2%)

swoodcolumn01.dds:
Average speed without dict: 1343.45 MB/s
Average speed with dict: 1118.17 MB/s
Average difference: -225.28 MB/s (-16.8%)

swooddet01.dds:
Average speed without dict: 1219.21 MB/s
Average speed with dict: 1178.85 MB/s
Average difference: -40.37 MB/s (-3.3%)

swoodfloor01.dds:
Average speed without dict: 780.45 MB/s
Average speed with dict: 624.26 MB/s
Average difference: -156.19 MB/s (-20.0%)

swoodplanks01.dds:
Average speed without dict: 837.44 MB/s
Average speed with dict: 659.61 MB/s
Average difference: -177.83 MB/s (-21.2%)

swoodplaster01.dds:
Average speed without dict: 872.33 MB/s
Average speed with dict: 962.22 MB/s
Average difference: +89.89 MB/s (+10.3%)

swoodstep01.dds:
Average speed without dict: 831.74 MB/s
Average speed with dict: 718.43 MB/s
Average difference: -113.31 MB/s (-13.6%)
Overall averages:
  Without dictionary: 991.89 MB/s
  With dictionary: 849.61 MB/s
  Difference: -142.27 MB/s (-14.3%)

1M Blocks, FileSize/100 Dict Size

Per file stats
scastlecol01.dds:
Average speed without dict: 1237.72 MB/s
Average speed with dict: 1024.67 MB/s
Average difference: -213.05 MB/s (-17.2%)

sclovers01.dds:
Average speed without dict: 1055.95 MB/s
Average speed with dict: 559.51 MB/s
Average difference: -496.43 MB/s (-47.0%)

sdetails01.dds:
Average speed without dict: 804.91 MB/s
Average speed with dict: 621.43 MB/s
Average difference: -183.48 MB/s (-22.8%)

sdetint01.dds:
Average speed without dict: 837.64 MB/s
Average speed with dict: 660.35 MB/s
Average difference: -177.30 MB/s (-21.2%)

sdirt01.dds:
Average speed without dict: 882.35 MB/s
Average speed with dict: 711.81 MB/s
Average difference: -170.54 MB/s (-19.3%)

sdirt02.dds:
Average speed without dict: 1135.32 MB/s
Average speed with dict: 884.66 MB/s
Average difference: -250.67 MB/s (-22.1%)

sdoor02.dds:
Average speed without dict: 929.72 MB/s
Average speed with dict: 780.60 MB/s
Average difference: -149.12 MB/s (-16.0%)

sdoor03.dds:
Average speed without dict: 907.18 MB/s
Average speed with dict: 902.80 MB/s
Average difference: -4.38 MB/s (-0.5%)

sdragonhead01.dds:
Average speed without dict: 879.56 MB/s
Average speed with dict: 811.30 MB/s
Average difference: -68.27 MB/s (-7.8%)

sdragontile01.dds:
Average speed without dict: 852.54 MB/s
Average speed with dict: 714.80 MB/s
Average difference: -137.74 MB/s (-16.2%)

sfloorhouse01.dds:
Average speed without dict: 1047.07 MB/s
Average speed with dict: 681.13 MB/s
Average difference: -365.95 MB/s (-34.9%)

sfloorhouse02.dds:
Average speed without dict: 964.13 MB/s
Average speed with dict: 1141.60 MB/s
Average difference: +177.47 MB/s (+18.4%)

sgrass01.dds:
Average speed without dict: 942.77 MB/s
Average speed with dict: 764.08 MB/s
Average difference: -178.68 MB/s (-19.0%)

sintcoloumn02.dds:
Average speed without dict: 832.30 MB/s
Average speed with dict: 694.34 MB/s
Average difference: -137.96 MB/s (-16.6%)

sintfloor01.dds:
Average speed without dict: 775.15 MB/s
Average speed with dict: 528.06 MB/s
Average difference: -247.09 MB/s (-31.9%)

smill01.dds:
Average speed without dict: 846.54 MB/s
Average speed with dict: 722.30 MB/s
Average difference: -124.24 MB/s (-14.7%)

smoss01.dds:
Average speed without dict: 1604.23 MB/s
Average speed with dict: 1662.51 MB/s
Average difference: +58.27 MB/s (+3.6%)

smoss01_m.dds:
Average speed without dict: 2354.87 MB/s
Average speed with dict: 2135.83 MB/s
Average difference: -219.05 MB/s (-9.3%)

smoss02walls.dds:
Average speed without dict: 2713.32 MB/s
Average speed with dict: 2573.90 MB/s
Average difference: -139.42 MB/s (-5.1%)

sroofslate01.dds:
Average speed without dict: 882.21 MB/s
Average speed with dict: 828.31 MB/s
Average difference: -53.90 MB/s (-6.1%)

sslatebaseint01.dds:
Average speed without dict: 827.65 MB/s
Average speed with dict: 560.33 MB/s
Average difference: -267.32 MB/s (-32.3%)

ssteps01.dds:
Average speed without dict: 1265.91 MB/s
Average speed with dict: 1052.24 MB/s
Average difference: -213.67 MB/s (-16.9%)

ssteps02.dds:
Average speed without dict: 1186.68 MB/s
Average speed with dict: 1008.74 MB/s
Average difference: -177.94 MB/s (-15.0%)

sstonebase01.dds:
Average speed without dict: 855.47 MB/s
Average speed with dict: 712.75 MB/s
Average difference: -142.73 MB/s (-16.7%)

sstonefloor01.dds:
Average speed without dict: 818.08 MB/s
Average speed with dict: 592.09 MB/s
Average difference: -226.00 MB/s (-27.6%)

sstonefloortrim01.dds:
Average speed without dict: 862.94 MB/s
Average speed with dict: 719.99 MB/s
Average difference: -142.95 MB/s (-16.6%)

sstonestep01.dds:
Average speed without dict: 828.01 MB/s
Average speed with dict: 639.10 MB/s
Average difference: -188.90 MB/s (-22.8%)

sstonewall.dds:
Average speed without dict: 992.14 MB/s
Average speed with dict: 792.04 MB/s
Average difference: -200.10 MB/s (-20.2%)

sstonewall02.dds:
Average speed without dict: 799.21 MB/s
Average speed with dict: 531.20 MB/s
Average difference: -268.01 MB/s (-33.5%)

sstonewall03.dds:
Average speed without dict: 808.39 MB/s
Average speed with dict: 569.15 MB/s
Average difference: -239.24 MB/s (-29.6%)

sstuccowall.dds:
Average speed without dict: 904.69 MB/s
Average speed with dict: 741.90 MB/s
Average difference: -162.79 MB/s (-18.0%)

sstuccowall02.dds:
Average speed without dict: 1150.85 MB/s
Average speed with dict: 831.19 MB/s
Average difference: -319.66 MB/s (-27.8%)

sstuccowallint01.dds:
Average speed without dict: 1016.91 MB/s
Average speed with dict: 692.57 MB/s
Average difference: -324.34 MB/s (-31.9%)

strims01.dds:
Average speed without dict: 1098.78 MB/s
Average speed with dict: 840.70 MB/s
Average difference: -258.08 MB/s (-23.5%)

swoodbeam01.dds:
Average speed without dict: 792.29 MB/s
Average speed with dict: 532.20 MB/s
Average difference: -260.09 MB/s (-32.8%)

swoodbeam02.dds:
Average speed without dict: 793.53 MB/s
Average speed with dict: 607.36 MB/s
Average difference: -186.17 MB/s (-23.5%)

swoodcolumn01.dds:
Average speed without dict: 1374.12 MB/s
Average speed with dict: 1098.21 MB/s
Average difference: -275.90 MB/s (-20.1%)

swooddet01.dds:
Average speed without dict: 1218.22 MB/s
Average speed with dict: 1159.94 MB/s
Average difference: -58.28 MB/s (-4.8%)

swoodfloor01.dds:
Average speed without dict: 807.24 MB/s
Average speed with dict: 692.47 MB/s
Average difference: -114.77 MB/s (-14.2%)

swoodplanks01.dds:
Average speed without dict: 835.01 MB/s
Average speed with dict: 674.77 MB/s
Average difference: -160.23 MB/s (-19.2%)

swoodplaster01.dds:
Average speed without dict: 883.10 MB/s
Average speed with dict: 684.30 MB/s
Average difference: -198.80 MB/s (-22.5%)

swoodstep01.dds:
Average speed without dict: 841.06 MB/s
Average speed with dict: 703.49 MB/s
Average difference: -137.57 MB/s (-16.4%)
Overall averages:
  Without dictionary: 996.19 MB/s
  With dictionary: 779.49 MB/s
  Difference: -216.70 MB/s (-21.8%)

110KB Dict Size Limit, 1M Blocks

Overall averages:
  Without dictionary: 993.59 MB/s
  With dictionary: 896.59 MB/s
  Difference: -97.00 MB/s (-9.8%)

110KB Dict Size Limit, 64K Blocks

Overall averages:
  Without dictionary: 995.76 MB/s
  With dictionary: 896.12 MB/s
  Difference: -99.64 MB/s (-10.0%)

Per Extension Stats

All done at zstd level 12

110KB dict size limit, unless stated otherwise.

C++ Code

Tested on cblib (Charles Bloom).

C++ Code, 64KB blocks, 64KB file size limit

Looking for .cpp files under 64.00 KiB...
Target block size: 64.00 KiB
Found 128 files
Arranged into 22 blocks

=== Compression Summary ===
Total original size: 1.11 MiB

Compressed sizes:
  Individual files (no dict):   303.48 KiB (3.74x)
  Individual files (with dict):  222.36 KiB (5.10x)
  Solid blocks (no dict):        277.92 KiB (4.08x)
  Solid blocks (with dict):      216.52 KiB (5.24x)

Space savings vs individual (no dict):
  Dictionary advantage:         81.12 KiB (26.7%)
  Solid block advantage:        25.56 KiB (8.4%)
  Solid block + dict advantage: 86.96 KiB (28.7%)

Average decompression speeds:
  Individual files (no dict):   1170.76 MB/s
  Individual files (with dict):  1413.42 MB/s
  Solid blocks (no dict):        1319.67 MB/s

128KB blocks, 64KB file size limit

Looking for .cpp files under 64.00 KiB...
Target block size: 128.00 KiB
Found 128 files
Arranged into 10 blocks

=== Compression Summary ===
Total original size: 1.11 MiB

Compressed sizes:
  Individual files (no dict):   303.48 KiB (3.74x)
  Individual files (with dict):  222.36 KiB (5.10x)
  Solid blocks (no dict):        268.30 KiB (4.23x)
  Solid blocks (with dict):      215.92 KiB (5.25x)

Space savings vs individual (no dict):
  Dictionary advantage:         81.12 KiB (26.7%)
  Solid block advantage:        35.18 KiB (11.6%)
  Solid block + dict advantage: 87.56 KiB (28.9%)

Average decompression speeds:
  Individual files (no dict):   1180.18 MB/s
  Individual files (with dict):  1484.81 MB/s
  Solid blocks (no dict):        1325.63 MB/s
  Solid blocks (with dict):      1474.08 MB/s

DDS Textures

Source: Interesting NPCs 3DNPC SE - Loose-29194-4-3-6-1582211680.7z

128KB blocks, 64KB file size limit

=== Compression Summary ===
Total original size: 984.96 KiB

Compressed sizes:
  Individual files (no dict):   731.52 KiB (1.35x)
  Individual files (with dict):  658.45 KiB (1.50x)
  Solid blocks (no dict):        728.13 KiB (1.35x)
  Solid blocks (with dict):      658.19 KiB (1.50x)

Space savings vs individual (no dict):
  Dictionary advantage:         73.07 KiB (10.0%)
  Solid block advantage:        3.39 KiB (0.5%)
  Solid block + dict advantage: 73.33 KiB (10.0%)

Average decompression speeds:
  Individual files (no dict):   784.33 MB/s
  Individual files (with dict):  812.99 MB/s

128KB blocks, 128KB file size limit

=== Compression Summary ===
Total original size: 4.89 MiB

Compressed sizes:
  Individual files (no dict):   3.53 MiB (1.38x)
  Individual files (with dict):  3.47 MiB (1.41x)
  Solid blocks (no dict):        3.53 MiB (1.39x)
  Solid blocks (with dict):      3.47 MiB (1.41x)

Space savings vs individual (no dict):
  Dictionary advantage:         61.98 KiB (1.7%)
  Solid block advantage:        3.39 KiB (0.1%)
  Solid block + dict advantage: 62.08 KiB (1.7%)

Average decompression speeds:
  Individual files (no dict):   740.73 MB/s
  Individual files (with dict):  724.82 MB/s
  Solid blocks (no dict):        689.80 MB/s
  Solid blocks (with dict):      693.06 MB/s

Unlike big files, smaller files are not hurt by use of a dictionary.

Skyrim Binary Scripts (.pex)

Source: Interesting NPCs 3DNPC SE - Loose-29194-4-3-6-1582211680.7z

128KB blocks, 64KB file size limit

/home/sewer/Project/sewer56-archives-nx/tools/benchmark-dict-over-extension.py /home/sewer/Downloads/3dnpc -e pex --block-size 131072
Looking for .pex files under 64.00 KiB...
Target block size: 128.00 KiB
Found 6904 files
Arranged into 45 blocks

=== Compression Summary ===
Total original size: 5.53 MiB

Compressed sizes:
  Individual files (no dict):   3.44 MiB (1.61x)
  Individual files (with dict):  685.92 KiB (8.26x)
  Solid blocks (no dict):        444.64 KiB (12.74x)
  Solid blocks (with dict):      366.99 KiB (15.44x)

Space savings vs individual (no dict):
  Dictionary advantage:         2.77 MiB (80.5%)
  Solid block advantage:        3.00 MiB (87.4%)
  Solid block + dict advantage: 3.08 MiB (89.6%)

Average decompression speeds:
  Individual files (no dict):   367.11 MB/s
  Individual files (with dict):  2142.75 MB/s
  Solid blocks (no dict):        4184.56 MB/s
  Solid blocks (with dict):      4569.94 MB/s

This one was too interesting not to post.

Skyrim Voices (.fuz)

128KB blocks, 64KB file size limit

/home/sewer/Project/sewer56-archives-nx/tools/benchmark-dict-over-extension.py /home/sewer/Downloads/3dnpc/sound/voice/3dnpc.esp/zorafairchildvoice -e fuz --block-size 131072

=== Compression Summary ===
Total original size: 42.21 MiB

Compressed sizes:
  Individual files (no dict):   36.61 MiB (1.15x)
  Individual files (with dict):  36.20 MiB (1.17x)
  Solid blocks (no dict):        36.29 MiB (1.16x)
  Solid blocks (with dict):      36.03 MiB (1.17x)

Space savings vs individual (no dict):
  Dictionary advantage:         422.70 KiB (1.1%)
  Solid block advantage:        327.61 KiB (0.9%)
  Solid block + dict advantage: 598.52 KiB (1.6%)

Average decompression speeds:
  Individual files (no dict):   2110.25 MB/s
  Individual files (with dict):  1790.29 MB/s
  Solid blocks (no dict):        1843.08 MB/s
  Solid blocks (with dict):      1955.87 MB/s

Takeaways

From the tests here, and many more.

  • Dictionary compression is effective for files <128KiB.
  • Dictionary compression over the file itself is generally ineffective.
  • Train on only start of files was attempted, in the hopes that rest of file yields constant branch prediction hits as the dictionary wouldn't be used. That didn't quite work out.
  • Dictionary on only first blocks of large files yielded negligible overall difference.

When archiving unknown data,