Sometimes I Need to Store a Lot of Strings Efficiently, So I Built lite-strtab

This post is about a very specific problem.

Storing many short, long-lived strings in an immutable (read-only) container.
All while keeping memory overhead low and lookup performance practical.

Every once in a while I run into the same problem: I have a lot of strings to keep, and I want to save a bunch of memory.

Think of stuff like:

A catalog or database snapshot you download to see what is available
Your game's localisation strings
Static lists of paths, endpoints, or keys

Something you load once, keep around for a while, and query every once in a while.

Motivation

As you may know, I have solo dev'd 200+ projects (very few are forks).

Screenshot showing 200+ solo GitHub projects — (nevermind the ones *not* on my account...)

My more recent ones made in the last 2-3 years share a common layout/template.
And even a nice Rust manual!

However, keeping all of these repositories up to date has been difficult; you make one template change, and now off you go patching 30+ repos. Oof!

Tech support is a similar story; I'm mostly solo (unfortunately), and questions from users needing help are a daily occurrence, all hours of the day. That means I can't always respond right away 😔; I may be asleep.

I usually aim to respond within 10 minutes.

If I don't reply immediately, I might forget. Even though it's all unpaid work.

So I've been investing in building some LLM-based tooling to help me with these aspects.
Help me migrate repos and catch questions while I'm sleeping 💤.

Why existing tools don't work for me?

Resource usage. OpenCode idles at 300MB of private bytes, and 500MB in use.
And that's in server mode WITHOUT the TUI.

Other tools; almost all non-native, built on TS or Python, are similar.
Including those built by nearly half a trillion companies (e.g. claude).

I'm going for ~1MB instead.

Where the Strings Come In

I wanted to parse models.dev/api.json, because I need knobs like env vars and api endpoints. I also want to keep that in memory for broad model/provider support.

That unfortunately includes unsized string data; and with that caveats arise. It is best to illustrate by example.

Unless stated otherwise, assume 64-bit machine everywhere.

My Data Structure

I start with a structure-of-arrays layout to save memory on struct padding etc.

It looks approximately like (simplified):

/// Runtime lookup catalog with split provider and model tables.
#[derive(Debug, Clone)]
pub struct ModelCatalog {
    /// Provider key lookup table.
    provider_table: HashTable<PackedProviderTableEntry>,
    /// Model key lookup table.
    model_table: HashTable<PackedModelTableEntry>,
    /// Indexed by index from provider_table (provider_idx)
    provider_api_urls: Box<[Box<str>]>,
    provider_env_var_pool: Box<[TinyVec<[Box<str>; 1]>]>,
    provider_entries: Box<[PackedProviderEntry]>,
    /// Indexed by index from model_table (model_config_idx)
    model_entries: Box<[PackedModelEntry]>,
    model_config_entries: Option<Box<[PackedModelConfigEntry]>>,
}

/// Packed model-table entry.
#[bitfield(u64)] // 8 byte struct! Amended by macro.
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub(crate) struct PackedModelTableEntry {
    #[bits(48)]
    hash48: u64, // 48 bit hash
    model_config_idx: u16,
}

/// Packed provider-table entry.
#[bitfield(u64)] // 8 byte struct! Amended by macro.
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub(crate) struct PackedProviderTableEntry {
    #[bits(48)]
    hash48: u64, // 48 bit hash
    provider_idx: u16,
}

/// Packed provider metadata row.
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub(crate) struct PackedProviderEntry {
    provider_type: ProviderType, // 8 bits
}

The Problems

I'll go block-by-block.

1. `Box<str>` handle tax

provider_api_urls: Box<[Box<str>]>,

On 64-bit, each Box<str> handle is 16 bytes (ptr + len).

In this snapshot, env name length is median 16 (mean 16.47), so for example:

"DEEPSEEK_API_KEY" payload: 16 bytes
Box<str> handle: 16 bytes

In other words, the handle itself can be as big as the heap allocation.

2. Padding from heap allocations

Each regular 'String' is a separate heap allocation.

And with every allocation there is extra overhead:

Padding/alignment: a 12-byte payload may occupy a 16-byte size-class bucket.
Internal metadata: allocator bookkeeping adds bytes beyond payload size.

Exact behaviour varies with allocator & platform.

In short, if you allocate short objects, like a 12 byte string; don't be surprised if a little bit more is actually used due to allocator internals.

3. Heap locality is not guaranteed

Even within a single array like Box<[Box<str>]>, each Box<str> points to its own heap allocation, and those allocations are not guaranteed to be adjacent or in the same region.

# Example 1: unrelated allocation can interleave
api_url("https://...")     -> 48B bucket @ 0x...9000
other allocation           -> 48B bucket @ 0x...9040  # unrelated value inserted in between
doc_url("https://docs...") -> 48B bucket @ 0x...9080

# Example 2: very similar env keys can split by size class
env[0]("DEEPSEEK_API_KEY")  # 16 chars -> 16B-class bucket @ 0x...7000
env[1]("DEEPINFRA_API_KEY") # 17 chars -> next size-class bucket @ 0x...b000

This hurts cache locality; may require additional paging, etc.

This is also why a 'bump allocator' or 'arena' is not the play here

4. Small arrays are expensive

provider_env_var_pool: Box<[TinyVec<[Box<str>; 1]>]>,

api.json snapshot (96 providers):

87 providers have 1 env key
5 providers have 2
4 providers have 3

So, most providers have a single env entry; but because >1 is possible, we need to support that case too. However, this isn't so trivial.

TinyVec<[Box<str>; 1]> uses 32 bytes per array/slice entry; and sometimes spills to heap (for 2+ keys).
TinyVec<[Box<str>; 3]> uses 56 bytes per array/slice entry; but usually only 1 entry is used.
Box<[Box<str>]> uses 16 bytes per array/slice entry, but also an extra 16 entry for slice info itself on heap.

Now consider the fact that the median/mean env name length is... 16 bytes.
Way more memory is used for the handles than for the data.

A regular string reference is expensive enough; but with small arrays, doubly so!!

layout	bytes per 1-key row (requested)	total usable bytes (glibc)
`TinyVec<[Box<str>; 1]>`	`32`	`3,568`
`TinyVec<[Box<str>; 3]>`	`56`	`5,384`
`Box<[Box<str>]>`	`32` (`16 + 16`)	`4,056`

The Core Idea

lite-strtab is kind of like a 'string interner', or 'arena allocator', but with slightly different tradeoffs.

Think of first interning all the strings, and then 'freezing' the table for read-only access. This 'freezing' performs some optimizations.

lite-strtab stores all UTF-8 bytes in one contiguous buffer and keeps one offset table.
(len + 1 entries, with a final sentinel).

Lookup is straightforward:

Read offsets[i] and offsets[i + 1]
Slice bytes[start..end]
Return &str

No per-string heap allocation, and IDs can be compact (StringId<u16> etc.).

use lite_strtab::StringTableBuilder;

let mut builder = StringTableBuilder::new();
let hello = builder.try_push("hello")?;
let world = builder.try_push("world")?;

let table = builder.build();
assert_eq!(table.get(hello), Some("hello"));
assert_eq!(table.get(world), Some("world"));
# Ok::<(), lite_strtab::Error>(())

Memory Layout Diagrams

Before: one allocation per string

flowchart TB
    subgraph Handles["Container handles"]
      H0["entry 0: String or Box<str> handle"]
      H1["entry 1: String or Box<str> handle"]
      HN["entry N: String or Box<str> handle"]
    end

    H0 --> A0["heap alloc: bytes for string 0"]
    H1 --> A1["heap alloc: bytes for string 1"]
    HN --> AN["heap alloc: bytes for string N"]

After: one byte buffer + compact offsets

flowchart TB
    IDS["IDs in your structs<br/>StringId<u16> (2B each)"] --> OFF["offsets (u32/u16)<br/>len + 1 entries"]
    OFF --> BYTES["one contiguous UTF-8 byte buffer"]

offsets: [0, 5, 10]
bytes:   [h e l l o w o r l d]

id 0 -> bytes[0..5]  -> "hello"
id 1 -> bytes[5..10] -> "world"

Handle and offset size comparison

Setting	Reference Size	Limits what?	Practical limit in `lite-strtab`
`String`	24 bytes	N/A	Owned, growable string handle
`Box<str>`	16 bytes	N/A	Owned, immutable string handle
`I = u8`	1 byte	String count	Up to `256` strings (`0..=255`)
`I = u16`	2 bytes	String count	Up to `65,536` strings (`0..=65,535`)
`I = u32`	4 bytes	String count	Up to `4,294,967,296` strings
`O = u16`	2 bytes	Total UTF-8 byte size	Up to `65,535` total bytes
`O = u32`	4 bytes	Total UTF-8 byte size	Up to about `4 GiB` total bytes

Most common setup: I = u16 + O = u32

This means 4GiB of UTF-8 data, and 64Ki entries. So every time you store a reference to a string inside a struct, it uses 2 bytes, instead of 16/24. This gives you massive memory savings when combined with arrays of strings.

Improvement with lite-strtab

Not final code, current plan.

Start with the new ModelCatalog shape:

/// Simplified layout, focusing on string-bearing fields.
pub struct ModelCatalog {
    provider_table: HashTable<PackedProviderTableEntry>,
    model_table: HashTable<PackedModelTableEntry>,

    // provider_idx is reused as the key
    provider_api_urls: StringTable, // defaults: O = u32 (offset), I = u16 (StringId)

    // env keys are grouped per provider in this table
    provider_env_keys: StringTable,
    provider_env_ranges: Box<[PackedEnvRange]>,

    provider_entries: Box<[PackedProviderEntry]>,
    model_entries: Box<[PackedModelEntry]>,
    model_config_entries: Option<Box<[PackedModelConfigEntry]>>,
}

#[bitfield(u16)] // each range entry is 16 bits (2 bytes)
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub struct PackedEnvRange {
    #[bits(14)]
    start: u16, // first env key index in provider_env_keys
    #[bits(2)]
    count: u8, // number of env keys (0..=3)
}

This means:

provider_idx -> API URL is direct (provider_api_urls[provider_idx])
provider_idx -> env keys is provider_env_ranges[provider_idx] -> (start, count) in provider_env_keys

1. API URL indexing Memory Savings

Before:

provider_api_urls: Box<[Box<str>]>,

After:

provider_api_urls: StringTable,

The existing provider_idx (u16) from the hashtable is now the key into the StringTable, so the effective per-entry metadata is 6 bytes (2-byte index + 4-byte offset).

Formula notes:

96 = number of providers in this snapshot.
16 = one Box<str> handle on 64-bit (ptr + len).
2 + 4 = compact metadata model (u16 index + u32 offset).
Ratio column is after / before.

On this snapshot (96 providers):

Step	Formula	Bytes	Ratio
Before (`Box<str>` handles)	`96 * 16`	`1,536 B`	`1.00x`
After (`StringTable` metadata)	`96 * (2 + 4)`	`576 B`	`0.38x`
Saved	`1,536 - 576`	`960 B`	`-`

This table only counts the indexing/reference layer, not UTF-8 payload bytes.

2. Environment Variable Memory Savings

Before:

provider_env_var_pool: Box<[TinyVec<[Box<str>; 1]>]>,

After:

provider_env_ranges: Box<[PackedEnvRange]>,
provider_env_keys: StringTable,

The existing provider_idx (u16) from the hashtable is now the key into the provider_env_ranges, which gives us offset+length into StringTable.

Formula notes:

32 = one TinyVec<[Box<str>; 1]> row.
16 = one Box<str> handle on 64-bit.
87 / 5 / 4 are the provider counts with 1 / 2 / 3 env keys.
X is allocator overhead from spill allocations (size-class rounding + metadata).
Ratio values in comparison are after / before.

Given 96 providers, 109 total env keys:

Before (TinyVec) path breakdown:

Component	Formula	Bytes
Per-provider rows	`96 * 32`	`3,072 B`
Spilled handles (2-key providers)	`5 * (2 * 16)`	`160 B`
Spilled handles (3-key providers)	`4 * (3 * 16)`	`192 B`
Total requested	`3,072 + 160 + 192`	`3,424 B`
Allocator overhead from spills	`+X`	`+144 B`
Total usable on glibc	`3,424 + 144`	`3,568 B`

After (PackedEnvRange + StringTable) breakdown:

Component	Formula	Bytes
Provider ranges	`96 * 2`	`192 B`
Per-env-key index+len component	`109 * 2`	`218 B`
Per-env-key offset component	`109 * 4`	`436 B`
Total requested (`after`)	`192 + 218 + 436`	`846 B`

Comparison:

Layout	Bytes	Saved vs after	Ratio
Before (`TinyVec`, requested)	`3,424 B`	`2,578 B`	`0.25x`
Before (`Box<[Box<str>]>`)	`3,280 B`	`2,434 B`	`0.26x`
After (`PackedEnvRange + StringTable`)	`846 B`	`-`	`1.00x`

3. Heap Savings

From benchmark results (malloc_usable_size, Linux/glibc):

Dataset	`lite-strtab`	`Box<[Box<str>]>`	`Vec<String>`	Saving	Ratio
`EnvKeys`	`2,240 B`	`2,728 B`	`2,888 B`	`488 B`	`0.82x`
`ApiUrls`	`4,352 B`	`4,672 B`	`4,736 B`	`320 B`	`0.93x`
TOTAL	`6,592 B`	`7,400 B`	`7,624 B`	`808 B`	`0.89x`

4. Combined impact

Layer (derived from sections)	Before	After	Saving	Ratio
`1. API URL indexing`	`1,536 B`	`576 B`	`960 B`	`0.38x`
`2. Env indexing` (best baseline)	`3,424 B`	`846 B`	`2,578 B`	`0.25x`
`3. Heap` (TOTAL)	`7,400 B`	`6,592 B`	`808 B`	`0.89x`
TOTAL (1 + 2 + 3)	`12,360 B`	`8,014 B`	`4,346 B`	`0.65x`

You might think. Damn, this person just did all this work just to save 4K...
Yes.

We'll save in other places in the future too 😉, but I might aswell write the library now, instead of later going to old code and rewrite it for the library.

Extra Benchmarks

Full benchmark tables and dataset details (full README)

This is a short summary of the crate info.

What about a larger dataset?

YakuzaKiwami (4,650 relative game paths, 238,109 bytes).

String reference cost in structs

You can store references to strings very very cheaply!! 💜

Representation	Bytes	Ratio
`lite-strtab`	`9,300 B`	`1.00x`
`Box<[Box<str>]>`	`74,400 B`	`8.00x`
`Vec<String>`	`111,600 B`	`12.00x`

For 4,650 strings...

Box<[Box<str>]>:

Struct field: 16 B × 4,650 = 74 KB

lite-strtab:

Struct field: 2 B × 4,650 = 9 KB

String references in your structs cost 2 bytes instead of 16-24.

Heap allocation size

Due to less alignment, we also save some space on the heap

Representation	Bytes	Ratio
`lite-strtab`	`256,736 B`	`1.00x`
`Box<[Box<str>]>`	`272,528 B`	`1.06x`
`Vec<String>`	`272,640 B`	`1.06x`

A neat side effect to more manual alignment control.

Heap memory numbers were measured with glibc malloc_usable_size on Linux.

This accounts for allocator induced alignment overhead, we're keeping it real ❤️‍🔥

Total memory usage (struct refs + heap)

Based on the two tables above

Representation	Struct refs	Heap alloc	Total bytes	Ratio
`lite-strtab`	`9,300 B`	`256,736 B`	`266,036 B`	`1.00x`
`Box<[Box<str>]>`	`74,400 B`	`272,528 B`	`346,928 B`	`1.30x`
`Vec<String>`	`111,600 B`	`272,640 B`	`384,240 B`	`1.44x`

Read performance (YakuzaKiwami, sequential pass + AHash)

Access	Representation	avg time (us)	avg thrpt (GiB/s)
`get`	`Vec<String>`	13.561	16.352
`get`	`Box<[Box<str>]>`	13.002	17.056
`get`	`lite-strtab`	13.368	16.589
`get`	`lite-strtab (null-padded)`	13.714	16.171
`get_unchecked`	`Vec<String>`	13.448	16.490
`get_unchecked`	`Box<[Box<str>]>`	12.812	17.308
`get_unchecked`	`lite-strtab`	13.207	16.790
`get_unchecked`	`lite-strtab (null-padded)`	13.828	16.037

In practice, this is near parity for a realistic read-and-process workload.

Assembly-level lookup cost (x86_64, release)

As an extra note. Even though we don't have ptr+len handles, deriving them is not that expensive 😉.

Here's x86_64 instruction counts for getting a &str.

Method	Instructions	Access pattern
`lite-strtab::get`	~12	bounds check -> load 2 offsets -> compute range -> add base
`lite-strtab::get_unchecked`	~7	load 2 offsets -> compute range -> add base
`Vec<String>::get`	~8	bounds check -> load ptr from heap -> deref for (ptr, len)
`Vec<String>::get_unchecked`	~5	load ptr from heap -> deref for (ptr, len)
`Box<[Box<str>]>::get`	~7	bounds check -> load ptr -> deref for (ptr, len)
`Box<[Box<str>]>::get_unchecked`	~4	load ptr -> deref for (ptr, len)

Extra benchmark notes

Current published benchmarks are sequential access patterns.
Random access patterns may behave differently and need dedicated tests.

Closing

If your workload has lots of long-lived immutable strings, this data structure shape is worth considering.

It is simple, predictable, and does one job well.

The sources and docs are here: