Skip to content

LLM

Batching Tool Calls: Racing Against Cerebras Rate Limits & Too Fast Text Generation

This post talks about LLMs

I don't usually talk about AI or LLMs. Too much stigma, too many grifters, too much slop.

Let's be clear: LLMs cannot and should not replace you today - it is a tool (like any other) that can only aid you. The quality of your code is only as good as you are; LLMs lack good 'taste' and thus will produce slop by default. That goes for writing code equally as well as other uses. I say this as someone who has to clean up that slop daily, both mine and PR'd.

It's your responsibility to judge and produce good quality code at the end of the day. That includes ensuring that a dumb silicon-made machine doesn't output crap.

That said, AI can be a powerful tool to speed up work - if you know what you're doing.
This post is about documenting an optimization experiment, that's all.

Graph aside (38 lines), everything was (of course), written by me, Sewer, manually by hand.

Two months ago I ran an experiment to optimize LLM-based coding workflows at extreme speeds.

The findings were originally shared as a Discord post, but I never got around to posting them here. Today, with some free time, I've reformatted it as a more proper blog post with additional context.