Introducing Neuralwatt Flex

Not every watt of compute needs to be spent the same way. Some AI work is urgent, but a lot of it isn't. And when a workload can wait, there's an opportunity to run it more efficiently, for a lower price, and with a lighter footprint on the grid.

A chatbot answering a customer needs to respond immediately, but an overnight job summarizing yesterday's logs can run until the morning. Neuralwatt Flex is a solution for the latter — code review bots, evals, long-running agent tasks, batch jobs, or any request where there isn't a person waiting for a response in real time.

How Neuralwatt Flex Works

Run a Flex request and, if there's spare capacity, it starts immediately. If the fleet is busy, it's held briefly before starting. In exchange for that flexibility, you pay 35% less than standard rates. In testing over the 24 hours before launch, two out of three Flex requests ran right away, and 95% started within about 30 seconds. Short enough that for background work, you'll rarely notice a difference.

Neuralwatt Flex is one of the first offerings of its kind for open-weight models, bringing the same economics from major closed platforms to the open models developers are choosing today. At launch, Neuralwatt Flex is debuting with the GLM-5.2 and Kimi K2 families, with more options on the way.

Activating Neuralwatt Flex is also simple. Just add "-flex" to the model name, like "glm-5.2-flex", or pass "service_tier": "flex" on a standard request. Send it, and Flex takes care of the rest.

By shifting flexible work into quieter windows, Neuralwatt Flex spreads demand more evenly across our infrastructure. That means less strain during peak hours, faster service for everyone, and compute that lands in the moments when it's cheapest and cleanest to run. It’s proof that efficiency, cost, and performance don’t have to be competing goals.

Neuralwatt Flex is now live, and we can’t wait for you to try it. If your workload can trade a few seconds for real savings, it's just one line of code away.