More Reasonable Billing for Batched Inference
in progress
Qingyun Li
Currently downloading the model, which is used for inference, can be expensive. For example, downloading Llama 3.3 70B takes over an hour (still running, so it might be multiple hours even).
If preparation steps such as this weren't billed, and only the actual compute was, that would be great.
M.R.
in progress
M
Mike Henry
Appreciate the feedback, we're going to switch it to only bill when the GPU is actively processing.