Hello, right now the orpheus endpoint returns SNAC tokens from the completions endpoint. In order to convert the SNAC tokens to .wav or other audio formats, you must use torch audio and python and have access to a GPU.
My use case is in a browser-like environment, I don't have access to a GPU or server resources with a GPU.
There are other services like deepinfra that offer an openai compatible API to stream a .wav file directly to the requester so that a GPU or server is not needed https://deepinfra.com/canopylabs/orpheus-3b-0.1-ft/api?example=openai-tts-python
This would work great for my environment and I could perform TTS with orpheus without needing a GPU.
There are some projects that accomplish this like https://github.com/Lex-au/Orpheus-FastAPI