We value your privacy
We use cookies to enhance your experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept", you consent to our use of cookies.
For more information, please see our privacy policy.
Requirements
32768MB8 vCPU300GB
Source code
Deploy time (approx)
~15 minutes
About vLLM
Deploy an OpenAI-compatible REST API endpoint using vLLM with GPU acceleration.
This template runs Qwen/Qwen3-Coder-Next across 2+ GPUs with tensor parallelism,
tool calling support, and HuggingFace model caching.
The endpoint is fully compatible with the OpenAI Chat Completions API, making it
a drop-in replacement for any OpenAI SDK client.
Use Verda servers with 2xA100, 2xH100, or similar multi-GPU configurations. Make sure to set
HF_TOKEN environment variable to your HuggingFace token to be able to properly download the model.
About DollarDeploy
DollarDeploy allows you to easily deploy and manage apps on your own VPS without the need for SSH access. Deploy vLLM with just a few clicks and start building your solution today!