We value your privacy

We use cookies to enhance your experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept", you consent to our use of cookies. For more information, please see our privacy policy.

vLLM

Run an OpenAI-compatible inference endpoint with Qwen3-Coder

Requirements

32768MB8 vCPU300GB

Source code

https://github.com/dollardeploy/templates

Deploy time (approx)

~15 minutes

About vLLM

Deploy an OpenAI-compatible REST API endpoint using vLLM with GPU acceleration. This template runs Qwen/Qwen3-Coder-Next across 2+ GPUs with tensor parallelism, tool calling support, and HuggingFace model caching.

The endpoint is fully compatible with the OpenAI Chat Completions API, making it a drop-in replacement for any OpenAI SDK client.

Use Verda servers with 2xA100, 2xH100, or similar multi-GPU configurations. Make sure to set HF_TOKEN environment variable to your HuggingFace token to be able to properly download the model.

About DollarDeploy

DollarDeploy allows you to easily deploy and manage apps on your own VPS without the need for SSH access. Deploy vLLM with just a few clicks and start building your solution today!

Requirements

Tags

Source code

Deploy time (approx)

About vLLM

About DollarDeploy