Doradus Research
Notes from running an on-prem AI cluster — consumer + workstation-class GPUs, multi-vendor inference stacks, multi-model serving. Operator perspective: what actually breaks, what the docs don't say, and the configurations that ended up working in production.
Code at github.com/DoradusResearch. Hardware: 3 GPU compute nodes carrying 10× RTX PRO 6000 Blackwell (95 GiB each) + 4× RTX 5090, 2× DGX Spark (GB10, 128 GiB UMA), 2× Mac Studio M3 Ultra (256 GiB UMA each). ~1.3 TB system RAM, ~75 TB tiered storage across four tiers — hot NVMe, erasure-coded warm cluster, shared NFS model cache, SMB cold archive. 100GbE fiberoptic backbone, modern scheduler + service mesh with mTLS, perimeter firewall appliance. All on-prem.
recent posts
- 003 EvoQuality: a model that grades AI-generated images so you don't ship the broken ones
We released the first GGUF version of EvoQuality — a small model that rates image quality on a 1-5 scale. It runs at 200+ tokens per second on a single consumer GPU and tells you when your AI-generated picture is broken. Six size variants from 4 GB to 14 GB. Use it to filter training data, gate diffusion outputs, or score image collections.
- 002 Sleep mode on Blackwell, part 2: catching CUDA error 700 live + a generic Triton shmem-budget helper
Brought a frontier model we had retired in April back into production. Upstream landed the one-line fix this week; we validated it on our cluster and the model is back in the live rotation. Also upstreaming a Triton helper that solves the "kernel too big for the GPU" class of problems.
- 001 Three frontier AI models on four GPUs: hot-swapping at 4.5 seconds
Hot-swap three frontier AI models on the same four consumer GPUs in under five seconds. Used to take fifty. Plain-language breakdown of what changed, why it matters, and a short engineering note for the curious.