https://www.d-matrix.ai/how-to-bridge-speed-and-scale-redefining-ai-inference-with-low-latency-batched-throughput/