Best-in-class solutions developed on Qualcomm AI Cloud 100 Ultra deliver up to 10x more tokens per dollar, significantly reducing operational costs for AI deployments
cerebral system, a pioneer in accelerating generative artificial intelligence (AI), announced its plans to deliver breakthrough performance and value for production artificial intelligence (AI). Up to 10x price performance for production-grade deployments using Cerebras’ industry-leading CS-3 AI accelerator for training and Qualcomm Technologies, Inc.’s AI 100 Ultra for inference can be improved.
“Techniques like sparsity and speculative decoding that speed up inference while reducing operational costs are key, allowing everyone to integrate and experiment with AI.”
“These joint efforts aim to usher in a new era of high-performance, low-cost inference, and the timing couldn’t be better. We focus on training.” Andrew Feldman, CEO and co-founder of Cerebras. “Using Qualcomm Technologies’ AI 100 Ultra, we can significantly reduce inference costs without sacrificing model quality, leading to the most efficient deployments currently available.”
Recommended AI news: GE Healthcare highlights AI-enabled portfolio and digital solutions at HIMSS 2024
Cerebras leverages the latest cutting-edge ML technology and world-class AI expertise to accelerate AI inference in conjunction with Qualcomm Technologies’ AI 100 Ultra. Some of the advanced techniques used are:
- unstructured sparsity: Cerebras and Qualcomm Technologies’ solutions can perform training and inference using unstructured dynamic sparsity, a hardware-accelerated AI technique that dramatically improves performance efficiency. For example, an Llama 13B model trained on Cerebras hardware with 85% sparsity trains 3-4x faster, and generates tokens at 2-3x higher throughput using AI 100 Ultra inference.
- Speculative decoding: This advanced AI technology combines the high throughput of small-scale LLMs with the accuracy of large-scale LLMs. The Cerebras software platform can automatically train and generate both models, which are seamlessly ingested via the Qualcomm® AI Stack, a product of Qualcomm Technologies. The resulting model can output tokens at up to twice the throughput with uncompromised accuracy.
- Efficient MX6 inference: AI 100 Ultra supports MX6, the industry standard micro-exponential format, for high-precision inference with half the memory footprint and twice the throughput of FP16.
- Cerebras NAS service: When you search for targeted use cases using Network Architecture Search, the Cerebras platform can deliver models optimized for Qualcomm AI architecture, delivering up to 2x more inference performance.
By combining these and other advanced technologies, Cerebras and Qualcomm Technologies’ solutions are designed to deliver significant performance improvements at model release time, so they can be deployed anywhere on Qualcomm cloud instances. An inference-compatible model is obtained.
Recommended AI news: Jobiqo resumes TheJobNetwork Job Board partnership program with Veritone AI
“By combining Cerebras’ AI training solutions with AI 100 Ultra, we are able to provide our customers with industry-leading AI inference performance/TCO$ and optimized, ready-to-deploy AI models, with faster deployment times. and time to RoI,” he said. Rashid Attar Vice President, Cloud Computing, Qualcomm Technologies, Inc.
Training with Cerebras enables customers to unlock significant performance and cost benefits with inference-aware training. Models trained on Cerebras are optimized to run inference on AI 100 Ultra, leading to frictionless deployment.
Kim Branson, Senior Vice President and Global Head of AI/ML at GlaxoSmithKline “It has become a matter of concern.” “Techniques like sparsity and speculative decoding that speed up inference while reducing operational costs are key, allowing everyone to integrate and experiment with AI.”
Recommended AI news: BIOHM Health partners with Virginia Tech to develop AI-powered microbiome products
[To share your insights with us as part of editorial or sponsored content, please write to [email protected]]