Sun. Dec 22nd, 2024
Introducing More Enterprise Grade Features For Api Customers

To help organizations scale their AI usage without overstretching their budgets, we’ve added two new ways to reduce the cost of consistent asynchronous workloads.

  • Usage discount on committed throughput: Customers with ongoing levels of tokens per minute (TPM) usage on GPT-4 or GPT-4 Turbo can request access to provisioned throughput and receive a 10-50% reduction based on the size of their commitment. You can get a range of discounts.
  • Cost savings for asynchronous workloads: Customers can use the new Batch API to run non-emergency workloads asynchronously. Batch API requests are priced at 50% off the common price, offer much higher rate limits, and return results within 24 hours. This is ideal for use cases such as model evaluation, offline classification, summarization, and synthetic data generation.


We plan to continue adding new features with a focus on enterprise-grade security, administrative controls, and cost management. To learn more about these releases, please see our API documentation or contact our team to discuss a custom solution for your enterprise.