This page describes the pricing, usage limits, and account restrictions that apply to Serverless Inference. Use this information to plan your usage and avoid unexpected charges or interruptions. Review it before you send production traffic, especially if you manage billing or operate at higher concurrency.Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-docs-2632.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
If you have questions about pricing, limits, or your account that this page doesn’t answer, contact Support to discuss your requirements.
Pricing
For detailed model pricing information, visit Serverless Inference pricing.Purchase more credits
Serverless Inference credits come with Free, Pro, and Academic plans for a limited time. Enterprise availability may vary. When credits run out:- Free accounts must activate pay-as-you-go inference on the Billing tab, or upgrade to a paid plan to continue using Serverless Inference. Activate pay-as-you-go or upgrade.
- W&B bills Pro plan users for overages monthly, based on model-specific pricing.
- Enterprise accounts should contact their account executive.
Account tiers and default usage caps
Each account tier has a default spending cap to help manage costs and prevent unexpected charges. W&B requires prepayment for paid Inference access. The following table shows the default cap for each tier and how to request a change. If you need to change your cap, contact your account executive or Support to adjust your limit.| Account tier | Default cap | How to change limit |
|---|---|---|
| Free | $100/month | Upgrade to Pro or Enterprise |
| Pro | $6,000/month | Contact your account executive or support for manual review |
| Enterprise | $700,000/year | Contact your account executive or support for manual review |
Concurrency limits
Concurrency limits protect service quality by capping how many requests a project or user can have in flight at once. If you exceed the concurrency limit, the API returns a429 Concurrency limit reached for requests response. To fix this error, reduce the number of concurrent requests.
W&B applies concurrency limits per W&B project and per user. For example, if you have three projects in a team, each project has its own concurrency limit quota.
If your use case requires increased limits, contact Support to discuss your requirements.
Geographic restrictions
The Inference service is only available from supported geographic locations. For more information, see the Terms of Service.Next steps
Now that you understand pricing, caps, and concurrency limits, continue to set up your account:- Review the prerequisites before you start.
- See available models and their specific pricing.