Pricing for LLMs

Many companies wonder how much does using an LLM cost them. Cost estimation and forecasting should be a key factor in determining whether LLM usage is the right strategy for your use-case.

Token-Based Usage

Closed-Source LLMs like OpenAI’s GPT-4 and Google’s Gemini are priced based on tokens used. Tokens refer to the basic unit of text an LLM understands.

You are charged based on the size of the Input and Output Matrix. The longer your input prompt, the more you are charged. The longer ChatGPT’s output, the more you are charged. To get an accurate sense of how many tokens your use-case is consuming, you can use the OpenAI Tokenizer as a starting point.

Each color in the bottom area represents a new token. For the above sentence, there are 6 input tokens. Input and Output tokens are charged differently, as shown by the diagram below.

This means for every 1 million tokens consumed, you are billed $10. That is ~$0.000001 for each input token.

Price catches up quick

A word of warning is not to take the usage-based pricing lightly, especially if you are deploying this company wide. Pricing adds up pretty quickly, because most applications are conversational. It is not uncommon to prompt OpenAI multiple times before getting the right answer.

Obviously, the main benefit of usage-based pricing is you only pay for what you use. We think the bigger advantage is actually in not having to worry about managing the infrastructure. Companies that don’t have dedicated AI teams can use OpenAI while abstracting away the internal workings.

Open-Source Model Pricing

Open-Source models are different because the owner is responsible for running them. You pay for the server time, regardless of how much the model is used. This means if you use the model one time or a hundred, you will be charged the same. Unlike OpenAI, you will also be responsible for maintaining the infrastructure. So if the number of users dramatically increases, you should have a mechanism to scale up your systems.

To estimate the compute you will be using, here is ApplyGPT’s internal SOP for AWS:

Of course, this only covers the cost of the model. There will be other cloud costs such as servers, public-facing endpoints, databases, etc…

For some, spending +$5K per month to run an LLM might be expensive. For others, it will be pennies. It all depends on your use-case and business situation.

Conclusion

Pricing is a primary lever in developing an LLM strategy. Before any decision is made, there should be a clear ROI on using Gen AI over the existing approach. We go into more detail during our workshops, but hopefully this provides a high-level overview.