Current State of LLMs

To most people, LLMs and ChatGPT have become synonymous. The truth is there is a whole world of open-source models and competitors playing the catch up game. This guide will introduce both sides.

Closed-Source Models

OpenAI didn’t create LLMs, but they certainly brought the best out of them. Currently, GPT-4 stands in the lead. Google’s Gemini and Anthropic’s Claude models have comparable performance. These models are “closed-source” because the company owns them.

Though no one knows how these models work, clearly OpenAI has overcome vast engineering challenges. Rough estimations conclude that state of the art hardware would require 110 years to train GPT-4. OpenAI was able to do it in a couple of years. The approximate training cost of GPT-4 is ~$100M, not including previous training, R&D, labor, and internal costs.

Unlike the company name, OpenAI’s models are not open-source. This leads to the number one drawback of these models: data security. Enterprises are hesitant to use these models out of fear their data is stored or used for training purposes. This has recently been addressed with the Azure OpenAI service, where you can sign a BAA to be HIPAA compliant.

Another drawback is the lack of control. OpenAI continues to make changes in the backend for their existing models. In rare situations, OpenAI has decreased the capabilities of their existing models without notifying users. The result is companies experience a drop in performance without realizing why. For production and customer-facing applications, this is a deal-breaker.

The high-level recommendation we give is if data privacy is not a mission critical concern, OpenAI and closed-source models should be your choice. These companies abstract away the complexities of infrastructure and model-training while providing best-in-class performance.

Open Source Models

Open-Source models are a hot topic. Universities, startups, and companies are releasing new models with the hope to compete with OpenAI. The truth is harsh: these models are nowhere near the level of sophistication or performance as GPT-4. Though in the past year, significant progress has been made.

ApplyGPT builds on-premise LLMs for those concerned with data privacy. Though going this route is not impossible, the development process is longer for often subpar results. The prime benefits of Open-Source Models include:

Data privacy: Company is responsible for securing and storing your data. You can invest as much or as little into securing your infrastructure

Greater customization and control: Certain companies can benefit more from training the model from scratch with targeted data. This approach is not recommended as it is expensive, time-consuming, and fragile.

Lower cost at scale: OpenAI API becomes expensive when used at scale (roughly +100 users). Open-Source models are charged hourly regardless of usage. This means 1 million users cost the same as 0 users. More details in the pricing section.

Llama 3 (from Meta) and Mixtral (from Mistral AI) are regarded as the best open-source models. Though HuggingFace has many different LLMs trained for specific use-cases. There is no science to choosing the best model. It comes down to A/B testing and model architecture, similar to Machine Learning.

Warning on Metrics

We have found mainstream metrics like Glue, MMLU, and TruthQA to be essentially useless in evaluating models. These tests evaluate the LLMs capabilities to retrieve knowledge and perform high-level reasoning. This does not translate to real world performance.

We use the LMSYS Chatbot Arena Leaderboard as a starting point. The website has users rank the responses of different models side by side. A community voting system is used. We prefer this method because it eliminates bias and is ranked by humans for human questions.

Don’t expect OpenAI level performance without the work

To be frank, it is impossible to compete with a firm that has spent hundreds of millions of dollars developing their product when you are spending a few million. Because of how synonymous LLM and OpenAI’s models have become, executives have come to expect GPT level performance from the Open Source models. Though it is possible, it is very difficult and will require considerable resources. If I haven’t belabored the point enough, if you have the option to use OpenAI, always use it!

Conclusion

LLMs are changing at a rate faster than any previous innovation. The model overview here may change in the next few months. As long as you keep the closed-sourced vs open-sourced concepts in mind, the model you pick doesn’t matter as much as the performance.