- Unified Chat API
- Low-latency resilient routing & fallback across 4 strategies: the least latency, round-robin, weighted round-robin, priority-based
- Model-specific Prompting
- and much more β¨
Glide has officially reached the Private Preview maturity πππ
As a part of this initial scope, we had to setup a bunch of common things to make it roll. As for the core functionality, we have brought up:
- The routing functionality with four types of routing strategies (including a tricky one like the least latency routing)
- The first-class adaptive resiliency & fallbacking across all routing strategies
- Unified Chat API that supports popular model providers like OpenAI, Azure OpenAI (on-prem models), Cohere, OctoML, Anthropic
- The ability to have model-specific prompts
- Installation via Docker & Homebrew
The most exciting things are ahead of us, so looking forward to get more cool stuff in scope of Public Preview π
Let's equip GenAI revolution with a resilient open infrastructure :raised_hands:
π¦ Release: https://github.com/EinStack/glid...
π Docs: https://glide.einstack.ai/
πΊ Demo: https://github.com/EinStack/glid...
πΊοΈ Roadmap: https://github.com/EinStack/glid...
@shruti_tripathi3 Thank you Shruti! We are thrilled to keep brining the remaining functionality for Glide and make it mulit-modal in terms of model support π
@avalonxt Thanks Alex! Appreciate your comment π If you have a use case for Glide, want to get started with it and need a help, just let us know in Discord/Github π
Glide sounds like a game-changer for GenAI apps! I'm really intrigued by the low-latency resilient routing and fallback strategies. Can you share any examples of how these strategies improve performance in real-world scenarios? Also, have you considered integrating with popular chat platforms like Slack or Discord to make it even more convenient for developers? Great job on this innovative solution!
@andrew_leader Appreciate your feedback Andrew!
First off, Glide is written in Golang which is a compilable language that is used to implement low-latency high-throughput services like databases, so that makes sure Glide itself does not slow your applications down (which could have been a case otherwise).
Then, Glide provides a sophisticated the least-latency routing strategy that collects information about performance of chat API from external providers that you have picked to use (e.g. OpenAI + Cohere, for example) and tries to pick the model that has the least latency per generated token for your requests.
Last, but not the least, external provides fail frequently. Glide is built and designed with that fact in mind. So every strategy we provide do fallbacks seamlessly and automatically (of course, if you set more than one model to use in Glide configs). However, Glide doesn't simply retry on failure, we are trying to leverage fallback models right on the main model failure, so we don't wasting time retrying 5 times with backoff.
Glide also does adaptive model health checking. So we identify that model is unavailable as pulling it out of the pool for some time, so the upcoming requests don't need to discover this fact over and over again (which further improves latency).
To sum up, to improve latency Glide does a lot of small intelligent decisions along the request workflow that in overall will lead to a much better UX and quicker responses even in the toughest situations like model provider outages.
Glide