hckrnws
Hey guys,
We built an open-source AI SDK (Python & JavaScript) that provides a drop-in replacement for OpenAI’s chat completion endpoint. We'd love to know what you think so we can make switching as easy as possible and get more folks on open-source.
You can swap in almost any open-source model on Huggingface. HuggingFaceH4/zephyr-7b-beta, Gryphe/MythoMax-L2-13b, teknium/OpenHermes-2.5-Mistral-7B and more.
If you haven't seen us here before, we're PostgresML, an open-source MLOps platfom built on Postgres. We bring ML to the database rather than the other way around. We're incredibly passionate about keeping AI truly open. So we needed a way for our customers to easily escape OpenAI's clutches. Give it a go and let us know if we're missing any models, or what else would help you switch.
You can check out the blog post for the details, but here's the git diff:
- from openai import OpenAI + from pgml import OpenSource AI
- client = OpenAI(openai_api_key=openai_api_key) + client = OpenSourceAI(database_url=database_url)
messages = [{"role": "system", "content" : "You are a helpful assistant"}, {"role": "user", "content" : "What is 1+1?"}]
- response = client.chat.completions.create(..) + response = client.chat_completions_create(..)
- return response.choices[0].message.content + return response["choices"][0]["message"]["content"]
"We bring ML to the database rather than the other way around."
Can you explain to me what that means? Maybe I've been having this problem recently.
We've been having to build this inhouse for louie.ai to be more engineer-grade as all the libs we evaluated weren't flexible enough for common scenarios, so maybe a useful evaluation laundry list for makers & users here:
* 12-factor: support setting via env vars, config files, api, etc
* decoupling auth config from model config
* supporting registration of multiple auths & models, not just one, including via 12-factor
* streamlining ability of llm apps to negotitate which llm models
* inferring & validating matchup of what your model provider gives and your app configures & requests, ideally at config or load time, and in a testable way
* transparent native support for each provider, as each provider & api has annoying deviations & useful features that end up being relevant. Ex: even openai vs azure openai has differences like the notion of 'deployments' and around rate limits that should be handled but also exposed
* observability: introspection hooks, including configuration for opentelemetry metrics, telemetry, & logs, including tenant/user dictionaries
Without that kind of stuff, a third-party dependency is more annoying than useful for 'serious' implementations, b/c we ended up fighting the library vs using
(And we'd be happy to OSS etc if relevant.. such a bear!)
> weren't flexible enough for common scenarios
Does openAI support that stuff, or is that a more general MLOps (or whatever you'd call it) list of functionality you need? The API call aspect seems like a pretty easy part that's not a barrier to switching models. It's whatever customizations for a given application that I'd expect to be more work. But re your text I quoted, I'm always wary of "one line of code" solutions that are harder to customize than it would be to just write the the thing.
openai and langchain natively support all of that or expose the hooks to do it
all that stuff is what we needed to do to support multiple providers vs hard-coding a single one. none of it is specific to our codebase. if we were going to take on a third-party dependency, we'd need it to be serious about this kind of thing, else we have an external dependency to work around on our critical path
Edit: As an example, if the switching layer doesn't implement model negotiation and doesn't expose key model details (which vary by model provider service, and often require REST calls to introspect), we can't add model negotiation / retries / etc on top, and would have to edit the innards to enable that, which opens up all sorts of questions
What is “model negotiation” in this context?
Ex: An agent pipeline is asked to generate some code, triggers a code agent, and azure openai is hooked up with both primary and fallback service regions + a few deployments of 3.5 turbo, 4, etc. Which model gets used first, second, ...? Why? And for the model abstraction layer, what does it need to expose vs hide?
very relevant. OSS please!
Another abstraction layer library is: https://github.com/BerriAI/litellm
For me the killer feature of a library like this would be if it implemented function calling. Even if it was for a very restricted grammar - like the traditional ReAct prompt:
Solve a question answering task with interleaving Thought, Action, Observation usteps. Thought can reason about the current situation, and Action can be three types:
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
(3) Finish[answer], which returns the answer and finishes the task.
After each observation, provide the next Thought and next Action. Here are some examples:
One parameter functions would be enough for the start (but maybe something that would work with multi-line strings - this line oriented grammar is kind of restricted).The problem is that most non-OpenAI models haven't actually been fine-tuned with function calling in mind, and getting a model to output function-calling-like syntax without having been trained on it is quite unreliable. There are a few alternatives that have been (OpenHermes 2.5 has some function calling in its dataset and does a decent job with it, and the latest Claude does as well), but for now it just doesn't work great.
That said, it's not that hard to fine-tune a model to understand function calling -- we do that as part of all of our OpenPipe fine tunes, and you can see the serialization method we use here: https://github.com/OpenPipe/OpenPipe/blob/main/app/src/model...
It isn't particularly difficult, and I'd expect more general-purpose fine-tunes will start doing something similar as they get more mature!
Checkout this recently released function calling model, apparently it beats gpt4 by 7% https://huggingface.co/Nexusflow/NexusRaven-V2-13B
ReAct was used before OpenAI introduced functions. It might be less reliable - but you can add error checking and retry.
LocalAI can do that: https://github.com/mudler/LocalAI
One challenge is that specific APIs are already being made that aren’t compatible between models. Right now, I’m referring to function calling and tools. I suspect that these are likely to create lock in as vendors move away from general completion APIs.
Sure, you need a local implementation of functional calling, etc., to implement those APIs, but once you do, there is no reason that you can't have a model-agnostic API implementation.
Obviously, if there are things that depend on specific model capabilities (e.g., image interpretation or other multimodal use), you'll need a model that supports that, but there are multimodel open models. You'll need a way to recognize and error on model/capability combos that are invalid though.
Comment was deleted :(
It’s not that you can’t add it, but that it seems the reason openai added “tools” after function calling is to make it generic for more unique/proprietary features.
Unless everybody is going to follow OpenAIs lead for every feature they implement, general compatibility is going to be challenging. Basically, the only reason this is possible now is due to the fact that there isn’t yet a lot of diversity in the way models are being used.
Are any OSS models producing JSON with specific keys as reliably as OpenAI models do with function calling? That would be needed to implement function calling.
> Are any OSS models producing JSON with specific keys as reliably as OpenAI models do with function calling?
Yes, OSS models -- from my understanding, this is true in theory of any model over which you have fine-grained control of inference, which you do if the inference code isn't a black box, but this has been generally implemented in the frameworks for running OSS models -- can be constrained to generate according to a grammar of arbitrary specificity.
Cool, that makes sense. Are there any now doing this off-the-shelf where a schema can be passed in and the response will be valid JSON that matches the schema?
Tooling for this was announced on HN 3 months ago, it now does a lot more: https://news.ycombinator.com/item?id=37125118
Here is an plugin for one of the popular tools for running multiple local models (it is a tool that presents an OpenAI-style API for downstream consumption, as well as presenting its own Web UI for direct interaction with the models.): https://github.com/hallucinate-games/oobabooga-jsonformer-pl...
Tools and generally integrating to switch from OpenAI is good even if that never ends up happening, because that puts pressure on them to not make their service worse.
Yeah and it's also great for tools that want to be model-agnostic. I'm building something that is pretty tightly coupled to OpenAI (to function calling, more specifically), but I'd be happy to let users bring their own models if I could do that without too much extra work.
This is cool, as a web developer I'm wondering if I should use it to build a chat interface that lets a user instantly swap models, maybe even mid conversation, and use their own API keys. Not sure if something like that already exists.
oobabooga/text-generation-webui already supports this for local models (but doesn't support other hosted models.)
SillyTavern technically supports this, though its not a central use case, so the UI isn't particular convenient for mid-conversation change of models, but it works.
But I don't know of anything where it is a central feature and convenient (and if it was, you could also implement things like parallel requests and choosing which response to keep.)
I'm not sure if something like that already exists, but you could absolutely build it with this! Feel free to jump into our discord with any questions you have
When the interface is so simple for these things, it does feel like you should just be able to plug and play with respect to model choice. Excited to see these kinds of projects move forward so we can desilo ai from corporate interests.
are there any plans to support providing functions for local models that support it? i.e. functionary-7b? there is some support for this in the server provided by llama-cpp-python which also follows the openai schema.
I love the concept, but I think you need some more documentation. I tried using this in colab and get
PanicException: Error getting DATABASE_POOLS for writing: PoisonError { .. }
I can't find anything on google or your docs on DATABASE_POOLS
Thanks for checking it out! Sorry you had some issues I'll look into it.
We are still actively building out our documentation. You can find the current docs here: https://postgresml.org/docs/introduction/machine-learning/sd...
Finally, a team that and understands the difference between ML and AI.
Thank you.
Is this just for local inference? We have a server that the team hits via an OpenAI-compatible APIs but serving local models.
This is an SDK built to interact with PostgresML, which provides ML & AI _inside_ a Postgres database. Clients in this case don't perform inference, rather the server does. You could run the open source server locally, or connect to one running in the cloud.
[dead]
Crafted by Rajat
Source Code