Swap OpenAI with any open-source model

Swap OpenAI with any open-source model

148

11m

by smarvin2

smarvin2

11m

Hey guys,

We built an open-source AI SDK (Python & JavaScript) that provides a drop-in replacement for OpenAI’s chat completion endpoint. We'd love to know what you think so we can make switching as easy as possible and get more folks on open-source.

You can swap in almost any open-source model on Huggingface. HuggingFaceH4/zephyr-7b-beta, Gryphe/MythoMax-L2-13b, teknium/OpenHermes-2.5-Mistral-7B and more.

If you haven't seen us here before, we're PostgresML, an open-source MLOps platfom built on Postgres. We bring ML to the database rather than the other way around. We're incredibly passionate about keeping AI truly open. So we needed a way for our customers to easily escape OpenAI's clutches. Give it a go and let us know if we're missing any models, or what else would help you switch.

You can check out the blog post for the details, but here's the git diff:

- from openai import OpenAI + from pgml import OpenSource AI

- client = OpenAI(openai_api_key=openai_api_key) + client = OpenSourceAI(database_url=database_url)

messages = [{"role": "system", "content" : "You are a helpful assistant"}, {"role": "user", "content" : "What is 1+1?"}]

- response = client.chat.completions.create(..) + response = client.chat_completions_create(..)

- return response.choices[0].message.content + return response["choices"][0]["message"]["content"]

bravura

11m

"We bring ML to the database rather than the other way around."

Can you explain to me what that means? Maybe I've been having this problem recently.

lmeyerov

11m

We've been having to build this inhouse for louie.ai to be more engineer-grade as all the libs we evaluated weren't flexible enough for common scenarios, so maybe a useful evaluation laundry list for makers & users here:

* 12-factor: support setting via env vars, config files, api, etc

* decoupling auth config from model config

* supporting registration of multiple auths & models, not just one, including via 12-factor

* streamlining ability of llm apps to negotitate which llm models

* inferring & validating matchup of what your model provider gives and your app configures & requests, ideally at config or load time, and in a testable way

* transparent native support for each provider, as each provider & api has annoying deviations & useful features that end up being relevant. Ex: even openai vs azure openai has differences like the notion of 'deployments' and around rate limits that should be handled but also exposed

* observability: introspection hooks, including configuration for opentelemetry metrics, telemetry, & logs, including tenant/user dictionaries

Without that kind of stuff, a third-party dependency is more annoying than useful for 'serious' implementations, b/c we ended up fighting the library vs using

(And we'd be happy to OSS etc if relevant.. such a bear!)

andy99

11m

> weren't flexible enough for common scenarios

Does openAI support that stuff, or is that a more general MLOps (or whatever you'd call it) list of functionality you need? The API call aspect seems like a pretty easy part that's not a barrier to switching models. It's whatever customizations for a given application that I'd expect to be more work. But re your text I quoted, I'm always wary of "one line of code" solutions that are harder to customize than it would be to just write the the thing.

lmeyerov

11m

openai and langchain natively support all of that or expose the hooks to do it

all that stuff is what we needed to do to support multiple providers vs hard-coding a single one. none of it is specific to our codebase. if we were going to take on a third-party dependency, we'd need it to be serious about this kind of thing, else we have an external dependency to work around on our critical path

Edit: As an example, if the switching layer doesn't implement model negotiation and doesn't expose key model details (which vary by model provider service, and often require REST calls to introspect), we can't add model negotiation / retries / etc on top, and would have to edit the innards to enable that, which opens up all sorts of questions

numlocked

11m

What is “model negotiation” in this context?

lmeyerov

11m

Ex: An agent pipeline is asked to generate some code, triggers a code agent, and azure openai is hooked up with both primary and fallback service regions + a few deployments of 3.5 turbo, 4, etc. Which model gets used first, second, ...? Why? And for the model abstraction layer, what does it need to expose vs hide?

swyx

11m

very relevant. OSS please!

zby

11m

Another abstraction layer library is: https://github.com/BerriAI/litellm

For me the killer feature of a library like this would be if it implemented function calling. Even if it was for a very restricted grammar - like the traditional ReAct prompt:

  Solve a question answering task with interleaving Thought, Action, Observation   usteps. Thought can reason about the current situation, and Action can be three types: 
  (1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
  (2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
  (3) Finish[answer], which returns the answer and finishes the task.
  After each observation, provide the next Thought and next Action. Here are some examples:

One parameter functions would be enough for the start (but maybe something that would work with multi-line strings - this line oriented grammar is kind of restricted).

kcorbitt

11m

The problem is that most non-OpenAI models haven't actually been fine-tuned with function calling in mind, and getting a model to output function-calling-like syntax without having been trained on it is quite unreliable. There are a few alternatives that have been (OpenHermes 2.5 has some function calling in its dataset and does a decent job with it, and the latest Claude does as well), but for now it just doesn't work great.

That said, it's not that hard to fine-tune a model to understand function calling -- we do that as part of all of our OpenPipe fine tunes, and you can see the serialization method we use here: https://github.com/OpenPipe/OpenPipe/blob/main/app/src/model...

It isn't particularly difficult, and I'd expect more general-purpose fine-tunes will start doing something similar as they get more mature!

stevenhuang

11m

Checkout this recently released function calling model, apparently it beats gpt4 by 7% https://huggingface.co/Nexusflow/NexusRaven-V2-13B

zby

11m

ReAct was used before OpenAI introduced functions. It might be less reliable - but you can add error checking and retry.

iamjackg

11m

LocalAI can do that: https://github.com/mudler/LocalAI

https://localai.io/features/openai-functions/

deepsquirrelnet

11m

One challenge is that specific APIs are already being made that aren’t compatible between models. Right now, I’m referring to function calling and tools. I suspect that these are likely to create lock in as vendors move away from general completion APIs.

dragonwriter

11m

Sure, you need a local implementation of functional calling, etc., to implement those APIs, but once you do, there is no reason that you can't have a model-agnostic API implementation.

Obviously, if there are things that depend on specific model capabilities (e.g., image interpretation or other multimodal use), you'll need a model that supports that, but there are multimodel open models. You'll need a way to recognize and error on model/capability combos that are invalid though.

Comment was deleted :(

deepsquirrelnet

11m

It’s not that you can’t add it, but that it seems the reason openai added “tools” after function calling is to make it generic for more unique/proprietary features.

Unless everybody is going to follow OpenAIs lead for every feature they implement, general compatibility is going to be challenging. Basically, the only reason this is possible now is due to the fact that there isn’t yet a lot of diversity in the way models are being used.

danenania

11m

Are any OSS models producing JSON with specific keys as reliably as OpenAI models do with function calling? That would be needed to implement function calling.

dragonwriter

11m

> Are any OSS models producing JSON with specific keys as reliably as OpenAI models do with function calling?

Yes, OSS models -- from my understanding, this is true in theory of any model over which you have fine-grained control of inference, which you do if the inference code isn't a black box, but this has been generally implemented in the frameworks for running OSS models -- can be constrained to generate according to a grammar of arbitrary specificity.

danenania

11m

Cool, that makes sense. Are there any now doing this off-the-shelf where a schema can be passed in and the response will be valid JSON that matches the schema?

dragonwriter

11m

Tooling for this was announced on HN 3 months ago, it now does a lot more: https://news.ycombinator.com/item?id=37125118

Here is an plugin for one of the popular tools for running multiple local models (it is a tool that presents an OpenAI-style API for downstream consumption, as well as presenting its own Web UI for direct interaction with the models.): https://github.com/hallucinate-games/oobabooga-jsonformer-pl...

armchairhacker

11m

Tools and generally integrating to switch from OpenAI is good even if that never ends up happening, because that puts pressure on them to not make their service worse.

danenania

11m

Yeah and it's also great for tools that want to be model-agnostic. I'm building something that is pretty tightly coupled to OpenAI (to function calling, more specifically), but I'd be happy to let users bring their own models if I could do that without too much extra work.

abhpro

11m

This is cool, as a web developer I'm wondering if I should use it to build a chat interface that lets a user instantly swap models, maybe even mid conversation, and use their own API keys. Not sure if something like that already exists.

dragonwriter

11m

oobabooga/text-generation-webui already supports this for local models (but doesn't support other hosted models.)

SillyTavern technically supports this, though its not a central use case, so the UI isn't particular convenient for mid-conversation change of models, but it works.

But I don't know of anything where it is a central feature and convenient (and if it was, you could also implement things like parallel requests and choosing which response to keep.)

smarvin2

11m

I'm not sure if something like that already exists, but you could absolutely build it with this! Feel free to jump into our discord with any questions you have

__loam

11m

When the interface is so simple for these things, it does feel like you should just be able to plug and play with respect to model choice. Excited to see these kinds of projects move forward so we can desilo ai from corporate interests.

skhm

11m

are there any plans to support providing functions for local models that support it? i.e. functionary-7b? there is some support for this in the server provided by llama-cpp-python which also follows the openai schema.

climatekid

11m

I love the concept, but I think you need some more documentation. I tried using this in colab and get

PanicException: Error getting DATABASE_POOLS for writing: PoisonError { .. }

I can't find anything on google or your docs on DATABASE_POOLS

smarvin2

11m

Thanks for checking it out! Sorry you had some issues I'll look into it.

We are still actively building out our documentation. You can find the current docs here: https://postgresml.org/docs/introduction/machine-learning/sd...

_heimdall

11m

Finally, a team that and understands the difference between ML and AI.

Thank you.

carom

11m

Is this just for local inference? We have a server that the team hits via an OpenAI-compatible APIs but serving local models.

montanalow

11m

This is an SDK built to interact with PostgresML, which provides ML & AI _inside_ a Postgres database. Clients in this case don't perform inference, rather the server does. You could run the open source server locally, or connect to one running in the cloud.

leeeeeepw

11m

[dead]

Crafted by Rajat

Source Code

hckrnws

Swap OpenAI with any open-source model