hckrnws
It is great to see that they even trained it on open datasets. More AI models need to do this, especially if they market themselves as open.
It appears that they reused a lot of the data preparation provided by the AllenAI team:
https://github.com/allenai/OLMoE
Why? A great part of the important material for knowledge and mandatory intellectual exercise is not "open" (possibly "accessible", yet under copy rights).
It’s difficult to argue that a model is truly “open” if the creator won’t even tell you what they trained it on. Even as companies like Meta argue that training on copyrighted material is OK, they still don’t want to openly admit that that’s what they did - and providing their training data, which would likely be a giant list of books from LibGen, would give the game away.
Yeah, I'm not fundamentally opposed to them training on copyrighted material, though I expect at some point to be thoroughly frustrated while trying to access such a dataset.
I do take personal affront when they dump models without specifying their datasets. They're just polluting the information space at that point.
This is a good place to interject a point about copyright training that people overlook, conflate, or don't realize.
There are two types of "training on copyright material"
1.) Training on material that is copyrighted and behind a paywall, but you circumvent the paywall. This is unambiguously illegal, as the material is pay to view.
2.)Training on material that is copyrighted, but free for anyone to consume. This is ambiguous in legality, but right now seems to be leaning in AI training favor - as long as the models don't verbatim share the material.
There is also another point about copyright material being ad-supported, and obviously the AI doesn't view/care about ads. There is a decent case to be made that this is illegal, but then is ad blocking actually theft?
> It’s difficult to argue that a model is truly “open” if the creator won’t even tell you what they trained it on
But the original poster said being glad «they even trained it on open datasets», not glad "they told you what they trained it on".
I read "open dataset" as "training data URLs are in a CSV somewhere," but I see how you could've read it differently.
they stated it in the original llama paper
Right, the original LLaMA paper. LLaMA 2 and 3 are significantly more capable models trained on orders of magnitude more data, and those papers notably do not say where the data comes from. The LLaMA 3 paper helpfully mentions that "Much of the data we utilize is obtained from the web," so I guess that's better than nothing!
LLaMA 2: https://arxiv.org/pdf/2307.09288
LLaMA 3: https://arxiv.org/pdf/2407.21783
The 2/27/23 paper? Where?
That's not AMD's fault. They are abiding by the law. If IP holders licensed or shared their data, AMD could train on them.
It's important to note that AMD is an IP company, Meta is a data company. AMD would be shooting itself in the foot if it normalized flagrantly violating the IP of others. Meta doesn't care about IP, they just want to sell ads and data.
You must have misunderstood the post: the OP wrote «great to see that they even trained it on open datasets», to which I replied "why should that be «great»: machines with sought "intellectual" abilities must be trained on a larger corpus than just open".
The point is not about any «AMD's fault». It is about "why would it be great to have LLMs trained on limited (open) data".
This is important for AMD presumably because it demonstrates that ML can be practically done on their hardware.
Most likely a part of strategy to dislodge Nvidia as the leading AI chip supplier, and AMD is in the position to try.
How well it will work? Well I don’t know enough details about these companies to tell.
One potential outcome, not one I politically support but expect, is that regulatory interests in the US are captured by AI tech companies and this becomes an import export issue similar to "encryption munitions" we have seen before.
It could work about as well as Svelete or Angular worked in convincing React developers away. It doesn’t really work. It’s kind of too late, a whole generation was trained on it.
Pre training joke for ya.
Interesting to see AMD entering the small LLM space where practical compute constraints actually matter. These 3B models represent the pragmatic side of AI - not everything needs to be a 100B+ parameter behemoth burning through datacenter power.
The real test will be inference latency and throughput on consumer hardware, not just the cherry-picked benchmark graphs they've shared. Anyone run comparative evals against Llama 3.2 3B or Gemma-2 on identical hardware yet?
The fully open approach (weights, hyperparams, training code) is refreshing compared to the "open weights only" trend we've been seeing. This is how you actually build a community around your tech stack.
Edge deployment is where this gets interesting - having truly open small models running locally on laptops/phones/embedded without phoning home feels like the computing paradigm we should have been pushing for all along instead of the current API-gated centralization.
"Fully Open"
> The Instella-3B models are licensed for academic and research purposes under a ReasearchRAIL license.
Huge mistake.
This would have been an amazing PR win for AMD if they just gave it away.
Open models attract ecosystems. It'd be a fantastic sales channel for their desktop GPU hardware if they can also build increasing support for ML with their cards and drivers.
I suspect the release was for marketing reasons and not for winning developer goodwill.
Developer goodwill can be very powerful marketing.
Not to mention it’s a 3B model.
look what happened to NVIDIA because they were able to win goodwill of developers. Lots of companies use PyTorch with NVIDIA
Comment was deleted :(
The license, for anybody else wondering what on earth a "ResearchRAIL" is: https://raw.githubusercontent.com/AMD-AIG-AIMA/Instella/refs...
It comes off a bit... dubious. The prohibited uses boil down to "don't use Instella to be a jerk or make porn" which I expect many people will do anyways and simply not disclose their use of Instella (which, of course, is prohibited).
This is the first RAIL license I've read, but if this is the standard, this reads less like a license and more like an unenforceable list of requests.
Edit: if I were to make an analogy, this license is a bit like if curl came with a clause that said "no web scraping or porn".
What can we do with 3B LLM models, I haven't tried in a couple of months but when I did the results weren't great.
I really want to combine them with RAG, plus doing some agentic stuff on my home server which runs a number of services, some as Home Assistant. So I want it to read all the sensors every hour, and then message me if anything occurs out of the ordinary (with ordinary equaling the historical data provided by the RAG).
No idea if this will work or not, but it's how I learn to use new technologies.
You could of course build this using traditional software development.
On a separate note it's sad to see Phoronix so poorly represented. More than half my screen is fixed on page ads and popover. This is 1999 levels of dumb.
I've disabled UBO to see what you're talking about and... Wow. Browsing without ad blockers is just a deal breaker for me now, on a workstation or mobile.
(I block ads on Android using Firefox/UBO, and on the iPad using Brave).
I use Brave, but on mobile I enjoy the Harmony app and it has a simplistic ad blocker that probably can't work very well because Google.
Curious. I opened it in chromium to see what it looks like without the adblocker, I declined all the tracking it asked for (violating the GDPR by the way), and I see no ads of any sort.
Still way easier to have an adblock/consent-o-matic disable it for you, but it seems like it only shows you ads if you actually agreed to them.
I must have agreed to them in the past, I don't use consent-o-matic and manually accept/decline in a chaotic, ad-hoc manner. Just installed it, thanks.
Meh. Custom architecture with a non-commercial, research-only license. Qwen-2.5 3B also has a research-only license, but is way ahead of this model on almost all of the benchmarks.
Why not being able to sell what others did is so important?
In my eyes, having a completely novel and reproducible model from end to end, including its dataset is great news.
Because they are heavily abusing the definition of “open source”. Though that ship has also sailed when it was decided that a model is “open source” when you get the final weights but not the exact training scripts and data
> Though that ship has also sailed when it was decided that a model is “open source” ...
So we can't egg AMD just because they did something better in some cases and worse in others?
They released a model from end to end and shown that they can compete now. Who cares about the business applications. That can come later.
The other actors abuse the definition of Open Source, too. Not only in AI, even. So, we shall denounce others with the same force, but we don't, because of the broken window theorem.
AMD is already an underdog, so whatever they do has no merit, and worthy of booing. Do masses boo NVIDIA for their abuse of the ecosystem? Did Intel got the same treatment when they were choking everyone else unethically?
Of course not.. Because they are/were the incumbents. They had no broken windows.
I’m not responsible for what others denounce other companies for doing. It has nothing to do with me.
I don’t care what the masses are doing.
When other companies abuse the phrase “open source” they should be called out on it.
And I’m complaining about AMD’s use of it here. Just because they’re an underdog doesn’t mean they get a pass.
I didn't say AMD gets a pass here. What I said is "if the ship has sailed, we have no right to single out AMD at this point, and if we're going to change this, we need to target everyone at once".
But they trained on open datasets…
They did, unlike most others.
But at the end of the day their released model still has restrictions on what use cases you can use the model in. If this was a piece of source code rather than an AI model it would not count as open source. And just because it’s AI model I do not think it should have a different standard.
Not GP, but in my opinion the reason why the license restriction is so important is that otherwise very few people will be able to try it. Huggingface or other commercial providers cannot put it up online (even if they didnt charge for the use, these entities might benefit from publicity so they might need to negotiate with AMD). I am not sure if this model will make it to the lmsys leaderboard either unless AMD helps provide an API endpoint and allows them to use it. If you install it in an HPC center for non-industry researchers you have to trust the new AMD codes (few people will do a serious infosec analysis on them), and you have to make sure you exclude people that might have commercial interests (say a student with a startup). It is not the license only that is slowing things down, but if the license was more general the code would develop more smoothly, and things like vllm might start to support it real soon.
The license doesn't restrict hosting it (and explicitly allows 3rd party access), but might require a license wall to click through. I'm not sure where did you infer that from.
I'm not an AMD employee, so I can't tell about their API access.
People here see student startups, I see tons of non-commercial research networks from where I sit. So the license is not absurd from where I look.
Maybe strict compliance is not as big a threat for academic institutions. A wall ot click-through text does not technically limit liabilities against good lawyers on the other side, so you might need better measures to ensure compliance in a setting where some users might have commercial interests.
To be clear, I am happy AMD is trying to get in the game. I just dont feel this model will gain as much use as it would have had with a permissive license and I hope that AMD will switch to better licenses (like Qwen or Llama models have also done in the past).
I'm trying to be flippant, but I'm genuinely curious. Have you read their license[1]? The terms are really broad and onerous even if one wants to use it for purely non-commercial, academic purposes.
[1]: https://huggingface.co/amd/Instella-3B-Instruct/raw/main/LIC...
> Have you read their license[1]?
Yes.
> The terms are really broad and onerous even if one wants to use it for purely non-commercial, academic purposes.
No. It's just legalese to prevent commercial use, abuse of models and prevention of responsibility. As a person who sits in an academic research center, I see no problems at first blush.
It's just the RAIL Research-use licenses. Both the Research M and Research S license.
https://www.licenses.ai/blog/2023/3/3/ai-pubs-rail-licenses
https://www.licenses.ai/rail-license-generator
The intent of the license is to show the techniques used in the project and to provide the results in a form that they can be used to further development of other projects but not be used themselves.
The TLDR of the license is "here's the model and the sources. use it to help make your own models but don't use our model directly"
> It's just the RAIL Research-use licenses.
No it isn't. It's a ResearchRAIL-MS derived license with lots of additional restrictions piled on.
> The TLDR of the license is "here's the model and the sources. use it to help make your own models but don't use our model directly"
ICYMI, it's a ResearchRAIL-MS derived license and not a ResearchRAIL-M derived license. All the onerous restrictions apply to the model as well as the source code.
ResearchRAIL comes in a handful of flavors. ResearchRAIL-M (model) and ResearchRAIL-S (source) are two of the flavors. If you combine them you get ResearchRAIL-MS (model+source).
You can make an MS license in their license generator by selecting ResearchRAIL and checking both the model and source buttons.
I understand their point in doing this - they want to use it as a sales enabler: they want you to buy GPUs to train their model for yourself, maybe tweak it with some non-free content, and then be able to use it commercially.
The entitlement is big and burly with this one.
I'm curious how it stacks up against the phi4-mini.
It's very far behind phi4-mini in perf. Although, phi4-mini is slightly larger (3.8B parameters vs 3B parameters).
Sounds like hyperbole, but the massive amount of competition in this space is what saves humanity by making AI available to all, and not just oligarchs. Hard to remember when there was such a flood of competition in a deeply productive space
Comment was deleted :(
Crafted by Rajat
Source Code