In search for a new self-hosted LLM

Tanka@lemmy.ml · edit-2 22 days ago

In search for a new self-hosted LLM

Jozzo@lemmy.world · 22 days ago

I find Qwen3.5 is the best at toolcalling and agent use, otherwise Gemma4 is a very solid all-rounder and it should be the first you try. Tbh gpt-oss is still good to this day, are you running into any problems w it?

Tanka@lemmy.ml · 22 days ago

No problems per se. I just thought that I had not checked for an update for a longer time.

jacksilver@lemmy.world · 21 days ago

You’re probably aware, but updating the model periodically is probably a good idea just because things do change overtime.

A model from two years ago was trained on data from at least two years ago. Meaning any technology, code, world event changes wouldn’t be reflected in the model.

ejs@piefed.social · 22 days ago

I suggest looking at llm arena leaderboards filtered by open weight models. It offers benchmarks at a very complete and statistically detailed level for models, and usually is quite up to date when new models come out. The new Gemma that just came out might be the best for 1x GPU, and if you have a bunch of vram check out the larger Chinese models

Gumus@lemmy.dbzer0.com · 22 days ago

I’d say Qwen 3.5 and Gemma 4 beat GPT OSS in every aspect.

iceberg314@slrpnk.net · 22 days ago

I also recommend gemma4 or qwen3.5. Both super solid in my experience for how lightweight they are

NoFun4You@lemmy.world · 21 days ago

Still can’t get my gemma to give me complete unbuggy components

iceberg314@slrpnk.net · 19 days ago

I guess I have been using gemma4 fro more role playing games. Qwen3.5 seems to be better coder actually

tal@lemmy.today · 22 days ago

I’m not on there, but you might have more luck in !localllama@sh.itjust.works

You might also want to list the hardware that you plan to use, since that’ll constrain what you can reasonably run.

cron@feddit.org · 22 days ago

The latest open weights model from google might be a good fit for you. The 26B model works pretty well on my machine, though the performance isn’t great (6 tokens per second, CPU only).

zorflieg@lemmy.world · 21 days ago

Gemma4 e4b quant8 will fit in 12gb and is good

jaschen306@sh.itjust.works · 22 days ago

I’m running gemma4 26b MOE for most of my agent calls. I use glm5:cloud for my development agent because 26b struggles when the context windows gets too big.

Matt@lemmy.ml · 21 days ago

Qwen is pretty good. Also try LFM models.

carzian@lemmy.ml · 22 days ago

I’m in the same boat. You’ll get better responses if you post your machine specs. I

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

deleted by creator

sompreno@lemmy.zip · 22 days ago

What are your computer specs?

Tanka@lemmy.ml · 22 days ago

I did just update my post with the specs. Maybe it takes a while to federate?

sompreno@lemmy.zip · 22 days ago

I must have not refreshed ignore my comment

Evotech@lemmy.world · 21 days ago

I’d use some Chinese model. Qwen3.5 Claude 4.6 distilled ablitirated is what I use

James R Kirk@startrek.website · 22 days ago

Just curious, what does “some automation” entail? I thought LLMs could only work with text, like summarize documents and that sort of thing.

Jozzo@lemmy.world · 21 days ago

It’s done by software using an LLM, not just a raw LLM. They do only work with text, but you can get it to output the text “get_weather(mylocation)”, and instead of just outputting that directly to the user, the software running on top of the LLM runs a " get_weather" function that calls some weather API. The result of that function is then output to the user.

Any time you see an “AI” taking “actions”, this is what happens in the background for every action.

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

deleted by creator

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

deleted by creator

James R Kirk@startrek.website · 22 days ago

That’s cool, it just… does those things? How does it connect to those apps? I can’t even get Gemini to set a reminder and that’s on a Google device.

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

deleted by creator

James R Kirk@startrek.website · 21 days ago

That was actually super helpful, thank you.

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

deleted by creator

a1studmuffin@aussie.zone · 22 days ago

These days they can also chain together tools, keep a working memory etc. Look at Claude Code if you’re curious. It’s come very far very quickly in the last 12 months.

James R Kirk@startrek.website · 22 days ago

OP said coding AND “some automation”, what is being automated?

nutbutter@discuss.tchncs.de · 22 days ago

Have you tried the new gemma4 models? The e4b fits in the 12gb memory and is pretty good. Or you can use 31b too, if you’re okay with offloading to CPU.