Skip to main content


This is the Llava:34B LLLM (local large language model) running on my #MacBook Pro describing a #screenshot from #BBC News. This, to me, is as good info as #BeMyAI would provide using #GPT4, so it goes to show that we can do this on-device and have some really meaningful results. Screenshot attached, #AltText contains the description.
Lately, I've taken to using this to describe images instead of using GPT, even if it takes a little longer for results to come back. I consider this to be quite impressive.
in reply to Andre Louis

Wow, I wish my hardware was anywhere near that good. I'm lucky enough to run the 7 and 13B. And, this isn't even the *best* Llava 34B can do right now! Ollama will be updated so Llava can take in an even better quality screenshot than it already does!
in reply to Devin Prater :blind:

It's great. I'd say this is better than GPT 3.5 when it was big. It's only going to get better from here as you say, so if this impresses us, where will we be in 6 months?
in reply to Andre Louis

Hopefully we'll either be running a screen reader powered by one of these, or wearing headsets that can answer questions about what is in front of us. Probably have to wait a year or two on that last one, but I believe it will happen, and I'm excited. The interactive image describers are just such an amazing advance. I still encounter lots of blind people who don't know they exist yet.
in reply to victor tsaran

@vick21 @bryansmart @pixelate So update: llava 1.6 support for llama.cpp is complete, but it only works in command line interface now. They still need to fix llama.cpp server, then ollama needs to upgrade. From my testing, it's much better, and it can process in 4 times higher resolutions (672x672, 336x1344, 1344x336.) Right now Ollama uses llava 1.6 weight, but model processes only in smaller older 336x336 resolution.
in reply to Chi Kim

@chikim @bryansmart @pixelate Wow, that was amazing. I gave it a picture of a building in New Orleans and asked to guess where the surroundings came from and it actually got very very close! This is good!
in reply to victor tsaran

@vick21 @bryansmart @pixelate Yeah it is good. LLaVA-34B even outperforms Gemini Pro on some benchmarks. That's just crazy!
in reply to victor tsaran

@vick21 @bryansmart @pixelate haha yeah but it's still crazy that open source model can beat a few months old giant model from Google!