Chi Kim

2 months ago • •

Chi Kim
2 months ago • •

Ollama seems to support Llava V1.6 34B! Best open source multimodal vision-language model I've tried so far! #LLM #ML #AI @freakyfwoof @vick21 @tristan @KyleBorah @Bri

#AI #ML #llm @Andre Louis @victor tsaran @Kyle Borah @Tristan CArel @Bri

in reply to Chi Kim

Downloading as we speak… I saw the announcement about Llava 1.6 but didn’t realize Ollama had a 34b model ready for us. To be super-honest with you though, GPT4V will still be faster and more responsive no matter how you slice it. :)

This entry was edited (2 months ago)

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

I agree, but can't beat free and private. lol

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

OpenAI is everyone's target. Llava 1.6 even trained with synthetic dataset generated with GPT-4V. Not sure about accuracy, but it talks very much like GPT-4V now with similar tone. lol

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

What do you run yours on? I spawned the model and, o my gosh, it took like for ever to load! :)

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

Yeah it does take a while. I'm on last intel mac, so if you're on silicon chip, yours should be faster! lol

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

Yeah, I am here with M2 Pro and this thing really takes its sweet time! :) Wow!

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

There's a rumor that Llama-3 is going to be 150B parameters, also possibly 300B variant. You might want to practice waiting. lol

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

I hope quantization folks are getting ready for it though! :)

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

Yeah llava-34 on Ollama is quantized. Imagine 5 times slower with 150B. lol

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

Sorry, I was responding to your post about Llama3. As for the Llava:34b I take back what I said. I was trying to load it from the console. It loads fairly fast, however, if I do it from Python or Ollama UI extension.

This entry was edited (2 months ago)

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

Talking about big models, have you tried miqu-1-70b? Mistral CEo confirmed someone leaked their testing model in 4chan. It's one of the top models in the HF leaderboard now. People say the quality is as good as or better than gpt 3.5, close to gpt4. However, it's pretty painful to wait for responses. haha The model is stil up in HF, and you can use with Llama.cpp or create modelfile to use with Ollama! https://huggingface.co/miqudev/miqu-1-70b

miqudev/miqu-1-70b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

^{huggingface.co}

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

Reading the headlines, it seems like every new model is now approaching GPT3.5 or even 4. Seems like a trend! :) No, I didn't try it! :)

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

Yeah OpenAI is running with a target on their back, so everyone wants to nock them down. lol Honestly though, getting close, but not quite yet. On a related note, it's crazy how couple of university folks beat Gemini Pro with Llava V1.6. Training their 34B model only took 30 hours with 32 A100s.

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

Next test: VOCR.

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

Chi, keep forgetting to ask you. I remember that when scanning VO cursor in the past versions, I was able to get the OCR of the whole screen, even the elements outside of the current window. With VOCR2 I mostly only get the stuff inside the current window regardless of whether I OCR VO cursor or the Window. Is this the intent or a bug?

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

You should only see the ones inside VO cursor. However, sometimes VOCursor can focuss on the elements that's not visible, and returns coordinates outside window. Then VOCR will grab stuff outside window. Basically it takes screenshot of what VO says VOCursor coordinates are.

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

Can you try going to system settings, then choose general form the side bar, and move your VO cursor to general scrol area (don't interact with it), and scan VOCursor. Then you'll only see what's inside the general scroll area. Whereas if you scan the window, you'll see stuff in the side bar as well. Let me know if it's not how it works in your end.

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 2 months ago • •

Yep, works just like you described! I just remember VO cursor scanning working differently in the previous version.

in reply to victor tsaran

Chi Kim

in reply to victor tsaran • 2 months ago • •

Yeah, in previous version I used VO applescript command and asked VO to capture screenshot under VOCursor and save into a file. Apparently that sometimes worked sometimes didn't. VOCR V2 now asks VO to give VOCursor bounds, and VOCR captures the screenshot instead. I couldn't do it in JXA, so I had to figure out in regular AppleScript. Fortunately ChatGPT wrote it for me. haha

⇧

Chi Kim 2 months ago • •

Chi Kim
2 months ago • •