Downloading as we speak… I saw the announcement about Llava 1.6 but didn’t realize Ollama had a 34b model ready for us. To be super-honest with you though, GPT4V will still be faster and more responsive no matter how you slice it. :)
OpenAI is everyone's target. Llava 1.6 even trained with synthetic dataset generated with GPT-4V. Not sure about accuracy, but it talks very much like GPT-4V now with similar tone. lol
Sorry, I was responding to your post about Llama3. As for the Llava:34b I take back what I said. I was trying to load it from the console. It loads fairly fast, however, if I do it from Python or Ollama UI extension.
Talking about big models, have you tried miqu-1-70b? Mistral CEo confirmed someone leaked their testing model in 4chan. It's one of the top models in the HF leaderboard now. People say the quality is as good as or better than gpt 3.5, close to gpt4. However, it's pretty painful to wait for responses. haha The model is stil up in HF, and you can use with Llama.cpp or create modelfile to use with Ollama! https://huggingface.co/miqudev/miqu-1-70b
Yeah OpenAI is running with a target on their back, so everyone wants to nock them down. lol Honestly though, getting close, but not quite yet. On a related note, it's crazy how couple of university folks beat Gemini Pro with Llava V1.6. Training their 34B model only took 30 hours with 32 A100s.
Chi, keep forgetting to ask you. I remember that when scanning VO cursor in the past versions, I was able to get the OCR of the whole screen, even the elements outside of the current window. With VOCR2 I mostly only get the stuff inside the current window regardless of whether I OCR VO cursor or the Window. Is this the intent or a bug?
You should only see the ones inside VO cursor. However, sometimes VOCursor can focuss on the elements that's not visible, and returns coordinates outside window. Then VOCR will grab stuff outside window. Basically it takes screenshot of what VO says VOCursor coordinates are.
Can you try going to system settings, then choose general form the side bar, and move your VO cursor to general scrol area (don't interact with it), and scan VOCursor. Then you'll only see what's inside the general scroll area. Whereas if you scan the window, you'll see stuff in the side bar as well. Let me know if it's not how it works in your end.
Yeah, in previous version I used VO applescript command and asked VO to capture screenshot under VOCursor and save into a file. Apparently that sometimes worked sometimes didn't. VOCR V2 now asks VO to give VOCursor bounds, and VOCR captures the screenshot instead. I couldn't do it in JXA, so I had to figure out in regular AppleScript. Fortunately ChatGPT wrote it for me. haha
victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •Chi Kim
in reply to victor tsaran • • •victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •miqudev/miqu-1-70b · Hugging Face
huggingface.covictor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •victor tsaran
in reply to Chi Kim • • •victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •Chi Kim
in reply to victor tsaran • • •victor tsaran
in reply to Chi Kim • • •Chi Kim
in reply to victor tsaran • • •