LLaVA-1.5 is an open-ish AI model which can provide image descriptions and allow follow-up interaction, akin to Be My AI. The best part is that you can run it locally on your computer if you have an appropriate GPU... or very, very slowly if you want to use your CPU. I thought it'd be cool to hook it up to #NVDASR so you can get image descriptions for the current navigator object and then ask follow-up questions. So, I wrote an NVDA add-on to do just that using llama.cpp. github.com/jcsteh/nvda-llamaCp…

reshared this

in reply to techsinger

@techsinger It honestly isn't worth it unfortunately; it'll take over a minute to answer each query. But if you do want to try, you could just use llamafile-server, which is a single binary. Download this and rename it to server.exe, then follow the rest of the instructions in my readme, skipping anything related to the zip files. github.com/Mozilla-Ocho/llamaf…
Unknown parent

mastodon - Link to source

Jamie Teh

@chikim Intriguing. I haven't tried the q8 or 13b version of LLaVA yet either. I notice here that you are using the q8 model with the f16 mmproj. I assumed the quantisation had to match for the main model and the mmproj, but clearly not? Is there a reason you mismatch them here?
in reply to Toni Barth

@ToniBarth They're huge, 4 gb+. I don't want to host that and it seems kinda pointless given that there's still some technical messing around required; figuring out whether you have the right GPU, etc. If you try to do this on CPU, it'll take over a minute to answer each query. I can point you to CPU binaries if you can't run it on GPU and still want to try it though.
in reply to Jamie Teh

Size is weird, I just took something. But it happens the same if it says 1920x1080 slot 0 - image loaded [id: 10] resolution (38 x 22)
slot 0 is processing [task id: 4]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 10]
{"timestamp":1701472031,"level":"INFO","function":"log_server_request","line":2601,"message":"request","remote_addr":"127.0.0.1","remote_port":33884,"status":200,"method":"POST","path":"/completion","params":{}}
slot 0 released (3 tokens in cache)