Skip to main content


What is Flash Attention huggingface.co/docs/text-gener… #llm #ai #ollama
in reply to victor tsaran

I wish their Python library supports Apple silicon like how llama.cpp implemented it. More and more models utilize it now, so you can't run them on Apple silicon with Python. :(
in reply to victor tsaran

Hopefully soon! Flash attention has been out a while though. Even other vision language models have started using them. :( Makes me want to go get a rtx card for my pc. lol
in reply to Chi Kim

@chikim Aren't they now talking about double-flash-attention, or something like that? I came across it yesterday, but didn't dig too deep into it just yet.