Chi Kim

hahaha For my curiosity, I got llama-3.1-405B-Q2_K to run on my mac with m3 Max 64GB. The model size is about 151GB, and I could offload 47 out of 127 layers to GPU. The problem is it takes more than 2 minutes (139 seconds) to generate one token. lol #LLM #ML #AI @vick21 @ZBennoui

#AI #ML #llm @victor tsaran @Zachary Bennoui

in reply to Chi Kim

victor tsaran

in reply to Chi Kim • 1 month ago • •

Wo!

⇧

Chi Kim

Chi Kim 1 month ago • •

victor tsaran

Chi Kim
1 month ago • •