Skip to main content


hahaha For my curiosity, I got llama-3.1-405B-Q2_K to run on my mac with m3 Max 64GB. The model size is about 151GB, and I could offload 47 out of 127 layers to GPU. The problem is it takes more than 2 minutes (139 seconds) to generate one token. lol #LLM #ML #AI @vick21 @ZBennoui