The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. Typical run using LLaMA v2 13B on M2 ...
Get up and running with large language models. Gemma 3 4B 3.3GB ollama run gemma3 Gemma 3 12B 8.1GB ollama run gemma3:12b Gemma 3 27B 17GB ollama run gemma3:27b QwQ 32B 20GB ollama run qwq DeepSeek-R1 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results