9.7 How to Set-up Local LLMs Using OllamaOllama is a lightweight, extensible framework for running language models on the local machine. It provides a simple API and CLI for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. To set-up Ollama on your local machine following these steps: Download and install Ollama from its official
website.
Start the Ollama server on
localhost:11434 using:
ollama serve
Pull a model from Ollama
library
: ollama pull <model name>
Now the local LLM is ready to be used by AI Assistant. If Ollama runs on the default host and port, no additional configuration is needed for the AI Assistant. All models pulled locally and served by Ollama will show up in the AI Assistant (e.g. default model selection list).
Recommendations:
Use models with at least 7B parameters, preferably more. Smaller models typically don’t produce useful results. Some models to start with: llama3.1:8b llama3.1:70b deepseek-coder-v2:16b codestral:22b
Running LLMs locally is resource intensive, for example, Llama 3.1 model with 8B parameters requires 4.7G disk space to download and 8G RAM to run. The model speed (tokens/sec) depends on the processing power. A GPU will deliver the best results, running only on CPU will result in slow replies. Ollama supports GPU acceleration. You should refer to the
documentation to find out if your GPU is supported.
|