DVT SystemVerilog IDE User Guide
Rev. 24.2.25, 31 October 2024

9.7 How to Set-up Local LLMs Using Ollama

Ollama is a lightweight, extensible framework for running language models on the local machine. It provides a simple API and CLI for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

To set-up Ollama on your local machine following these steps:

  • Download and install Ollama from its official website.

  • Start the Ollama server on localhost:11434 using: ollama serve

  • Pull a model from Ollama library : ollama pull <model name>

Now the local LLM is ready to be used by AI Assistant. If Ollama runs on the default host and port, no additional configuration is needed for the AI Assistant. All models pulled locally and served by Ollama will show up in the AI Assistant (e.g. default model selection list).

Recommendations:

Use models with at least 7B parameters, preferably more. Smaller models typically don’t produce useful results.

Some models to start with:

  • llama3.1:8b

  • llama3.1:70b

  • deepseek-coder-v2:16b

  • codestral:22b

Running LLMs locally is resource intensive, for example, Llama 3.1 model with 8B parameters requires 4.7G disk space to download and 8G RAM to run.

The model speed (tokens/sec) depends on the processing power. A GPU will deliver the best results, running only on CPU will result in slow replies.

Ollama supports GPU acceleration. You should refer to the documentation to find out if your GPU is supported.