Ollama CheatSheet: Get Started with Running Local LLM with Ollama

Running large language models (LLMs) locally on your computer has become increasingly accessible, thanks to tools like Ollama. Ollama allows you to manage and use various open-source LLMs on your machine, providing a high degree of control and privacy. This cheat sheet will guide you through everything you need to get started with running local LLMs using Ollama, from installation to advanced usage.

Introduction to Ollama

Ollama is an open-source platform that simplifies the process of running LLMs locally. It supports a wide range of models, including LLaMA 2, Mistral, and Gemma, and allows you to switch between them easily. By running LLMs locally, you can avoid the costs and privacy concerns associated with cloud-based services.

💡

Interested in the latest trend in AI?

Then, You cannot miss out Anakin AI!

Anakin AI is an all-in-one platform for all your workflow automation, create powerful AI App with an easy-to-use No Code App Builder, with Llama 3, Claude, GPT-4, Uncensored LLMs, Stable Diffusion...

Build Your Dream AI App within minutes, not weeks with Anakin AI!

Start for free

Why Use Ollama?

Privacy: Running LLMs locally ensures that your data never leaves your machine, addressing privacy and compliance concerns.
Cost Efficiency: Avoid the costs associated with cloud-based LLM services.
Control: Gain more control over the AI models and their configurations.
Flexibility: Easily switch between different models and customize them to suit your needs.

Installation

Prerequisites

A computer with sufficient memory and storage (at least 8 GB of RAM is recommended).
Basic knowledge of using the terminal or command prompt.

Step-by-Step Installation

Download Ollama: Visit the Ollama GitHub repository or the Ollama website to download the appropriate version for your operating system (Mac, Windows, or Linux).

Install Ollama:

Mac: Download the .dmg file and follow the installation instructions.
Windows: Download the .exe file and run the installer.
Linux: Follow the installation instructions provided on the GitHub page.

Verify Installation: Open your terminal or command prompt and run the following command to verify the installation:

ollama --version

Running Your First LLM

Downloading a Model

To run an LLM, you first need to download a model. For example, to download the LLaMA 2 model, use the following command:

ollama run llama2

This command will download the model and set it up for use. Note that the download may take some time, as models can be several gigabytes in size.

Interacting with the Model

Once the model is downloaded, you can start interacting with it. For example, to ask the LLaMA 2 model a question, use the following command:

ollama run llama2

You will be prompted to enter your query. For example:

>>> What can you do for me?

The model will then generate a response based on your query.

Advanced Usage

Running Ollama as a Local Server

To integrate Ollama with your applications, you can run it as a local server and interact with it via a REST API. Start the server with the following command:

ollama serve

You can then make API calls to interact with the model. For example, using curl:

curl --location --request POST 'http://localhost:11434/v1/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "prompt": "A hyper-realistic portrait of a medieval knight",
    "model": "llama2"
}'

Using Client Libraries

Ollama provides client libraries for various programming languages, including Python and JavaScript. These libraries simplify the process of making API calls. For example, using the Python client:

import ollama

client = ollama.Client(api_key='YOUR_API_KEY')
response = client.generate(prompt="A hyper-realistic portrait of a medieval knight", model="llama2")
print(response)

Customizing Models

Ollama allows you to customize models by adjusting parameters and system prompts. This can be done by modifying the model configuration file. For example, to change the temperature setting of the LLaMA 2 model, edit the configuration file as follows:

model: llama2
temperature: 0.7

Switching Between Models

Switching between different models in Ollama is straightforward. Simply specify the model name in your command. For example, to switch to the Mistral model:

ollama run mistral

Performance Considerations

Running LLMs locally can be resource-intensive. Here are some tips to optimize performance:

Use a Powerful Machine: Machines with more RAM and better GPUs will perform better.
Optimize Model Size: Use smaller models if performance is an issue.
Docker Integration: For better GPU utilization, consider running Ollama inside a Docker container with the Nvidia Container Toolkit.

Troubleshooting

Common Issues

Model Download Failures: Ensure you have a stable internet connection and sufficient disk space.
Performance Issues: Check your system resources and consider using a smaller model or upgrading your hardware.
API Errors: Verify that the server is running and that you are using the correct API endpoint and parameters.

Getting Help

For additional support, refer to the Ollama documentation or join the community on GitHub and other social platforms.

Using Ollama to Call AI API with Anakin AI

Anakin AI offers a comprehensive API service that allows developers to integrate AI capabilities into their applications seamlessly. By leveraging Anakin AI's API, you can enhance your projects with robust AI features without managing complex backend architecture. Here’s how you can use Ollama to call Anakin AI's API:

Step 1: Upgrade Your Plan and Check Your Account Credits

Ensure your Anakin AI account has sufficient credits. Navigate to the Anakin AI Web App, click on the avatar in the lower left corner, and access the Upgrade page to check your subscription status or upgrade your plan if necessary.

Step 2: Generate Your API Access Token

Generate an API access token by visiting the app Integration section in the Anakin AI Web App. Click Manage Token, select New Token, complete the token configuration, and save the API access token securely.

Step 3: Make API Calls Using Ollama

With your API access token ready, you can now make API calls to Anakin AI using Ollama. Here’s an example of how to use the Anakin AI API to generate text content:

curl --location --request POST 'https://api.anakin.ai/v1/quickapps/{{appId}}/runs' \
--header 'Authorization: Bearer ANAKINAI_API_ACCESS_TOKEN' \
--header 'X-Anakin-Api-Version: 2024-05-06' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {
        "Product/Service": "Cloud Service",
        "Features": "Reliability and performance.",
        "Advantages": "Efficiency",
        "Framework": "Attention-Interest-Desire-Action"
    },
    "stream": true
}'

Replace {{appId}} with your app ID and ANAKINAI_API_ACCESS_TOKEN with the token you generated.

Step 4: Integrate with Ollama

To integrate Anakin AI's API with Ollama, you can use the Python client library. Here’s an example:

import ollama
import requests

client = ollama.Client(api_key='YOUR_OLLAMA_API_KEY')

# Define the Anakin AI API endpoint and headers
anakin_api_url = 'https://api.anakin.ai/v1/quickapps/{{appId}}/runs'
headers = {
    'Authorization': 'Bearer ANAKINAI_API_ACCESS_TOKEN',
    'X-Anakin-Api-Version': '2024-05-06',
    'Content-Type': 'application/json'
}

# Define the payload for the Anakin AI API call
payload = {
    "inputs": {
        "Product/Service": "Cloud Service",
        "Features": "Reliability and performance.",
        "Advantages": "Efficiency",
        "Framework": "Attention-Interest-Desire-Action"
    },
    "stream": True
}

# Make the API call to Anakin AI
response = requests.post(anakin_api_url, headers=headers, json=payload)
print(response.json())

This example demonstrates how to call Anakin AI's API from a Python script using Ollama.

FAQs

What is Ollama?

Ollama is an open-source platform that allows you to run large language models (LLMs) locally on your computer.

How do I install Ollama?

Download the appropriate version for your operating system from the Ollama GitHub repository or website, and follow the installation instructions.

Can I run multiple models with Ollama?

Yes, Ollama supports multiple models, and you can switch between them easily by specifying the model name in your commands.

How do I interact with a model using Ollama?

You can interact with a model by running it in the terminal and entering your queries, or by using the REST API for programmatic access.

What are the system requirements for running Ollama?

A computer with at least 8 GB of RAM is recommended. More powerful hardware will provide better performance.

How do I customize a model in Ollama?

You can customize a model by modifying its configuration file, adjusting parameters such as temperature and system prompts.

Can I run Ollama on a GPU?

Yes, Ollama can utilize GPUs for better performance. Consider using Docker with the Nvidia Container Toolkit for optimal GPU utilization.

How do I integrate Anakin AI's API with Ollama?

Generate an API access token from Anakin AI, then use it to make API calls from Ollama. You can use the Python client library to facilitate this integration.

Conclusion

Ollama makes it easy to run large language models locally, providing a high degree of control, privacy, and flexibility. Whether you're a developer looking to integrate LLMs into your applications or an enthusiast exploring the capabilities of AI, Ollama offers a powerful and user-friendly solution. Follow this cheat sheet to get started with Ollama and unlock the potential of local LLMs.