Uncensored Llama 3: Dolphin-2.9-llama3-70b-gguf

Sam Altwoman
0

Llama 3 70B, Dolphin-2.9 remains a powerful and versatile model. Its ability to handle complex tasks without censorship

アプリの概要

Dolphin-2.9-Llama3-70B-GGUF: An In-Depth Look

Introduction

Dolphin-2.9-Llama3-70B-GGUF is a state-of-the-art language model that has garnered significant attention in the AI community. Developed by Eric Hartford, Lucas Atkins, Fernando Fernandes, and Cognitive Computations, this model is a fine-tuned version of the Meta Llama 3 8B model. It boasts a variety of capabilities, including instruction following, conversational skills, coding abilities, and initial agentic functionalities. This article delves into the model's design, training process, uncensored nature, benchmarks, and provides a guide on how to run it locally.

Model Design

Dolphin-2.9-Llama3-70B-GGUF is based on the Meta Llama 3 8B model and has been quantized using the Generalized Greedy Uniform Factorization (GGUF) method. The quantization process reduces the model's size while maintaining its performance, making it more accessible for various applications. The model supports a context window of up to 8,000 tokens, with a full-weight fine-tuning sequence length of 4,000 tokens.

The model's architecture is designed to handle a wide range of tasks, from simple instructions to complex coding problems. It also includes initial agentic abilities, allowing it to perform function calls and interact with other systems. The model's versatility makes it suitable for a variety of applications, including customer support, content generation, and software development.

Training Process

The training of Dolphin-2.9-Llama3-70B-GGUF was a collaborative effort involving multiple datasets and advanced techniques. The model was trained using a combination of synthetic data generated from GPT-4 and other models. The training process took 2.5 days on eight L40S nodes provided by Crusoe Cloud.

The datasets used for training include:

  • Cognitivecomputations/Dolphin-2.9
  • Teknium/OpenHermes-2.5
  • M-a-p/CodeFeedback-Filtered-Instruction
  • Cognitivecomputations/dolphin-coder
  • Cognitivecomputations/samantha-data
  • HuggingFaceH4/ultrachat_200k
  • Microsoft/orca-math-word-problems-200k
  • Abacusai/SystemChat-1.1
  • Locutusque/function-calling-chatml
  • Internlm/Agent-FLAN

These datasets cover a wide range of topics and tasks, ensuring that the model is well-rounded and capable of handling diverse queries.

Uncensored Nature

One of the most notable features of Dolphin-2.9-Llama3-70B-GGUF is its uncensored nature. The model's dataset has been filtered to remove alignment and bias, making it more compliant with user requests. However, this also means that the model can generate content that may be considered unethical or inappropriate. Users are advised to implement their own alignment layers before deploying the model in production environments.

The uncensored nature of the model has sparked discussions within the AI community. While it offers greater flexibility and compliance, it also raises concerns about the potential misuse of the model. Developers and users must exercise caution and responsibility when using Dolphin-2.9-Llama3-70B-GGUF.

Benchmarks

Dolphin-2.9-Llama3-70B-GGUF has been benchmarked against several other models, including the original Llama 3 70B and GPT-4. The results indicate that Dolphin-2.9 performs exceptionally well, often surpassing its predecessors in various tasks. For instance, the model has shown significant improvements in instruction following, conversational abilities, and coding tasks.

However, some users have reported that the model's attention to detail and ability to follow instructions have diminished compared to the original Llama 3 70B. This is likely due to the synthetic nature of the training data, which may not capture the nuances of human-generated content. Despite these limitations, Dolphin-2.9 remains a powerful tool for a wide range of applications.

Running Dolphin-2.9-Llama3-70B-GGUF Locally

Running Dolphin-2.9-Llama3-70B-GGUF locally requires a few steps. Here is a guide to help you get started:

Prerequisites

  1. Hardware Requirements: Ensure that your system meets the hardware requirements. For instance, using a 256k context window version requires at least 64GB of memory.
  2. Software Requirements: Install the necessary software, including Python and the Hugging Face CLI.

Installation

  1. Install Hugging Face CLI:

    pip install -U "huggingface_hub[cli]"
    
  2. Download the Model:

    huggingface-cli download bartowski/dolphin-2.9-llama3-70b-GGUF --include "dolphin-2.9-llama3-70b-Q4_K_M.gguf" --local-dir ./ --local-dir-use-symlinks False
    

    If the model is larger than 50GB, it will be split into multiple files. To download all parts, use:

    huggingface-cli download bartowski/dolphin-2.9-llama3-70b-GGUF --include "dolphin-2.9-llama3-70b-Q8_0.gguf/*" --local-dir dolphin-2.9-llama3-70b-Q8_0 --local-dir-use-symlinks False
    

Running the Model

  1. Using the API:

    curl http://localhost:11434/api/generate -d '{
        "model": "dolphin-llama3:8b-256k",
        "prompt": "Why is the sky blue?",
        "options": {
            "num_ctx": 256000
        }
    }'
    
  2. Using the CLI:

    ollama run dolphin-llama3:8b-256k
    >>> /set parameter num_ctx 256000
    

Prompt Format

When interacting with the model, use the following prompt format:

<|im_start|>system
{system_prompt}

Conclusion

Dolphin-2.9-Llama3-70B-GGUF represents a significant advancement in the field of large language models. Its uncensored nature, combined with its impressive performance across various tasks, makes it a valuable tool for researchers, developers, and AI enthusiasts alike. However, it is crucial to exercise caution and implement appropriate safeguards when deploying the model, as its uncensored nature may lead to the generation of unethical or inappropriate content.