Nous Research has unveiled its latest achievement in the realm of artificial intelligence, the Hermes-2-Mixtral-8x7B. This large language model (LLM) represents a significant step forward in AI capabilities, pushing the boundaries of what's possible in natural language processing. In this deep dive, we will explore the technical details and benchmark data that set the Hermes-2-Mixtral-8x7B apart.
Article Summary
- The Hermes-2-Mixtral-8x7B, developed by Nous Research, is a groundbreaking large language model, excelling in various benchmarks with its dual variants, SFT and DPO, optimized for supervised fine-tuning and data parallelism, respectively.
- Featuring state-of-the-art performance on tasks like GPT4All, AGIEval, and BigBench, the model achieves an impressive average accuracy, particularly notable in the ARC Challenge with a score of 75.70%.
- The model introduces the advanced ChatML prompt format for enhanced multi-turn dialogues, along with quantized versions for varied computational environments, demonstrating Nous Research's commitment to innovation in AI technology.
Overview of Hermes-2-Mixtral-8x7B
The Hermes-2-Mixtral-8x7B is built on an expansive training dataset, primarily comprising over 1,000,000 entries generated by GPT-4, supplemented with high-quality data from various open datasets. It is available in two distinct variants:
- SFT (Supervised Finetune Only): Specialized for supervised fine-tuning applications.
- DPO (Data Parallelism Only): Focuses on optimizing data parallelism for enhanced performance.
How to Try Hermes-2-Mixtral-8x7B Online
Experience the future of conversation with Anakin AI's premiere chatbots: the Hermes-2-Mixtral-8x7B SFT and Hermes-2-Mixtral-8x7B DPO. These advanced AI chatbots are designed to cater to your specific interaction needs with unparalleled expertise and efficiency.
Whether you require the highly specialized knowledge of the SFT model or the rapid, wide-reaching capabilities of the DPO model, Anakin AI provides a seamless and sophisticated user experience.
- Test the expert knowledge and precision of the Hermes-2-Mixtral-8x7B SFT here, perfect for in-depth conversations on complex subjects.
- Experience the swift and scalable responses of the Hermes-2-Mixtral-8x7B DPO here, ideal for handling high volumes of queries with ease.
Dive into the future of digital communication and let these chatbots demonstrate the power of AI-driven interaction. Visit Anakin AI online now to engage with Hermes-2-Mixtral-8x7B SFT and DPO — your gateway to intelligent conversation.
Benchmarking Hermes-2-Mixtral-8x7B: How Good Is It?
GPT4All Benchmark
The GPT4All benchmark is a comprehensive test of language models' performance across various tasks. The Hermes-2-Mixtral-8x7B has demonstrated exceptional results in this benchmark, surpassing many of its predecessors and competitors.
Benchmark Results
The following table presents the detailed performance metrics of Hermes-2-Mixtral-8x7B in the GPT4All benchmark:
Task | Version | Metric | Value | Stderr |
---|---|---|---|---|
ARC Challenge | 0 | ACC | 0.5990 | ±0.0143 |
ACC Norm | 0.6425 | ±0.0140 | ||
ARC Easy | 0 | ACC | 0.8657 | ±0.0070 |
ACC Norm | 0.8636 | ±0.0070 | ||
BoolQ | 1 | ACC | 0.8783 | ±0.0057 |
Hellaswag | 0 | ACC | 0.6661 | ±0.0047 |
ACC Norm | 0.8489 | ±0.0036 | ||
OpenBookQA | 0 | ACC | 0.3440 | ±0.0213 |
ACC Norm | 0.4660 | ±0.0223 | ||
PIQA | 0 | ACC | 0.8324 | ±0.0087 |
ACC Norm | 0.8379 | ±0.0086 | ||
Winogrande | 0 | ACC | 0.7616 | ±0.0120 |
Average Accuracy: 75.70%
AGIEval Benchmark for Hermes-2-Mixtral-8x7B
The AGIEval benchmark evaluates the model's performance in tasks requiring advanced general intelligence capabilities.
For the AGIEval benchmark, here's how the Hermes-2-Mixtral-8x7B scored:
Task | Version | Metric | Value | Stderr |
---|---|---|---|---|
AGIEval Aqua Rat | 0 | ACC | 0.2402 | ±0.0269 |
ACC Norm | 0.2520 | ±0.0273 | ||
AGIEval LogiQA EN | 0 | ACC | 0.4117 | ±0.0193 |
ACC Norm | 0.4055 | ±0.0193 | ||
AGIEval LSAT AR | 0 | ACC | 0.2348 | ±0.0280 |
ACC Norm | 0.2087 | ±0.0269 | ||
AGIEval LSAT LR | 0 | ACC | 0.5549 | ±0.0220 |
ACC Norm | 0.5294 | ±0.0221 | ||
AGIEval LSAT RC | 0 | ACC | 0.6617 | ±0.0289 |
ACC Norm | 0.6357 | ±0.0294 | ||
AGIEval SAT EN | 0 | ACC | 0.8010 | ±0.0279 |
ACC Norm | 0.7913 | ±0.0284 | ||
AGIEval SAT EN Without Passage | 0 | ACC | 0.4806 | ±0.0349 |
ACC Norm | 0.4612 | ±0.0348 | ||
AGIEval SAT Math | 0 | ACC | 0.4909 | ±0.0338 |
Average Accuracy: 46.05%
This section of the article provides a technical and detailed overview of the Hermes-2-Mixtral-8x7B model, focusing on its architecture, variants, and performance in key benchmarks. The use of tables helps in presenting the benchmark data clearly, aiding in the comprehension of the model's capabilities. Continue the article by expanding on other benchmarks, features, and comparisons to other models
BigBench Benchmark for Hermes-2-Mixtral-8x7B
The BigBench benchmark tests the model's abilities in a wide array of tasks, emphasizing reasoning, understanding, and problem-solving skills.
Benchmark Results
Below is a detailed breakdown of Hermes-2-Mixtral-8x7B's performance in the BigBench benchmark:
Task | Version | Metric | Value | Stderr |
---|---|---|---|---|
Causal Judgement | 0 | Multiple Choice Grade | 0.6105 | ±0.0355 |
Date Understanding | 0 | Multiple Choice Grade | 0.7182 | ±0.0235 |
Disambiguation QA | 0 | Multiple Choice Grade | 0.5736 | ±0.0308 |
Geometric Shapes | 0 | Multiple Choice Grade | 0.4596 | ±0.0263 |
Exact Str Match | 0.0000 | ±0.0000 | ||
Logical Deduction Five Objects | 0 | Multiple Choice Grade | 0.3500 | ±0.0214 |
Logical Deduction Seven Objects | 0 | Multiple Choice Grade | 0.2500 | ±0.0164 |
Logical Deduction Three Objects | 0 | Multiple Choice Grade | 0.5200 | ±0.0289 |
Movie Recommendation | 0 | Multiple Choice Grade | 0.3540 | ±0.0214 |
Navigate | 0 | Multiple Choice Grade | 0.5000 | ±0.0158 |
Reasoning About Colored Objects | 0 | Multiple Choice Grade | 0.6900 | ±0.0103 |
Ruin Names | 0 | Multiple Choice Grade | 0.6317 | ±0.0228 |
Salient Translation Error Detection | 0 | Multiple Choice Grade | 0.2535 | ±0.0138 |
Snarks | 0 | Multiple Choice Grade | 0.7293 | ±0.0331 |
Sports Understanding | 0 | Multiple Choice Grade | 0.6744 | ±0.0149 |
Temporal Sequences | 0 | Multiple Choice Grade | 0.7400 | ±0.0139 |
Tracking Shuffled Objects Five Objects | 0 | Multiple Choice Grade | 0.2176 | ±0.0117 |
Tracking Shuffled Objects Seven Objects | 0 | Multiple Choice Grade | 0.1543 | ±0.0086 |
Tracking Shuffled Objects Three Objects | 0 | Multiple Choice Grade | 0.5200 | ±0.0289 |
Average Score: 49.70%
Advanced Features and Compatibility
ChatML Prompt Format
Hermes-2-Mixtral-8x7B utilizes the innovative ChatML prompt format, enhancing the structure and versatility of multi-turn chat dialogues. This format:
- Enables precise control over the conversation flow.
- Supports system prompts for guided interactions.
- Is compatible with OpenAI's API, offering familiarity for those who have used ChatGPT.
Quantization and Accessibility
To accommodate various computational environments, Hermes-2-Mixtral-8x7B offers quantized versions, ensuring efficient performance across different setups:
Various quantizations by TheBloke are also available, catering to diverse needs.
How to Run and Use Hermes-2-Mixtral-8x7B Locally
Method 1. run Hermes-2-Mixtral-8x7B using Hugging Face Transformers
Hermes-2-Mixtral-8x7B is not just a theoretical marvel but also a practical tool. Here's an example of how one can utilize the model using HuggingFace Transformers:
import torch
from transformers import LlamaTokenizer, MixtralForCausalLM
tokenizer = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO', trust_remote_code=True)
model = MixtralForCausalLM.from_pretrained(
"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=False,
load_in_4bit=True,
use_flash_attention_2=True
)
prompt = "system\nYou are Hermes 2, a superintelligent AI.\nuser\nTell me about quantum mechanics."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=100)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response)
This example demonstrates the ease of integrating Hermes-2-Mixtral-8x7B into various AI and data processing tasks, making it a versatile tool for professionals across sectors.
Method 2. Hermes-2-Mixtral-8x7B Using WasmEdge
Alternatively, you can also run Hermes-2-Mixtral-8x7B with WasmEdge. To run the model on your device, follow these steps:
Install WasmEdge: Use the command below to install WasmEdge along with the required plugin:
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
Download the Nous-Hermes-2-Mixtral-8x7B Model: Retrieve the GGUF file of the model, which is several GBs in size, using this command:
curl -LO https://huggingface.co/second-state/Nous-Hermes-2-Mixtral-8x7B-SFT-GGUF/resolve/main/Nous-Hermes-2-Mixtral-8x7B-SFT-Q5_K_M.gguf
Download the Chat Application Wasm File: This cross-platform portable Wasm file allows you to interact with the model via command line. The Rust source code for the app is available here. Download the file using:
curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-chat.wasm
To start chatting with the model in your terminal, simply input the following command:
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Nous-Hermes-2-Mixtral-8x7B-SFT-Q5_K_M.gguf llama-chat.wasm -p chatml
Conclusion
Nous Research's Hermes-2-Mixtral-8x7B stands as a landmark achievement in the field of AI and natural language processing. With its state-of-the-art performance, innovative features, and user-friendly design, it represents the next leap forward in the capabilities of language models. Whether for academic research, business analytics, or creative endeavors, Hermes-2-Mixtral-8x7B is poised to revolutionize how we interact with and leverage AI technology.