Hermes-2-Mixtral-8x7B DPO & SFT: New Crown of Open Source LLMs

Nous Research has unveiled its latest achievement in the realm of artificial intelligence, the Hermes-2-Mixtral-8x7B. This large language model (LLM) represents a significant step forward in AI capabilities, pushing the boundaries of what's possible in natural language processing. In this deep dive, we will explore the technical details and benchmark data that set the Hermes-2-Mixtral-8x7B apart.

Article Summary

The Hermes-2-Mixtral-8x7B, developed by Nous Research, is a groundbreaking large language model, excelling in various benchmarks with its dual variants, SFT and DPO, optimized for supervised fine-tuning and data parallelism, respectively.
Featuring state-of-the-art performance on tasks like GPT4All, AGIEval, and BigBench, the model achieves an impressive average accuracy, particularly notable in the ARC Challenge with a score of 75.70%.
The model introduces the advanced ChatML prompt format for enhanced multi-turn dialogues, along with quantized versions for varied computational environments, demonstrating Nous Research's commitment to innovation in AI technology.

Overview of Hermes-2-Mixtral-8x7B

Hermes-2-Mixtral-8x7B: New Crown of Open Source LLMs

The Hermes-2-Mixtral-8x7B is built on an expansive training dataset, primarily comprising over 1,000,000 entries generated by GPT-4, supplemented with high-quality data from various open datasets. It is available in two distinct variants:

SFT (Supervised Finetune Only): Specialized for supervised fine-tuning applications.
DPO (Data Parallelism Only): Focuses on optimizing data parallelism for enhanced performance.

How to Try Hermes-2-Mixtral-8x7B Online

Experience the future of conversation with Anakin AI's premiere chatbots: the Hermes-2-Mixtral-8x7B SFT and Hermes-2-Mixtral-8x7B DPO. These advanced AI chatbots are designed to cater to your specific interaction needs with unparalleled expertise and efficiency.

Whether you require the highly specialized knowledge of the SFT model or the rapid, wide-reaching capabilities of the DPO model, Anakin AI provides a seamless and sophisticated user experience.

Test the expert knowledge and precision of the Hermes-2-Mixtral-8x7B SFT here, perfect for in-depth conversations on complex subjects.

Hermes-2-Mixtral-8x7B SFT | Online Chatbot | Anakin.ai

Test out the power of Open Source LLMs with this Online Chatbot!

Anakin.aiSam Altwoman0

Experience the swift and scalable responses of the Hermes-2-Mixtral-8x7B DPO here, ideal for handling high volumes of queries with ease.

Hermes-2-Mixtral-8x7B DPO | Online Chatbot | Anakin.ai

Want to test out the latest features of Hermes-2-Mixtral-8x7B DPO? Try this Online Chatbot!

Anakin.aiSam Altwoman0

Dive into the future of digital communication and let these chatbots demonstrate the power of AI-driven interaction. Visit Anakin AI online now to engage with Hermes-2-Mixtral-8x7B SFT and DPO — your gateway to intelligent conversation.

Benchmarking Hermes-2-Mixtral-8x7B: How Good Is It?

GPT4All Benchmark

The GPT4All benchmark is a comprehensive test of language models' performance across various tasks. The Hermes-2-Mixtral-8x7B has demonstrated exceptional results in this benchmark, surpassing many of its predecessors and competitors.

Benchmark Results

The following table presents the detailed performance metrics of Hermes-2-Mixtral-8x7B in the GPT4All benchmark:

Task	Version	Metric	Value	Stderr
ARC Challenge	0	ACC	0.5990	±0.0143
		ACC Norm	0.6425	±0.0140
ARC Easy	0	ACC	0.8657	±0.0070
		ACC Norm	0.8636	±0.0070
BoolQ	1	ACC	0.8783	±0.0057
Hellaswag	0	ACC	0.6661	±0.0047
		ACC Norm	0.8489	±0.0036
OpenBookQA	0	ACC	0.3440	±0.0213
		ACC Norm	0.4660	±0.0223
PIQA	0	ACC	0.8324	±0.0087
		ACC Norm	0.8379	±0.0086
Winogrande	0	ACC	0.7616	±0.0120

Average Accuracy: 75.70%

AGIEval Benchmark for Hermes-2-Mixtral-8x7B

The AGIEval benchmark evaluates the model's performance in tasks requiring advanced general intelligence capabilities.

For the AGIEval benchmark, here's how the Hermes-2-Mixtral-8x7B scored:

Task	Version	Metric	Value	Stderr
AGIEval Aqua Rat	0	ACC	0.2402	±0.0269
		ACC Norm	0.2520	±0.0273
AGIEval LogiQA EN	0	ACC	0.4117	±0.0193
		ACC Norm	0.4055	±0.0193
AGIEval LSAT AR	0	ACC	0.2348	±0.0280
		ACC Norm	0.2087	±0.0269
AGIEval LSAT LR	0	ACC	0.5549	±0.0220
		ACC Norm	0.5294	±0.0221
AGIEval LSAT RC	0	ACC	0.6617	±0.0289
		ACC Norm	0.6357	±0.0294
AGIEval SAT EN	0	ACC	0.8010	±0.0279
		ACC Norm	0.7913	±0.0284
AGIEval SAT EN Without Passage	0	ACC	0.4806	±0.0349
		ACC Norm	0.4612	±0.0348
AGIEval SAT Math	0	ACC	0.4909	±0.0338

Average Accuracy: 46.05%

This section of the article provides a technical and detailed overview of the Hermes-2-Mixtral-8x7B model, focusing on its architecture, variants, and performance in key benchmarks. The use of tables helps in presenting the benchmark data clearly, aiding in the comprehension of the model's capabilities. Continue the article by expanding on other benchmarks, features, and comparisons to other models

BigBench Benchmark for Hermes-2-Mixtral-8x7B

The BigBench benchmark tests the model's abilities in a wide array of tasks, emphasizing reasoning, understanding, and problem-solving skills.

Benchmark Results

Below is a detailed breakdown of Hermes-2-Mixtral-8x7B's performance in the BigBench benchmark:

Task	Version	Metric	Value	Stderr
Causal Judgement	0	Multiple Choice Grade	0.6105	±0.0355
Date Understanding	0	Multiple Choice Grade	0.7182	±0.0235
Disambiguation QA	0	Multiple Choice Grade	0.5736	±0.0308
Geometric Shapes	0	Multiple Choice Grade	0.4596	±0.0263
		Exact Str Match	0.0000	±0.0000
Logical Deduction Five Objects	0	Multiple Choice Grade	0.3500	±0.0214
Logical Deduction Seven Objects	0	Multiple Choice Grade	0.2500	±0.0164
Logical Deduction Three Objects	0	Multiple Choice Grade	0.5200	±0.0289
Movie Recommendation	0	Multiple Choice Grade	0.3540	±0.0214
Navigate	0	Multiple Choice Grade	0.5000	±0.0158
Reasoning About Colored Objects	0	Multiple Choice Grade	0.6900	±0.0103
Ruin Names	0	Multiple Choice Grade	0.6317	±0.0228
Salient Translation Error Detection	0	Multiple Choice Grade	0.2535	±0.0138
Snarks	0	Multiple Choice Grade	0.7293	±0.0331
Sports Understanding	0	Multiple Choice Grade	0.6744	±0.0149
Temporal Sequences	0	Multiple Choice Grade	0.7400	±0.0139
Tracking Shuffled Objects Five Objects	0	Multiple Choice Grade	0.2176	±0.0117
Tracking Shuffled Objects Seven Objects	0	Multiple Choice Grade	0.1543	±0.0086
Tracking Shuffled Objects Three Objects	0	Multiple Choice Grade	0.5200	±0.0289

Average Score: 49.70%

Advanced Features and Compatibility

ChatML Prompt Format

Hermes-2-Mixtral-8x7B utilizes the innovative ChatML prompt format, enhancing the structure and versatility of multi-turn chat dialogues. This format:

Enables precise control over the conversation flow.
Supports system prompts for guided interactions.
Is compatible with OpenAI's API, offering familiarity for those who have used ChatGPT.

Quantization and Accessibility

To accommodate various computational environments, Hermes-2-Mixtral-8x7B offers quantized versions, ensuring efficient performance across different setups:

SFT+DPO Version: Link
SFT Only Version: Link

Various quantizations by TheBloke are also available, catering to diverse needs.

How to Run and Use Hermes-2-Mixtral-8x7B Locally

Method 1. run Hermes-2-Mixtral-8x7B using Hugging Face Transformers

Hermes-2-Mixtral-8x7B is not just a theoretical marvel but also a practical tool. Here's an example of how one can utilize the model using HuggingFace Transformers:

import torch
from transformers import LlamaTokenizer, MixtralForCausalLM

tokenizer = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO', trust_remote_code=True)
model = MixtralForCausalLM.from_pretrained(
    "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=True,
    use_flash_attention_2=True
)

prompt = "system\nYou are Hermes 2, a superintelligent AI.\nuser\nTell me about quantum mechanics."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")


generated_ids = model.generate(input_ids, max_new_tokens=100)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response)

This example demonstrates the ease of integrating Hermes-2-Mixtral-8x7B into various AI and data processing tasks, making it a versatile tool for professionals across sectors.

Method 2. Hermes-2-Mixtral-8x7B Using WasmEdge

Alternatively, you can also run Hermes-2-Mixtral-8x7B with WasmEdge. To run the model on your device, follow these steps:

Install WasmEdge: Use the command below to install WasmEdge along with the required plugin:

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

Download the Nous-Hermes-2-Mixtral-8x7B Model: Retrieve the GGUF file of the model, which is several GBs in size, using this command:

curl -LO https://huggingface.co/second-state/Nous-Hermes-2-Mixtral-8x7B-SFT-GGUF/resolve/main/Nous-Hermes-2-Mixtral-8x7B-SFT-Q5_K_M.gguf

Download the Chat Application Wasm File: This cross-platform portable Wasm file allows you to interact with the model via command line. The Rust source code for the app is available here. Download the file using:

curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-chat.wasm

To start chatting with the model in your terminal, simply input the following command:

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Nous-Hermes-2-Mixtral-8x7B-SFT-Q5_K_M.gguf llama-chat.wasm -p chatml

Conclusion

Nous Research's Hermes-2-Mixtral-8x7B stands as a landmark achievement in the field of AI and natural language processing. With its state-of-the-art performance, innovative features, and user-friendly design, it represents the next leap forward in the capabilities of language models. Whether for academic research, business analytics, or creative endeavors, Hermes-2-Mixtral-8x7B is poised to revolutionize how we interact with and leverage AI technology.