WizardLM 2: Open Source LLM from Microsoft that Quietly Bypasses GPT-4

Microsoft has recently unveiled WizardLM 2, a groundbreaking family of large language models that push the boundaries of artificial intelligence. These models showcase significant improvements in complex chat, multilingual understanding, reasoning, and agent capabilities, surpassing their predecessor, WizardLM, and other leading open-source models.

💡

Interested in the latest AI News? Want to use the latest AI Models in One Place?

Visit Anakin AI, where you can build AI Apps with ANY AI Model, using a No Code App Builder!

Start for free

WizardLM-2 LLM Family: A Trio of Cutting-Edge Models

WizardLM 2 introduces three remarkable models, each tailored to specific needs and performance requirements:

WizardLM-2 8x22B: As Microsoft's most advanced model, WizardLM-2 8x22B demonstrates highly competitive performance compared to leading proprietary models like GPT-4. It consistently outperforms all existing state-of-the-art open-source models, making it the best choice for tackling complex tasks.

WizardLM-2 70B: This model reaches top-tier reasoning capabilities and is the first choice in the 70B parameter size category. It offers an excellent balance between performance and resource requirements.

WizardLM-2 7B: Despite its smaller size, WizardLM-2 7B is incredibly fast and achieves comparable performance to open-source models 10 times its size. It is an ideal choice for applications that require efficiency without compromising on quality.

WizardLM 2 Benchmarks: Compared to GPT-4

Compared WizardLM2 Benchmarks to GPT-4-1106-preview, Command R Plus, Mistral Large, Qwen 1.5, Straling LM 7B.

To assess the performance of WizardLM 2, Microsoft conducted extensive automatic and human evaluations across various benchmarks and real-world scenarios. The results speak for themselves:

Benchmark	WizardLM-2 8x22B	WizardLM-2 70B	WizardLM-2 7B
MT-Bench	Highly competitive with GPT-4 and Claude 3	Top performing open model in its size category	Top performing open model in its size category
Human Evaluation on Complex Instructions	Slightly underperforms GPT-4, significantly outperforms Command R Plus	Surpasses GPT4-0613, Mistral-Large, and Qwen1.5-72B-Chat	-
AlpacaEval	-	-	WizardLM-13B-V1.2 achieves 89.17%, exceeding ChatGPT's 86.09%
WizardLM Eval	-	-	WizardLM-13B-V1.2 scores 101.4% compared to ChatGPT's 100%

These impressive results validate the effectiveness of the Evol-Instruct training approach. Both the automatic and human evaluations consistently show WizardLM 2 outperforming open-source alternatives like Alpaca and Vicuna, which rely on simpler human-created instruction data.

How WizardLM 2 was Trained

The secret behind WizardLM 2's exceptional performance lies in Evol-Instruct, a revolutionary training methodology developed by Microsoft.

Evol-Instruct leverages large language models to iteratively rewrite an initial set of instructions into increasingly complex variations. This evolved instruction data is then used to fine-tune the base models, resulting in a significant boost in their ability to handle intricate tasks.
Evol-Instruct has become a fundamental technology for the GenAI community, enabling the creation of large amounts of high-complexity instruction data that would be incredibly difficult for humans to generate. By automating the process of generating diverse and challenging training data, Microsoft has paved the way for the rapid advancement of large language models.

Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF)

Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF) have become fundamental technologies for the GenAI community since their introduction by Microsoft. These innovative training methodologies have played a crucial role in the development of the Wizard series of large language models, including the latest iteration, WizardLM 2.

Evol-Instruct is an evolutionary approach to generating high-quality instruction data for training language models. By leveraging LLMs to iteratively rewrite an initial set of instructions into more complex variations, Evol-Instruct enables the creation of diverse and challenging training data that would be difficult for humans to generate manually. This evolved instruction data is then used to fine-tune the base models, resulting in significant performance improvements.

RLEIF, on the other hand, is a reinforcement learning framework that combines instruction quality reward models (IRM) with process supervision reward models (PRM) to achieve more precise correctness during online training. This approach allows the language models to learn from their own generated responses and iteratively improve their performance based on the feedback provided by the reward models.

The combination of Evol-Instruct and RLEIF has been instrumental in the development of WizardLM 2, enabling the models to achieve state-of-the-art performance on a wide range of tasks, including complex chat, multilingual understanding, reasoning, and agent capabilities. As these technologies continue to evolve and mature, they are expected to play an increasingly important role in the advancement of large language models and the GenAI community as a whole.

AI Align AI (AAA)

AI Align AI (AAA) is a novel framework introduced by Microsoft that enables multiple state-of-the-art LLMs to teach and improve each other. This innovative approach to model training leverages the collective knowledge and capabilities of diverse language models to enhance their individual performance and align their outputs.

The AAA framework consists of two main components:

Co-Teaching: In this phase, WizardLMs and various licensed open-source and proprietary state-of-the-art models engage in simulated chat, quality judging, improvement suggestions, and closing skill gaps. By interacting with each other and providing feedback, the models learn from their peers and refine their own capabilities.

Self-Teaching: WizardLM can generate new evolution training data for supervised learning and preference data for reinforcement learning via active learning from itself. This self-teaching mechanism allows the model to continuously improve its performance by learning from its own generated data and feedback.

The AAA framework has been a key contributor to the exceptional performance of WizardLM 2. By enabling the models to learn from each other and themselves, AAA has helped to bridge the gap between open-source and proprietary language models, resulting in a family of models that consistently outperform their peers across a wide range of tasks and benchmarks.

As the field of artificial intelligence continues to evolve, frameworks like AAA are expected to play an increasingly important role in the development of advanced language models. By fostering collaboration and knowledge sharing among diverse models, AAA has the potential to accelerate the progress of the GenAI community and push the boundaries of what is possible with large language models.

Progressive Learning and Data Pre-Processing

Progressive learning and data pre-processing are two essential components of Microsoft's fully AI-powered synthetic training system for WizardLM 2. These techniques have been instrumental in optimizing the training process and achieving superior performance with less data compared to traditional one-time training approaches.

In the progressive learning paradigm, different data partitions are used to train the models in a stage-by-stage manner. Each stage involves three key steps:

Evol Lab: The data slice is fed into the Evol Lab, where Evol-Instruct and Evol-Answer are applied to generate more diverse and complex [instruction, response] pairs. This process helps to enrich the training data and expose the models to a wider range of scenarios.

AI Align AI (AAA): The generated data is then passed through the AAA framework, where multiple state-of-the-art LLMs engage in co-teaching and self-teaching to improve each other's performance and align their outputs.

Learning: Finally, the models undergo supervised learning, Stage-DPO (a progressive offline reinforcement learning technique), and RLEIF to optimize their performance at each stage.

Data pre-processing is another crucial aspect of the training system. It involves three main steps:

Data Analysis: This pipeline is used to obtain the distribution of different attributes for new source data, providing a preliminary understanding of the data and guiding the subsequent steps.

Weighted Sampling: Based on experimental experience, the weights of various attributes in the training data are adjusted to better align with the optimal distribution for training, which may differ from the natural distribution of human chat corpora.

Progressive Learning: As described above, the pre-processed data is then used in the progressive learning pipeline to train the models in a stage-by-stage manner.

The combination of progressive learning and data pre-processing has enabled Microsoft to achieve significant performance improvements in WizardLM 2 while using less data compared to traditional training approaches. By carefully curating and optimizing the training data and leveraging the power of AI to guide the learning process, these techniques have set a new standard for the development of large language models in the GenAI community.

WizardLM2: The Microsoft's Open Source LLM Game?

Microsoft's commitment to advancing the field of artificial intelligence extends beyond the development of cutting-edge models. By open-sourcing WizardLM 2 and sharing the research behind it, Microsoft aims to empower the AI community to build upon their work and drive further innovation.

You can visit the WizardLM2 Hugging Face Card here.

The WizardLM 2 8x22B and 7B model weights are readily available on Hugging Face under the Apache 2.0 license, with the larger WizardLM-2 70B model set to be released in the coming days. To ensure optimal output quality, users should strictly follow the Vicuna-style multi-turn conversation format provided by Microsoft when interacting with the models.

In addition to the model weights, Microsoft has made several live demos of WizardLM 2 available, with more on the way. These demos provide an accessible way for researchers, developers, and enthusiasts to interact with and evaluate the models, fostering collaboration and experimentation within the AI community.

Conclusion

WizardLM 2 is a testament to Microsoft's unwavering commitment to advancing the field of artificial intelligence. By combining cutting-edge research, innovative training methodologies, and a dedication to open-source collaboration, Microsoft has created a family of large language models that are poised to revolutionize the way we approach complex tasks and interactions.

As researchers, developers, and enthusiasts explore the capabilities of WizardLM 2 and build upon its foundations, we can look forward to a future where AI-powered systems seamlessly integrate into our lives, enhancing our abilities and opening up new possibilities for growth and discovery. The journey ahead is filled with excitement and potential, and WizardLM 2 is just the beginning.

💡

Interested in the latest AI News? Want to use the latest AI Models in One Place?

Visit Anakin AI, where you can build AI Apps with ANY AI Model, using a No Code App Builder!

Start for free