PPLX 70b | Chat Online | 無料のAIツール

Annie
3

Don't wait—experience the power of PPLX 70b and pplx-7b-online today and unlock a new world of information at your fingertips!

チャットボット

アプリの概要

PPLX 70b | Chat Online

Introduction to PPLX 70B Models.

We're thrilled to introduce our groundbreaking PPLX Online LLMs: pplx-7b-online and pplx-70b-online. These online models represent a leap forward in the world of language models, offering you access to the most up-to-date, factual, and helpful information. These models are accessible through the pplx-api, setting a new standard for API capabilities in the field of AI. Moreover, you can also experience these models in action via Perplexity Labs, our LLM playground.

PPLX Online LLMs

PPLX Online LLMs, including pplx-7b-online and pplx-70b-online, are designed to address two common limitations found in most language models today:

  1. Freshness: Many language models struggle to provide the latest information available on the internet.

  2. Hallucinations: Some language models produce inaccurate statements, causing confusion.

Our pplx-7b-online and pplx-70b-online models tackle these limitations head-on by ensuring that their responses are not only accurate and helpful but also up-to-date, thanks to their ability to utilize knowledge from the web.

How PPLX Online LLMs Work

Here's an overview of how our online LLMs operate:

  • Leveraging open-sourced models: Our PPLX models build upon the foundations of the mistral-7b and llama2-70b base models.

  • In-house search technology: We have developed an in-house search, indexing, and crawling infrastructure that allows us to augment our LLMs with the most relevant, current, and valuable information from the web. Our search index is extensive, frequently updated, and prioritizes high-quality, non-SEOed websites. "Snippets" from websites are provided to our pplx-online models to ensure responses are based on the latest information.

  • Fine-tuning: Our PPLX models have undergone meticulous fine-tuning processes to effectively use snippets for their responses. We collaborate with in-house data contractors to curate high-quality, diverse, and large training datasets to enhance performance in various aspects, including helpfulness, factuality, and freshness. Continuous fine-tuning ensures ongoing improvement.

Evaluating Perplexity's Online LLMs

Perplexity's mission is to create the world's most reliable answer engine, one that people trust for expanding their knowledge. To assess the performance of our LLMs in delivering helpful, factual, and up-to-date information, we've curated evaluation datasets that present challenging yet realistic use cases for answer engines.

Criteria for Evaluation

Our LLMs were evaluated based on the following criteria:

  • Helpfulness: Determining which response effectively answers the query and follows specified instructions.

  • Factuality: Assessing which response provides more accurate answers without hallucinations, even for questions requiring precise or niche knowledge.

  • Freshness (inspired by FreshLLMs): Evaluating responses for the presence of up-to-date information, indicating a model's ability to provide "fresh" answers.

Responses were also assessed holistically, taking into account the response that users would prefer to receive from a human assistant helping with the query.

Curating the Evaluation Set

We meticulously curated a diverse set of 150 prompts for evaluation, each carefully selected to effectively assess helpfulness, factuality, and freshness. This dataset covers a wide range of answer engine prompts, ensuring comprehensive evaluations aligned with Perplexity's goals.

Generating Model Responses

Four models were evaluated:

  • pplx-7b-online: Perplexity's model with internet access, fine-tuned using mistral-7b.

  • pplx-70b-online: Perplexity's model with internet access, fine-tuned using llama2-70b.

  • gpt-3.5-turbo-1106: OpenAI's model accessed via the API without additional augmentations.

  • llama2-70b-chat: Meta AI's model accessed via our pplx-api with no additional augmentations.

All models received the same search results and snippets for each prompt, and responses were generated using identical hyperparameters and system prompts.

Ranking Model Responses with Human Evaluation

In-house contractors evaluated model responses through pairwise comparisons, considering criteria such as helpfulness, factuality, and freshness. Evaluators were instructed to select the response they holistically preferred and the one that performed better on the specific evaluation criterion. The order of responses was randomized, and the source models were concealed from evaluators. Evaluators were also permitted to use internet searches to verify response accuracy.

Evaluation Results

Pairwise preference rankings collected from human evaluation were used to calculate per-task Elo scores for each model. These Elo scores offer insights into the relative performance of models. In this evaluation, PPLX models consistently matched or outperformed gpt-3.5 and llama2-70b on Perplexity-related use cases, particularly in terms of accuracy and freshness.

Figure 1: Estimated Elo scores and 95% confidence intervals

Accessing Perplexity's Online Models

We are excited to announce the official release of the pplx-api, which now provides public access to our pplx-7b-online and pplx-70b-online models. Additionally, our pplx-7b-chat and pplx-70b-chat models have transitioned from alpha to general release. We have also introduced a new usage-based pricing structure using Anakin AI's API capacity.

Don't wait—experience the power of PPLX 70b and pplx-7b-online today and unlock a new world of information at your fingertips!

前置きのプロンプト