Llama-3-8B-Web: How to Connect Llama to the Web

Llama-3-8B-Web is the perfect Llama 3 variation that connects the Llama 3 8B model to the Ineternet!

1000+ Pre-built AI Apps for Any Use Case

Llama-3-8B-Web: How to Connect Llama to the Web

Start for free
Contents

Llama-3-8B-Web: A Breakthrough in Web Navigation Agents

The McGill-NLP team has recently released Llama-3-8B-Web, a groundbreaking web navigation agent that has set a new standard in the field. Built upon Meta's state-of-the-art Llama 3 language model and fine-tuned on the extensive WebLINX dataset, Llama-3-8B-Web has demonstrated remarkable capabilities in navigating the web, following instructions, and engaging in meaningful dialogue with users.

💡
Want to test out the Llama 3 Models Online right now?

Try out the latest Llama-3-8B and Llama-3-70B Online at Anakin AI!

Anakin AI is the all-in-one platform where you can connect to virtually any model available. Pay one subscription for all AI Models!
Meta Llama-3-8B | Free AI tool | Anakin.ai
Meta Llama 3 is a powerful open-source AI assistant that can help with a wide range of tasks like learning, coding, creative writing, and answering questions.
Meta Llama-3-70B | Free AI tool | Anakin.ai
Experience the cutting-edge Llama-3-70B model released by Meta, Try out this state-of-the-art language model with just a click!

Llama-3-8B-Web is Surpassing GPT-4V on the WebLINX Benchmark

One of the most impressive achievements of Llama-3-8B-Web is its performance on the WebLINX benchmark, a comprehensive evaluation of web navigation agents that includes human-centric browsing through dialogue. Llama-3-8B-Web has surpassed GPT-4V, a leading competitor, by an impressive 18% on this benchmark, achieving an overall score of 28.8% on the out-of-domain test splits, compared to GPT-4V's 10.5%.

The WebLINX benchmark consists of four out-of-domain test splits, covering over 1,000 real-world demos across 150 websites from 15 geographic locations. Llama-3-8B-Web's superior performance on this benchmark demonstrates its ability to generalize to new websites, domains, and scenarios where the user relies on dialogue to navigate the web.

Model Overall Score Link Selection (seg-F1) Element Clicking (IoU) Response Alignment (chr-F1)
Llama-3-8B-Web 28.8% 34.1% 27.1% 37.5%
GPT-4V 10.5% 18.9% 13.6% 3.1%
GPT-3.5 - - - -
MindAct-3B - - - -

The table above provides a detailed comparison of Llama-3-8B-Web's performance against GPT-4V and other models on the WebLINX benchmark. Llama-3-8B-Web outperforms GPT-4V in all three key metrics:

Link Selection (seg-F1): Llama-3-8B-Web achieves a score of 34.1%, compared to GPT-4V's 18.9%, demonstrating its superior ability to choose useful links on web pages.

Element Clicking (IoU): Llama-3-8B-Web scores 27.1% on the Intersection over Union (IoU) metric, which measures the model's accuracy in clicking on relevant elements. In comparison, GPT-4V achieves only 13.6%.

Response Alignment (chr-F1): Llama-3-8B-Web's responses are significantly more aligned with the user's intent, achieving a character-level F1 score of 37.5%, compared to GPT-4V's 3.1%.

These results highlight Llama-3-8B-Web's exceptional performance in web navigation tasks, surpassing not only GPT-4V but also other fine-tuned models like GPT-3.5 and MindAct-3B. The model's ability to generalize to new websites, domains, and scenarios, coupled with its strong performance in link selection, element clicking, and response alignment, make it a highly capable and versatile web navigation agent.

Fine-Tuning on the WebLINX Dataset

To create Llama-3-8B-Web, the McGill-NLP team fine-tuned Meta's Llama-3-8B-Instruct model on 24,000 web interactions from the WebLINX training set. These interactions include clicking, typing, submitting forms, and replying to user queries. By leveraging this diverse and extensive dataset, Llama-3-8B-Web has acquired the knowledge and skills necessary to navigate complex websites and provide relevant information to users.

Integration with the Hugging Face Ecosystem

Llama-3-8B-Web is seamlessly integrated with the Hugging Face ecosystem, making it easy for developers to load the dataset using 🤗Dataset and the agent from 🤗Hub with pipeline. With just a few lines of code, developers can predict actions and integrate Llama-3-8B-Web into their own projects.

from datasets import load_dataset
from huggingface_hub import snapshot_download
from transformers import pipeline

valid = load_dataset("McGill-NLP/WebLINX", split="validation")
snapshot_download("McGill-NLP/WebLINX", "dataset", allow_patterns="templates/*")
template = open('templates/llama.txt').read()

state = template.format(**valid[0])
agent = pipeline("McGill-NLP/Llama-3-8b-Web")
out = agent(state, return_full_text=False)[0]
print("Action:", out['generated_text'])

The WebLlama Project

Alongside the release of Llama-3-8B-Web, the McGill-NLP team has launched the 🖥️WebLlama project (https://webllama.github.io). The goal of this project is to make it easy for developers to train, evaluate, and deploy Llama-3 agents. WebLlama aims to build agents that won't replace users but rather equip them with powerful assistants to enhance their web browsing experience.

The training and evaluation code for Llama-3-8B-Web is available on the WebLlama GitHub repository (https://github.com/McGill-NLP/webllama), along with the exact YAML configurations used in the training pipeline. This transparency and accessibility encourage the community to build upon and improve the current state-of-the-art in web navigation agents.

Future Directions

The McGill-NLP team has ambitious plans for the future of WebLlama and Llama-3-8B-Web. They are actively working on incorporating additional datasets, such as Mind2Web, to further enhance the agents' capabilities. Mind2Web, developed by the OSU-NLP group, is a dataset for autonomous navigation covering 137 websites, which will provide valuable training data for the next generation of WebLlama agents.

In addition to expanding the training data, the WebLlama project aims to provide reliable evaluation for many benchmarks, including Mind2Web, WebArena, VisualWebArena, and WorkArena. These dynamic benchmarks will help assess the agents' performance in various real-world scenarios and ensure their effectiveness and reliability.

Finally, the McGill-NLP team is working on integrating WebLlama agents with existing deployment platforms, such as ServiceNow Research's BrowserGym, LaVague, and Playwright. This integration will make it easier for developers to deploy WebLlama agents and bring their powerful web navigation capabilities to end-users.

Conclusion

Llama-3-8B-Web represents a significant milestone in the development of web navigation agents. By leveraging the power of Meta's Llama 3 language model and fine-tuning it on the extensive WebLINX dataset, the McGill-NLP team has created an agent that surpasses the state-of-the-art in web navigation and dialogue. With the launch of the WebLlama project and the ongoing efforts to expand training data, improve evaluation, and streamline deployment, the future of web navigation agents looks brighter than ever.

💡
Want to test out the Llama 3 Models Online right now?

Try out the latest Llama-3-8B and Llama-3-70B Online at Anakin AI!

Anakin AI is the all-in-one platform where you can connect to virtually any model available. Pay one subscription for all AI Models!
Meta Llama-3-8B | Free AI tool | Anakin.ai
Meta Llama 3 is a powerful open-source AI assistant that can help with a wide range of tasks like learning, coding, creative writing, and answering questions.
Meta Llama-3-70B | Free AI tool | Anakin.ai
Experience the cutting-edge Llama-3-70B model released by Meta, Try out this state-of-the-art language model with just a click!