Firecrawl: Crawling Websites into LLM-Ready Markdown

Discover the power of Firecrawl: Effortlessly crawl websites into LLM-ready markdown with detailed steps, sample code, and seamless integration with Langchain and Llama Index - unlock the potential of web content for your AI projects today!

1000+ Pre-built AI Apps for Any Use Case

Firecrawl: Crawling Websites into LLM-Ready Markdown

Start for free
Contents

Firecrawl is an innovative API service developed by Mendable.ai that simplifies the process of crawling websites and converting them into clean, LLM-ready markdown. With Firecrawl, you can easily transform entire websites into structured markdown format, making it effortless to integrate the content into various language models and applications.

💡
Want to try out Claude 3.5 Sonnet without Restrictions?

Searching for an AI Platform that gives you access to any AI Model with an All-in-One price tag?

Then, You cannot miss out Anakin AI!

Anakin AI is an all-in-one platform for all your workflow automation, create powerful AI App with an easy-to-use No Code App Builder, with Llama 3, Claude, GPT-4, Uncensored LLMs, Stable Diffusion...

Build Your Dream AI App within minutes, not weeks with Anakin AI!

Key Features of Firecrawl

Comprehensive Crawling: Firecrawl takes a URL as input and intelligently crawls all accessible subpages, ensuring that no relevant content is missed.

Markdown Conversion: The crawled content is automatically converted into clean and well-structured markdown format, ready to be consumed by language models.

No Sitemap Required: Firecrawl eliminates the need for a sitemap, as it dynamically discovers and crawls all accessible pages within a website.

Easy Integration: Firecrawl provides a user-friendly API, along with SDKs for Python and Node.js, making integration into your projects a breeze.

Langchain and Llama Index Support: Firecrawl seamlessly integrates with popular libraries like Langchain and Llama Index, enabling efficient document loading and processing.

Getting Started with Firecrawl

To get started with Firecrawl, follow these simple steps:

Sign up on the Firecrawl website to obtain your API key.

GitHub - mendableai/firecrawl: 🔥 Turn entire websites into LLM-ready markdown
🔥 Turn entire websites into LLM-ready markdown. Contribute to mendableai/firecrawl development by creating an account on GitHub.

Choose your preferred integration method:

  • API: Use the Firecrawl API directly by making HTTP requests to the provided endpoints.
  • Python SDK: Install the Firecrawl Python SDK using pip install firecrawl-py.
  • Node.js SDK: Install the Firecrawl Node.js SDK using npm install firecrawl-js.

Start crawling websites and retrieving LLM-ready markdown.

Using the Firecrawl Python SDK

Here's an example of how to use the Firecrawl Python SDK to crawl a website and retrieve the markdown content:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="YOUR_API_KEY")

# Crawl a website
crawl_result = app.crawl_url('mendable.ai', {'crawlerOptions': {'excludes': ['blog/*']}})

# Get the markdown for each crawled page
for result in crawl_result:
    print(result['markdown'])

In this example, we create an instance of the FirecrawlApp class by providing our API key. We then use the crawl_url method to initiate a crawl of the "mendable.ai" website, specifying crawler options to exclude certain paths if needed.

The crawl_result variable contains the crawled data, and we can iterate over each result to access the markdown content of each page.

Using the Firecrawl Node.js SDK

Similarly, here's an example of using the Firecrawl Node.js SDK:

const { FirecrawlApp } = require('firecrawl-js');

const app = new FirecrawlApp('YOUR_API_KEY');

// Crawl a website
app.crawlUrl('mendable.ai', { crawlerOptions: { excludes: ['blog/*'] } })
  .then((crawlResult) => {
    // Get the markdown for each crawled page
    crawlResult.forEach((result) => {
      console.log(result.markdown);
    });
  })
  .catch((error) => {
    console.error('Error:', error);
  });

The usage is similar to the Python SDK, where we create an instance of the FirecrawlApp class, provide the API key, and use the crawlUrl method to initiate the crawl. The crawled data is then accessible in the crawlResult variable.

How to Use Firecrawl with Langchain and Llama Index

Firecrawl seamlessly integrates with Langchain and Llama Index, allowing you to easily load crawled documents into these libraries for further processing and analysis.

Langchain Integration with Firecrawl

To use Firecrawl with Langchain, you can utilize the Firecrawl document loader provided by Langchain. Here's an example:

from langchain.document_loaders import FirecrawlLoader

loader = FirecrawlLoader(api_key="YOUR_API_KEY", url="https://mendable.ai")
documents = loader.load()

In this example, we create an instance of the FirecrawlLoader class, providing our API key and the URL of the website to crawl. The load method retrieves the crawled documents, which can then be used within Langchain for various tasks such as question answering, summarization, or text generation.

Llama Index Integration with Firecrawl

Firecrawl also integrates with Llama Index, allowing you to load crawled documents into an index for efficient retrieval and querying. Here's an example:

from llama_index import FirecrawlReader

reader = FirecrawlReader(api_key="YOUR_API_KEY")
documents = reader.load_data(urls=["https://mendable.ai"])
index = GPTSimpleVectorIndex(documents)

In this example, we create an instance of the FirecrawlReader class, providing our API key. We then use the load_data method to load the crawled documents from the specified URLs. Finally, we create an instance of the GPTSimpleVectorIndex class, passing the loaded documents to build an index for efficient querying and retrieval.

Conclusion

Firecrawl is a powerful tool that simplifies the process of crawling websites and converting them into LLM-ready markdown. With its easy-to-use API, SDKs, and seamless integration with popular libraries like Langchain and Llama Index, Firecrawl empowers developers to efficiently extract and utilize website content for various natural language processing tasks.

By leveraging Firecrawl, you can focus on building innovative applications and models without worrying about the complexities of web crawling and data preprocessing. Whether you're working on content analysis, question answering systems, or any other NLP-related projects, Firecrawl provides a reliable and efficient solution for acquiring high-quality markdown data from websites.

So, go ahead and explore the possibilities with Firecrawl! Sign up, obtain your API key, and start transforming websites into valuable LLM-ready markdown today.

💡
Want to try out Claude 3.5 Sonnet without Restrictions?

Searching for an AI Platform that gives you access to any AI Model with an All-in-One price tag?

Then, You cannot miss out Anakin AI!

Anakin AI is an all-in-one platform for all your workflow automation, create powerful AI App with an easy-to-use No Code App Builder, with Llama 3, Claude, GPT-4, Uncensored LLMs, Stable Diffusion...

Build Your Dream AI App within minutes, not weeks with Anakin AI!