# 🔥 Firecrawl Empower your AI apps with clean data from any website. Featuring advanced scraping, crawling, and data extraction capabilities. _This repository is in development, and we’re still integrating custom modules into the mono repo. It's not fully ready for self-hosted deployment yet, but you can run it locally._ ## What is Firecrawl? [Firecrawl](https://firecrawl.dev?ref=github) is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap required. Check out our [documentation](https://docs.firecrawl.dev). _Pst. hey, you, join our stargazers :)_

## How to use it? We provide an easy to use API with our hosted version. You can find the playground and documentation [here](https://firecrawl.dev/playground). You can also self host the backend if you'd like. Check out the following resources to get started: - [x] **API**: [Documentation](https://docs.firecrawl.dev/api-reference/introduction) - [x] **SDKs**: [Python](https://docs.firecrawl.dev/sdks/python), [Node](https://docs.firecrawl.dev/sdks/node), [Go](https://docs.firecrawl.dev/sdks/go), [Rust](https://docs.firecrawl.dev/sdks/rust) - [x] **LLM Frameworks**: [Langchain (python)](https://python.langchain.com/docs/integrations/document_loaders/firecrawl/), [Langchain (js)](https://js.langchain.com/docs/integrations/document_loaders/web_loaders/firecrawl), [Llama Index](https://docs.llamaindex.ai/en/latest/examples/data_connectors/WebPageDemo/#using-firecrawl-reader), [Crew.ai](https://docs.crewai.com/), [Composio](https://composio.dev/tools/firecrawl/all), [PraisonAI](https://docs.praison.ai/firecrawl/) - [x] **Low-code Frameworks**: [Dify](https://dify.ai/blog/dify-ai-blog-integrated-with-firecrawl), [Langflow](https://docs.langflow.org/), [Flowise AI](https://docs.flowiseai.com/integrations/langchain/document-loaders/firecrawl), [Cargo](https://docs.getcargo.io/integration/firecrawl), [Pipedream](https://pipedream.com/apps/firecrawl/) - [x] **Others**: [Zapier](https://zapier.com/apps/firecrawl/integrations), [Pabbly Connect](https://www.pabbly.com/connect/integrations/firecrawl/) - [ ] Want an SDK or Integration? Let us know by opening an issue. To run locally, refer to guide [here](https://github.com/mendableai/firecrawl/blob/main/CONTRIBUTING.md). ### API Key To use the API, you need to sign up on [Firecrawl](https://firecrawl.dev) and get an API key. ### Features - [**Scrape**](#scraping): scrapes a URL and get its content in LLM-ready format (markdown, structured data via [LLM Extract](#llm-extraction-beta), screenshot, html) - [**Crawl**](#crawling): scrapes all the URLs of a web page and return content in LLM-ready format - [**Map**](#map-alpha): input a website and get all the website urls - extremly fast ### Powerful Capabilities - **LLM-ready formats**: markdown, structured data, screenshot, HTML, links, metadata - **The hard stuff**: proxies, anti-bot mechanisms, dynamic content (js-rendered), output parsing, orchestration - **Customizability**: exclude tags, crawl behind auth walls with custom headers, max crawl depth, etc... - **Media parsing**: pdfs, docx, images. - **Reliability first**: designed to get the data you need - no matter how hard it is. - **Actions**: click, scroll, input, wait and more before extracting data You can find all of Firecrawl's capabilites and how to use them in our [documentation](https://docs.firecrawl.dev) ### Crawling Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl. ```bash curl -X POST https://api.firecrawl.dev/v1/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "scrapeOptions": { "formats": ["markdown", "html"] } }' ``` Returns a crawl job id and the url to check the status of the crawl. ```json { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v1/crawl/123-456-789" } ``` ### Check Crawl Job Used to check the status of a crawl job and get its result. ```bash curl -X GET https://api.firecrawl.dev/v1/crawl/123-456-789 \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' ``` ```json { "status": "completed", "total": 36, "creditsUsed": 36, "expiresAt": "2024-00-00T00:00:00.000Z", "data": [ { "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...", "html": "...", "metadata": { "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl", "language": "en", "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3", "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.", "ogLocaleAlternate": [], "statusCode": 200 } } ] } ``` ### Scraping Used to scrape a URL and get its content in the specified formats. ```bash curl -X POST https://api.firecrawl.dev/v1/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "formats" : ["markdown", "html"] }' ``` Response: ```json { "success": true, "data": { "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...", "html": "