firecrawl/apps/rust-sdk/README.md

# Firecrawl Rust SDK
The Firecrawl Rust SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.

## Installation

To install the Firecrawl Rust SDK, add the following to your `Cargo.toml`:

```toml
[dependencies]
firecrawl = "^0.1"
tokio = { version = "^1", features = ["full"] }
```

To add it in your codebase.

## Usage

First, you need to obtain an API key from [firecrawl.dev](https://firecrawl.dev). Then, you need to initialize the `FirecrawlApp` like so:

```rust
use firecrawl::FirecrawlApp;

#[tokio::main]
async fn main() {
    // Initialize the FirecrawlApp with the API key
    let app = FirecrawlApp::new("fc-YOUR-API-KEY").expect("Failed to initialize FirecrawlApp");

    // ...
}
```

### Scraping a URL

To scrape a single URL, use the `scrape_url` method. It takes the URL as a parameter and returns the scraped data as a `Document`.

```rust
let scrape_result = app.scrape_url("https://firecrawl.dev", None).await;
match scrape_result {
    Ok(data) => println!("Scrape result:\n{}", data.markdown),
    Err(e) => eprintln!("Scrape failed: {}", e),
}
```

### Scraping with Extract

With Extract, you can easily extract structured data from any URL. You need to specify your schema in the JSON Schema format, using the `serde_json::json!` macro.

```rust
let json_schema = json!({
    "type": "object",
    "properties": {
        "top": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "points": {"type": "number"},
                    "by": {"type": "string"},
                    "commentsURL": {"type": "string"}
                },
                "required": ["title", "points", "by", "commentsURL"]
            },
            "minItems": 5,
            "maxItems": 5,
            "description": "Top 5 stories on Hacker News"
        }
    },
    "required": ["top"]
});

let llm_extraction_options = ScrapeOptions {
    formats: vec![ ScrapeFormats::Extract ].into(),
    extract: ExtractOptions {
        schema: json_schema.into(),
        ..Default::default()
    }.into(),
    ..Default::default()
};

let llm_extraction_result = app
    .scrape_url("https://news.ycombinator.com", llm_extraction_options)
    .await;

match llm_extraction_result {
    Ok(data) => println!("LLM Extraction Result:\n{:#?}", data.extract.unwrap()),
    Err(e) => eprintln!("LLM Extraction failed: {}", e),
}
```

### Crawling a Website

To crawl a website, use the `crawl_url` method. This will wait for the crawl to complete, which may take a long time based on your starting URL and your options.

```rust
let crawl_options = CrawlOptions {
    exclude_paths: vec![ "blog/*".into() ].into(),
    ..Default::default()
};

let crawl_result = app
    .crawl_url("https://mendable.ai", crawl_options)
    .await;

match crawl_result {
    Ok(data) => println!("Crawl Result (used {} credits):\n{:#?}", data.credits_used, data.data),
    Err(e) => eprintln!("Crawl failed: {}", e),
}
```

#### Crawling asynchronously

To crawl without waiting for the result, use the `crawl_url_async` method. It takes the same parameters, but it returns a `CrawlAsyncRespone` struct, containing the crawl's ID. You can use that ID with the `check_crawl_status` method to check the status at any time. Do note that completed crawls are deleted after 24 hours.

```rust
let crawl_id = app.crawl_url_async("https://mendable.ai", None).await?.id;

// ... later ...

let status = app.check_crawl_status(crawl_id).await?;

if status.status == CrawlStatusTypes::Completed {
    println!("Crawl is done: {:#?}", status.data);
} else {
    // ... wait some more ...
}
```

### Map a URL (Alpha)

Map all associated links from a starting URL.

```rust
let map_result = app
    .map_url("https://firecrawl.dev", None)
    .await;

match map_result {
    Ok(data) => println!("Mapped URLs: {:#?}", data),
    Err(e) => eprintln!("Map failed: {}", e),
}
```

## Error Handling

The SDK handles errors returned by the Firecrawl API and by our dependencies, and combines them into the `FirecrawlError` enum, implementing `Error`, `Debug` and `Display`. All of our methods return a `Result<T, FirecrawlError>`.

## Running the Tests with Cargo

To ensure the functionality of the Firecrawl Rust SDK, we have included end-to-end tests using `cargo`. These tests cover various aspects of the SDK, including URL scraping, web searching, and website crawling.

### Running the Tests

To run the tests, execute the following commands:
```bash
$ export $(xargs < ./tests/.env)
$ cargo test --test e2e_with_auth
```

## Contributing

Contributions to the Firecrawl Rust SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

## License

The Firecrawl Rust SDK is open-source and released under the [AGPL License](https://www.gnu.org/licenses/agpl-3.0.en.html).
feat: rust sdk initial commit 2024-07-07 20:40:52 +08:00			`# Firecrawl Rust SDK`
chore: update deps by allowing bumps + add uuid 2024-07-11 22:47:44 +08:00			`The Firecrawl Rust SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.`

chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`## Installation`

			To install the Firecrawl Rust SDK, add the following to your `Cargo.toml`:

			```toml
			`[dependencies]`
updating cargo pckg name n version 2024-08-15 21:11:27 +08:00			`firecrawl = "^0.1"`
chore: update doc and libs 2024-07-12 21:22:55 +08:00			`tokio = { version = "^1", features = ["full"] }`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			```

chore: update doc and libs 2024-07-12 21:22:55 +08:00			`To add it in your codebase.`

chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`## Usage`

Update docs and example 2024-09-21 02:08:33 +08:00			First, you need to obtain an API key from [firecrawl.dev](https://firecrawl.dev). Then, you need to initialize the `FirecrawlApp` like so:
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			```rust
updating cargo pckg name n version 2024-08-15 21:11:27 +08:00			`use firecrawl::FirecrawlApp;`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			`#[tokio::main]`
			`async fn main() {`
			`// Initialize the FirecrawlApp with the API key`
Update docs and example 2024-09-21 02:08:33 +08:00			`let app = FirecrawlApp::new("fc-YOUR-API-KEY").expect("Failed to initialize FirecrawlApp");`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
Update docs and example 2024-09-21 02:08:33 +08:00			`// ...`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`}`
			```

			`### Scraping a URL`

Update docs and example 2024-09-21 02:08:33 +08:00			To scrape a single URL, use the `scrape_url` method. It takes the URL as a parameter and returns the scraped data as a `Document`.
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			```rust
Update docs and example 2024-09-21 02:08:33 +08:00			`let scrape_result = app.scrape_url("https://firecrawl.dev", None).await;`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`match scrape_result {`
Update docs and example 2024-09-21 02:08:33 +08:00			`Ok(data) => println!("Scrape result:\n{}", data.markdown),`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`Err(e) => eprintln!("Scrape failed: {}", e),`
			`}`
			```

Update docs and example 2024-09-21 02:08:33 +08:00			`### Scraping with Extract`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
Update docs and example 2024-09-21 02:08:33 +08:00			With Extract, you can easily extract structured data from any URL. You need to specify your schema in the JSON Schema format, using the `serde_json::json!` macro.
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			```rust
			`let json_schema = json!({`
			`"type": "object",`
			`"properties": {`
			`"top": {`
			`"type": "array",`
			`"items": {`
			`"type": "object",`
			`"properties": {`
			`"title": {"type": "string"},`
			`"points": {"type": "number"},`
			`"by": {"type": "string"},`
			`"commentsURL": {"type": "string"}`
			`},`
			`"required": ["title", "points", "by", "commentsURL"]`
			`},`
			`"minItems": 5,`
			`"maxItems": 5,`
			`"description": "Top 5 stories on Hacker News"`
			`}`
			`},`
			`"required": ["top"]`
			`});`

Update docs and example 2024-09-21 02:08:33 +08:00			`let llm_extraction_options = ScrapeOptions {`
			`formats: vec![ ScrapeFormats::Extract ].into(),`
			`extract: ExtractOptions {`
			`schema: json_schema.into(),`
			`..Default::default()`
			`}.into(),`
			`..Default::default()`
			`};`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			`let llm_extraction_result = app`
Update docs and example 2024-09-21 02:08:33 +08:00			`.scrape_url("https://news.ycombinator.com", llm_extraction_options)`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`.await;`
Update docs and example 2024-09-21 02:08:33 +08:00
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`match llm_extraction_result {`
Update docs and example 2024-09-21 02:08:33 +08:00			`Ok(data) => println!("LLM Extraction Result:\n{:#?}", data.extract.unwrap()),`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`Err(e) => eprintln!("LLM Extraction failed: {}", e),`
			`}`
			```

			`### Crawling a Website`

Update docs and example 2024-09-21 02:08:33 +08:00			To crawl a website, use the `crawl_url` method. This will wait for the crawl to complete, which may take a long time based on your starting URL and your options.
chore: update deps by allowing bumps + add uuid 2024-07-11 22:47:44 +08:00
			```rust
Update docs and example 2024-09-21 02:08:33 +08:00			`let crawl_options = CrawlOptions {`
			`exclude_paths: vec![ "blog/*".into() ].into(),`
			`..Default::default()`
			`};`
chore: update doc and libs 2024-07-12 21:22:55 +08:00
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`let crawl_result = app`
Update docs and example 2024-09-21 02:08:33 +08:00			`.crawl_url("https://mendable.ai", crawl_options)`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`.await;`
Update docs and example 2024-09-21 02:08:33 +08:00
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`match crawl_result {`
Update docs and example 2024-09-21 02:08:33 +08:00			`Ok(data) => println!("Crawl Result (used {} credits):\n{:#?}", data.credits_used, data.data),`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`Err(e) => eprintln!("Crawl failed: {}", e),`
			`}`
chore: update deps by allowing bumps + add uuid 2024-07-11 22:47:44 +08:00			```
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
Update docs and example 2024-09-21 02:08:33 +08:00			`#### Crawling asynchronously`

			To crawl without waiting for the result, use the `crawl_url_async` method. It takes the same parameters, but it returns a `CrawlAsyncRespone` struct, containing the crawl's ID. You can use that ID with the `check_crawl_status` method to check the status at any time. Do note that completed crawls are deleted after 24 hours.

			```rust
			`let crawl_id = app.crawl_url_async("https://mendable.ai", None).await?.id;`

			`// ... later ...`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
Update docs and example 2024-09-21 02:08:33 +08:00			`let status = app.check_crawl_status(crawl_id).await?;`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
Update docs and example 2024-09-21 02:08:33 +08:00			`if status.status == CrawlStatusTypes::Completed {`
			`println!("Crawl is done: {:#?}", status.data);`
			`} else {`
			`// ... wait some more ...`
			`}`
			```

			`### Map a URL (Alpha)`

			`Map all associated links from a starting URL.`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			```rust
Update docs and example 2024-09-21 02:08:33 +08:00			`let map_result = app`
			`.map_url("https://firecrawl.dev", None)`
			`.await;`

			`match map_result {`
			`Ok(data) => println!("Mapped URLs: {:#?}", data),`
			`Err(e) => eprintln!("Map failed: {}", e),`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			`}`
			```

			`## Error Handling`

Update docs 2024-09-21 02:30:46 +08:00			The SDK handles errors returned by the Firecrawl API and by our dependencies, and combines them into the `FirecrawlError` enum, implementing `Error`, `Debug` and `Display`. All of our methods return a `Result<T, FirecrawlError>`.
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00
			`## Running the Tests with Cargo`

			To ensure the functionality of the Firecrawl Rust SDK, we have included end-to-end tests using `cargo`. These tests cover various aspects of the SDK, including URL scraping, web searching, and website crawling.

			`### Running the Tests`

			`To run the tests, execute the following commands:`
			```bash
chore: update doc and libs 2024-07-12 21:22:55 +08:00			`$ export $(xargs < ./tests/.env)`
			`$ cargo test --test e2e_with_auth`
chore(doc): adding sdk rust lib 2024-07-12 19:59:41 +08:00			```

			`## Contributing`

			`Contributions to the Firecrawl Rust SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.`

			`## License`

			`The Firecrawl Rust SDK is open-source and released under the [AGPL License](https://www.gnu.org/licenses/agpl-3.0.en.html).`