Commit Graph

728 Commits

Author SHA1 Message Date
Nicolas
deae7dcd61 Update email_notification.ts 2024-06-06 10:41:54 -07:00
Nicolas
f725fa5a97 Update email_notification.ts 2024-06-06 10:41:23 -07:00
rafaelsideguide
fb758fa05e go 2024-06-06 14:01:16 -03:00
Nicolas
0310da6729 Update rate-limiter.ts 2024-06-06 09:31:44 -07:00
Nicolas
01503c1fbf Nick: 2024-06-06 09:29:25 -07:00
rafaelsideguide
b3cae4c858 adding js and testing twine 2024-06-06 13:27:31 -03:00
rafaelsideguide
bc1c1e5053 updating version to check if it runs 2024-06-06 11:41:01 -03:00
Rafael Miller
7686ad5702
Merge pull request #196 from mattjoyce/main
Python-SDK transitional build setup for pyproject.toml
2024-06-06 10:26:16 -03:00
Nicolas
525b4f2a83 Update rate-limiter.ts 2024-06-05 14:38:10 -07:00
Nicolas
d7f8208cdb Update email_notification.ts 2024-06-05 13:53:31 -07:00
Nicolas
ec10eb09f3 Update credit_billing.ts 2024-06-05 13:22:03 -07:00
Nicolas
5991000d2b Update credit_billing.ts 2024-06-05 13:21:15 -07:00
Nicolas
5683bb2cc8 Nick: 2024-06-05 13:20:26 -07:00
rafaelsideguide
164676c70a bugfix screenshot for readme pages 2024-06-05 15:34:42 -03:00
rafaelsideguide
935406b96a Merge branch 'main' into pr/196 2024-06-05 15:19:25 -03:00
Nicolas
b4c6819a54 Nick: 2024-06-05 11:11:09 -07:00
rafaelsideguide
0d51b11dcd missing breaks 2024-06-05 15:02:28 -03:00
Rafael Miller
64423441b2
Merge branch 'main' into main 2024-06-05 14:44:29 -03:00
Nicolas
beb7526d1d Update webhook.ts 2024-06-05 10:38:05 -07:00
Nicolas
1a16378fe8
Merge pull request #234 from JakobStadlhuber/feat/webhook-self-hosted
Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
2024-06-05 10:25:05 -07:00
Nicolas
7cb14edec8 Nick: 2024-06-05 10:13:52 -07:00
Rafael Miller
9e000ded03
Merge branch 'main' into feat/better-gdrive-pdf-fetch 2024-06-05 14:07:56 -03:00
rafaelsideguide
ccc55127d6 Added scroll xpaths on fire-engine for handling readme docs 2024-06-05 11:48:41 -03:00
rafaelsideguide
b5045d1661 [feat] improved the scrape for gdrive pdfs 2024-06-04 17:47:28 -03:00
Nicolas
96257b7b17 Update handleCustomScraping.ts 2024-06-04 12:22:46 -07:00
Nicolas
674500affa Nick: 2024-06-04 12:15:39 -07:00
rafaelsideguide
5ae4d1caf5 Update single_url.ts 2024-06-04 15:28:09 -03:00
Jakob Stadlhuber
9e5ddec207 Remove default webhook URL from .env.example
The default value for the SELF_HOSTED_WEBHOOK_URL in the .env.example file was removed to prevent unintentional exposure or usage. The users are now required to explicitly specify
2024-06-04 19:56:35 +02:00
Jakob Stadlhuber
6208f4207d Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
This commit introduces the capability of using a Self-Hosted Webhook URL. The application now checks for a self-hosted URL before querying the database for the webhook settings. If a Self-Hosted Webhook URL is set in the environment variables, it will be used directly, diminishing unnecessary database queries.
2024-06-04 19:55:07 +02:00
rafaelsideguide
93f3098672 build files 2024-06-04 14:54:54 -03:00
rafaelsideguide
64a4338ff0 Update single_url.ts 2024-06-04 14:40:05 -03:00
Rafael Miller
02fe470e20
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
Better fallbacks for initial crawl start
2024-06-04 14:31:10 -03:00
Rafael Miller
665a40d9f4
Merge pull request #212 from mendableai/bugfix/partial-data-js-sdk
[Bug] Improved js response and test for getting partial_data
2024-06-04 14:05:23 -03:00
rafaelsideguide
1f4c6b7a87 Update package.json 2024-06-04 13:59:48 -03:00
Rafael Miller
19c67916d4
Merge pull request #211 from mendableai/fix/rename-variables
[Fix] Changed timeout parameter name on js sdk
2024-06-04 13:57:58 -03:00
Rafael Miller
f4f87b5374
Merge branch 'main' into bugfix/partial-data-js-sdk 2024-06-04 13:40:42 -03:00
rafaelsideguide
4e3a0495d7 updated version 0.0.12 -> 0.0.13
- [ ] publish
2024-06-04 12:03:55 -03:00
Rafael Miller
b80fb374e5
Merge branch 'main' into playwright-service-bug-222 2024-06-04 11:57:17 -03:00
rafaelsideguide
6920ec8a61 bugfixing. already on main 2024-06-04 11:05:50 -03:00
Nicolas
d91b725c6f Update fly.toml 2024-06-04 00:41:15 -07:00
Nicolas
cbf8d79cce Update pdfProcessor.ts 2024-06-04 00:13:37 -07:00
Nicolas
3fc9004ba8 Update fly.toml 2024-06-03 23:49:46 -07:00
Nicolas
2ea01f1456 Update single_url.ts 2024-06-03 23:42:39 -07:00
Nicolas
854d5b3cb3 Update single_url.ts 2024-06-03 23:32:55 -07:00
Nicolas
99059814a8 Nick: 2024-06-03 21:32:48 -07:00
Nicolas
918059ee9e Merge branch 'main' into nsc/improvemnts-fixes-misc 2024-06-03 16:46:02 -07:00
Nicolas
38e583f66c Update socialBlockList.test.ts 2024-06-03 16:44:23 -07:00
Nicolas
c69c89f838 Nick: 2024-06-03 16:42:42 -07:00
Nicolas
48d1ec05b2 Merge branch 'main' into nsc/improved-blocklist 2024-06-03 16:38:03 -07:00
Nicolas
d30ced4394
Merge pull request #221 from mendableai/nsc/fwd-header-auth
feat: Ability to forward headers to reliable providers for auth etc...
2024-06-03 16:33:40 -07:00
Romain Bruyère
4987f901d1
Merge branch 'mendableai:main' into main 2024-06-03 21:29:33 +02:00
rafaelsideguide
4100cc9223 Update index.test.ts 2024-06-03 16:29:16 -03:00
rombru
3ff91ddd1f fix: use @ instead of # for default BULL_AUTH_KEY. hash mark is reserved for URI fragments. 2024-06-03 21:28:25 +02:00
rafaelsideguide
c1aed1360e Update index.test.ts 2024-06-03 15:51:07 -03:00
rafaelsideguide
1fc3a15149 Update single_url.ts 2024-06-03 15:24:40 -03:00
Nicolas
fde522c3e1 Update single_url.ts 2024-06-02 20:23:45 -07:00
Matt Joyce
deefe65cbe Change the way the playwright response is parsed
Was failing with a Type Error, but actually looked ok.
This fixes the type error, and stop scraper fallback.
2024-06-01 19:16:56 +10:00
Matt Joyce
14896a9fdd Fix PLAYWRIGHT_MICROSERVICE_URL
It needs to end in html, otherwise scrape will 404
2024-06-01 19:03:16 +10:00
Matt Joyce
1eacad4ef3 Clarifying wait type and name 2024-06-01 18:53:03 +10:00
Matt Joyce
c516140bfb Various Linting
Pylint
C0114: Missing module docstring
C0115: Missing class docstring
C0116: Missing function or method docstring
C0303: Trailing whitespace
Import ordering
2024-06-01 18:53:03 +10:00
Matt Joyce
2a39b5382b Add timeout to class and provide default. 2024-06-01 18:52:42 +10:00
Nicolas
8cb62dde92 Update website_params.ts 2024-05-31 16:09:39 -07:00
Nicolas
3b8059edb6 Update single_url.ts 2024-05-31 15:43:06 -07:00
Nicolas
6bea803120 Nick: 2024-05-31 15:39:54 -07:00
Nicolas
2139129296 Nick: v12 2024-05-31 11:39:55 -07:00
Nicolas
260e31c68b Merge branch 'nsc/new-pricing' 2024-05-30 16:08:31 -07:00
Nicolas
aa8133ca7f Update load-testing-example.ts 2024-05-30 16:07:14 -07:00
Nicolas
0c115c6181
Merge pull request #216 from mendableai/nsc/new-pricing
feat: New pricing/limits changes
2024-05-30 15:36:59 -07:00
Nicolas
6860ace4af Nick: 2024-05-30 15:07:49 -07:00
Nicolas
6ceb7ff50a Nick: 2024-05-30 14:46:55 -07:00
Nicolas
33f10a7f91 Nick: fixes 2024-05-30 14:42:32 -07:00
Nicolas
ace46f340b Nick: new limits, new pricing 2024-05-30 14:31:36 -07:00
Matt Joyce
5c4b3e8f8a Initial pyproject.toml
This will enable building using 'python -m build', without impacting the utility of setup.py, also provide a base for other build tools and automation.
2024-05-30 21:48:40 +10:00
Matt Joyce
dec225d368 Move version to __init__.py
Setup.py does not need to be edited when building the package.
2024-05-30 21:48:40 +10:00
rafaelsideguide
2b763d848b improved js response and test for getting partial_data 2024-05-30 08:44:38 -03:00
rafaelsideguide
5b8b6902e7 Update index.ts 2024-05-30 08:25:13 -03:00
Nicolas
6c939d534d Nick: small refactor 2024-05-29 19:43:51 -07:00
Eric Ciarla
37915e11e8 Final push 2024-05-29 21:18:24 -04:00
Eric Ciarla
a0e404f94e init commit 2024-05-29 18:56:57 -04:00
rafaelsideguide
ee9a2184e2 Added custom scraping conditions for readme docs 2024-05-29 13:39:43 -03:00
Nicolas
c20c38721d Update index.test.ts 2024-05-28 17:17:20 -07:00
Nicolas
0f43a12906 Update index.test.ts 2024-05-28 17:17:12 -07:00
Nicolas
f53d25efac Merge branch 'main' into nsc/wait-for-param 2024-05-28 12:56:28 -07:00
Nicolas
1b3547dcf2 Nick: 2024-05-28 12:56:24 -07:00
rafaelsideguide
71187b03a2 added timeout 2024-05-27 16:48:08 -03:00
rafaelsideguide
d5c83803cd fixing idempotency test 2024-05-27 16:35:01 -03:00
rafaelsideguide
41c4ef6a82 dotenv was missing 2024-05-27 16:23:57 -03:00
rafaelsideguide
127d2db1dd added js/ts sdk tests 2024-05-27 15:54:09 -03:00
rafaelsideguide
a9b68d95d8 Update test.py 2024-05-27 14:28:44 -03:00
rafaelsideguide
667d3e4c4f Merge branch 'test-sdks' of https://github.com/mendableai/firecrawl into test-sdks 2024-05-27 14:23:39 -03:00
rafaelsideguide
19decd1062 fixing workflow 2024-05-27 14:21:33 -03:00
Rafael Miller
3c8edf683c
Merge branch 'main' into test-sdks 2024-05-27 14:15:18 -03:00
rafaelsideguide
63772ea711 added github action workflow 2024-05-27 14:14:00 -03:00
Nicolas
1ef307cb6f Nick: checks 2024-05-27 10:01:12 -07:00
Nicolas
01cc91c53d Update fly.staging.toml 2024-05-27 10:00:52 -07:00
Nicolas
1de53cc4d0 Nick: fixes 2024-05-26 18:15:05 -07:00
Nicolas
efb821d63b
Merge branch 'main' into main 2024-05-26 18:12:23 -07:00
Nicolas
ed4226fd1f
Update setup.py 2024-05-26 18:11:54 -07:00
Nicolas
1bbfb98d7e
Merge pull request #186 from Keredu/main
Limit on /search is not deterministic
2024-05-26 18:08:16 -07:00
Nicolas
67a53a9ae0
Merge pull request #190 from simonha9/simonha9/improve-rate-limit-error-msg
Feat: Provide more details for 429 error msg
2024-05-26 18:07:42 -07:00
Nicolas
7e2df7bd5e Update auth.ts 2024-05-26 18:07:21 -07:00
Nicolas
7948c6cee2 Nick: fixed pip issues 2024-05-26 18:03:37 -07:00
Matt Joyce
b061e12030 added python versions requirement
this is inline with requests module, a critical dependency
2024-05-26 11:37:47 +10:00
Matt Joyce
f00dffbbb1 added misc PyPi keys
help potential users find and understand the purpose and status of the project.
2024-05-26 11:36:29 +10:00
Matt Joyce
cd7f260288 Added PyPi classifiers
These classifiers will help potential users find and understand the purpose and status of the project.  use python 3.8 as the base, because that's what module 'requests' needs.
2024-05-26 11:33:28 +10:00
Matt Joyce
e5c6ac23fe Added long description to PyPi
https://packaging.python.org/en/latest/guides/making-a-pypi-friendly-readme/
2024-05-26 10:01:35 +10:00
Simon H
115204e6b6 Feat: Provide more details for 429 error msg
- Added better error code for when rate limit exceeded including
consumed/remaining points, reset date and retry-after seconds
2024-05-25 12:03:20 -04:00
Keredu
2192978f91 Limit on /search is not deterministic 2024-05-25 00:12:26 +02:00
Nicolas
e98434606d Update blocklist.ts 2024-05-24 15:04:15 -07:00
Nicolas
e5c8719554 Update blocklist.ts 2024-05-24 14:53:04 -07:00
rafaelsideguide
397769c7e3 added python sdk e2e tests with pytest
some of them are still missing though
2024-05-24 17:56:27 -03:00
rafaelsideguide
d39860c08b Merge branch 'main' into feat/idempotency-key 2024-05-24 14:15:37 -03:00
Nicolas
8c380d70a5
Update firecrawl.py 2024-05-24 09:48:48 -07:00
Nicolas
65fe9c4f80
Merge branch 'main' into main 2024-05-24 09:47:12 -07:00
Rafael Miller
53a7ec0f6e
Removed hard coded timeout 2024-05-24 13:46:16 -03:00
Nicolas
e0d979edad
Merge pull request #176 from mendableai/bug/data-check-in-python-sdk
[Bug] Added data check for python SDK
2024-05-24 09:45:39 -07:00
Nicolas
53a214cefb
Merge pull request #168 from mendableai/nsc/allowed-keywords-in-blocklist
feat: Allow privacy/legal/ other pages in social media websites
2024-05-24 09:43:15 -07:00
Nicolas
e166c07690
Merge pull request #170 from qyou/fix-hardcode-timeout
update: wait until body attached in playwright-service
2024-05-24 09:41:27 -07:00
Jakob Stadlhuber
9fc5a0ff98 Update comment in .env.example for proxy settings
This commit modifies the comment in .env.example to specify that proxy settings are for Playwright. This clarification aims to provide users a more clear context about when and why these proxy settings are used.
2024-05-24 17:45:59 +02:00
Jakob Stadlhuber
b001aded46 Add proxy and media blocking configurations
Updated environment variables and application settings to include proxy configurations and media blocking option. The proxy settings allow users to use a proxy service, while the media blocking is an optional feature that can help save bandwidth. Changes have been made in the .env.example, docker-compose.yaml, and main.py files.
2024-05-24 17:41:34 +02:00
rafaelsideguide
7ca431b202 crawl load tests 7 and 8 2024-05-23 16:36:05 -03:00
rafaelsideguide
c201ea1986 added idempotency key to python sdk 2024-05-23 12:52:59 -03:00
rafaelsideguide
35927a65a5 Merge branch 'main' into feat/idempotency-key 2024-05-23 12:20:06 -03:00
rafaelsideguide
184e4678f1 bugfix on idempotency key check 2024-05-23 11:47:04 -03:00
Matt Joyce
96630154d3
Merge pull request #1 from mendableai/main
Fix FIRECRAWL_API_URL bug, also various PyLint fixes
2024-05-23 09:16:03 +10:00
Matt Joyce
106c18d11f Use truthiness check for 'success' key in API response
PyLint C0121
2024-05-23 08:57:53 +10:00
Matt Joyce
5c21aed9c7 adding pylintrc to allow longer lines 2024-05-23 08:45:56 +10:00
Matt Joyce
48e91c89e7 Removed unnecessary If block
PyLint R1731
2024-05-23 08:42:07 +10:00
Matt Joyce
7d2efe5acb Added request timeouts
connection timeout to 5 seconds and the response timeout to 10
PyLint W3101
2024-05-23 08:39:19 +10:00
Matt Joyce
96b19172a1 Removed trailing whitespace
PyLint C0303: Trailing whitespace (trailing-whitespace)
2024-05-23 08:30:23 +10:00
Matt Joyce
6216c85322 Time module already imported
Pylint
W0404: Reimport 'time' (imported line 16) (reimported)
C0415: Import outside toplevel (time) (import-outside-toplevel)
2024-05-23 08:21:32 +10:00
Matt Joyce
8adf2b7132 Added Docstrings for functions
PyLint C0116: Missing function or method docstring (missing-function-docstring)
2024-05-23 08:20:32 +10:00
Matt Joyce
971e1f85c4 Added module docstring
PyLint C0114 - missing-module-docstring
2024-05-23 08:03:58 +10:00
Matt Joyce
8d041c05b4 rearranged logic for FIRECRAWL_API_URL
It would not use the ENV unless the param was set to None which was counter-intuitive.
2024-05-23 08:00:56 +10:00
rafaelsideguide
aa6df4305e crawl load tests 6 and 7 2024-05-22 18:20:24 -03:00
Nicolas
4e39701644 Update main.py 2024-05-22 12:59:56 -07:00
rafaelsideguide
73f1d09d39 Update website_params.ts 2024-05-22 15:07:12 -03:00
Nicolas
3aa5f26627 Update main.py 2024-05-22 10:45:43 -07:00
Nicolas
3e63985e53 Update main.py 2024-05-22 10:40:47 -07:00
rafaelsideguide
4dfc371241 Update index.test.ts 2024-05-22 14:38:41 -03:00
rafaelsideguide
f4a3469b9e Merge branch 'main' into bug/crawl-limit 2024-05-22 14:27:28 -03:00
rafaelsideguide
ff147f1f51 load testing for crawl 2024-05-22 14:26:29 -03:00
Nicolas
0d187f0425
Merge pull request #77 from tractorjuice/patch-1
Add additional file extensions to crawler.ts
2024-05-22 10:16:49 -07:00
rafaelsideguide
04a0bef0fb Merge branch 'main' into test/load-testing 2024-05-22 11:26:19 -03:00
rafaelsideguide
e4573c08ca Update website_params.ts 2024-05-22 11:24:48 -03:00
rafaelsideguide
f9ae1729b6 Update firecrawl.py 2024-05-22 09:40:38 -03:00
rafaelsideguide
068a240ab4 load tests for scrape route 2024-05-22 09:30:32 -03:00
Nicolas
cb2bd0e71f Update index.test.ts 2024-05-21 19:03:32 -07:00
Nicolas
253abb849f Update rate-limiter.ts 2024-05-21 18:53:58 -07:00
Nicolas
229b9908d2 Nick: only enable hyper dx in prod 2024-05-21 18:52:46 -07:00
Nicolas
a8ff295977 Update single_url.ts 2024-05-21 18:50:42 -07:00
Nicolas
a5e718b084 Nick: improvements 2024-05-21 18:34:23 -07:00
Nicolas
6285f12cd1
Merge pull request #167 from mendableai/nsc/hyper-dx-integration
feat: HyperDX Integration
2024-05-21 13:19:38 -07:00
rafaelsideguide
75f4e34d8e Merge branch 'main' into test/load-testing 2024-05-21 10:28:02 -03:00
rafaelsideguide
ec46065066 Update rate-limiter.ts 2024-05-21 10:07:27 -03:00
rafaelsideguide
6a3ac13fe1 Update load-test.yml 2024-05-21 10:06:02 -03:00
youqiang
c47dae13a9 update: wait until body attached in playwright-service 2024-05-21 14:53:57 +08:00
Nicolas
7f64fe884a Update blocklist.ts 2024-05-20 17:26:01 -07:00
Nicolas
756f54466d Nick: allowed keywords for now 2024-05-20 17:24:21 -07:00
Nicolas
01783dc336 Update openapi.json 2024-05-20 17:10:55 -07:00
Nicolas
77a79b5a79 Nick: max num tokens for llm extract (for now) + slice the max 2024-05-20 17:07:38 -07:00
Nicolas
2644e1c029 Update .env.example 2024-05-20 13:36:51 -07:00
Nicolas
9e61d431f0 Nick: hyper dx integration init 2024-05-20 13:36:34 -07:00
Nicolas
d5d0d48848 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-20 10:06:52 -07:00
Nicolas
60002e79b8 Nick: python sdk bump 2024-05-20 10:06:48 -07:00
Matt Joyce
7e5ef4dec4 Allow override of API URL
Allows python sdk to be used with local installs.
2024-05-20 18:46:32 +10:00
Nicolas
c74f757b53 Update rate-limiter.ts 2024-05-19 13:05:36 -07:00
Nicolas
98a39b39ab Nick: increased rate limits 2024-05-19 12:59:29 -07:00
Nicolas
18fa15df25 Update index.test.ts 2024-05-19 12:50:06 -07:00
Nicolas
614c073af0 Nick: improvements 2024-05-19 12:45:46 -07:00
Nicolas
f473793ba3 Merge branch 'main' into feat/rate-limits 2024-05-19 12:23:34 -07:00
Nicolas
4efebf7a4b Merge branch 'test/load-testing' of https://github.com/mendableai/firecrawl into test/load-testing 2024-05-19 12:22:51 -07:00
Nicolas
5792cd022c Update fly.staging.toml 2024-05-19 12:22:49 -07:00
rafaelsideguide
d667e1417b added fly staging load test
- being rate limited. Need to add the token to the rate-limit functions
2024-05-17 19:09:19 -03:00
Nicolas
7630565c26 Create fly.staging.toml 2024-05-17 14:33:59 -07:00
rafaelsideguide
7297b21dcd Added load testing using artillery 2024-05-17 18:32:44 -03:00
rafaelsideguide
a480595aa7 Update index.test.ts 2024-05-17 15:41:27 -03:00
rafaelsideguide
54049be539 Added e2e tests 2024-05-17 15:37:47 -03:00
Nicolas
6feb21cc35 Update website_params.ts 2024-05-17 11:21:26 -07:00
Nicolas
5be208f595 Nick: fixed 2024-05-17 10:40:44 -07:00
Nicolas
eb88447e8b Update index.test.ts 2024-05-17 10:00:05 -07:00
Nicolas
df6c3d1e7d Merge branch 'main' into detect-pdfs 2024-05-17 09:55:51 -07:00
Nicolas
9d635cb2a3 Nick: docx support 2024-05-16 11:48:02 -07:00
Nicolas
bcce0544e7 Update openapi.json 2024-05-16 11:03:32 -07:00
Nicolas
80250fb54f Update index.test.ts 2024-05-15 17:40:46 -07:00
Nicolas
098db17913 Update index.ts 2024-05-15 17:37:09 -07:00
Nicolas
93b1f0334e Update index.test.ts 2024-05-15 17:35:06 -07:00
Nicolas
123fb784ca Update index.test.ts 2024-05-15 17:29:22 -07:00
Nicolas
4a6cfb6097 Update index.test.ts 2024-05-15 17:22:29 -07:00
Nicolas
6ca368327f Merge branch 'main' into test/crawl-options 2024-05-15 17:18:25 -07:00
Nicolas
24be4866c5 Nick: 2024-05-15 17:16:20 -07:00
Nicolas
ade4e05cff Nick: working 2024-05-15 17:13:04 -07:00
Nicolas
bfccaf670d Nick: fixes most of it 2024-05-15 15:30:37 -07:00
rafaelsideguide
d91043376c not working yet 2024-05-15 18:54:40 -03:00
rafaelsideguide
fa014defc7 Fixing child links only bug 2024-05-15 18:35:09 -03:00
Nicolas
2ba743fb1a
Merge pull request #27 from eltociear/patch-1
refactor: fix typo in WebScraper/index.ts
2024-05-15 13:28:38 -07:00
Nicolas
0663d78324
Merge pull request #119 from chand1012/main
Add Docker Compose for easy self hosting
2024-05-15 13:27:40 -07:00
rafaelsideguide
da8d94105d fixed for testing the crawl algorithm only 2024-05-15 17:16:03 -03:00
Nicolas
95ffaa2236 Update crawl.test.ts 2024-05-15 12:58:02 -07:00
Nicolas
f15b8f855e Update crawl.json 2024-05-15 12:57:24 -07:00
Nicolas
98dd672d0a Update crawl.json 2024-05-15 12:55:04 -07:00
Nicolas
499671c87f Update crawl.test.ts 2024-05-15 12:50:13 -07:00
Nicolas
58053eb423 Update rate-limiter.ts 2024-05-15 12:47:35 -07:00
Nicolas
4745d114be Update crawl.test.ts 2024-05-15 12:42:14 -07:00
Nicolas
1601e93d69 Merge branch 'main' into test/crawl-options 2024-05-15 12:34:47 -07:00
Nicolas
3678d3c986 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-15 12:11:18 -07:00
Nicolas
fd82982a31 Nick: 2024-05-15 12:11:16 -07:00
rafaelsideguide
4925ee59f6 added crawl test suite 2024-05-15 15:50:50 -03:00
Nicolas
1b0d6341d3 Update index.ts 2024-05-15 11:48:12 -07:00
Nicolas
d10f81e7fe Nick: fixes 2024-05-15 11:28:20 -07:00
Nicolas
87570bdfa1 Update index.ts 2024-05-15 11:06:03 -07:00
rafaelsideguide
d4574851be Added rpc definition 2024-05-15 08:40:21 -03:00
rafaelsideguide
47c20c80ab Update auth.ts 2024-05-15 08:34:49 -03:00
Ikko Eltociear Ashimine
e91c122c69
Merge branch 'main' into patch-1 2024-05-15 12:14:52 +09:00
Nicolas
7d8ceab6de Merge branch 'feat/rate-limits' of https://github.com/mendableai/firecrawl into feat/rate-limits 2024-05-14 14:48:01 -07:00
Nicolas
0e0faa28b3 Update auth.ts 2024-05-14 14:47:36 -07:00
rafaelsideguide
672eddb999 updated rpc 2024-05-14 18:47:21 -03:00
Nicolas
4761ea510b Update rate-limiter.ts 2024-05-14 14:26:42 -07:00
rafaelsideguide
40ad97dee8 added rate limits 2024-05-14 18:08:31 -03:00
Nicolas
27e1e22a0a Update index.test.ts 2024-05-14 12:28:25 -07:00
Nicolas
a0fdc6f7c6 Nick: 2024-05-14 12:12:40 -07:00
Nicolas
7f31959be7 Nick: 2024-05-14 12:04:36 -07:00
Nicolas
8a72cf556b Nick: 2024-05-13 21:10:58 -07:00
Nicolas
26a092f780 Update index.ts 2024-05-13 21:04:49 -07:00
Nicolas
8101cbee37 Update index.ts 2024-05-13 21:02:47 -07:00
Nicolas
86b8439844 Nick: 2024-05-13 20:51:42 -07:00
Nicolas
a96fc5b96d Nick: 4x speed 2024-05-13 20:45:11 -07:00
Nicolas
e26008a833 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-13 19:54:13 -07:00
Nicolas
512449e1aa Nick: v21 2024-05-13 19:54:12 -07:00
Nicolas
bd27b0e17e
Merge pull request #142 from mendableai/doc/crawl-limit-default
[Doc] Added default value for crawlOptions.limit
2024-05-13 18:38:09 -07:00
Nicolas
aa0c8188c9 Nick: 408 handling 2024-05-13 18:34:00 -07:00
Nicolas
999176d576 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-13 13:57:34 -07:00
Nicolas
f3ec21d9c4 Update runWebScraper.ts 2024-05-13 13:57:22 -07:00
Nicolas
65d89afba9 Nick: 2024-05-13 13:01:43 -07:00
Eric Ciarla
4cc46d4af8 Update models.ts 2024-05-13 15:23:31 -04:00
rafaelsideguide
8eb2e95f19 Cleaned up 2024-05-13 16:13:10 -03:00
Nicolas
2ce045912f Nick: disable vision right now 2024-05-13 10:56:08 -07:00
rafaelsideguide
f4348024c6 Added check during scraping to deal with pdfs
Checks if the URL is a PDF during the scraping process (single_url.ts).

TODO: Run integration tests - Does this strat affect the running time?

ps. Some comments need to be removed if we decide to proceed with this strategy.
2024-05-13 09:13:42 -03:00
Rafael Miller
5a2712fa5a
Merge branch 'main' into detect-pdfs 2024-05-10 15:53:13 -03:00
rafaelsideguide
bc6b929b43 [Bug] Fixing /crawl limit 2024-05-10 12:15:54 -03:00
rafaelsideguide
df16890f84 Added default value for crawlOptions.limit 2024-05-10 11:59:33 -03:00
rafaelsideguide
18480b2005 Removed .env.example, improved docs and docker compose envs 2024-05-10 11:38:17 -03:00
Nicolas
66bd1e4020 Update website_params.ts 2024-05-09 18:41:15 -07:00
Nicolas
c02a82c282 Update main.py 2024-05-09 18:02:34 -07:00
Nicolas
efc6fcb474 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-09 18:01:04 -07:00
Nicolas
73687822ad Update main.py 2024-05-09 18:00:58 -07:00
Nicolas
d21091bb06 Update single_url.ts 2024-05-09 17:52:46 -07:00
Nicolas
be85008622 Nick: better 2024-05-09 17:48:11 -07:00
Nicolas
be5661a768 Nick: a lot better 2024-05-09 17:45:16 -07:00
Nicolas
fce17e6beb Update credit_billing.ts 2024-05-09 15:29:58 -07:00
rafaelsideguide
f4d8b2c89a Updated docs 2024-05-09 10:36:56 -03:00
Nicolas
aa6b84c5fa Nick: readme 2024-05-08 17:41:15 -07:00
Nicolas
d9da4b53f8 Update example.py 2024-05-08 17:36:40 -07:00
Nicolas
4c88d5da66 Nick: v8 python 2024-05-08 17:35:16 -07:00
Nicolas
e6dbbf1bab Nick: fixes js and pydantic implementation 2024-05-08 17:16:59 -07:00
Nicolas
c89964b230 Nick: 2024-05-08 16:38:49 -07:00
Nicolas
9541ff6b30 Nick: 429 addressed 2024-05-08 15:14:39 -07:00
Nicolas
3bfef646e0 Update index.test.ts 2024-05-08 13:23:53 -07:00
Nicolas
6ced8e73a7 Update index.test.ts 2024-05-08 13:13:38 -07:00
Nicolas
c50076c377 Update websites.json 2024-05-08 13:04:17 -07:00
Nicolas
1296928879 Update index.test.ts 2024-05-08 13:00:20 -07:00
Nicolas
4a5f87623c
Merge pull request #118 from mendableai/feat/test-suite
[Test] Added integration tests suite
2024-05-08 12:47:17 -07:00
Nicolas
fb7a8fd73f Delete test_screenshot.png 2024-05-08 12:39:32 -07:00
Nicolas
c635688ddb Nick: test suite 2024-05-08 12:36:54 -07:00
Nicolas
d34b4de6ac Update websites.json 2024-05-08 12:27:45 -07:00
Nicolas
a0a67f124a Update index.test.ts 2024-05-08 12:26:04 -07:00
Nicolas
b7e3104c7b Ni 2024-05-08 12:18:53 -07:00
Nicolas
ad58bc2820 Nick: test suite init 2024-05-08 11:38:46 -07:00
rafaelsideguide
3f460af6c5 Added idempotency key to crawl route 2024-05-07 15:29:27 -03:00
Eric Ciarla
d280bcadf3 Add keyAuth 2024-05-07 13:52:42 -04:00
Nicolas
056b0ec24d Merge branch 'main' into feat/test-suite 2024-05-07 10:41:09 -07:00
Nicolas
dcedb8d798 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:49 -07:00
Nicolas
6505bf6bf2 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:44 -07:00
Nicolas
bdbee963f7 Merge branch 'main' into nsc/cancel-job 2024-05-07 10:13:43 -07:00
rafaelsideguide
61d615c04b Added tests 2024-05-07 14:03:00 -03:00
rafaelsideguide
e1f52c538f nested includeHtml inside pageOptions 2024-05-07 13:40:24 -03:00
Nicolas
f46bf19fa5 Nick: 2024-05-07 09:26:52 -07:00
rafaelsideguide
83f3408634 Added max depth option 2024-05-07 11:06:26 -03:00
Nicolas
2e3ff85509 Update crawl-cancel.ts 2024-05-06 17:22:16 -07:00
Nicolas
6d5da358cc Nick: cancel job 2024-05-06 17:16:43 -07:00
rafaelsideguide
509250c4ef changed to includeHtml 2024-05-06 19:45:56 -03:00
rafaelsideguide
538355f1af Added toMarkdown option 2024-05-06 11:36:44 -03:00
Nicolas
d1b6f6dcde Update fly.toml 2024-05-04 13:49:09 -07:00
Nicolas
cd9a0840b5 Update search.ts 2024-05-04 13:13:15 -07:00
Nicolas
5229a4902b Update search.ts 2024-05-04 13:09:11 -07:00
Nicolas
ce7bab7b35 Update status.ts 2024-05-04 13:00:38 -07:00
Nicolas
15b774e974 Update index.ts 2024-05-04 12:44:30 -07:00
Nicolas
67f135a5b6 Update crawl-status.ts 2024-05-04 12:31:28 -07:00
Nicolas
2aa09a3000 Nick: partial docs working, cleaner 2024-05-04 12:30:12 -07:00
Nicolas
00373228fa Update index.ts 2024-05-04 11:53:16 -07:00
rafaelsideguide
fbb4c63a1a [Test] Added integration tests suite
solves #15
2024-05-03 17:23:25 -03:00
Nicolas
21cdaf5996
Update log_job.ts 2024-05-02 12:40:49 -07:00
Eric Ciarla
caf3f9eede Add Posthog Logging 2024-05-02 15:30:22 -04:00
Nicolas
8a95cb42f0 Update models.ts 2024-04-30 18:36:21 -07:00
Nicolas
4967536501 Update index.ts 2024-04-30 18:19:55 -07:00
Nicolas
768166b066 Update single_url.ts 2024-04-30 16:57:44 -07:00
Nicolas
a386259511 Update scrape.ts 2024-04-30 16:35:44 -07:00
Nicolas
dfcf39f4c0 Update scrape.ts 2024-04-30 16:19:59 -07:00
Nicolas
3c7030dbb1 Nick: improvements 2024-04-30 16:19:32 -07:00
Nicolas
cbd9e88b77 Merge branch 'main' into llm-extraction 2024-04-30 14:49:20 -07:00