Commit Graph

659 Commits

Author SHA1 Message Date
Nicolas
749b0c05dc Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-06-25 15:21:15 -03:00
Nicolas
e7be17db92 Nick: metadata fixes and lock duration for bull decreased to 2 hrs 2024-06-25 15:21:14 -03:00
Nicolas
f84fb4b331
Merge pull request #313 from snippet/google-search-term-fix
fix multi-word search term issue: /search (w/o Serp)
2024-06-24 19:24:58 -03:00
Jeff Pereira
6ddf3a58a1 fix multi-word search term issue: /search (w/o Serp) 2024-06-24 14:21:52 -07:00
Nicolas
90b7fff366
Update crawler.ts 2024-06-24 16:52:01 -03:00
Nicolas
08c1fa799b
Update queue-worker.ts 2024-06-24 16:51:32 -03:00
rafaelsideguide
3ebdf93342 removed console.logs 2024-06-24 16:43:12 -03:00
Nicolas
56d42d9c9b Nick: 2024-06-24 16:33:07 -03:00
rafaelsideguide
21d29de819 testing crawl with new.abb.com case
many unnecessary console.logs for tracing the code execution
2024-06-24 16:25:07 -03:00
Nicolas
3c7b7e7242 NIck: fixes fallback 2024-06-23 18:59:08 -03:00
Caleb Peffer
e59ba758f5 Caleb: changed posthog logging so that It associates jobs with a group. No 2024-06-18 17:42:21 -07:00
Caleb Peffer
5a91d8425f Caleb: solve for typechecking on idempotencyKey on my machine 2024-06-18 17:07:38 -07:00
rafaelsideguide
9c539e9113 Fixed includeHTML to use cleanedHtml as response 2024-06-18 16:26:54 -03:00
Rafael Miller
f5a9acc4c6
Merge branch 'main' into feat/removeTags-regex 2024-06-18 14:39:59 -03:00
rafaelsideguide
9f7afd1e88 fix for some complex cases 2024-06-18 14:36:51 -03:00
Nicolas
d0c05accf6 Nick: 2024-06-18 13:21:50 -04:00
Nicolas
818751a256
Merge pull request #294 from mendableai/tests/e2e-to-unit
[Test] Transcribed from e2e to unit tests for many cases
2024-06-18 13:09:22 -04:00
Nicolas
754c9fa08d Update package.json 2024-06-18 12:58:57 -04:00
Nicolas
90a807c547 Update index.ts 2024-06-18 12:56:13 -04:00
Nicolas
26e8bfc23a Merge branch 'main' into pr/296 2024-06-18 12:55:45 -04:00
Nicolas
b53ba58bc0
Merge pull request #282 from mendableai/nsc/rate-limiter-tests
test: Rate Limit Unit Tests
2024-06-18 11:01:28 -04:00
rafaelsideguide
727e5de8c5 Update index.test.ts 2024-06-18 11:54:10 -03:00
rafaelsideguide
c54e797eb1 (╯°□°)╯︵ ┻━┻ 2024-06-18 11:51:28 -03:00
rafaelsideguide
6e32522fa2 Improvements on response document types 2024-06-18 11:43:06 -03:00
rafaelsideguide
20f14bcf7f Added some types 2024-06-18 10:55:07 -03:00
rafaelsideguide
c2fc69af1c removed some e2e tests that are making the ci get stuck 2024-06-18 09:57:05 -03:00
rafaelsideguide
6c726a02eb Moved to utils/removeUnwantedElements, added unit tests 2024-06-18 09:46:42 -03:00
AndyMik90
8b3c3aae91 Added support for RegEx in removeTags 2024-06-18 07:31:46 +02:00
neev jewalkar
e5ffda1eec Added local host support for the javascript SDK 2024-06-18 05:42:25 +05:30
rafaelsideguide
b2bd562bb2 transcribed from e2e to unit tests for many cases 2024-06-17 17:09:44 -03:00
Nicolas
ab038051e9 Merge branch 'main' into nsc/rate-limiter-tests 2024-06-17 15:06:12 -04:00
rafaelsideguide
a20d002a6b Delete test-run-report.json 2024-06-17 09:25:29 -03:00
Eric Ciarla
519ab1aecb Update unit tests 2024-06-15 17:14:09 -04:00
Eric Ciarla
f0d4146b42 Merge branch 'feat/maxDepthRelative' of https://github.com/mendableai/firecrawl into feat/maxDepthRelative 2024-06-15 16:52:00 -04:00
Eric Ciarla
ff7b52cab1 Delete one more e2e test 2024-06-15 16:51:50 -04:00
Eric Ciarla
b1eb608295
Merge branch 'main' into feat/maxDepthRelative 2024-06-15 16:50:27 -04:00
Eric Ciarla
34e37c5671 Add unit tests to replace e2e 2024-06-15 16:43:37 -04:00
Eric Ciarla
2b40729cc2 Update index.test.ts 2024-06-15 08:56:32 -04:00
Eric Ciarla
f22759b2e7 Update index.test.ts 2024-06-14 19:42:11 -04:00
Eric Ciarla
a6b7197737 Fix for maxDepth 2024-06-14 19:40:37 -04:00
Nicolas
4ec863718b
Merge pull request #283 from mendableai/nsc/crawler-fixes
Fixes crawler getting confused with base paths that contain www.
2024-06-14 13:50:32 -07:00
Nicolas
43767360d8 Merge branch 'main' into nsc/rate-limiter-tests 2024-06-14 13:50:21 -07:00
Nicolas
e88cb314c8 Update crawler.ts 2024-06-14 13:44:54 -07:00
Rafael Miller
361cba4119
Merge pull request #175 from mendableai/test/load-testing
Test/load testing
2024-06-14 17:39:01 -03:00
Nicolas
7b11ace87d Create rate-limiter.test.ts 2024-06-14 12:31:42 -07:00
rafaelsideguide
e369d1dd0e Update index.test.ts 2024-06-14 16:17:54 -03:00
Nicolas
e37aa3db57 Nick: fixed rate limit on status 2024-06-14 12:13:02 -07:00
rafaelsideguide
a6ed2e693f Update index.test.ts 2024-06-14 15:22:52 -03:00
rafaelsideguide
ad7795f973 Merge remote-tracking branch 'origin/main' into test/load-testing 2024-06-14 15:14:01 -03:00
rafaelsideguide
354712a8a3 just changed the name for the test? 2024-06-14 13:02:04 -03:00
Eric Ciarla
2c5f5c0ea2
Merge branch 'main' into feat/maxDepthRelative 2024-06-14 11:49:12 -04:00
Eric Ciarla
80c10393b4 Update index.test.ts 2024-06-14 11:32:30 -04:00
Eric Ciarla
42ed1f4479 Update index.test.ts 2024-06-14 11:20:24 -04:00
Eric Ciarla
8830acce07 Update index.test.ts 2024-06-14 11:11:58 -04:00
Eric Ciarla
278bb311cb Update index.test.ts 2024-06-14 11:02:39 -04:00
Eric Ciarla
36a62727b8 Update index.test.ts 2024-06-14 10:52:43 -04:00
Rafael Miller
f9c7ca9388
Merge branch 'main' into feat/issue-266 2024-06-14 11:47:58 -03:00
Rafael Miller
3e2e76311c
Merge branch 'main' into feat/issue-205 2024-06-14 11:25:20 -03:00
Eric Ciarla
59451754f5 Add tests 2024-06-14 10:14:07 -04:00
rafaelsideguide
afee5684a3 Fixed tests' message and updated version 2024-06-14 11:05:19 -03:00
Eric Ciarla
9b254c1cd0 Update index.test.ts 2024-06-14 09:48:14 -04:00
Rafael Miller
5a5c532bea
Merge branch 'main' into py-sdk-improve-response-handling 2024-06-14 10:42:51 -03:00
Eric Ciarla
9aba451b18 Update index.test.ts 2024-06-14 09:33:43 -04:00
Rafael Miller
cc2e3f05b0
Merge pull request #256 from mattjoyce/feat-254-sdk-py-logging
Added logging to python sdk FIRECRAWL_LOGGING_LEVEL
2024-06-14 10:22:40 -03:00
rafaelsideguide
6963a490f1 Updated version 2024-06-14 10:21:44 -03:00
rafaelsideguide
5dd18ca79b fixed edge cases 2024-06-14 09:46:55 -03:00
Eric Ciarla
ab9de0f5ab Update maxDepth tests 2024-06-13 18:46:30 -04:00
Eric Ciarla
393bd45237 Update index.test.ts 2024-06-13 18:13:15 -04:00
Eric Ciarla
71c98d8b80 Update logic 2024-06-13 18:00:52 -04:00
Eric Ciarla
095951aa4d Update test 2024-06-13 17:40:00 -04:00
Eric Ciarla
5e8aa92788 Update index.ts 2024-06-13 17:33:13 -04:00
Eric Ciarla
bf10e9d392 Update index.test.ts 2024-06-13 17:28:59 -04:00
Eric Ciarla
65d63bae45 Update index.ts 2024-06-13 17:17:44 -04:00
Eric Ciarla
32e814bedc Update index.ts 2024-06-13 17:02:30 -04:00
Nicolas
6fc1ee32fd
Merge pull request #275 from mendableai/feat/issue-273
Added pageOptions.removeTags
2024-06-13 13:27:01 -07:00
rafaelsideguide
bb859ae9a7 Added metadata.pageStatusCode and metadata.pageError properties to the responses 2024-06-13 17:08:40 -03:00
rafaelsideguide
676d6e8ab5 Added pageOptions.removeTags 2024-06-13 10:51:05 -03:00
Nicolas
182f8d4d6c Update index.ts 2024-06-12 18:07:05 -07:00
Nicolas
11b6d5afa5 Update fly.toml 2024-06-12 18:00:22 -07:00
Nicolas
67dc46b454 Nick: clusters 2024-06-12 17:53:04 -07:00
rafaelsideguide
d20af257ba Added jobId to webhook data 2024-06-12 15:38:41 -03:00
rafaelsideguide
e37d151404 added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
2024-06-12 15:06:47 -03:00
rafaelsideguide
01c9f071fa fixed 2024-06-12 11:27:06 -03:00
rafaelsideguide
dc6acbf1f0 Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option 2024-06-12 11:01:05 -03:00
Nicolas
f93231499f
Merge pull request #265 from mendableai/feat/issue-264
[Feat] Added route to clean completed jobs and a github action cron that triggers every 24h
2024-06-11 21:33:52 -07:00
Nicolas
45dee63943
Merge pull request #262 from mendableai/nsc/webhook-self-host-fix
Only fetch webhook from db if self host webhook not set and using db auth
2024-06-11 15:46:57 -07:00
rafaelsideguide
157fbe4a1e added bull auth key 2024-06-11 17:52:01 -03:00
rafaelsideguide
df3a678cf4 getting back the cancel test, this should work 2024-06-11 17:46:56 -03:00
rafaelsideguide
def2ba9987 added tests 2024-06-11 17:46:25 -03:00
Nicolas
1e3e06a1d5 Update replacePaths.test.ts 2024-06-11 13:02:39 -07:00
Nicolas
2239e03269 Update replacePaths.test.ts 2024-06-11 12:54:02 -07:00
Nicolas
520739c9f4 Nick: fixed bugs associated with absolute path replacements 2024-06-11 12:43:16 -07:00
Nicolas
b87725c683 Update openapi.json 2024-06-11 12:08:49 -07:00
rafaelsideguide
ee282c3d55 Added allowBackwardCrawling option 2024-06-11 15:24:39 -03:00
rafaelsideguide
a9f93c2f1e Added route to clean completed jobs and a github action cron that triggers every 24h 2024-06-11 14:18:05 -03:00
Nicolas
da38dad9a7 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-06-10 18:26:31 -07:00
Nicolas
9390816c1b Update openapi.json 2024-06-10 18:26:25 -07:00
Nicolas
f6b06ac27a Nick: ignoreSitemap, better crawling algo 2024-06-10 18:12:41 -07:00
Nicolas
1bd0327e1a Merge branch 'main' into nsc/pageoptions-crawler 2024-06-10 17:15:10 -07:00
Nicolas
99f2ffd6d5 Update webhook.ts 2024-06-10 17:03:10 -07:00