Commit Graph

1532 Commits

Author SHA1 Message Date
Nicolas
49bd95327e Update types.ts 2024-10-03 17:00:33 -03:00
Nicolas
1a1ac9fd60 Nick: 2024-10-03 16:37:58 -03:00
Nicolas
a150aa820c Nick: shouldnt fallback on a 400 + error code should be correct on page status code
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-10-03 15:21:42 -03:00
Nicolas
489a643391 Update index.ts 2024-10-02 20:25:52 -03:00
Gergő Móricz
26771e2e71 debug(zod): log unsupported protocol errors
Some checks failed
Fly Deploy / Pre-deploy checks (push) Has been cancelled
Fly Deploy / Deploy app (push) Has been cancelled
2024-10-01 22:13:28 +02:00
Nicolas
d1b838322d
Merge pull request #721 from mendableai/feat/concurrency-limit
Concurrency limits
2024-10-01 16:15:05 -03:00
Nicolas
ac5e1fc194 Update sitemap.ts 2024-10-01 16:14:43 -03:00
Nicolas
c6717fecaa Nick: got rid of job interval sleep and math.min 2024-10-01 16:11:12 -03:00
Nicolas
18f9cd09e1 Nick: fixed more stuff 2024-10-01 16:04:39 -03:00
Gergő Móricz
fe721fffbe fix(crawl-redis): normalize URL before locking 2024-10-01 20:59:50 +02:00
Nicolas
c0541cc990 Update queue-worker.ts 2024-10-01 15:38:24 -03:00
Nicolas
37299fc035 Update types.ts 2024-10-01 15:18:11 -03:00
Nicolas
8aa07afb6d Nick: fixes 2024-10-01 15:15:49 -03:00
Nicolas
92dbd33e57 Update queue-worker.ts 2024-10-01 14:53:26 -03:00
Nicolas
4d5477f357 Nick: resolved conflicts 2024-10-01 14:39:57 -03:00
Nicolas
96245e387d Update crawl.ts 2024-10-01 14:29:53 -03:00
Nicolas
258c67ce67 Revert "feat(queue-worker): always crawl links from content even if sitemapped"
This reverts commit 3c045c43a4.
2024-10-01 14:20:23 -03:00
Nicolas
445fc432e9 Reapply "fix(v1/crawl): always use sitemap"
This reverts commit 339b19ce9d.
2024-10-01 14:03:07 -03:00
Nicolas
339b19ce9d Revert "fix(v1/crawl): always use sitemap"
This reverts commit 5dc0fcf644.
2024-10-01 13:59:49 -03:00
Gergő Móricz
5dc0fcf644 fix(v1/crawl): always use sitemap 2024-10-01 18:49:44 +02:00
Gergő Móricz
3c045c43a4 feat(queue-worker): always crawl links from content even if sitemapped 2024-10-01 18:32:53 +02:00
Nicolas
1af26fe1b4 Nick: sitemap fix 2024-10-01 12:38:48 -03:00
Nicolas
ff4b7a835b
Merge pull request #685 from devflowinc/main
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
bugfix: using onlyIncludeTags and removeTags together
2024-09-30 17:18:30 -03:00
Nicolas
986262e1d4 Update search.ts 2024-09-30 15:23:43 -03:00
Gergő Móricz
0dd06d33ef fix(v0/search): pass job priority 2024-09-30 19:20:24 +02:00
Gergő Móricz
20ffdbd15c hotfix 2024-09-30 19:17:52 +02:00
Gergő Móricz
a8df85fd9b fix(acuc): remove sentry capture 2024-09-30 19:10:24 +02:00
Gergő Móricz
3621e191bd feat(concurrency-limit): set limit based on plan 2024-09-28 00:19:54 +02:00
Gergő Móricz
c6a83ab92c fix(api): entrypoint
Some checks failed
Fly Deploy / Pre-deploy checks (push) Has been cancelled
Fly Deploy / Deploy app (push) Has been cancelled
2024-09-27 22:16:27 +02:00
Gergő Móricz
e44bdf7a54 bad dockerfile 2024-09-27 21:07:11 +02:00
Gergő Móricz
f0a1a2e45b fix: increase ulimit -n in docker 2024-09-27 20:44:52 +02:00
Gergő Móricz
d5e2a80e4a fix(crawl-status): keep 10 megabyte pages if they're the only thing in the output
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-27 20:41:41 +02:00
Nicolas
975f0575b4 Nick: max retries with axios-retry 2024-09-27 12:58:57 -04:00
Nicolas
92961cf74f Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-09-27 12:23:45 -04:00
Nicolas
1fdff87b3e Update single_url.ts 2024-09-27 12:23:44 -04:00
Gergő Móricz
6283e8fc47 fix(logger): set default level to trace 2024-09-27 17:46:43 +02:00
Gergő Móricz
5e8ef4954e feat(auth): log cache key in acuc update error 2024-09-27 17:13:10 +02:00
Gergő Móricz
e98f858eb6 fix(api): playground scrape errors
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-26 22:28:14 +02:00
Nicolas
8d44cb33bb Nick: fixed error message 2024-09-26 22:15:15 +02:00
Gergő Móricz
2cb493321a fix(ACUC): do not refresh cache every set 2024-09-26 22:15:15 +02:00
Gergő Móricz
9bdd344b36 fix(redlock): use redlock.using for stability 2024-09-26 22:15:15 +02:00
Gergő Móricz
250c3bb5c6 fix(auth): move redlock settings 2024-09-26 22:15:15 +02:00
Gergő Móricz
81245e68fa fix(auth/redlock): retry cached ACUC lock for 20 seconds 2024-09-26 22:15:15 +02:00
Gergő Móricz
0f89f5e7cb fix(billTeam): cache update race condition 2024-09-26 22:15:15 +02:00
Gergő Móricz
d13a97f979 fix(credit_billing): allow spending of exact credits 2024-09-26 22:15:15 +02:00
Gergő Móricz
84bff8add8 fix(billTeam): update cached ACUC after billing 2024-09-26 22:15:15 +02:00
Gergő Móricz
f22ab5ffaf feat(db): implement bill_team RPC 2024-09-26 22:15:15 +02:00
Gergő Móricz
c1f68c3e0a fix(credit_billing): return chunk.remaining_credits 2024-09-26 22:15:15 +02:00
Gergő Móricz
2073063fb7 fix(db): fix caching and rpc error 2024-09-26 22:15:15 +02:00
Gergő Móricz
f8c70fe5dd feat(db): implement auth_credit_usage_chunk RPC 2024-09-26 22:15:15 +02:00
Gergő Móricz
29815e084b feat(v1/Document): add warning field 2024-09-26 21:19:05 +02:00
Gergő Móricz
095babe70b fix(queue-jobs): jobs with concurrency fails may vanish 2024-09-26 21:18:56 +02:00
Gergő Móricz
b696bfc854 fix(crawl-status): avoid race conditions where crawl may be deemed failed 2024-09-26 21:00:27 +02:00
Gergő Móricz
dec4171937 fix(queue-worker, queue-jobs): logic fixes 2024-09-26 20:39:19 +02:00
Gergő Móricz
d2881927c1 fix(queue-worker): remove concurrency entries when done in sentry-less branch 2024-09-26 20:29:17 +02:00
Gergő Móricz
53fce67ca1 feat(queue-worker): PoC of concurrency limits 2024-09-26 20:24:34 +02:00
Nicolas
30058b1da0 Nick: increased timeout for chrome-cdp due to smart wait 2024-09-26 20:24:34 +02:00
Nicolas
a9773a24a3 Nick: increased timeout for chrome-cdp due to smart wait
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-25 19:27:02 -04:00
Gergő Móricz
953d4fb197 fix(redlock): use redlock.using for stability 2024-09-25 22:47:42 +02:00
Gergő Móricz
eef116bef8 fix(auth): move redlock settings 2024-09-25 22:27:51 +02:00
Gergő Móricz
2c96d2eef6 fix(auth/redlock): retry cached ACUC lock for 20 seconds 2024-09-25 22:25:13 +02:00
Gergő Móricz
1cca9b8ae6 fix(billTeam): cache update race condition 2024-09-25 22:15:02 +02:00
Gergő Móricz
eb7317c08a fix(credit_billing): allow spending of exact credits 2024-09-25 21:44:05 +02:00
Gergő Móricz
e67cbc2ca1 fix(billTeam): update cached ACUC after billing 2024-09-25 21:37:01 +02:00
Gergő Móricz
5a8eb17a82 feat(db): implement bill_team RPC 2024-09-25 20:57:45 +02:00
Gergő Móricz
415fd9f333 fix(credit_billing): return chunk.remaining_credits 2024-09-25 20:37:35 +02:00
Gergő Móricz
417adf8e96 fix(db): fix caching and rpc error 2024-09-25 19:42:45 +02:00
Gergő Móricz
331e826bca feat(db): implement auth_credit_usage_chunk RPC 2024-09-25 19:25:18 +02:00
Nicolas
1da026b26e Update single_url.ts
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-24 23:29:48 -04:00
Nicolas
b8266cc329 Update website_params.ts 2024-09-24 23:28:58 -04:00
Gergő Móricz
f00c0b82f9 fix(v1/scrape): add total wait specified in request to timeout
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-24 21:56:22 +02:00
Nicolas
3f138e559e Update website_params.ts 2024-09-24 15:14:26 -04:00
Gergő Móricz
43730b5db6 feat(WebScraper): always report error of last scraper in order 2024-09-24 20:03:49 +02:00
Gergő Móricz
3e661a2087 fix(v1/crawl-cancel): avoid double authing 2024-09-24 20:01:34 +02:00
Gergő Móricz
4194525640 fix(blocklist): unblock TikTok Business page
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
This is just a regular business site, not social media.
2024-09-24 16:55:19 +02:00
Gergő Móricz
4a623c084a fix(fly): don't use Depot builders (doesn't work) 2024-09-24 10:50:30 +02:00
Gergő Móricz
a59b5836d5 Revert error tallying 2024-09-24 10:27:49 +02:00
Gergő Móricz
a4b128e8b7 fix(rust): blocklisted error test
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-23 23:03:00 +02:00
Gergő Móricz
483f97d21b fix(v0/search): don't sent scrape fail errors to Sentry 2024-09-23 18:49:27 +02:00
Gergő Móricz
d927cafeea fix(queue-worker): don't send scraping errors to sentry 2024-09-23 18:48:01 +02:00
Gergő Móricz
677faa27f3 fix(WebScraper): explicitly ignore 404s 2024-09-23 18:47:07 +02:00
Gergő Móricz
83d8287c14 fix(v0, sentry): don't send all scraping methods failed errors to Sentry 2024-09-23 18:40:21 +02:00
Gergő Móricz
d2f7031069 fix(WebScraper): fatal error handler triggering for 404s 2024-09-23 18:33:10 +02:00
Nicolas
848a2b364a Update package.json 2024-09-21 21:11:23 -04:00
Nicolas
dfdbae74c6 Update fireEngine.ts 2024-09-21 21:10:05 -04:00
Nicolas
fbb5f23016 Update index.ts 2024-09-21 20:53:33 -04:00
Nicolas
607e46267c Update package.json
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-20 19:46:17 -04:00
Nicolas
db161ac55a Nick: press + write 2024-09-20 19:45:23 -04:00
Nicolas
3fc5ce17d2 Nick: fixed error handling for v0 scrape 2024-09-20 18:35:30 -04:00
Nicolas
0690cfeaad Merge branch 'main' into feat/actions 2024-09-20 18:24:13 -04:00
Gergő Móricz
95e4c8920b fix(sdk/rust): license
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-20 21:55:05 +02:00
Gergő Móricz
e1a34b0a99 Revert "feat(scrape): scroll down/up with actions if fullpagescreenshot"
This reverts commit 815bfc8f07.
2024-09-20 21:43:22 +02:00
Gergő Móricz
815bfc8f07 feat(scrape): scroll down/up with actions if fullpagescreenshot
revert this if unneeded
2024-09-20 21:42:09 +02:00
Gergő Móricz
d663bbf0ca feat(actions): add scroll 2024-09-20 21:41:53 +02:00
Gergő Móricz
3dd912ec91 feat(actions): add typeText, pressKey, fix playwright screenshot/waitFor 2024-09-20 21:02:53 +02:00
Gergő Móricz
719dfbccbb Update docs 2024-09-20 20:30:46 +02:00
Gergő Móricz
939040bf44 Update docs and example 2024-09-20 20:10:11 +02:00
Gergő Móricz
3ec0bbe28d feat(sdk/rust/crawl): paginate through results 2024-09-20 20:10:11 +02:00
Gergő Móricz
a078cdbd9d Rust SDK 1.0.0 2024-09-20 20:10:11 +02:00
Gergő Móricz
93a20442e3 feat(sdk/rust): first batch of changes for 1.0.0 2024-09-20 20:10:11 +02:00