Commit Graph

586 Commits

Author SHA1 Message Date
Nicolas
575ddc9e6e Update scrape.ts 2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5 Nick: speed improvements 2024-07-22 18:30:58 -04:00
Nicolas
b229fbebd8 Update scrape_log.ts 2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c fix(isFile): added .tiff extension 2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399 fix(WebCrawler): filter out file URLs when taking URLs from sitemap 2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85 fix(fly): raise heap limit to 4G per process 2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788 Update blocklist.ts 2024-07-18 14:20:19 -04:00
Nicolas
6161b83890 Update scrape_log.ts 2024-07-18 14:17:08 -04:00
Nicolas
2dd7398aad Update scrape_log.ts 2024-07-18 14:16:46 -04:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797 Update crawl-cancel.ts 2024-07-18 13:49:51 -04:00
Nicolas
11768571ed Update crawl-cancel.ts 2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20 Update crawl-cancel.ts 2024-07-18 13:40:24 -04:00
Nicolas
d2de01d342 Nick: fixes 2024-07-18 13:19:44 -04:00
Gergo Moricz
0b8047c7a0 fix(WebScraper): infinite regex leading to fly.io instance hangs 2024-07-18 19:13:43 +02:00
Nicolas
f11137352c Merge branch 'main' into feat/fire-engine-chrome-cdp 2024-07-18 12:48:42 -04:00
Nicolas
01b5e8fc73
Merge pull request #429 from mendableai/mog/fix-job-stuck-2
Fix queue stuck bug via lock settings changes
2024-07-18 12:39:21 -04:00
Nicolas
b134ba92bc
Merge pull request #427 from mendableai/docs/update-docs
[Docs] Updating docs
2024-07-18 11:49:08 -04:00
rafaelsideguide
f13ef02a08 Update openapi.json 2024-07-18 10:34:03 -03:00
Nicolas
2fab2d8d29 Update scrape.ts 2024-07-17 20:44:34 -04:00
Nicolas
6609c1b6e5
Update .env.local 2024-07-17 16:22:27 -04:00
Nicolas
17a1f9b55f
Update .env.example 2024-07-17 16:22:04 -04:00
rafaelsideguide
eda616d728 Merge remote-tracking branch 'origin/main' into docs/update-docs 2024-07-17 16:44:51 -03:00
rafaelsideguide
2b4ce12097 Update openapi.json 2024-07-17 16:43:22 -03:00
Gergo Moricz
8160c311c0 fix queue stuck bug via lock setting changes 2024-07-17 21:31:25 +02:00
Caleb Peffer
8d5ebc9b9f
Merge pull request #423 from mendableai/cjp/linksOnPage
Caleb: Return a list of links on a page by default
2024-07-17 12:36:07 -06:00
Caleb Peffer
5b24d26c84 Caleb; fixed test 2024-07-17 11:33:12 -07:00
Caleb Peffer
c5d1e7260d Caleb: made changes per Rafaels requests 2024-07-17 11:29:05 -07:00
rafaelsideguide
205cd63c2f Update openapi.json 2024-07-17 15:07:06 -03:00
Rafael Miller
f020048a46
Merge pull request #420 from mendableai/bugfix/empty-tags
Small fix for empty pageOptions
2024-07-17 10:10:24 -03:00
Caleb Peffer
da3c6bca37 Caleb: added a simple test 2024-07-16 21:23:22 -07:00
Caleb Peffer
0b3c0ede49 Added tests per @nicks request 2024-07-16 21:15:59 -07:00
Caleb Peffer
98c788ca7a Caleb: added a test to ensure links on page exists and isn't zero on mendable 2024-07-16 21:13:52 -07:00
Nicolas
3c3412e893 Update rate-limiter.test.ts 2024-07-16 22:45:12 -04:00
Nicolas
ffc3b7c5fb Update index.ts 2024-07-16 22:42:40 -04:00
Nicolas
c9073a747c Nick: 2024-07-16 22:41:13 -04:00
Caleb Peffer
d39d3be649 Caleb: now extracting and returning a list of all links on the page for a customer 2024-07-16 18:38:03 -07:00
rafaelsideguide
dba1fb2dc8 Update removeUnwantedElements.ts 2024-07-16 18:22:56 -03:00
Nicolas
92202de12b Update rate-limiter.ts 2024-07-16 10:09:49 -04:00
Thomas Kosmas
5c65ec58e5 Support chrome-cdp and restructure sitemap fire-engine support. 2024-07-15 18:40:43 +03:00
Nicolas
949791049f Nick: 2024-07-12 23:20:26 -04:00
Nicolas
d0c8d3ecde Merge branch 'main' into nsc/sitemap-fix-fire-engine 2024-07-12 22:15:06 -04:00
Nicolas
a3b1703b68 Update fireEngine.ts 2024-07-12 22:15:00 -04:00
Nicolas
09bc2c7a9c
Merge pull request #394 from mendableai/nsc/small-fe-print
Log Fire-engine page errors
2024-07-12 22:14:04 -04:00
Nicolas
e098e88ea7 Nick: 2024-07-12 22:02:08 -04:00
Nicolas
bfc7f5882e Update index.ts 2024-07-12 19:57:12 -04:00
Nicolas
436e8922a7 Nick: doing on the ci instead 2024-07-12 19:49:38 -04:00
Nicolas
fc3328f3d1 Update index.ts 2024-07-12 19:12:56 -04:00
Nicolas
fd18f2269b Nick: slack alerts 2024-07-12 19:07:59 -04:00
rafaelsideguide
f453bcf17c bugfix docker self hosting 2024-07-12 16:51:20 -03:00