Commit Graph

1074 Commits

Author SHA1 Message Date
Nicolas
3242872503 Update single_url.ts 2024-07-25 17:43:55 -04:00
Nicolas
11e6b2680e
Merge pull request #455 from mendableai/feat/scrape-monitoring
Add scrape monitoring
2024-07-25 16:27:07 -04:00
Nicolas
e5b797549e Merge branch 'main' into feat/scrape-monitoring 2024-07-25 16:21:02 -04:00
Nicolas
50d2426fc4 Update scrape-events.ts 2024-07-25 16:20:29 -04:00
Nicolas
a75d6889c7
Merge pull request #450 from mendableai/feat/logger
[wip] Added logger
2024-07-25 14:40:19 -04:00
rafaelsideguide
1f1c068eea changing from error to debug 2024-07-25 10:00:50 -03:00
rafaelsideguide
e720e1bacf Merge remote-tracking branch 'origin/main' into feat/logger 2024-07-25 09:49:27 -03:00
rafaelsideguide
309728a482 updated logs 2024-07-25 09:48:06 -03:00
Nicolas
2c1221750b
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
Added regex for links in sitemap
2024-07-24 20:37:35 -04:00
Nicolas
6ad7e24403 Update ingestion.tsx 2024-07-24 18:15:51 -04:00
Nicolas
92843a356d Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-24 18:13:36 -04:00
Nicolas
1e13ddbe8e Nick: changes to the ui component 2024-07-24 18:13:34 -04:00
Gergő Móricz
623b547292 fix(fly.toml): scale up memory limit 2024-07-24 23:39:00 +02:00
Nicolas
15890772be Scale bump 2024-07-24 16:56:19 -04:00
Eric Ciarla
a4bccbe3bb
Firecrawl UI Template
Firecrawl UI template
2024-07-24 15:05:55 -04:00
Eric Ciarla
a62c0730c1
Delete package-lock.json 2024-07-24 15:00:19 -04:00
Eric Ciarla
4cb091ad05
Update .gitignore 2024-07-24 14:59:34 -04:00
Eric Ciarla
4596d0b2e6 Add ReadMe and LICENSE 2024-07-24 14:56:53 -04:00
Eric Ciarla
9654721bf2 Vite commit 2024-07-24 14:27:50 -04:00
rafaelsideguide
cc98f83fda added failed and completed log events 2024-07-24 15:25:36 -03:00
Gergo Moricz
60c74357df feat(ScrapeEvents): log queue events 2024-07-24 18:44:14 +02:00
rafaelsideguide
4eca6bd301 fix/check-for-auth-on-scrape-log 2024-07-24 12:54:14 -03:00
Nicolas
4ead89f983
Merge pull request #453 from mendableai/nsc/notion-fix
Notion Website Fixes
2024-07-24 11:40:19 -04:00
Nicolas
3a1b8a9797 Update website_params.ts 2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30 Update website_params.ts 2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c feat(monitoring/scrape): include url, worker, response_size 2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc fix(monitoring): bad success check on scrape 2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6 fix: add jobId for scrape 2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b fix(scrape-events): bad logic 2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
no need for regex

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
Eric Ciarla
1b7a00624d Delete old comp 2024-07-23 21:51:08 -04:00
Eric Ciarla
565bc09439 Basic react app 2024-07-23 21:48:11 -04:00
rafaelsideguide
6208ecdbc0 added logger 2024-07-23 17:30:46 -03:00
Eric Ciarla
a0d89169ed init 2024-07-23 15:48:12 -04:00
Nicolas
f0b07b509b Update index.ts 2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d added regex for links in sitemap 2024-07-23 09:07:23 -03:00
Nicolas
252bc09ee2
Merge pull request #447 from mendableai/nsc/speed-improvements
/scrape should now be 600ms-900ms faster
2024-07-22 19:18:24 -04:00
Nicolas
ac692ef09c
Update CONTRIBUTING.md 2024-07-22 19:17:53 -04:00
Nicolas
30e706b43f Update scrape.ts 2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c Update index.ts 2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e Update scrape.ts 2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5 Nick: speed improvements 2024-07-22 18:30:58 -04:00
Nicolas
1bc36e1a56
Update fly-direct.yml 2024-07-22 14:12:55 -04:00
Nicolas
b229fbebd8 Update scrape_log.ts 2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c fix(isFile): added .tiff extension 2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399 fix(WebCrawler): filter out file URLs when taking URLs from sitemap 2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85 fix(fly): raise heap limit to 4G per process 2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788 Update blocklist.ts 2024-07-18 14:20:19 -04:00
Nicolas
6161b83890 Update scrape_log.ts 2024-07-18 14:17:08 -04:00