Han Xiao
|
ae788c39c5
|
docs: document header usage
|
2024-04-24 17:28:55 +02:00 |
|
yanlong.wang
|
94a72052f4
|
fix: reduce frequency of screenshot if possible
|
2024-04-24 19:43:24 +08:00 |
|
yanlong.wang
|
ae99af50aa
|
Merge branch 'main' of github.com:jina-ai/url2text
|
2024-04-24 19:21:50 +08:00 |
|
yanlong.wang
|
230388529e
|
bump: deps
|
2024-04-24 19:21:44 +08:00 |
|
Yanlong Wang
|
7ee2c327a3
|
refactor: reorganize features (#37)
* wip
* fix
* wip
* cleanup
* fix
* fix
* cache: may rescue using stale cache
* fix: target 384mb ram per page
* fix: log about pool size
* fix
* clean
* fix: cache and snapshot reporting
|
2024-04-24 19:21:12 +08:00 |
|
dependabot[bot]
|
e36d3b0f24
|
chore(deps): bump protobufjs and firebase-admin in /backend/functions (#35)
Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) to 7.2.6 and updates ancestor dependency [firebase-admin](https://github.com/firebase/firebase-admin-node). These dependencies need to be updated together.
Updates `protobufjs` from 7.2.4 to 7.2.6
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.6)
Updates `firebase-admin` from 11.11.1 to 12.1.0
- [Release notes](https://github.com/firebase/firebase-admin-node/releases)
- [Commits](https://github.com/firebase/firebase-admin-node/compare/v11.11.1...v12.1.0)
---
updated-dependencies:
- dependency-name: protobufjs
dependency-type: indirect
- dependency-name: firebase-admin
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2024-04-24 16:37:38 +08:00 |
|
yanlong.wang
|
c5bc474964
|
cleanup: remove top level package lock
|
2024-04-24 16:34:14 +08:00 |
|
Yanlong Wang
|
4b208f44b5
|
fix: process not quitting on errors
|
2024-04-21 10:17:05 +08:00 |
|
Han Xiao
|
17415ed1f1
|
docs: fix readme image
|
2024-04-20 23:27:42 +02:00 |
|
Yanlong Wang
|
5d255dda3b
|
chore: update deps
|
2024-04-19 09:30:19 +08:00 |
|
Charuka Samarakoon
|
d47310a6f7
|
fix: allocating incorrect max value due to missing parentheses (#26)
|
2024-04-19 09:01:23 +08:00 |
|
yanlong.wang
|
d4ca381c38
|
fix: explicitly reject non http protocols
|
2024-04-18 15:35:06 +08:00 |
|
yanlong.wang
|
abc817e960
|
feat: block media resources to improve speed
|
2024-04-18 15:06:28 +08:00 |
|
yanlong.wang
|
cbc13ecbbd
|
fix: catch turndown errors
|
2024-04-18 13:51:54 +08:00 |
|
Han Xiao
|
6ee0f2de75
|
docs: update streaming mode
|
2024-04-17 21:53:20 -07:00 |
|
Han Xiao
|
3557cba48d
|
docs: update explain of streaming mode
|
2024-04-17 21:48:42 -07:00 |
|
yanlong.wang
|
0975b35ca2
|
chore: turn up concurrency a bit base on analysis
|
2024-04-18 11:53:55 +08:00 |
|
yanlong.wang
|
a211366501
|
fix: expose publishedTime if possible
|
2024-04-17 12:36:36 +08:00 |
|
Yanlong Wang
|
6e36f0a447
|
fix: url wrong normalization
|
2024-04-17 09:55:41 +08:00 |
|
Yanlong Wang
|
781b835466
|
fix: keep url details
|
2024-04-17 09:48:26 +08:00 |
|
Yanlong Wang
|
11a5a90611
|
fix: favor nominal url over real url
|
2024-04-17 09:30:49 +08:00 |
|
Yanlong Wang
|
bda7e76e50
|
chore: increase max instances to target 10k concurrent requests
|
2024-04-17 09:22:26 +08:00 |
|
Yanlong Wang
|
50ed9cc248
|
feat: fallback to google archive (#16)
* feat: fallback to google archive
* fix
|
2024-04-16 09:17:45 -07:00 |
|
yanlong.wang
|
8a2b095bd7
|
fix: give expireAt for image cache
|
2024-04-16 15:46:05 +08:00 |
|
Han Xiao
|
4f284f51b6
|
docs: update readme
|
2024-04-15 21:50:34 -07:00 |
|
Han Xiao
|
b3fb4c5c57
|
feat: add image captioning (#6)
* Fix contentText assignment in CrawlerHost class
* fix: recover vscode configurations
* feat: add image captioning
* feat: add image captioning
* clean: vscode config
* chore: fix some ts warnings
* feat: auto alt text
* fix
* chore: improve prompt
* clean: unused config
* fix: failure condition
* fix: remove redundant code
* fix: catch parse error
* fix: catch parse error
---------
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
|
2024-04-15 20:51:31 -07:00 |
|
Han Xiao
|
18373626b2
|
fix: catch parse error
|
2024-04-15 19:27:40 -07:00 |
|
Han Xiao
|
3134a59d8f
|
chore: update readme
|
2024-04-15 17:23:16 -07:00 |
|
Han Xiao
|
9b190127aa
|
fix: clean broken markdown
|
2024-04-13 21:40:51 -07:00 |
|
Han Xiao
|
7fc30dd003
|
docs: explain stream mode
|
2024-04-13 19:27:10 -07:00 |
|
Han Xiao
|
af2775d1aa
|
docs: explain stream mode
|
2024-04-13 19:25:51 -07:00 |
|
Han Xiao
|
ef23d810f8
|
feat: clean broken markdown
|
2024-04-13 19:21:35 -07:00 |
|
Han Xiao
|
c7c039aeb1
|
docs: fix readme
|
2024-04-13 13:13:24 -07:00 |
|
Han Xiao
|
da8934cb9a
|
chore: rename url2text to reader
|
2024-04-13 12:55:07 -07:00 |
|
Han Xiao
|
b6b9d39734
|
chore: rename url2text to reader
|
2024-04-13 12:51:36 -07:00 |
|
Han Xiao
|
ad7d95f1fe
|
chore: rename url2text to reader
|
2024-04-13 12:50:45 -07:00 |
|
Han Xiao
|
d1d9c1e4b4
|
chore: rename url2text to reader
|
2024-04-13 12:47:55 -07:00 |
|
Han Xiao
|
747b9cd1a4
|
chore: rename url2text to reader
|
2024-04-13 12:42:40 -07:00 |
|
Han Xiao
|
4269486836
|
chore: rename url2text to reader
|
2024-04-13 12:41:38 -07:00 |
|
Han Xiao
|
86ba571e48
|
chore: rename url2text to reader
|
2024-04-13 12:39:00 -07:00 |
|
Han Xiao
|
d7fbc41ba2
|
chore: rename url2text to reader
|
2024-04-13 12:33:51 -07:00 |
|
Han Xiao
|
8378cb06ee
|
chore: rename url2text to reader
|
2024-04-13 12:25:42 -07:00 |
|
Han Xiao
|
eaaaf773df
|
chore: rename url2text to reader
|
2024-04-13 12:22:36 -07:00 |
|
Han Xiao
|
e050a5bffa
|
Merge remote-tracking branch 'origin/main'
|
2024-04-13 11:42:21 -07:00 |
|
Han Xiao
|
8e241c7f5a
|
chore: rename url2text to reader
|
2024-04-13 11:42:15 -07:00 |
|
Yanlong Wang
|
dbeb69582a
|
puppeteer stealth
|
2024-04-13 22:27:50 +08:00 |
|
Yanlong Wang
|
33d7cfc41c
|
fix
|
2024-04-13 08:25:52 +08:00 |
|
Yanlong Wang
|
95799988da
|
fix: use gpt bot UA
|
2024-04-13 08:13:50 +08:00 |
|
Yanlong Wang
|
950338261a
|
fix
|
2024-04-13 08:07:55 +08:00 |
|
Yanlong Wang
|
5199b00eeb
|
fix
|
2024-04-13 08:04:07 +08:00 |
|