Commit Graph

159 Commits

Author SHA1 Message Date
yanlong.wang
abc817e960
feat: block media resources to improve speed 2024-04-18 15:06:28 +08:00
yanlong.wang
cbc13ecbbd
fix: catch turndown errors 2024-04-18 13:51:54 +08:00
Han Xiao
6ee0f2de75
docs: update streaming mode 2024-04-17 21:53:20 -07:00
Han Xiao
3557cba48d
docs: update explain of streaming mode 2024-04-17 21:48:42 -07:00
yanlong.wang
0975b35ca2
chore: turn up concurrency a bit base on analysis 2024-04-18 11:53:55 +08:00
yanlong.wang
a211366501
fix: expose publishedTime if possible 2024-04-17 12:36:36 +08:00
Yanlong Wang
6e36f0a447
fix: url wrong normalization 2024-04-17 09:55:41 +08:00
Yanlong Wang
781b835466
fix: keep url details 2024-04-17 09:48:26 +08:00
Yanlong Wang
11a5a90611
fix: favor nominal url over real url 2024-04-17 09:30:49 +08:00
Yanlong Wang
bda7e76e50
chore: increase max instances to target 10k concurrent requests 2024-04-17 09:22:26 +08:00
Yanlong Wang
50ed9cc248
feat: fallback to google archive (#16)
* feat: fallback to google archive

* fix
2024-04-16 09:17:45 -07:00
yanlong.wang
8a2b095bd7
fix: give expireAt for image cache 2024-04-16 15:46:05 +08:00
Han Xiao
4f284f51b6 docs: update readme 2024-04-15 21:50:34 -07:00
Han Xiao
b3fb4c5c57
feat: add image captioning (#6)
* Fix contentText assignment in CrawlerHost class

* fix: recover vscode configurations

* feat: add image captioning

* feat: add image captioning

* clean: vscode config

* chore: fix some ts warnings

* feat: auto alt text

* fix

* chore: improve prompt

* clean: unused config

* fix: failure condition

* fix: remove redundant code

* fix: catch parse error

* fix: catch parse error

---------

Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
2024-04-15 20:51:31 -07:00
Han Xiao
18373626b2 fix: catch parse error 2024-04-15 19:27:40 -07:00
Han Xiao
3134a59d8f chore: update readme 2024-04-15 17:23:16 -07:00
Han Xiao
9b190127aa fix: clean broken markdown 2024-04-13 21:40:51 -07:00
Han Xiao
7fc30dd003 docs: explain stream mode 2024-04-13 19:27:10 -07:00
Han Xiao
af2775d1aa docs: explain stream mode 2024-04-13 19:25:51 -07:00
Han Xiao
ef23d810f8 feat: clean broken markdown 2024-04-13 19:21:35 -07:00
Han Xiao
c7c039aeb1 docs: fix readme 2024-04-13 13:13:24 -07:00
Han Xiao
da8934cb9a chore: rename url2text to reader 2024-04-13 12:55:07 -07:00
Han Xiao
b6b9d39734 chore: rename url2text to reader 2024-04-13 12:51:36 -07:00
Han Xiao
ad7d95f1fe chore: rename url2text to reader 2024-04-13 12:50:45 -07:00
Han Xiao
d1d9c1e4b4 chore: rename url2text to reader 2024-04-13 12:47:55 -07:00
Han Xiao
747b9cd1a4 chore: rename url2text to reader 2024-04-13 12:42:40 -07:00
Han Xiao
4269486836 chore: rename url2text to reader 2024-04-13 12:41:38 -07:00
Han Xiao
86ba571e48 chore: rename url2text to reader 2024-04-13 12:39:00 -07:00
Han Xiao
d7fbc41ba2 chore: rename url2text to reader 2024-04-13 12:33:51 -07:00
Han Xiao
8378cb06ee chore: rename url2text to reader 2024-04-13 12:25:42 -07:00
Han Xiao
eaaaf773df chore: rename url2text to reader 2024-04-13 12:22:36 -07:00
Han Xiao
e050a5bffa Merge remote-tracking branch 'origin/main' 2024-04-13 11:42:21 -07:00
Han Xiao
8e241c7f5a chore: rename url2text to reader 2024-04-13 11:42:15 -07:00
Yanlong Wang
dbeb69582a
puppeteer stealth 2024-04-13 22:27:50 +08:00
Yanlong Wang
33d7cfc41c
fix 2024-04-13 08:25:52 +08:00
Yanlong Wang
95799988da
fix: use gpt bot UA 2024-04-13 08:13:50 +08:00
Yanlong Wang
950338261a
fix 2024-04-13 08:07:55 +08:00
Yanlong Wang
5199b00eeb
fix 2024-04-13 08:04:07 +08:00
Yanlong Wang
5ed3f90b9c
fix 2024-04-13 07:53:58 +08:00
Yanlong Wang
be7eeec11b
fix 2024-04-12 14:17:30 +08:00
Yanlong Wang
2da1b7f3a5
fix 2024-04-12 14:17:04 +08:00
Yanlong Wang
fdd8a8aa8d
fix 2024-04-12 12:27:42 +08:00
Yanlong Wang
78c8444096
fix 2024-04-12 10:59:37 +08:00
Yanlong Wang
629ab270be
fix 2024-04-12 10:24:56 +08:00
Yanlong Wang
664d4b1c9f
fix 2024-04-12 09:25:19 +08:00
Han Xiao
2dc0850c8c chore: rename url2text to reader 2024-04-11 15:44:12 -07:00
Han Xiao
c1743db305 chore: clean code 2024-04-11 15:29:57 -07:00
yanlong.wang
b29a569d39
fix 2024-04-11 19:20:17 +08:00
yanlong.wang
7e366aca68
fix 2024-04-11 19:12:07 +08:00
yanlong.wang
a9426341f6
fix 2024-04-11 19:06:45 +08:00