Yanlong Wang
|
5d255dda3b
|
chore: update deps
|
2024-04-19 09:30:19 +08:00 |
|
Charuka Samarakoon
|
d47310a6f7
|
fix: allocating incorrect max value due to missing parentheses (#26)
|
2024-04-19 09:01:23 +08:00 |
|
yanlong.wang
|
d4ca381c38
|
fix: explicitly reject non http protocols
|
2024-04-18 15:35:06 +08:00 |
|
yanlong.wang
|
abc817e960
|
feat: block media resources to improve speed
|
2024-04-18 15:06:28 +08:00 |
|
yanlong.wang
|
cbc13ecbbd
|
fix: catch turndown errors
|
2024-04-18 13:51:54 +08:00 |
|
yanlong.wang
|
0975b35ca2
|
chore: turn up concurrency a bit base on analysis
|
2024-04-18 11:53:55 +08:00 |
|
yanlong.wang
|
a211366501
|
fix: expose publishedTime if possible
|
2024-04-17 12:36:36 +08:00 |
|
Yanlong Wang
|
6e36f0a447
|
fix: url wrong normalization
|
2024-04-17 09:55:41 +08:00 |
|
Yanlong Wang
|
781b835466
|
fix: keep url details
|
2024-04-17 09:48:26 +08:00 |
|
Yanlong Wang
|
11a5a90611
|
fix: favor nominal url over real url
|
2024-04-17 09:30:49 +08:00 |
|
Yanlong Wang
|
bda7e76e50
|
chore: increase max instances to target 10k concurrent requests
|
2024-04-17 09:22:26 +08:00 |
|
Yanlong Wang
|
50ed9cc248
|
feat: fallback to google archive (#16)
* feat: fallback to google archive
* fix
|
2024-04-16 09:17:45 -07:00 |
|
yanlong.wang
|
8a2b095bd7
|
fix: give expireAt for image cache
|
2024-04-16 15:46:05 +08:00 |
|
Han Xiao
|
b3fb4c5c57
|
feat: add image captioning (#6)
* Fix contentText assignment in CrawlerHost class
* fix: recover vscode configurations
* feat: add image captioning
* feat: add image captioning
* clean: vscode config
* chore: fix some ts warnings
* feat: auto alt text
* fix
* chore: improve prompt
* clean: unused config
* fix: failure condition
* fix: remove redundant code
* fix: catch parse error
* fix: catch parse error
---------
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
|
2024-04-15 20:51:31 -07:00 |
|
Han Xiao
|
18373626b2
|
fix: catch parse error
|
2024-04-15 19:27:40 -07:00 |
|
Han Xiao
|
9b190127aa
|
fix: clean broken markdown
|
2024-04-13 21:40:51 -07:00 |
|
Han Xiao
|
ef23d810f8
|
feat: clean broken markdown
|
2024-04-13 19:21:35 -07:00 |
|
Han Xiao
|
8378cb06ee
|
chore: rename url2text to reader
|
2024-04-13 12:25:42 -07:00 |
|
Han Xiao
|
e050a5bffa
|
Merge remote-tracking branch 'origin/main'
|
2024-04-13 11:42:21 -07:00 |
|
Han Xiao
|
8e241c7f5a
|
chore: rename url2text to reader
|
2024-04-13 11:42:15 -07:00 |
|
Yanlong Wang
|
dbeb69582a
|
puppeteer stealth
|
2024-04-13 22:27:50 +08:00 |
|
Yanlong Wang
|
33d7cfc41c
|
fix
|
2024-04-13 08:25:52 +08:00 |
|
Yanlong Wang
|
95799988da
|
fix: use gpt bot UA
|
2024-04-13 08:13:50 +08:00 |
|
Yanlong Wang
|
950338261a
|
fix
|
2024-04-13 08:07:55 +08:00 |
|
Yanlong Wang
|
5199b00eeb
|
fix
|
2024-04-13 08:04:07 +08:00 |
|
Yanlong Wang
|
5ed3f90b9c
|
fix
|
2024-04-13 07:53:58 +08:00 |
|
Yanlong Wang
|
be7eeec11b
|
fix
|
2024-04-12 14:17:30 +08:00 |
|
Yanlong Wang
|
2da1b7f3a5
|
fix
|
2024-04-12 14:17:04 +08:00 |
|
Yanlong Wang
|
fdd8a8aa8d
|
fix
|
2024-04-12 12:27:42 +08:00 |
|
Yanlong Wang
|
78c8444096
|
fix
|
2024-04-12 10:59:37 +08:00 |
|
Yanlong Wang
|
629ab270be
|
fix
|
2024-04-12 10:24:56 +08:00 |
|
Yanlong Wang
|
664d4b1c9f
|
fix
|
2024-04-12 09:25:19 +08:00 |
|
Han Xiao
|
2dc0850c8c
|
chore: rename url2text to reader
|
2024-04-11 15:44:12 -07:00 |
|
Han Xiao
|
c1743db305
|
chore: clean code
|
2024-04-11 15:29:57 -07:00 |
|
yanlong.wang
|
b29a569d39
|
fix
|
2024-04-11 19:20:17 +08:00 |
|
yanlong.wang
|
7e366aca68
|
fix
|
2024-04-11 19:12:07 +08:00 |
|
yanlong.wang
|
a9426341f6
|
fix
|
2024-04-11 19:06:45 +08:00 |
|
yanlong.wang
|
5cfb78b275
|
gfm
|
2024-04-11 19:06:06 +08:00 |
|
yanlong.wang
|
9d0d54e511
|
fix
|
2024-04-11 19:00:27 +08:00 |
|
yanlong.wang
|
e17ef6dba0
|
fix
|
2024-04-11 18:28:51 +08:00 |
|
yanlong.wang
|
77174f1511
|
fix
|
2024-04-11 17:24:42 +08:00 |
|
yanlong.wang
|
94e65381bd
|
fix
|
2024-04-11 17:14:41 +08:00 |
|
yanlong.wang
|
b2f8b11cdc
|
wip
|
2024-04-10 19:57:00 +08:00 |
|
yanlong.wang
|
b46e859a30
|
wip
|
2024-04-10 19:43:53 +08:00 |
|
yanlong.wang
|
89d6d49f06
|
wip
|
2024-04-10 19:32:07 +08:00 |
|