Elian Doran
1f0a6b4a79
feat(ocr): add OCR ( #5834 )
2026-04-03 11:35:36 +03:00
Elian Doran
e539b11718
chore(ocr): upgrade to officeprocessor v6 to avoid pdfjs issues
2026-04-03 11:11:53 +03:00
Elian Doran
2fca8c3850
fix(build): missing pdfjs-dist
2026-04-03 10:33:19 +03:00
Elian Doran
0d3f70a231
chore(server): try to bypass officeparser PDFjs issue
2026-04-03 10:02:54 +03:00
Elian Doran
a3a52aaafe
chore(ocr): switch to unpdf due to issues with pdfjs-dist
2026-04-03 09:22:56 +03:00
Elian Doran
a6c4401973
chore(server): remove pdf-parse dependency
2026-04-03 09:04:56 +03:00
Elian Doran
2e34ec2a17
chore(server): remove sharp from externals
2026-04-03 09:04:04 +03:00
Elian Doran
927afec83c
chore(ocr): remove multi-page TIFF support for now to remove dependency to sharp
2026-04-03 08:50:02 +03:00
renovate[bot]
4f571fc3d7
fix(deps): update dependency i18next to v26.0.2
2026-04-03 00:49:43 +00:00
Elian Doran
9878f76f65
fix(ocr): sharp failing on Alpine
2026-04-02 22:56:22 +03:00
Elian Doran
23799562ae
refactor(ocr): reuse office processor for PDFs
2026-04-02 22:53:57 +03:00
Elian Doran
f441a145b5
fix(server): prod not starting due to bundling issues
2026-04-02 22:42:53 +03:00
Elian Doran
7189764916
chore(ocr): support overriding cache dir
2026-04-02 22:00:37 +03:00
Elian Doran
70bc707e3a
chore(ocr): address requested changes
2026-04-02 21:58:54 +03:00
Elian Doran
90215bde8b
chore(ocr): remove unnecessary index
2026-04-02 21:55:07 +03:00
Elian Doran
2b3ae5285b
test(server): update integration DB to latest migration
2026-04-02 21:49:19 +03:00
Elian Doran
9b6d0db5b6
test(server): fix outdated tests in search result
2026-04-02 21:48:42 +03:00
Elian Doran
723da88ff8
chore(ocr): disable auto-processing by default
2026-04-02 21:46:05 +03:00
Elian Doran
5bcf2f4356
chore(deps): remove deprecated types for tesseract
2026-04-02 21:34:32 +03:00
Elian Doran
82e723c915
test(ocr): fix broken tests
2026-04-02 21:27:46 +03:00
Elian Doran
3da416908d
feat(ocr): display content snippet in quick search
2026-04-02 21:04:18 +03:00
Elian Doran
d79d2e9ad2
fix(ocr): too many blob queries in search
2026-04-02 20:58:11 +03:00
Elian Doran
30ba36894d
chore(ocr): optimize search algorithm
...
OCRContentExpression now takes all tokens as an array (like NoteContentFulltextExp), iterates over the input note set from becca, and checks text representations in-memory — zero SQL queries.
parse.ts creates a single OCRContentExpression(tokens) instead of N separate instances.
The LIMIT 50 and the N+1 blob→note/attachment queries are gone entirely.
2026-04-02 20:54:22 +03:00
Elian Doran
b747402352
chore(ocr): get rid of costly ranking for OCR
2026-04-02 20:48:41 +03:00
Elian Doran
0398a9bda3
refactor(ocr): potential race condition with image upload
2026-04-02 20:40:17 +03:00
Elian Doran
72dff88384
refactor(ocr): get rid of unused routes and services
2026-04-02 20:34:37 +03:00
Elian Doran
0314a9755f
refactor(ocr): minor changes
2026-04-02 20:32:58 +03:00
Elian Doran
bc967b15b2
chore(server): fix accidental changes
2026-04-02 20:28:17 +03:00
Elian Doran
8ac686a19f
fix(ocr): TIFF overlapping with image processor
2026-04-02 20:26:31 +03:00
Elian Doran
aafecaa3a4
refactor(ocr): get rid of fake metadata
2026-04-02 20:24:31 +03:00
Elian Doran
bb23b08b15
refactor(ocr): get rid of unused clean up
2026-04-02 20:23:03 +03:00
Elian Doran
476396da53
refactor(ocr): deduplicate batch processing
2026-04-02 20:19:32 +03:00
Elian Doran
5112971848
refactor(ocr): reduce duplication
2026-04-02 20:17:24 +03:00
Elian Doran
2d852c38ec
feat(ocr): automatic processing of attachments
2026-04-02 20:00:55 +03:00
Elian Doran
f163cacddc
feat(ocr): integrate viewing attachment OCR
2026-04-02 19:51:11 +03:00
Elian Doran
6ecb1cb2b0
feat(settings): cross-reference OCR and language & region settings
2026-04-02 17:09:27 +03:00
Elian Doran
24fefe0711
refactor(ocr): remove unnecessary methods
2026-04-02 13:17:38 +03:00
Elian Doran
e5eba69d0d
fix(ocr): cannot handle image/tiff
2026-04-02 12:51:58 +03:00
Elian Doran
bdd2b7e317
fix(ocr): properly handle office MIME types
2026-04-02 12:41:45 +03:00
Elian Doran
ad29375975
chore(ocr): remove unimplemented logic
2026-04-02 12:36:10 +03:00
Elian Doran
cf73a4ef43
feat(llm): integrate with OCR
2026-04-02 12:16:17 +03:00
Elian Doran
60a2621928
chore(ocr): remove last extraction date
...
Wasn't useful because blobs are hash-identified
2026-04-02 12:04:27 +03:00
Elian Doran
b4e5d9dbc2
feat(ocr): not well integrate with sync
2026-04-02 11:43:19 +03:00
Elian Doran
722efd74c2
fix(ocr): default confidence level is too low
2026-04-02 11:06:58 +03:00
Elian Doran
5dc9b6defe
chore(ocr): deduplicate & fix percentage for confidence in log
2026-04-02 11:04:26 +03:00
Elian Doran
605fbaaa4a
fix(ocr): automatic OCR not respecting language
2026-04-02 11:01:20 +03:00
Elian Doran
23b46865c5
refactor(ocr): simplify initialization of image processor
2026-04-02 10:59:58 +03:00
Elian Doran
ac310eaaf5
feat(ocr): handle cache dir properly
2026-04-02 10:54:15 +03:00
Elian Doran
44a5dccd61
chore(ocr): remove master switch
2026-04-02 10:22:34 +03:00
Elian Doran
64318c92e7
fix(ocr): route default interfering with content language
2026-04-02 10:00:12 +03:00