Text content and OCR #18

Closed
opened 2025-03-04 10:33:41 +00:00 by aonrud · 1 comment
Owner

V2 should have some capacity to search the document contents.

Possible solution: on save, use pymupdf to get text content of each PDF page (or OCR, if empty) and save to an additional ItemPageText model

V2 should have some capacity to search the document contents. Possible solution: on save, use pymupdf to get text content of each PDF page (or OCR, if empty) and save to an additional ItemPageText model
Author
Owner

Close in favour of #21. Search part should be separate.

Close in favour of #21. Search part should be separate.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Irish-Left-Archive/ILAv2#18
No description provided.