This PR provides the ability to split a document into several parts before synOCR processing. A given pattern is searched within the original document. If found the document is split on all pages this pattern occurs.
For pattern searching "pdfToText" is used. Splitting the original document is done by using qpdf (which is already contained in OCRMyPDF.
Searching for document split pattern in document SCN_0005.pdf
0 split pages detected in file SCN_0005.pdf
Searching for document split pattern in document test.pdf
3 split pages detected in file test.pdf
splitting pdf: pages 9-z into test-4.pdf
splitting pdf: pages 6-7 into test-3.pdf
splitting pdf: pages 4-4 into test-2.pdf
splitting pdf: pages 1-2 into test-1.pdf
document split processing finished