[TIKA-2705] Allow configuration of TesseractOCRParser as we do for other parsers - Tika - [issue]
...It would be handy to be able to configure tesseract via our regular tika-config set up....    Author: Tim Allison , 2019-10-21, 13:30
[TIKA-2851] Upgrade to POI 4.1.1 when available - Tika - [issue]
...There were some regressions in POI's recent 4.1.0 release in EMF/WMF handling.  Unless there are other higher priority reasons to upgrade to 4.1.0, I propose we wait for 4.1.1. My apolo...    Author: Tim Allison , 2019-10-21, 21:19
[TIKA-2779] Integrate/parameterize new rotated text handling in PDFBox - Tika - [issue]
...PDFBOX-4371 ... thank you Tilman Hausherr!...    Author: Tim Allison , 2019-10-08, 17:35
[TIKA-2967] Handle digital signature data uniformly across at least PDF and ooxml - Tika - [issue]
...There are some inconsistencies in how we handle digital signature data between PDF and ooxml.   My sense is that this info belongs in the metadata, not in the xhtml (even if clearly mar...    Author: Tim Allison , 2019-10-17, 12:36
[TIKA-2965] Add a metadata flag for XFA and XMP in PDFs - Tika - [issue]
...It would be useful to be able to determine which PDFs in a given collection contain XFA and/or XMP.  Let's add a metadata flag for those embedded files....    Author: Tim Allison , 2019-10-17, 19:36
[TIKA-3049] Improve file detection...varia - Tika - [issue]
...I recently crawled a few bugzilla issue trackers to add files to our regression corpus.  I noticed that bugzilla is able to identify the mime types of a few file types that we're not, a...    Author: Tim Allison , 2020-02-20, 21:32
[TIKA-3026] Consider extracting structure/tags where possible in PDFs with the PDFMarkedContentExtractor - Tika - [issue]
...Some PDFs contain tags that may be useful in understanding the structure of the elements within a PDF, e.g. table markup, paragraph breaks, headers, etc.    The quality of the tags depends e...    Author: Tim Allison , 2020-02-24, 19:02
[TIKA-3050] Add xmp extraction to psd files - Tika - [issue]    Author: Tim Allison , 2020-02-25, 08:22
[TIKA-3047] Upgrade to POI 4.1.2 - Tika - [issue]
...Now available at a maven repo near you!  Thank you Andreas Beeker for running the release!...    Author: Tim Allison , 2020-02-25, 08:22
[TIKA-3033] Upgrade to PDFBox 2.0.19 when available - Tika - [issue]    Author: Tim Allison , 2020-02-25, 08:22