[TIKA-2521] SAX-based docx/pptx should start a new line before second paragraph within a cell - Tika - [issue]    Author: Tim Allison , 2017-12-08, 18:55
[TIKA-2522] Trivial regression in MSWord parser -- not extracting Encite Add in text any more - Tika - [issue]
...In Tika 1.16, we used to extract runs starting with ADDIN EN.CITE <EndNote><Cite><Author>SwerdloffBecause of TIKA-2459, we no longer extract this text....    Author: Tim Allison , 2017-12-08, 18:07
[TIKA-2523] Regression in ppt parsing -- "typeface can't be null or empty" - Tika - [issue]
...We noticed a regression in ppt parsing in POI 3.17 while running the large scale regression tests in prep for the release of Tika 1.17.  There are about 200 new exceptions, but it looks...    Author: Tim Allison , 2017-12-11, 13:34
[TIKA-2525] Upgrade to POI 3.17.1 when available - Tika - [issue]
...There's a minor regression in ppt parsing in POI 3.17, and some areas for improvement for use with Java 9.  Once POI 3.17.1 is out, let's upgrade to that....    Author: Tim Allison , 2017-12-11, 13:35
[TIKA-2528] Fix key location, keys file and download link - Tika - [issue]
...Thanks to Sebb for pointing out areas for improvement in our release process:    Author: Tim Allison , 2017-12-23, 14:31
[TIKA-2533] Improve embedded image extraction in PDFs - Tika - [issue]
...PDFBOX-4043, Tilman Hausherr pinged us to fix a parallel bug in our extraction of images.  Given that we're copying/pasting from PDFBox's ExtractImages, we should fix that bug and consi...    Author: Tim Allison , 2017-12-21, 13:01
[TIKA-2553] Upgrade compiler definition to Java 8 - Tika - [issue]
...As discussed on the mailing list, and announced during the release of 1.17.Question: Do we want to do this for both 2.0.0 and 1.18 or 2.0.0 only?...    Author: Tim Allison , 2018-01-23, 21:47
[TIKA-2563] Extract embedded objects in HTML and javascript - Tika - [issue]
...Files (esp images) and other objects can be embedded in html/css/javascript with the data: uri scheme.  We should extract those like any other embedded file....    Author: Tim Allison , 2018-02-23, 14:11
[TIKA-2600] Don't use md5 checksum due to changes to the release distribuition policy - Tika - [issue]
...To plagiarize from PDFBOX-4142:The release distribution policy was changes with regard to the checksums to be used:Old policy :MUST provide a MD5-fileSHOULD provide a SHA-file [SHA-512 r...    Author: Tim Allison , 2018-03-07, 21:20
[TIKA-1874] Fix rare npe in XWPFListManager - Tika - [issue]
...Many thanks to Dominik Stadler's CommonCrawlDocumentDownload, I recently grabbed .docx files from the initial index that comes with that code.  I'll be adding these docs to our regular ...    Author: Tim Allison , 2016-02-25, 18:40