[TIKA-735] OpenOffice parser: embedded OLE docs are extracted at the end, as extra <html>...</html> - Tika - [issue]
...When I have an OpenOffice presentation (ODP) that embeds (OLE)objects, in this case OpenOffice text, text from the embedded objectsis at the end of the presentation.It's great that we are ex...    Author: Michael McCandless , 2011-10-01, 18:12
[TIKA-736] OpenOffice parser: master footer text isn't extracted - Tika - [issue]
...If I edit the footer text on the master slide of an OpenOffice presentation, I see that text rendered on the slide, but it's not extracted by Tika.Digging into the document, curiously the fo...    Author: Michael McCandless , 2013-12-17, 16:35
[TIKA-692] TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag - Tika - [issue]
...[Note: spinoff from the tika-dev thread "Issue in text extraction inSolr / Tika" on Aug 19 2011, by nirnaydewan]When parsing a Word doc where some contiguous text is bolded, due todifference...    Author: Michael McCandless , 2011-10-20, 12:34
[TIKA-909] ForkParser doens't return Metadata - Tika - [issue]
...It's not such a big limitation, because you can eg see the metadata in the XHTML header...Spinoff from thread "ForkParser and Metadata" on tika-user    Author: Michael McCandless , 2012-11-04, 13:33
[TIKA-931] Tika's PDFParser fails to parse documents embedded in a PDF Package - Tika - [issue]
...In working on PDFBOX-1297, I realized Tika's PDFParser also doesn'tvisit documents embedded with a PDF document (ie a PDF package).Tika can actually handle this better than ExtractText since...    Author: Michael McCandless , 2012-05-22, 13:18
[TIKA-207] MS word doc containing tracked changes produces incorrect text - Tika - [issue]
...Spinoff from this discussion: extracting text from an MS Word doc (2003 format) that hasu...    Author: Michael McCandless , 2018-02-28, 18:06
[TIKA-1628] ExternalParser.check should return false if it hits SecurityException - Tika - [issue]
...If you run Tika with a Java security manager that blocks execution of external processes, ExternalParser.check throws SecurityException, but I think it should just return false?...    Author: Michael McCandless , 2015-05-12, 21:52
[TIKA-1048] XMLParser should add whitespace between elements - Tika - [issue]
...If the incoming XML is compact (ie doesn't have whitespace between elements), I think we should somehow add whitespace between elements when extracting text?...    Author: Michael McCandless , 2013-01-06, 23:49
[TIKA-1067] Tika extracts non-existent asterisks (*) from .ppt files - Tika - [issue]
...I created a new blank presentation, put in title + subtitle, saved it as .ppt, and then ran TikaCLI -t:<body><div class="slideShow"><div class="slide"><p class="slide-ma...    Author: Michael McCandless , 2015-03-13, 23:35
[TIKA-1072] AIOOBE when handling embedded document in .doc file - Tika - [issue]
...I have a Word (.doc) document that hits an exception when I run:java -jar tika-app/target/tika-app-1.4-SNAPSHOT.jar /x/tmp/20-Force-on-a-current-S00.doc Here's the exception:Caused by: java....    Author: Michael McCandless , 2015-03-14, 00:44