[TIKA-711] Word parser doesn't extract optional hyphen correctly - Tika - [issue]
...We seem not to extract the optional hyphen character correctly inthe Word parser.You can create this char in Word by typing ctrl and -.  It's hidden,normally; you have to turn on displa...    Author: Michael McCandless , 2011-10-03, 18:26
[TIKA-712] Master slide text isn't extracted - Tika - [issue]
...It looks like we are not getting text from the master slide for PPTand PPTX....    Author: Michael McCandless , 2015-03-02, 05:07
[TIKA-714] Word art isn't extracted for various doc types - Tika - [issue]    Author: Michael McCandless , 2011-11-06, 11:15
[TIKA-715] Some parsers produce non-well-formed XHTML SAX events - Tika - [issue]
...With TIKA-683 I committed simple, commented out code toSafeContentHandler, to verify that the SAX events produced by theparser have valid (matched) tags.  Ie, each startElement("foo") i...    Author: Michael McCandless , 2017-12-08, 18:49
[TIKA-717] Comment/annotation is sometimes not extracted - Tika - [issue]
...When I add an annotation or comment, I see that comment extracted for PPTX, DOC, DOCX, PDF (in one case) but not for RTF, PPT and PDF (in a different case).I think I'll just commit the test ...    Author: Michael McCandless , 2011-10-03, 10:54
[TIKA-718] PDF bookmark text isn't extracted - Tika - [issue]    Author: Michael McCandless , 2011-10-18, 19:12
[TIKA-720] EBCDIC encoding not detected - Tika - [issue]
...I have a test file encoded in EBCDIC, but Tika fails to detect it.Not sure we can realistically fix this; I have no idea how (and,realistically, one really ought to convert out of EBCDIC on ...    Author: Michael McCandless , 2011-10-20, 12:34
[TIKA-721] UTF16-LE not detected - Tika - [issue]
...I have a test file encoded in UTF16-LE, but Tika fails to detect it.Note that it is missing the BOM, which is not allowed (for UTF16-BEthe BOM is optional).Not sure we can realistically fix ...    Author: Michael McCandless , 2017-10-27, 12:07
[TIKA-722] Arabic PDF doesn't extract correctly - Tika - [issue]
...I have a PDF w/ Arabic font that Tika fails to extract (gets allgibberish).Looks like the PDF does not include the separate Unicode text metadata(hmm: would Tika extract that if it were pres...    Author: Michael McCandless , 2014-06-15, 16:07
[TIKA-724] PDF text sometimes has extra space between letters - Tika - [issue]
...I have a PDF with simple text "Here is some formatted text", but whenI extract with Tika I get extra spaces inserted:H e re  i s  so me  fo rma tte d  te x tWhen I create...    Author: Michael McCandless , 2012-02-10, 10:52