clear query| facets| time Search criteria: author:"Michael McCandless".   Results from 1 to 10 from 54 (0.0s).
Loading phrases to help you
refine your search...
[TIKA-738] Tika fails to extract text from PDF annotations - Tika - [issue]
...Spinoff from TIKA-717....
http://issues.apache.org/jira/browse/TIKA-738    Author: Michael McCandless , 2011-11-26, 19:54
[TIKA-742] PDF2XHTML fails to insert <p> nor space around page marker - Tika - [issue]
...I have a test document (unfortunately not committable) whose pagenumbers are rendered with no separator (<p> nor space) before the nextword.  So I have words like: 1Massachusetts ...
http://issues.apache.org/jira/browse/TIKA-742    Author: Michael McCandless , 2011-10-05, 10:43
[TIKA-751] Small improvements to how embedded docs are parsed in AbstractPOIFSExtractor.handleEmbeddedOfficeDoc - Tika - [issue]
...I noticed some minor things in this method: It does too much work (writes the tmpFile out) if the    EmbeddedDocumentExtractor didn't want to actually parse file    file....
http://issues.apache.org/jira/browse/TIKA-751    Author: Michael McCandless , 2011-10-12, 19:19
[TIKA-753] Improve performance when parsing embedded Office docs - Tika - [issue]
http://issues.apache.org/jira/browse/TIKA-753    Author: Michael McCandless , 2011-10-20, 12:37
[TIKA-757] Address TODOs when we upgrade to next POI release (3.8 beta 5) - Tika - [issue]
...I'm opening a blanket issue to remind us all to address the TODOs in the sources for when we upgrade to the next POI.I think this (a single blanket issue) is better than keeping separate iss...
http://issues.apache.org/jira/browse/TIKA-757    Author: Michael McCandless , 2012-07-01, 21:23
[TIKA-758] Address TODOs when we upgrade to next PDFBox release - Tika - [issue]
...Like TIKA-757 for POI, I'm opening this blanket issue to address any TODOs in the code when we next upgrade PDFBox....
http://issues.apache.org/jira/browse/TIKA-758    Author: Michael McCandless , 2015-03-02, 20:51
[TIKA-767] Enable controlling of PDFBOX's setSuppressDuplicateOverlappingText from PDFParser - Tika - [issue]
...Given that there are some problems with how overlapping text isremoved (slow performance: PDFBOX-956; some chars incorrectly skipped:PDFBOX-1155), I think we should make this controllable fr...
http://issues.apache.org/jira/browse/TIKA-767    Author: Michael McCandless , 2011-11-04, 16:28
[TIKA-1010] Embedded documents in RTF are not extracted - Tika - [issue]
...When an RTF doc embeds a doc it looks like this:{\object\objemb\objw628\objh765{\*\objclass Package}{\*\objdata 0105000002000000080000005061636b61676500000000000000000066000000020048772e7478...
http://issues.apache.org/jira/browse/TIKA-1010    Author: Michael McCandless , 2016-02-01, 16:32
[TIKA-1011] Exception (Null charset name) processing .mhtml file - Tika - [issue]
...This small test.mhtml file:From: <Saved by Windows Internet Explorer 8>Subject: Index PagesDate: Tue, 28 Aug 2012 09:53:28 +0300MIME-Version: 1.0Content-Type: multipart/related; type="...
http://issues.apache.org/jira/browse/TIKA-1011    Author: Michael McCandless , 2012-10-26, 21:07
[TIKA-1015] Word (.doc) embedded files don't set relationship ID in the Metadata - Tika - [issue]
http://issues.apache.org/jira/browse/TIKA-1015    Author: Michael McCandless , 2012-10-31, 15:07