clear query| facets| time Search criteria: .   Results from 1 to 10 from 723 (0.0s).
Loading phrases to help you
refine your search...
[expand - 2 more] - 100000 is the maximum for this record type - Tika - [mail # user]
...I confirmed that this will require the next version of POI due to a bugthat is my fault: https://bz.apache.org/bugzilla/show_bug.cgi?id=63569Many thanks to Dominik Stadler for fixing this.If...
   Author: Tim Allison , 2020-01-23, 15:21
[TIKA-3026] Consider extracting structure/tags where possible in PDFs with the PDFMarkedContentExtractor - Tika - [issue]
...Some PDFs contain tags that may be useful in understanding the structure of the elements within a PDF, e.g. table markup, paragraph breaks, headers, etc.    The quality of the tags depends e...
http://issues.apache.org/jira/browse/TIKA-3026    Author: Tim Allison , 2020-01-22, 14:29
[SOLR-14086] Tika ClassNotFound error due to commons-compress in solr-core dependency - Solr - [issue]
...Opening on behalf of Tim Allison since he found this in SOLR-14054.From lucene-solr repo directory:ant clean clean-maven-build clean-jars jar-checksumscd solrant packagecp package/solr-9.0.0...
http://issues.apache.org/jira/browse/SOLR-14086    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14113] Add more file types to DIH's unit tests to ensure dependency coverage - Solr - [issue]
...As part of SOLR-14054, Dawid Weiss noted that the unit tests pass without the commons-csv dependency, which is, in fact, required if a csv file is sent to DIH.  Let's add several more file t...
http://issues.apache.org/jira/browse/SOLR-14113    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14054] Upgrade Tika to 1.23 - Solr - [issue]
...We just released 1.23.  Let's upgrade Tika....
http://issues.apache.org/jira/browse/SOLR-14054    Author: Tim Allison , 2020-01-21, 16:57
[TIKA-2310] Try to order chapters in epub correctly - Tika - [issue]
...Johan van der Knijff recently pointed out on twitter that our Epub parser doesn't handle chapters in the right order.  We should try to fix our parser so that the output is in the corre...
http://issues.apache.org/jira/browse/TIKA-2310    Author: Tim Allison , 2020-01-17, 22:22
[TIKA-3027] Consider using html parser instead of xml parser for epub contents - Tika - [issue]
...We have a good number of files in our regression set whose content "xhtml" files cause problems for the XML parser.  Should we switch to the HTMLParser? To name a few:commoncrawl3/6H/6HAGP5D...
http://issues.apache.org/jira/browse/TIKA-3027    Author: Tim Allison , 2020-01-17, 20:55
[TIKA-2913] Extract preview image as thumbnail in HWP 5.0 - Tika - [issue]
...HWP 5.0 files main include a PrvImage png file that is a thumbnail of the contents of the file.  We should extract these as we do other thumbnails....
http://issues.apache.org/jira/browse/TIKA-2913    Author: Tim Allison , 2019-12-22, 11:05
[TIKA-3014] XLIFF12Parser fails with ToXMLHandler - Tika - [issue]
...XLIFF12Parser fails with ToXMLHandler because xml namespace isn't set, but is needed for "xml:lang".One option would be to remove the namespace on the lang attribute?David Meikle, any recomm...
http://issues.apache.org/jira/browse/TIKA-3014    Author: Tim Allison , 2019-12-19, 01:46
[expand - 1 more] - Parsing order issue - Tika - [mail # dev]
...Tilman,   That isn’t correct. I’ll find the link that might help...On Tue, Dec 17, 2019 at 1:02 PM Tilman Hausherr wrote:> I already answered... we need the PDF.>> But... ...
   Author: Tim Allison , 2019-12-18, 00:12