clear query| facets| time Search criteria: .   Results from 1 to 10 from 723 (0.0s).
Loading phrases to help you
refine your search...
[expand - 2 more] - 100000 is the maximum for this record type - Tika - [mail # user]
...I confirmed that this will require the next version of POI due to a bugthat is my fault: thanks to Dominik Stadler for fixing this.If...
   Author: Tim Allison , 2020-01-23, 15:21
[TIKA-3026] Consider extracting structure/tags where possible in PDFs with the PDFMarkedContentExtractor - Tika - [issue]
...Some PDFs contain tags that may be useful in understanding the structure of the elements within a PDF, e.g. table markup, paragraph breaks, headers, etc.    The quality of the tags depends e...    Author: Tim Allison , 2020-01-22, 14:29
[SOLR-14086] Tika ClassNotFound error due to commons-compress in solr-core dependency - Solr - [issue]
...Opening on behalf of Tim Allison since he found this in SOLR-14054.From lucene-solr repo directory:ant clean clean-maven-build clean-jars jar-checksumscd solrant packagecp package/solr-9.0.0...    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14113] Add more file types to DIH's unit tests to ensure dependency coverage - Solr - [issue]
...As part of SOLR-14054, Dawid Weiss noted that the unit tests pass without the commons-csv dependency, which is, in fact, required if a csv file is sent to DIH.  Let's add several more file t...    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14054] Upgrade Tika to 1.23 - Solr - [issue]
...We just released 1.23.  Let's upgrade Tika....    Author: Tim Allison , 2020-01-21, 16:57
[TIKA-2310] Try to order chapters in epub correctly - Tika - [issue]
...Johan van der Knijff recently pointed out on twitter that our Epub parser doesn't handle chapters in the right order.  We should try to fix our parser so that the output is in the corre...    Author: Tim Allison , 2020-01-17, 22:22
[TIKA-3027] Consider using html parser instead of xml parser for epub contents - Tika - [issue]
...We have a good number of files in our regression set whose content "xhtml" files cause problems for the XML parser.  Should we switch to the HTMLParser? To name a few:commoncrawl3/6H/6HAGP5D...    Author: Tim Allison , 2020-01-17, 20:55
[TIKA-2913] Extract preview image as thumbnail in HWP 5.0 - Tika - [issue]
...HWP 5.0 files main include a PrvImage png file that is a thumbnail of the contents of the file.  We should extract these as we do other thumbnails....    Author: Tim Allison , 2019-12-22, 11:05
[TIKA-3014] XLIFF12Parser fails with ToXMLHandler - Tika - [issue]
...XLIFF12Parser fails with ToXMLHandler because xml namespace isn't set, but is needed for "xml:lang".One option would be to remove the namespace on the lang attribute?David Meikle, any recomm...    Author: Tim Allison , 2019-12-19, 01:46
[expand - 1 more] - Parsing order issue - Tika - [mail # dev]
...Tilman,   That isn’t correct. I’ll find the link that might help...On Tue, Dec 17, 2019 at 1:02 PM Tilman Hausherr wrote:> I already answered... we need the PDF.>> But... ...
   Author: Tim Allison , 2019-12-18, 00:12