clear query| facets| time Search criteria: .   Results from 1 to 10 from 838 (0.0s).
Loading phrases to help you
refine your search...
[TIKA-3026] Consider extracting structure/tags where possible in PDFs with the PDFMarkedContentExtractor - Tika - [issue]
...Some PDFs contain tags that may be useful in understanding the structure of the elements within a PDF, e.g. table markup, paragraph breaks, headers, etc.    The quality of the tags depends e...
http://issues.apache.org/jira/browse/TIKA-3026    Author: Tim Allison , 2020-01-22, 14:29
[SOLR-14086] Tika ClassNotFound error due to commons-compress in solr-core dependency - Solr - [issue]
...Opening on behalf of Tim Allison since he found this in SOLR-14054.From lucene-solr repo directory:ant clean clean-maven-build clean-jars jar-checksumscd solrant packagecp package/solr-9.0.0...
http://issues.apache.org/jira/browse/SOLR-14086    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14113] Add more file types to DIH's unit tests to ensure dependency coverage - Solr - [issue]
...As part of SOLR-14054, Dawid Weiss noted that the unit tests pass without the commons-csv dependency, which is, in fact, required if a csv file is sent to DIH.  Let's add several more file t...
http://issues.apache.org/jira/browse/SOLR-14113    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14054] Upgrade Tika to 1.23 - Solr - [issue]
...We just released 1.23.  Let's upgrade Tika....
http://issues.apache.org/jira/browse/SOLR-14054    Author: Tim Allison , 2020-01-21, 16:57
[TIKA-2310] Try to order chapters in epub correctly - Tika - [issue]
...Johan van der Knijff recently pointed out on twitter that our Epub parser doesn't handle chapters in the right order.  We should try to fix our parser so that the output is in the corre...
http://issues.apache.org/jira/browse/TIKA-2310    Author: Tim Allison , 2020-01-17, 22:22
[TIKA-3027] Consider using html parser instead of xml parser for epub contents - Tika - [issue]
...We have a good number of files in our regression set whose content "xhtml" files cause problems for the XML parser.  Should we switch to the HTMLParser? To name a few:commoncrawl3/6H/6HAGP5D...
http://issues.apache.org/jira/browse/TIKA-3027    Author: Tim Allison , 2020-01-17, 20:55
[PDFBOX-4715] Need to add release version for maven-compiler-plugin - PDFBox - [issue]
...If I build PDFBox with > Java 8, but then try to run it via Tika with Java 8, I get:java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class org...
http://issues.apache.org/jira/browse/PDFBOX-4715    Author: Tim Allison , 2019-12-23, 22:16
[VOTE] Release Apache PDFBox 2.0.18 - PDFBox - [mail # dev]
...+1Thank you, Andreas!On Sun, Dec 22, 2019 at 2:09 PM Maruan Sahyoun wrote:> +1>> BR> Maruan>> > Hi,> >> > a candidate for the PDFBox 2.0.18 release is availa...
   Author: Tim Allison , 2019-12-23, 13:45
[TIKA-2913] Extract preview image as thumbnail in HWP 5.0 - Tika - [issue]
...HWP 5.0 files main include a PrvImage png file that is a thumbnail of the contents of the file.  We should extract these as we do other thumbnails....
http://issues.apache.org/jira/browse/TIKA-2913    Author: Tim Allison , 2019-12-22, 11:05
Build failed in Jenkins: PDFBox-2.0.x #1902 - PDFBox - [mail # dev]
...I don't _think_ this is related to my recent commit.  Do let me know,though, if I've broken yet another build this week.... :(On Thu, Dec 19, 2019 at 8:53 AM Apache Jenkins Server <[...
   Author: Tim Allison , 2019-12-19, 15:27