clear query| facets| time Search criteria: .   Results from 1 to 10 from 838 (0.0s).
Loading phrases to help you
refine your search...
[TIKA-3026] Consider extracting structure/tags where possible in PDFs with the PDFMarkedContentExtractor - Tika - [issue]
...Some PDFs contain tags that may be useful in understanding the structure of the elements within a PDF, e.g. table markup, paragraph breaks, headers, etc.    The quality of the tags depends e...    Author: Tim Allison , 2020-01-22, 14:29
[SOLR-14086] Tika ClassNotFound error due to commons-compress in solr-core dependency - Solr - [issue]
...Opening on behalf of Tim Allison since he found this in SOLR-14054.From lucene-solr repo directory:ant clean clean-maven-build clean-jars jar-checksumscd solrant packagecp package/solr-9.0.0...    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14113] Add more file types to DIH's unit tests to ensure dependency coverage - Solr - [issue]
...As part of SOLR-14054, Dawid Weiss noted that the unit tests pass without the commons-csv dependency, which is, in fact, required if a csv file is sent to DIH.  Let's add several more file t...    Author: Tim Allison , 2020-01-21, 16:57
[SOLR-14054] Upgrade Tika to 1.23 - Solr - [issue]
...We just released 1.23.  Let's upgrade Tika....    Author: Tim Allison , 2020-01-21, 16:57
[TIKA-2310] Try to order chapters in epub correctly - Tika - [issue]
...Johan van der Knijff recently pointed out on twitter that our Epub parser doesn't handle chapters in the right order.  We should try to fix our parser so that the output is in the corre...    Author: Tim Allison , 2020-01-17, 22:22
[TIKA-3027] Consider using html parser instead of xml parser for epub contents - Tika - [issue]
...We have a good number of files in our regression set whose content "xhtml" files cause problems for the XML parser.  Should we switch to the HTMLParser? To name a few:commoncrawl3/6H/6HAGP5D...    Author: Tim Allison , 2020-01-17, 20:55
[PDFBOX-4715] Need to add release version for maven-compiler-plugin - PDFBox - [issue]
...If I build PDFBox with > Java 8, but then try to run it via Tika with Java 8, I get:java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class org...    Author: Tim Allison , 2019-12-23, 22:16
[VOTE] Release Apache PDFBox 2.0.18 - PDFBox - [mail # dev]
...+1Thank you, Andreas!On Sun, Dec 22, 2019 at 2:09 PM Maruan Sahyoun wrote:> +1>> BR> Maruan>> > Hi,> >> > a candidate for the PDFBox 2.0.18 release is availa...
   Author: Tim Allison , 2019-12-23, 13:45
[TIKA-2913] Extract preview image as thumbnail in HWP 5.0 - Tika - [issue]
...HWP 5.0 files main include a PrvImage png file that is a thumbnail of the contents of the file.  We should extract these as we do other thumbnails....    Author: Tim Allison , 2019-12-22, 11:05
Build failed in Jenkins: PDFBox-2.0.x #1902 - PDFBox - [mail # dev]
...I don't _think_ this is related to my recent commit.  Do let me know,though, if I've broken yet another build this week.... :(On Thu, Dec 19, 2019 at 8:53 AM Apache Jenkins Server <[...
   Author: Tim Allison , 2019-12-19, 15:27