[expand - 1 more] - PDFBox regression tests? - PDFBox - [mail # dev]
...Looks like there may be some issues with Japanese...don't know if this isrelated to your observation?It feels like when I sort by ascending order ofNUM_COMMON_TOKENS_DIFF_IN_B, there are qui...
   Author: Tim Allison , 2020-07-30, 10:44
[expand - 1 more] - Error when parsing of Excel files - Tika - [mail # user]
...> as for files, in my case they are from customer and I don't want to sharethem.
   Author: Tim Allison , 2020-07-29, 16:35
[TIKA-3147] Strip punctuation in lang id component within tika-eval - Tika - [issue]
...I noticed that "the quick brown fox jumped over the lazy dog" was identified as English in tika-eval.  However, if I added semi-colons, it was identified as Chinese.This is in alignment...    Author: Tim Allison , 2020-07-27, 20:58
[TIKA-3145] Add a content digester to tika-eval text stats - Tika - [issue]
...When comparing files, it can be useful to digest the text contents so that users can identify files that may have duplicate content but different overall digests.  Let's add a content d...    Author: Tim Allison , 2020-07-24, 22:43
[TIKA-3146] Add Nutch's TextProfileSignature digest to tika-eval's text stats - Tika - [issue]
... require trivial modifications to work within the tika-eval context.  As with TIKA-31...    Author: Tim Allison , 2020-07-24, 22:43
Tika extract images - Tika - [mail # dev]
...Which endpoint are you using?On Wed, Jul 22, 2020 at 1:36 PM Tilman Hausherr wrote:> What happens when you try to do the same with tika-app from the command> line?>> Tilman>&g...
   Author: Tim Allison , 2020-07-22, 23:56
[TIKA-3143] Enable custom resources and writers in tika-server - Tika - [issue]
...We've put in a fair amount of work into the configuration and robustness of tika-server.  I think it would be useful to enable users and even other modules within Tika to add custom han...    Author: Tim Allison , 2020-07-20, 17:09
[TIKA-3140] Add a metadata filter for tika-eval - Tika - [issue]
...If we go forward with TIKA-3137, it would be useful to add a metadatafilter for tika-eval's text stats, including the junk detector....    Author: Tim Allison , 2020-07-17, 20:24
[TIKA-3142] Update Jenkins for main branch, maybe turn on more modern jdks - Tika - [issue]
...I think I did this for Tika-trunk.  I modified our jdk7 to jdk11 on tika-master.  Once Tika-trunk completes, I'll rename it to tika-master-jdk8 unless there are objections.What els...    Author: Tim Allison , 2020-07-17, 19:35
[TIKA-3137] Enable a metadata filter for the RecursiveParserWrapper - Tika - [issue]
...The RecursiveParserWrapper is designed to extract all metadata from every embedded file.  Some users may need more targeted ways of filtering the metadata to save on resources, e.g. mem...    Author: Tim Allison , 2020-07-17, 19:26