clear query| facets| time Search criteria: *:*.   Results from 1 to 10 from 3043 (0.0s).
Loading phrases to help you
refine your search...
[TIKA-3042] Date format extraction problem in XLS/XLSX - Tika - [issue]
...Currently TIKA/ManifoldCF 2.10 extracts dates from the attached file tis way:2018.05.10 -> 10/05/182002.02.02 -> 2/2/2We need this format:2018.05.10 -> 2018-05-102002.02.02 -> 20...
http://issues.apache.org/jira/browse/TIKA-3042    Author: Zoltan Farago , 2020-02-21, 07:21
[TIKA-3050] Add xmp extraction to psd files - Tika - [issue]
http://issues.apache.org/jira/browse/TIKA-3050    Author: Tim Allison , 2020-02-20, 22:03
[TIKA-3049] Improve file detection...varia - Tika - [issue]
...I recently crawled a few bugzilla issue trackers to add files to our regression corpus.  I noticed that bugzilla is able to identify the mime types of a few file types that we're not, a...
http://issues.apache.org/jira/browse/TIKA-3049    Author: Tim Allison , 2020-02-20, 21:32
[TIKA-3048] Tika unable to parse html files with non UTF-8 charset - Tika - [issue]
...Tika is returning junk characters when parsing chinese characters present inside html file. Html file have charset mentioned as GB2312 explicitly.<head><meta http-equiv=Content-Type...
http://issues.apache.org/jira/browse/TIKA-3048    Author: Akash , 2020-02-20, 19:35
[TIKA-2650] Soft-hyphen is not extracted properly - Tika - [issue]
...We are tring to extract text from PDF. if PDF having any big word at the end of line then after half word there is soft hyphen and remaining word goes to next line. but which extracting thes...
http://issues.apache.org/jira/browse/TIKA-2650    Author: Saurabh Patil , 2020-02-19, 18:46
[TIKA-3047] Upgrade to POI 4.1.2 - Tika - [issue]
...Now available at a maven repo near you!  Thank you Andreas Beeker for running the release!...
http://issues.apache.org/jira/browse/TIKA-3047    Author: Tim Allison , 2020-02-14, 21:59
[TIKA-3046] Add detection of some open office related formats - Tika - [issue]
...Add format detection for .cdr, .bau, .sob, .oxt, .odp, .odb. In unpacking attachments to Libre Office's bug tracker, I found that our zip package detector didn't recognize these formats....
http://issues.apache.org/jira/browse/TIKA-3046    Author: Tim Allison , 2020-02-14, 16:55
[TIKA-3045] Allow users to run custom parsing of xfa and xmp - Tika - [issue]
...We currently do some processing of xfa and xmp, but some users may want more control over parsing these embedded file types....
http://issues.apache.org/jira/browse/TIKA-3045    Author: Tim Allison , 2020-02-14, 12:38
[TIKA-3043] vorbis-java-tika overwrites tika's Parser and Detector in MANIFEST - Tika - [issue]
...We are using tika-parsers version 1.21 as a gradle dependency. When gradle builds the jar the org.apache.tika.parser.Parser and org.apache.tika.detect.Detector are overwritten by vorbis-java...
http://issues.apache.org/jira/browse/TIKA-3043    Author: CHARUSHEELA BOPARDIKAR , 2020-02-14, 11:11
[TIKA-3044] add -C/--content cli option using WriteOutContentHandler - Tika - [issue]
...For text extraction, the cli currently provides both --text and --text-main options. For html files, --text will return the body, while --text-main will only return the title. There is curre...
http://issues.apache.org/jira/browse/TIKA-3044    Author: Alexander Klimetschek , 2020-02-14, 06:45