[TIKA-3042] Date format extraction problem in XLS/XLSX - Tika - [issue]
...Currently TIKA/ManifoldCF 2.10 extracts dates from the attached file tis way:2018.05.10 -> 10/05/182002.02.02 -> 2/2/2We need this format:2018.05.10 -> 2018-05-102002.02.02 -> 20...    Author: Zoltan Farago , 2020-02-21, 07:21
[TIKA-3050] Add xmp extraction to psd files - Tika - [issue]    Author: Tim Allison , 2020-02-20, 22:03
[TIKA-3049] Improve file detection...varia - Tika - [issue]
...I recently crawled a few bugzilla issue trackers to add files to our regression corpus.  I noticed that bugzilla is able to identify the mime types of a few file types that we're not, a...    Author: Tim Allison , 2020-02-20, 21:32
[TIKA-3048] Tika unable to parse html files with non UTF-8 charset - Tika - [issue]
...Tika is returning junk characters when parsing chinese characters present inside html file. Html file have charset mentioned as GB2312 explicitly.<head><meta http-equiv=Content-Type...    Author: Akash , 2020-02-20, 19:35
[TIKA-2650] Soft-hyphen is not extracted properly - Tika - [issue]
...We are tring to extract text from PDF. if PDF having any big word at the end of line then after half word there is soft hyphen and remaining word goes to next line. but which extracting thes...    Author: Saurabh Patil , 2020-02-19, 18:46
[TIKA-3047] Upgrade to POI 4.1.2 - Tika - [issue]
...Now available at a maven repo near you!  Thank you Andreas Beeker for running the release!...    Author: Tim Allison , 2020-02-14, 21:59
[TIKA-3046] Add detection of some open office related formats - Tika - [issue]
...Add format detection for .cdr, .bau, .sob, .oxt, .odp, .odb. In unpacking attachments to Libre Office's bug tracker, I found that our zip package detector didn't recognize these formats....    Author: Tim Allison , 2020-02-14, 16:55
[TIKA-3045] Allow users to run custom parsing of xfa and xmp - Tika - [issue]
...We currently do some processing of xfa and xmp, but some users may want more control over parsing these embedded file types....    Author: Tim Allison , 2020-02-14, 12:38
[TIKA-3043] vorbis-java-tika overwrites tika's Parser and Detector in MANIFEST - Tika - [issue]
...We are using tika-parsers version 1.21 as a gradle dependency. When gradle builds the jar the org.apache.tika.parser.Parser and org.apache.tika.detect.Detector are overwritten by vorbis-java...    Author: CHARUSHEELA BOPARDIKAR , 2020-02-14, 11:11
[TIKA-3044] add -C/--content cli option using WriteOutContentHandler - Tika - [issue]
...For text extraction, the cli currently provides both --text and --text-main options. For html files, --text will return the body, while --text-main will only return the title. There is curre...    Author: Alexander Klimetschek , 2020-02-14, 06:45