[TIKA-3094] Apache Tika fails to extract text for pptx extension. - Tika - [issue]
...This is regressed from 1.23 version of Apache Tika. Text extraction for .pptx ententions which was earlier working with Apache Tika 1.23 is no longer working in 1.24 version.For .ppt extenti...    Author: Abhishek Chauhan , 2020-09-30, 17:45
[TIKA-3206] commons-io : 2.6, which is a transitive dependency of tika is vulnerable to "sonatype-2018-0705". - Tika - [issue]
...Tika has embedded commons-io.2.6.jar which is vulnerable to  "sonatype-2018-0705".ISSUEsonatype-2018-0705SEVERITYSonatype CVSS 3:7.8CVE CVSS 2.0:0.0 EXPLANATIONThe commons-io package is vuln...    Author: Ankush Rana , 2020-09-30, 17:41
[TIKA-3044] add -C/--content cli option using WriteOutContentHandler - Tika - [issue]
...For text extraction, the cli currently provides both --text and --text-main options. For html files, --text will return the body, while --text-main will only return the title. There is curre...    Author: Alexander Klimetschek , 2020-09-30, 17:38
[TIKA-3205] Mime magic for more certificate related formats - Tika - [issue]
...As spotted by a Tika user on stackoverflow <>, we only have mime magic for a handful of the certificate/key related formats, and are missing ...    Author: Nick Burch , 2020-09-30, 17:31
[TIKA-3196] PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor - Tika - [issue]
...We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream...    Author: Trevor Bentley , 2020-09-29, 17:45
[TIKA-2518] tika app outputs warnings by default - Tika - [issue]
...upon downloading the latest tika and trying basic commands it spews unwanted warnings, which makes parsing output necessary.Example 1:java -jar tika-app-1.16.jar --list-detectorsDec 05, 2017...    Author: Ryan Brueske , 2020-09-28, 04:32
[TIKA-3204] License incompliance with xmp-core 6.1.10 - Tika - [issue]
...Apache Tika 1.24.1 (and probably also oder versions) has a dependency to xmp-core 6.1.10. Usage of this dependency is incompliant with its license, because distribution of xmp-core is strict...    Author: Christian Seipel , 2020-09-24, 18:26
[TIKA-3202] Tika duplicates the ocr text - Tika - [issue]
...I m using tika 1.24.1 together with tesseract from docker image apache/tika:1.24-fullThe header X-Tika-PDFocrStrategy: OCR_AND_TEXT occurs the issuethe output from pdf processing is duplicat...    Author: marek kapowicki , 2020-09-23, 05:54
[TIKA-3203] MP4Parser temporary files are not deleted from Tomcat temp folder - Tika - [issue]
...In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its temp folder ($CATALINA_HOME/temp) as "".  The MP4Parser creates files in    Author: Isabelle Giguere , 2020-09-23, 01:24
[TIKA-3200] wrong language ("tr" instead of "ru") is assigned for recognized charset "windows-1251" - Tika - [issue]
...CharsetDetector is detecting windows_1251 using detector  org.apache.tika.parser.txt.CharsetRecog_windows_1251 This detector is creating CharsetMatch with language "tr", but should be "ru". ...    Author: Alexey Lukashov , 2020-09-22, 03:36