[TIKA-94] Speech recognition - Tika - [issue]
...Like OCR for image files (TIKA-93), we could try using speech recognition to extract text content (where available) from audio (and video!) files.The CMU Sphinx engine (http://cmusphinx.sour...    Author: Jukka Zitting , 2018-06-13, 07:54
Thread-safety and locking of methods Tika.detect(...) and MimeType.detect(...) - Tika - [mail # user]
...Hi,Based on the Xerces discussion it sounds like using a pool of parserswould be the best approach.Best,JukkaOn Thu, May 17, 2018 at 11:51 AM, Sebastian Nagel wrote:> Hi,>> two...
   Author: Jukka Zitting , 2018-05-17, 15:58
[TIKA-380] Upgrade to PDFBox 1.0.0 - Tika - [issue]
...PDFBox 1.0.0 is being released today. The release has some nice performance improvements for text extraction, so we'll want to upgrade ASAP....    Author: Jukka Zitting , 2016-12-16, 16:07
[TIKA-393] Upgrade to PDFBOX 1.1.0 - Tika - [issue]
...PDFBox 1.1.0 is due out shortly. We should upgrade before releasing Tika 0.7....    Author: Jukka Zitting , 2016-12-16, 16:06
[TIKA-446] Upgrade to PDFBox 1.3.1 - Tika - [issue]
...Apache PDFBox 1.2.0 is just about to be released. We should upgrade when it becomes available on Maven Central....    Author: Jukka Zitting , 2016-12-16, 15:58
[TIKA-771] "Hello, World!" in UTF-8/ASCII gets detected as IBM500 - Tika - [issue]
...Looks like the encoding detection heuristics need some adjustment....    Author: Jukka Zitting , 2016-08-05, 12:32
xmpcore in Maven Central? - Tika - [mail # dev]
...Hi,On Fri, Jul 15, 2016 at 1:26 PM Allison, Timothy B. wrote:> I heard back from Adobe that they don't manage xmpcore in Maven Central.> Any idea who does?I do, assuming my memory from...
   Author: Jukka Zitting , 2016-07-15, 20:44
disable extraction of images - Tika - [mail # user]
...Hi,Some scanning software include OCR features and include hidden text behindthe scanned images to make the resulting PDF searchable. I suspect this maybe happening in your case.It would be ...
   Author: Jukka Zitting , 2016-04-13, 13:58
[TIKA-503] Add a ContentHandler for collecting links from parser output - Tika - [issue]
...It would be nice to have a LinkContentHandler class that would automatically collect any <a href="...">...</a> elements from the parser output....    Author: Jukka Zitting , 2016-04-05, 20:25
New moderators needed - Tika - [mail # dev]
...Hi,I haven't been very active at ASF or Tika lately, so I'm stepping down as amoderator of many mailing lists (INFRA-11076).As a result, infra tells that this mailing list is now down to a s...
   Author: Jukka Zitting , 2016-01-15, 15:43