clear query| facets| time Search criteria: .   Results from 1 to 7 from 7 (0.0s).
Loading phrases to help you
refine your search...
[TIKA-2709] Invalid handling of <base> tags - Tika - [issue]
...Currently, when the HTML parser encounters the following:<base href='http://example.com/'>it emits SAX events corresponding to the following:<base /><meta name='Content-Locati...
http://issues.apache.org/jira/browse/TIKA-2709    Author: Gerard Bouchar , 2018-08-17, 12:20
[TIKA-2673] HtmlEncodingDetector doesn't follow the specification - Tika - [issue]
...This bug is linked to TIKA-2671, but does not concern metadata, but rather the bytes-based detection itself.While reading the specification, I collected a list of sample cases where HtmlEnco...
http://issues.apache.org/jira/browse/TIKA-2673    Author: Gerard Bouchar , 2018-08-08, 17:38
[TIKA-2648] mime detection based on resource name detects resources as "text/x-php" instead of "text/html" - Tika - [issue]
...When using tika to detect a mime type given only an URL containing ".php" and a content-type hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".TikaConfig tik...
http://issues.apache.org/jira/browse/TIKA-2648    Author: Gerard Bouchar , 2018-08-03, 23:07
[TIKA-2700] The HTML parser should parse the contents of the title tag as raw text, not HTML - Tika - [issue]
...The current HTML parser in tika fails to extract the correct document title when it contains at least one unescaped '<' character. For instance, in the following HTML document:<html>...
http://issues.apache.org/jira/browse/TIKA-2700    Author: Gerard Bouchar , 2018-08-01, 09:41
[TIKA-2671] HtmlEncodingDetector doesnt take provided metadata into account - Tika - [issue]
...org.apache.tika.parser.html.HtmlEncodingDetector ignores the document's metadata. So when using it to detect the charset of an HTML document that came with a conflicting charset specified at...
http://issues.apache.org/jira/browse/TIKA-2671    Author: Gerard Bouchar , 2018-06-27, 13:04
[TIKA-2100] Html Parser does not keep the html tag attributes - Tika - [issue]
...Parsing a very simple html like  <!DOCTYPE html><html lang="en"><head><title>Page Title</title></head><body><h1 align="left">My First Hea...
http://issues.apache.org/jira/browse/TIKA-2100    Author: Gerard Bouchar , 2018-05-30, 22:04
[TIKA-2652] HtmlParser generates incorrect meta tags - Tika - [issue]
...Whatever the input HTML meta are, tika's HTML meta can only have a "name" and a "content"  attribute. This gives invalid HTML meta tags for in the output.For instance, the following val...
http://issues.apache.org/jira/browse/TIKA-2652    Author: Gerard Bouchar , 2018-05-25, 10:15