[TIKA-2709] Invalid handling of <base> tags - Tika - [issue]
...Currently, when the HTML parser encounters the following:<base href=''>it emits SAX events corresponding to the following:<base /><meta name='Content-Locati...    Author: Gerard Bouchar , 2018-08-17, 12:20
[TIKA-2673] HtmlEncodingDetector doesn't follow the specification - Tika - [issue]
...This bug is linked to TIKA-2671, but does not concern metadata, but rather the bytes-based detection itself.While reading the specification, I collected a list of sample cases where HtmlEnco...    Author: Gerard Bouchar , 2018-08-08, 17:38
[TIKA-2648] mime detection based on resource name detects resources as "text/x-php" instead of "text/html" - Tika - [issue]
...When using tika to detect a mime type given only an URL containing ".php" and a content-type hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".TikaConfig tik...    Author: Gerard Bouchar , 2018-08-03, 23:07
[TIKA-2700] The HTML parser should parse the contents of the title tag as raw text, not HTML - Tika - [issue]
...The current HTML parser in tika fails to extract the correct document title when it contains at least one unescaped '<' character. For instance, in the following HTML document:<html>...    Author: Gerard Bouchar , 2018-08-01, 09:41
[TIKA-2671] HtmlEncodingDetector doesnt take provided metadata into account - Tika - [issue] ignores the document's metadata. So when using it to detect the charset of an HTML document that came with a conflicting charset specified at...    Author: Gerard Bouchar , 2018-06-27, 13:04
[TIKA-2100] Html Parser does not keep the html tag attributes - Tika - [issue]
...Parsing a very simple html like  <!DOCTYPE html><html lang="en"><head><title>Page Title</title></head><body><h1 align="left">My First Hea...    Author: Gerard Bouchar , 2018-05-30, 22:04
[TIKA-2652] HtmlParser generates incorrect meta tags - Tika - [issue]
...Whatever the input HTML meta are, tika's HTML meta can only have a "name" and a "content"  attribute. This gives invalid HTML meta tags for in the output.For instance, the following val...    Author: Gerard Bouchar , 2018-05-25, 10:15