Thank you for this information. Since this is very much related to Any23 and microdata parsing, I’m going to ask what I believe is a related question but keep this same thread so it will be organized in one place:
I noticed a lot of job boards such as dice.com <http://dice.com/>
, monster.com <http://monster.com/>
, etc use http://schema.org/JobPosting
information, however many seem to use <script type="application/ld+json”>…</script> rather than RDF.
Summer 2017, Google announced structured data guidance for Jobs:https://developers.google.com/search/docs/data-types/job-posting
and a testing tool to validate your HTML: https://search.google.com/structured-data/testing-tool
I verified a few sample listings on the above mentioned job boards on google’s testing-tool and they validate OK.
So after looking at http://any23.apache.org/getting-started.html
for the supported extractors, I see Any23 mentions it supports JSON+LD input, so I added this to nutch-site.xml to override the same property in nutch-default.xml:
<description>Comma-separated list of Any23 extractors (a list of extractors is available here: http://any23.apache.org/getting-started.html)</description>
I expected to see additional information from nutch parsechecker after adding the jsonld extractors, however I see NO changes to Any23-Triples microdata parsed.
What might I be doing wrong?