FWIW, I now understand what I was missing that made me think Manifold
was running TIka when it wasn't. It turns out that Alfresco uses Tika
internally and when you get a document from Alfresco (using the
Webscripts connector anyway) the set of fields you get includes all
the image metadata and what-not (for image files). I never realized
this because I don't typically use Alfresco for images. But when I
added extra logging to the Alfresco WebScripts connector code, to spit
out the incoming field set, I see things like:
Found property exif:yResolution = 72.0
Found property cm:owner = admin
Found property exif:isoSpeedRatings = 400
Found property exif:fNumber = 3.5
Found property sys:node-uuid = 0516a5cc-fc04-4512-a4ed-b595b7c3908b
Found property exif:pixelYDimension = 2048
Found property exif:resolutionUnit = Inch
Found property exif:dateTimeOriginal = 2005-01-09T16:00:55Z
Found property sys:locale = en_GB
which explains why the Solr connector was trying to save fields like
exif_fNumber and exif_resolutionUnit. This came up because the
Alfresco instance I'm experimenting with has their default sample
workspace which includes images and things I don't normally touch.
As for managing all this so my history doesn't contain all those
failure messages, I thought about creating a "WhitelistFieldTransform"
as a transform connection to drop any fields other than the ones that
are whitelisted. Two questions:
1. Does this seem like a reasonable approach, or is there a better way?
2. If this is reasonable and I create such a filter, would there be
any interest in having it contributed back to MCF?
This message optimized for indexing by NSA PRISM
On Sun, Oct 15, 2017 at 10:11 AM, Karl Wright <[EMAIL PROTECTED]> wrote: