Regarding why I didn't choose to load data with the flatfile loader script...
I want to be able to SEND enrichment data to Metron rather than have to set up cron jobs to PULL data. At the moment I'm trying to prove that the process works with a simple data source. In the future we will want enrichment data in Metron that comes from systems (e.g. HR databases) that I won't have access to, hence will need someone to be able to send us the data.
> Carolyn: just call the flat file loader from a script processor...
I didn't believe that would work in my environment. I'm pretty sure the script has dependencies on various Metron JARs, not least for the row id hashing algorithm. I suppose this would require at least a partial install of Metron alongside NiFi, and would introduce additional work on the NiFi cluster for any Metron upgrade. In some (enterprise) environments there might be separation of ownership between NiFi and Metron.
I also prefer not to have a Java app calling a bash script which calls a new java process, with logs or error output that might just get swallowed up invisibly. Somewhere down the line this could hold up effective troubleshooting.
> Simon: I have actually written a stellar processor, which applies stellar to all FlowFile attributes...
> Simon: what didn't you like about the flatfile loader script?
The flatfile loader script has worked fine for me when prepping enrichment data in test systems, however it was a bit of a chore to get the JSON configuration files set up, especially for "wide" data sources that may have 15-20 fields, e.g. Active Directory.
More broadly speaking, I want to embrace the streaming data paradigm and tried to avoid batch jobs. With the DNS example, you might imagine a future where the enrichment data is streamed based on DHCP registrations, DNS update events, etc. In principle this could reduce the window of time where we might enrich a data source with out-of-date data.
From: Carolyn Duby [mailto:[EMAIL PROTECTED]]
Sent: 12 June 2018 20:33
To: [EMAIL PROTECTED]
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
I like the streaming enrichment solutions but it depends on how you are getting the data in. If you get the data in a csv file just call the flat file loader from a script processor. No special Nifi required.
If the enrichments don’t arrive in bulk, the streaming solution is better.
Solutions Engineer, Northeast
Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com
On 6/12/18, 1:08 PM, "Simon Elliston Ball" <[EMAIL PROTECTED]> wrote:
>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need
>the batch based loaders.
>As it happens I have written most of a NiFi processor to handle this
>use case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how
>to handle releases of a nifi-metron-bundle. I'll probably get round to
>putting the code in my github at least in the next few days, while we
>figure out a more permanent home.
>Charlie, out of curiosity, what didn't you like about the flatfile
>On 12 June 2018 at 18:00, Charles Joynt <[EMAIL PROTECTED]>
>> Thanks for the responses. I appreciate the willingness to look at
>> creating a NiFi processer. That would be great!
>> Just to follow up on this (after a week looking after the "ops" side
>> dev-ops): I really don't want to have to use the flatfile loader
>> script, and I'm not going to be able to write a Metron-style HBase
>> key generator any time soon, but I have had some success with a different approach.