I am looking for a product (open source or not), something like Talend or Pentaho that in which I can design the ETL (from and to kafka), and run the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop Map/Reduce. The product should be complete and supports many connections to many data sources and targets, In that sense if you know of a connection to Talend or Pentaho it will be great.
Thanks again. , On 01/07/2013 12:28 AM, David Arthur wrote:
On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: Interesting - we build ETLs on top of Hadoop using Cascading (open source workflow API), which has a lot of what it calls "Taps" for connecting to data sources and sinks.
But I haven't heard of a Kafka Tap. Should be possible to implement, though.
One issue is that Hadoop is batch oriented, so there's a bit of an impedance mismatch when you've got a streaming data source, but from experience it's possible to get that to work.
Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans Hadoop records, which may be ETL'd first, and emits new Kafka events. On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED]>wrote:
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
I previously posted a link to contrib in this thread. No, its not a cascading tap. Its a complete job. One to read kafka events to hdfs, one to generate kafka events from hdfs. ETL can happen in between. On Jan 7, 2013 1:51 PM, "Ken Krugler" <[EMAIL PROTECTED]> wrote:
On Jan 7, 2013, at 2:05pm, Russell Jurney wrote: Thanks, I missed that - all I saw was the long URL to the Talend integration doc on Hortonworks. Some Cascading integration notes, just for posterity:
Having a Kafka Tap/Scheme would make integration easy. I see there are KafkaInputFormat and KafkaOutputFormat classes in the contrib, which is great - though these would have to back-port these to the older Hadoop APIs in order to work with Cascading. Also Cascading sends all data around as the key (value is always NullWritable) whereas the Kafka input/output formats do the opposite.