How to manually commit offset in Spark Kafka direct streaming? -
i looked around hard didn't find satisfactory answer this. maybe i'm missing something. please help.
we have spark streaming application consuming kafka topic, needs ensure end-to-end processing before advancing kafka offsets, e.g. updating database. building transaction support within streaming system, , guaranteeing each message processed (transformed) and, more importantly, output.
i have read kafka directstreams. says robust failure-recovery in directstreaming mode, spark checkpointing should enabled, stores offsets along checkpoints. offset management done internally (setting kafka config params ["auto.offset.reset", "auto.commit.enable", "auto.offset.interval.ms"
]). not speak of how (or if) can customize committing offsets (once we've loaded database, e.g.). in other words, can set "auto.commit.enable"
false , manage offsets (not unlike db connection) ourselves?
any guidance/help appreciated.
the article below start understand approach.
spark-kafka-achieving-zero-data-loss
further more,
the article suggests using zookeeper client directly, can replaced kafkasimpleconsumer also. advantage of using zookeper/kafkasimpleconsumer monitoring tools depend on zookeper saved offset can take advantage. information can saved on hdfc or other reliable service.
Comments
Post a Comment