How to manually commit offset in Spark Kafka direct streaming? -


i looked around hard didn't find satisfactory answer this. maybe i'm missing something. please help.

we have spark streaming application consuming kafka topic, needs ensure end-to-end processing before advancing kafka offsets, e.g. updating database. building transaction support within streaming system, , guaranteeing each message processed (transformed) and, more importantly, output.

i have read kafka directstreams. says robust failure-recovery in directstreaming mode, spark checkpointing should enabled, stores offsets along checkpoints. offset management done internally (setting kafka config params ["auto.offset.reset", "auto.commit.enable", "auto.offset.interval.ms"]). not speak of how (or if) can customize committing offsets (once we've loaded database, e.g.). in other words, can set "auto.commit.enable" false , manage offsets (not unlike db connection) ourselves?

any guidance/help appreciated.

the article below start understand approach.

spark-kafka-achieving-zero-data-loss

further more,

the article suggests using zookeeper client directly, can replaced kafkasimpleconsumer also. advantage of using zookeper/kafkasimpleconsumer monitoring tools depend on zookeper saved offset can take advantage. information can saved on hdfc or other reliable service.


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -