How to manually commit offset in Spark Kafka direct streaming? -


i looked around hard didn't find satisfactory answer this. maybe i'm missing something. please help.

we have spark streaming application consuming kafka topic, needs ensure end-to-end processing before advancing kafka offsets, e.g. updating database. building transaction support within streaming system, , guaranteeing each message processed (transformed) and, more importantly, output.

i have read kafka directstreams. says robust failure-recovery in directstreaming mode, spark checkpointing should enabled, stores offsets along checkpoints. offset management done internally (setting kafka config params ["auto.offset.reset", "auto.commit.enable", "auto.offset.interval.ms"]). not speak of how (or if) can customize committing offsets (once we've loaded database, e.g.). in other words, can set "auto.commit.enable" false , manage offsets (not unlike db connection) ourselves?

any guidance/help appreciated.

the article below start understand approach.

spark-kafka-achieving-zero-data-loss

further more,

the article suggests using zookeeper client directly, can replaced kafkasimpleconsumer also. advantage of using zookeper/kafkasimpleconsumer monitoring tools depend on zookeper saved offset can take advantage. information can saved on hdfc or other reliable service.


Comments

Popular posts from this blog

Combining PHP Registration and Login into one class with multiple functions in one PHP file -

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -