apache kafka - How to implement Processor API with Exactly-once mode -


i'm studying kafka stream , using processor api implement use case. code below shows process method forwards message downstream , aborts before calling commit. causes stream reprocessed , duplicates message on sink.

public void process(string key, string value) {      context.forward(key, value);      ..      ..     //killed      context.commit(); } 

processing.guarantee parameter:

streamsconfiguration.put(streamsconfig.processing_guarantee_config, streamsconfig.exactly_once); 

is there way apply forwarding when invoking commit statement. if not, correct approach implement exactly-once mode.

thank you

make sure sink in read_committed consumer mode see committed messages. if messages written output topic before transaction aborted, upon abort, messages still there, not marked @ committed. second time through transaction completes messages , commit marker added output topic. if read without being in read_committed mode see messages (including uncommitted ones) , may appear duplicates because see aborted results , committed results.

from 0.11 javadoc here https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/kafkaconsumer.html

transactions introduced in kafka 0.11.0 wherein applications can write multiple topics , partitions atomically. in order work, consumers reading these partitions should configured read committed data. can achieved by setting isolation.level=read_committed in consumer's configuration.

in read_committed mode, consumer read transactional messages have been committed. continue read non-transactional messages before. there no client-side buffering in read_committed mode. instead, end offset of partition read_committed consumer offset of first message in partition belonging open transaction. offset known 'last stable offset'(lso).

a read_committed consumer read till lso , filter out transactional messages have been aborted. lso affects behavior of seektoend(collection) , endoffsets(collection) read_committed consumers, details of in each method's documentation. finally, fetch lag metrics adjusted relative lso read_committed consumers. partitions transactional messages include commit or abort markers indicate result of transaction. there markers not returned applications, yet have offset in log. result, applications reading topics transactional messages see gaps in consumed offsets. these missing messages transaction markers, , filtered out consumers in both isolation levels. additionally, applications using read_committed consumers may see gaps due aborted transactions, since messages not returned consumer , yet have valid offsets.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -