Kafka rack-id and min in-sync replicas -


kafka has introduced rack-id provide redundancy capabilities if whole rack fails. there min in-sync replica setting specify minimum number of replicas need in-sync before producer receives ack (-1 / config). there unclean leader election setting specify whether leader can elected when not in-sync.

so, given following scenario:

  • two racks. rack 1, 2.
  • replication count 4.
  • min in-sync replicas = 2
  • producer ack=-1 (all).
  • unclean leader election = false

aiming have @ least once message delivery, redundancy of nodes , tolerant rack failure.

is possible there moment 2 in-sync replicas both come rack 1, producer receives ack , @ point rack 1 crashes (before replicas rack 2 in-sync)? means rack 2 contain unclean replicas , no producers able add messages partition grinding halt. replicas unclean no new leader elected in case.

is analysis correct, or there under hood ensure replicas forming min in-sync replicas have different racks?
since replicas on same rack have lower latency seems above scenario reasonably likely.

the scenario shown in image below:

enter image description here

yes, think possible. because kafka can maintain isr according runtime's fact, not spirit.

words https://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka

for each partition of topic, maintain in-sync replica set (isr). set of replicas alive , have caught leader (note leader in isr). when partition created initially, every replica in isr. when new message published, leader waits until reaches replicas in isr before committing message. if follower replica fails, dropped out of isr , leader continues commit new messages fewer replicas in isr. notice now, system running in under replicated mode.

words https://cwiki.apache.org/confluence/display/kafka/kafka+replication

after configured timeout period, leader drop failed follower isr , writes continue on remaining replicas in isr. if failed follower comes back, first truncates log last checkpointed hw. starts catch messages after hw leader. when follower catches up, leader add current isr.

the min in-sync replicas mentioned limit number, isr size not depend on it. settings means if producer's ack "all" , isr size less min, kafka refuse write message.

so in first time, isr {1,2,3,4}, , if broker 3 or 4 fall down, kicked out isr. , case mentioned happen. when rack 1's broker failed, unclean leader election.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -