CopyPastor

Detecting plagiarism made easy.

Score: 0.8010852932929993; Reported for: String similarity Open both answers

Possible Plagiarism

Plagiarized on 2020-10-19
by mike

Original Post

Original - Posted on 2013-06-20
by Lundahl



            
Present in both answers; Present only in the new answer; Present only in the old answer;

In general, I do not think this is a good design trying to force a producer to partition the data based on the consumer. A Kafka topic should seperate the dependencies between a producer and a consumer and encapsulate them from each other.
Two main reasons to not try to achieve this:
- a Kafka topic is meant to be consumed by multiple consumer groups and they are (hopefully) all independent if each other in terms of consumer threads. - a consumer group and its consumers is not stable as one of them could die and a rebalance could happen. Its is then required to have a sticky partition assignment strategy that brings in more conplexity to your consumer. However, what if one of the 5 consumers dies forever? You would not be able to read the message of its four partitions. Remember a consumer group is a "moving thing" and i recommend to let Kafka habdle it as much as possible.
I understand this might not actually answeryour question. If you want proper balancing you should match the number of partition woth consumer threads and ensure on the producer side that all messages are produced in a balanced way accross the partitions.
When structuring your data for Kafka it really depends on how it´s meant to be consumed.
In my mind, a topic is a grouping of messages of a similar type that will be consumed by the same type of consumer so in the example above, I would just have a single topic and if you´ll decide to push some other kind of data through Kafka, you can add a new topic for that later.
Topics are registered in ZooKeeper which means that you might run into issues if trying to add too many of them, e.g. the case where you have a million users and have decided to create a topic per user.
Partitions on the other hand is a way to parallelize the consumption of the messages and the total number of partitions in a broker cluster need to be at least the same as the number of consumers in a consumer group to make sense of the partitioning feature. Consumers in a consumer group will split the burden of processing the topic between themselves according to the partitioning so that one consumer will only be concerned with messages in the partition itself is "assigned to".
Partitioning can either be explicitly set using a partition key on the producer side or if not provided, a random partition will be selected for every message.

        
Present in both answers; Present only in the new answer; Present only in the old answer;