Interface Partitioner is inefficient #75

dcsobral · 2018-07-06T22:41:50Z

The Partitioner interface design is inefficient. The generatePartitionedPath takes a topic, which is immutable per task, plus an encodedPartition, which is per-record. That leads to things such as confluentinc/kafka-connect-hdfs#224, in which generatePartitionedPath is ignored and a separate method is called that reproduces the behavior of DefaultPartitioner's implementation, minus the partition information.

Instead, a base method should take only topic, and encodedPartition should take that static result and the SinkRecord.

The text was updated successfully, but these errors were encountered:

The issue being fixed is that the way this interface is currently designed leads to Partitioner being effectively un-extensible, unless they don't need any parameters at all except those defined by the connectors using the partitioner. The new design puts the onus of turning properties into a configuration map on the partitioner class. Because those classes use recommenders and also use common storage configuration, the interface was extended with two recommender getters, with a default implementation. TimestampExtractor also gets the same treatment, as it is used from some of the Partitioner implementations. While this does break API, the existing API is difficult to extend to begin with, which is the reason for the change.

dcsobral mentioned this issue Jul 16, 2018

Partitioner classes misuse PartitionerConfig #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface Partitioner is inefficient #75

Interface Partitioner is inefficient #75

dcsobral commented Jul 6, 2018

Interface Partitioner is inefficient #75

Interface Partitioner is inefficient #75

Comments

dcsobral commented Jul 6, 2018