-
Notifications
You must be signed in to change notification settings - Fork 217
Description
Introduce new cell indexing concept
Summary
Currently if enabled the bin pack first fit distribution in auctioneer depends on rep indexes which are mapped to bosh cell indexes. Relying on bosh cell indexes is unstable since they are an internal bosh data structure and not a part of its API.
We have the use-case when the linear distribution of bosh cell indexes is broken. This happens when we add an additional AZ to the diego-cell instance group where during deployment:
- bosh creates the vms from the new AZ with indexes higher than the total number of cells
- bosh deletes the same amount of vms from the old AZs leaving unused indexes
- bosh updates the vms from the new AZ
This way cell indexes could end up being not strictly consecutive. E.g. cells indexes [0, 1, 2, 3, 4, 5] would be transformed to [0, 1, 2, 3, 6, 7]. The new order of cell indexes has a negative impact on:
- how to configure (and predict) the diego.auctioneer.bin_pack_first_fit_weight in a deterministic way to have a smooth curve of cell resource utilisation.
- since the diego-cells from the new AZ are indexed highest (in our example 6 and 7) then they will be the least utilised which decreases stability.
Describe alternatives you've considered
We think that it would be best to keep the current mechanism of reading indexes directly from the bosh spec and introduce a way to override it.
If we imagine that the current distribution of cell indexes is a random variable (as in statistics) then it is expected to be an uniform one and we should keep this behaviour. We can introduce a new property of the rep job where a natural number (<= number of cells) is specified. Let us call it x. The number x can then be used to generate a random number between 0 and x as the index of the cell to be used to override the bosh one. This way:
- the uniform distribution of indexes across cells will be preserved after a migration to more AZs.
- the higher indexes won't be clustered only within one of the AZs.
- indexes are changed every time there is a cell update.
The suggestion would require changes only to the diego-release repo.
We would like to hear your opinion over it as well as other suggestions towards solving this problem. We would also be glad to contribute to this enhancement.
Additional Text Output, Screenshots, or contextual information
We use bin pack first fit distribution to make pushing "big" apps (e.g. requiring more than 32G of memory) possible. Since the highest indexes are only given to diego-cells from the new AZ this means that they would be the least utilised and the only ones able to host "big" apps. Hence it would be hard to place such app instances into other AZs.