Skip to content

messaging: move topic handling to vtgate#5181

Closed
derekperkins wants to merge 1 commit intovitessio:masterfrom
nozzle:vschema-topics
Closed

messaging: move topic handling to vtgate#5181
derekperkins wants to merge 1 commit intovitessio:masterfrom
nozzle:vschema-topics

Conversation

@derekperkins
Copy link
Copy Markdown
Member

I think the design of topics as implemented in #5011 is flawed. Topics are handled at the vttablet level, which has two use cases that cause unexpected data loss and cannot be fixed at the tablet layer:

  1. Single Keyspace: It should be possible for a messaging table in one keyspace to subscribe to a topic from another keyspace. Currently, the topic won’t be aware of the subscribers from other keyspaces, as that is defined in a table level comment, and so will never fanout.
  2. Requires identical sharding column: The sharding decision has already been made at the vtgate level, so inserting into subscriber message tables that are sharded differently would result in data integrity problems, since it violates sharding guarantees.

IMO, the correct solution should be handled at vtgate, where it can fanout inserts across keyspaces and maintain data integrity regardless of sharding columns. This has a further benefit of taking a burden off of vttablet, which will come into play as we add more advanced features like subscription filtering (#5180).

In this PR, I am proposing new vschema fields/messages to describe these subscriptions. I have also included a rough draft for what subscription filters might look like, to support the need for subscriptions to live in the vschema. The actual proto definition and implementation details are out of the scope of this PR. I will remove all these fields once the overall design is approved.

Signed-off-by: Derek Perkins <derek@derekperkins.com>
@derekperkins derekperkins requested a review from sougou September 11, 2019 09:54
@derekperkins
Copy link
Copy Markdown
Member Author

Another benefit of registering topic tables in the vschema is that they aren't reliant on a subscriber. In the current design, a topic is created only when at least one subscriber is subscribed, meaning that if the last subscription was removed, it would break anyone publishing to the topic. The correct behavior would be to just return early and successfully if there are no subscribers.

@derekperkins
Copy link
Copy Markdown
Member Author

We just stumbled onto another flaw with the current implementation. We often use sequences to generate message table ids, but that no longer functions correctly.

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 16, 2019

This changes the nature of the feature quite a bit. We should first look at undoing #5011. That will keep the vttablet code more maintainable.

We also have to consider other issues related to this approach. The big one is: what about transactions. If you don't enable 2PC, then messages can get lost.

As for filtering, I think we can apply what we learned from vreplication. Specify the filtering rule as a select statement, which leads us to the inevitable question:

Could this be achieved through vreplication? The topic would be a physical table, and messages would become vreplication targets. This will guarantee that no messages will be lost. However, there is some write amplification because the topic becomes a physical table, and it needs to be purged also.

@derekperkins
Copy link
Copy Markdown
Member Author

I would definitely revert most of #5011 as a part of this change, but we're already using that style of topics, so we would want a replacement merged before removing the older style of topics.

I REALLY like the idea of doing it via vreplication:

PRO:

  • constant time inserts: currently more/slower subscribers will slow down inserts
  • more reliable: currently all message tables have to have compatible schemas or inserts will fail. vreplication should help prevent those failures, manipulate the output if necessary, and if there is still an incompatibility, the subscriber has a chance to fix it and catch up rather than just miss messages
  • historical access to messages: having messages separately in a topic table allows for selective retention rates so subscribers could process messages created before them
  • using SQL to filter leans into Vitess strengths and doesn't force us to create our own query language for subscriber filtering
  • having all real tables simplifies the design
  • solves issues from above: cross-keyspace, different table designs, sequences + 2PC

CON:

  • requires selective vreplication: you wouldn't want any updates/deletes to be replicated from topics to subscribers (this may already be handled, I'm not sure)
  • write amplification: if there is only 1 subscriber, you're doing twice as much work, but the percentage overhead decreases as more subscribers are added
  • vreplication only supports a subset of SQL - we'd want to use JSON functions for filtering, but I don't believe those are available.

I think we should definitely move forward with this powered by vreplication. I think that the configuration should still live in the vschema, so I believe that the design of this PR is still a step in the right direction.

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 24, 2019

Clsoing this. We've agreed to use the vreplication route. Also, we're leaning towards a separate place to store this type of metadata as the vschema is mainly used for routing requests.

@sougou sougou closed this Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants