Skip to content

Project Proposal: Vitess#67

Merged
caniszczyk merged 4 commits intocncf:masterfrom
sougou:vitess
Feb 5, 2018
Merged

Project Proposal: Vitess#67
caniszczyk merged 4 commits intocncf:masterfrom
sougou:vitess

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented Nov 13, 2017

Original doc:
https://docs.google.com/document/d/1p7gqlpQNJpZtsolHeX6vXR4NXXwGrCMsCz8rSi5jsBA/edit#

I've made some minor changes based on the formatting of the other
proposals. The vendor list was very big (182 lines). So,
I shortened by listing top level orgs in some cases.

cc @bgrant0607 @caniszczyk


*Sponsor / Advisor from TOC*: Brian Grant <briangrant@google.com>

*Unique Identifier*: grpc
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this UID is already taken.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Fixed :)

Original doc:
https://docs.google.com/document/d/1p7gqlpQNJpZtsolHeX6vXR4NXXwGrCMsCz8rSi5jsBA/edit#

I've made some minor changes based on the formatting of the other
proposals. The vendor list was very big (182 lines). So,
I shortened by listing top level orgs in some cases.

NoSQL storage systems were designed to scale out, but focus on unstructured and non-transactional data. However, it is complex to migrate or build applications that truly need transactions, indexes, and joins over structured data using NoSQL. NewSQL storage systems such as Vitess fill that gap, and enable more applications to migrate to cloud-native architectures and to scale out. Vitess was built to be cloud-native for use within Google, and can link:http://vitess.io/getting-started/[run on Kubernetes].

*External Dependencies*: Full list: https://github.com/youtube/vitess/blob/master/vendor/vendor.json. Top level orgs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you list the respective licenses here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Found some oddities. I've provided links for those.

@bassam
Copy link
Copy Markdown
Contributor

bassam commented Dec 13, 2017

+1 non-binding

I'm excited to see this project be part of the CNCF. MySql is the most widely adopted RDBMS and Vitess helps solve some of the fundamental issues around its scalability and usability.

@caniszczyk
Copy link
Copy Markdown
Contributor

As an update the Vitess team presented to the CNCF Storage WG today: https://docs.google.com/presentation/d/1xgDO8zr3Tmic4NV9DOp_cVPC5F_ncCsXmOVjELlokiQ/edit#slide=id.g1d26bc3f31_0_61

@bassam
Copy link
Copy Markdown
Contributor

bassam commented Dec 14, 2017

@sougou what is the recommended DR approach with Vitess? I assume that the universe of data that needs to be protected is vitess state in etcd plus all shard state in mysql. What can you say about consistency guarantees between vitess state and shard state? Is it possible to restore the entire cluster back to a point in time?

@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Dec 14, 2017

For those concerned about DR, the main approach is that you run everything distributed across multiple data centers:

  • The global lockserver should run a multi-DC quorum. So, the data survives a single DC going dark. However, this is not a hard requirement because the data can be reconstructed manually. It mainly contains info about keyspaces and shard ranges.
  • If a non-master DC goes down, nothing is lost. You just move the traffic to another DC. When it comes back up, replication will catch up and serving will resume.
  • If a DC experiences total data loss and comes back up empty, you just initialize it as if you're bringing up a new DC.
  • If a master DC goes down, then you can failover the master to another DC and resume serving write traffic in the new DC. This usually results in a few seconds of downtime per master.

At YouTube, we run the masters in a replication mode called semi-sync. This ensures that at least one other replica has received the data for every transaction that gets committed. Here, we take a calculated risk that it's sufficient that any replica getting the data is sufficient, which could be in the same DC. This has served us well so far.

We could be more paranoid and require a replica outside of the current DC to provide the semi-sync ack. However, that would slow down our transactions, which we're not willing to tolerate at this point. But this option is available for someone who wants "no transaction to be lost".

This pulls us into the subject of distributed durability. Much research has been done here, and much more is due.

@bassam
Copy link
Copy Markdown
Contributor

bassam commented Dec 14, 2017

thanks @sougou I have a better understanding of the multiple datacenter approach and tradeoffs.

I'm still curious about the consistency guarantees between shard config state in etcd and the actual shards themselves. Consider the case where I would like to "clone" a vitess cluster, is it sufficient to snapshot state in etcd first then backup each of the mysql shards? Do I need to quiesce sharding/resharding before I can safely do that? Is it possible to grab a consistent point-in-time "snapshot" of the entire cluster?

@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Dec 14, 2017

The shard config state itself is fairly static. This is because resharding is a human decision. We shard at YouTube 'often'. But this means once every 2-3 months. Other users of vitess shard even less often.

In terms of cloning, every DC is a clone. You can choose to stop replication for a DC and take a back-up of all the data. As mentioned in the limitations, Vitess doesn't have the ability to give you a cross-shard consistent view of the data. This same limitation carries over when taking backups: you can only take a backup of the latest data for a database. So, it will be very difficult to stop all replication at a transactionally consistent point.

Is there a particular use case you have in mind? In general, users haven't asked for this. Those that need to see data 'as of certain time' generally add those time-stamps to those rows, and then query them from a live system. This has been the preferred approach nowadays because it saves you from having to separately provision for such snapshot databases.

@bassam
Copy link
Copy Markdown
Contributor

bassam commented Dec 15, 2017

@sougou no specific scenario in mind, just trying to understand vitess a bit more. thanks for you answers.

@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Dec 15, 2017

Sounds good. Let me know if that didn't answer your questions, or if you have any follow up ones.

@bgrant0607
Copy link
Copy Markdown
Contributor

@sougou A question from the CNCF storage WG:

How does Vitess compare to the MySQL Operator presented at Kubecon? Are there any plans to add Operator-like functionality, such as configurability via Kubernetes CRDs?

https://youtu.be/J7h0F34iBx0?t=652
https://schd.ws/hosted_files/kccncna17/4d/MySQL%20on%20Kubernetes.pdf
https://dyn.com/blog/mysql-on-kubernetes/

@bgrant0607
Copy link
Copy Markdown
Contributor

bgrant0607 commented Jan 10, 2018

BTW, here's a Vitess demo: https://youtu.be/J7h0F34iBx0?t=1513

@enisoc
Copy link
Copy Markdown

enisoc commented Jan 10, 2018

The MySQL Operator hasn't been released yet AFAIK, but based on our discussions with @CaptTofu as he was preparing for that talk, I think the comparison is the same as Vitess vs MySQL in general. If MySQL alone is a good fit for you, MySQL Operator will help you run it on Kubernetes. If you need middleware like Vitess on top of MySQL, MySQL Operator won't remove that need.

A prototype Vitess Operator is in progress now. We originally started going down this path with a Helm chart whose values.yaml was designed to look much like a CRD, except that it was expanded on the client side by Go template code. What I'm doing now is moving that logic into a server-side controller for a VitessCluster CRD, as an example of using kube-metacontroller to write Operators.

@bassam
Copy link
Copy Markdown
Contributor

bassam commented Jan 10, 2018

@enisoc do you see the vitess operator work going into the vitess project/repo or will it be separate?

@enisoc
Copy link
Copy Markdown

enisoc commented Jan 10, 2018

@bassam Initially I plan to post it in the kube-metacontroller repo, since metacontroller's API is still evolving and I want to keep the examples up to date. After the first versioned release of metacontroller, it would make sense to move the Vitess Operator into either the vitess repo, or into its own repo under a vitess-owned org.

@derekperkins
Copy link
Copy Markdown

@bassam I've been working with @enisoc on the Kubernetes integrations, and unsurprisingly since Vitess has been running in containers from the beginning, it's a perfect match. Where the MySQL Operator is going to necessarily have to deal with growing volume claims and increasing requests/limits, Vitess is much more predictable in terms of resource consumption. Instead of growing the single MySQL instance, it's not out of the question that the Vitess Operator could be splitting/merging shards behind the scenes, protecting you from resource waste and/or hot spots in your data, all by horizontally scaling pods.

@clintkitson
Copy link
Copy Markdown

Excellent @enisoc @derekperkins.

@clintkitson
Copy link
Copy Markdown

clintkitson commented Jan 15, 2018

During the SWG call I made the comment to @bgrant0607 regarding the operators. I think today Vitess does great things for solving a MySQL limitation of scalability by abstracting control and data-plane activity. But this is really only valid if you have scaling problems with MySQL.

I believe projects that enable a cloud native experience are important for the TOC to consider. In the case of data services like this, there would be a couple of key things that can be addressed to enable this experience.

  1. Consumers - How are the data services consumed by an application? Do the data services have integration to the CO that enables a consumer to define an application that makes use of the service without manual interaction? Is there integration as a part of a standard consumption API (open services broker) and K8s service catalog? ie. Deploy application, specify requirement for sql storage, mysql table space or instances created automatically and connection info advertised to application.

  2. Providers - How are the data services operated? Are the lifecycle operations and scaling of the application handled automatically? I believe this question was addressed above through developing a K8s operator.


*Description*:

Vitess is a database clustering system for horizontal scaling of MySQL. Using the terminology from the link:http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf[Pavlo and Aslett NewSQL survey article], Vitess is “sharding middleware”. By encapsulating shard-routing logic, Vitess allows application code and database queries to remain agnostic to the distribution of data onto multiple shards. You can split and merge shards as your needs change, with an atomic cutover step that is performed in seconds. Vitess has been serving all YouTube database traffic since 2011, and has grown to encompass tens of thousands of MySQL nodes. It has also gained increasing adoption in the community with about fifteen companies currently in the pipeline, some of whom have already gone into production. For more details, see the link:http://vitess.io/overview/[Vitess overview].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a request to use the term "orchestration" here. I propose inserting the following sentence after the first:

"Vitess orchestrates management of MySQL instances and intermediates requests to the cluster."

and the following one (borrowed from the Vitess overview), just before the sentence about serving YouTube traffic:

"Vitess also supports and automatically handles various scenarios, including master failover and data backups."

since that functionality is key to operating in a cloud-native environment -- Vitess is about more than just scaling and sharding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

As proposed by @bgrant0607 in the review comments.

*Statement on alignment with CNCF mission*:

NoSQL storage systems were designed to scale out, but focus on unstructured and non-transactional data. However, it is complex to migrate or build applications that truly need transactions, indexes, and joins over structured data using NoSQL. NewSQL storage systems such as Vitess fill that gap, and enable more applications to migrate to cloud-native architectures and to scale out. Vitess was built to be cloud-native for use within Google, and can link:http://vitess.io/getting-started/[run on Kubernetes].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sougou

How do you feel about replacing "NewSQL storage systems such as Vitess" with the following:

"NewSQL storage systems and database orchestration systems such as Vitess"

?

We changed the "storage system" terminology in the description at the top, but we missed it here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to "Database orchestration systems such as Vitess".

No need to mention NewSQL at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@derekperkins
Copy link
Copy Markdown

@clintkitson I know that Vitess bills itself as a MySQL sharding solution, but it really is so much more, and in my opinion, should be the default tool that Kubernetes users turn to. The growth of ProxySQL shows that there is a significant market for MySQL middleware, even without sharding. Vitess performs most, if not all, of the same features that ProxySQL does: efficiently pooling queries, offloading authentication, rewriting harmful queries, while providing future-proofing for companies who may need to shard in the future, plus it supports failover orchestration. Even if someone never had to shard, they would still see enormous benefits from migrating.

To your point about service catalog / broker, I'm not super familiar with that, but since Vitess understands the MySQL protocol, any traction there for MySQL would equally apply to Vitess.

@enisoc
Copy link
Copy Markdown

enisoc commented Jan 24, 2018

FYI regarding the Vitess Operator work mentioned above, the WIP is now posted here: GoogleCloudPlatform/metacontroller#10

@caniszczyk caniszczyk merged commit aca63fc into cncf:master Feb 5, 2018
@caniszczyk
Copy link
Copy Markdown
Contributor

caniszczyk commented Feb 5, 2018

Welcome Vitess! We'll be working with the Vitess community over the next few weeks to welcome them to the CNCF project family and move over to https://github.com/vitessio

https://lists.cncf.io/g/cncf-toc/topic/result_vitess_project/10289386?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,10289386

+1 TOC binding votes (8 / 9):

+1 non-binding community votes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants