Project Proposal: Vitess by sougou · Pull Request #67 · cncf/toc

sougou · 2017-11-13T22:05:02Z

Original doc:
https://docs.google.com/document/d/1p7gqlpQNJpZtsolHeX6vXR4NXXwGrCMsCz8rSi5jsBA/edit#

I've made some minor changes based on the formatting of the other
proposals. The vendor list was very big (182 lines). So,
I shortened by listing top level orgs in some cases.

cc @bgrant0607 @caniszczyk

enisoc · 2017-11-13T22:28:10Z

proposals/vitess.adoc

+
+*Sponsor / Advisor from TOC*: Brian Grant <briangrant@google.com>
+
+*Unique Identifier*: grpc


I think this UID is already taken.

Oops! Fixed :)

Original doc: https://docs.google.com/document/d/1p7gqlpQNJpZtsolHeX6vXR4NXXwGrCMsCz8rSi5jsBA/edit# I've made some minor changes based on the formatting of the other proposals. The vendor list was very big (182 lines). So, I shortened by listing top level orgs in some cases.

caniszczyk · 2017-11-14T04:11:27Z

proposals/vitess.adoc

+
+NoSQL storage systems were designed to scale out, but focus on unstructured and non-transactional data. However, it is complex to migrate or build applications that truly need transactions, indexes, and joins over structured data using NoSQL. NewSQL storage systems such as Vitess fill that gap, and enable more applications to migrate to cloud-native architectures and to scale out. Vitess was built to be cloud-native for use within Google, and can link:http://vitess.io/getting-started/[run on Kubernetes].
+
+*External Dependencies*: Full list: https://github.com/youtube/vitess/blob/master/vendor/vendor.json. Top level orgs:


can you list the respective licenses here?

Done. Found some oddities. I've provided links for those.

bassam · 2017-12-13T18:03:35Z

+1 non-binding

I'm excited to see this project be part of the CNCF. MySql is the most widely adopted RDBMS and Vitess helps solve some of the fundamental issues around its scalability and usability.

caniszczyk · 2017-12-13T19:25:37Z

As an update the Vitess team presented to the CNCF Storage WG today: https://docs.google.com/presentation/d/1xgDO8zr3Tmic4NV9DOp_cVPC5F_ncCsXmOVjELlokiQ/edit#slide=id.g1d26bc3f31_0_61

bassam · 2017-12-14T18:45:03Z

@sougou what is the recommended DR approach with Vitess? I assume that the universe of data that needs to be protected is vitess state in etcd plus all shard state in mysql. What can you say about consistency guarantees between vitess state and shard state? Is it possible to restore the entire cluster back to a point in time?

sougou · 2017-12-14T19:29:20Z

For those concerned about DR, the main approach is that you run everything distributed across multiple data centers:

The global lockserver should run a multi-DC quorum. So, the data survives a single DC going dark. However, this is not a hard requirement because the data can be reconstructed manually. It mainly contains info about keyspaces and shard ranges.
If a non-master DC goes down, nothing is lost. You just move the traffic to another DC. When it comes back up, replication will catch up and serving will resume.
If a DC experiences total data loss and comes back up empty, you just initialize it as if you're bringing up a new DC.
If a master DC goes down, then you can failover the master to another DC and resume serving write traffic in the new DC. This usually results in a few seconds of downtime per master.

At YouTube, we run the masters in a replication mode called semi-sync. This ensures that at least one other replica has received the data for every transaction that gets committed. Here, we take a calculated risk that it's sufficient that any replica getting the data is sufficient, which could be in the same DC. This has served us well so far.

We could be more paranoid and require a replica outside of the current DC to provide the semi-sync ack. However, that would slow down our transactions, which we're not willing to tolerate at this point. But this option is available for someone who wants "no transaction to be lost".

This pulls us into the subject of distributed durability. Much research has been done here, and much more is due.

bassam · 2017-12-14T19:55:40Z

thanks @sougou I have a better understanding of the multiple datacenter approach and tradeoffs.

I'm still curious about the consistency guarantees between shard config state in etcd and the actual shards themselves. Consider the case where I would like to "clone" a vitess cluster, is it sufficient to snapshot state in etcd first then backup each of the mysql shards? Do I need to quiesce sharding/resharding before I can safely do that? Is it possible to grab a consistent point-in-time "snapshot" of the entire cluster?

sougou · 2017-12-14T20:31:34Z

The shard config state itself is fairly static. This is because resharding is a human decision. We shard at YouTube 'often'. But this means once every 2-3 months. Other users of vitess shard even less often.

In terms of cloning, every DC is a clone. You can choose to stop replication for a DC and take a back-up of all the data. As mentioned in the limitations, Vitess doesn't have the ability to give you a cross-shard consistent view of the data. This same limitation carries over when taking backups: you can only take a backup of the latest data for a database. So, it will be very difficult to stop all replication at a transactionally consistent point.

Is there a particular use case you have in mind? In general, users haven't asked for this. Those that need to see data 'as of certain time' generally add those time-stamps to those rows, and then query them from a live system. This has been the preferred approach nowadays because it saves you from having to separately provision for such snapshot databases.

bassam · 2017-12-15T03:25:25Z

@sougou no specific scenario in mind, just trying to understand vitess a bit more. thanks for you answers.

sougou · 2017-12-15T06:13:05Z

Sounds good. Let me know if that didn't answer your questions, or if you have any follow up ones.

bgrant0607 · 2018-01-10T17:06:02Z

@sougou A question from the CNCF storage WG:

How does Vitess compare to the MySQL Operator presented at Kubecon? Are there any plans to add Operator-like functionality, such as configurability via Kubernetes CRDs?

https://youtu.be/J7h0F34iBx0?t=652
https://schd.ws/hosted_files/kccncna17/4d/MySQL%20on%20Kubernetes.pdf
https://dyn.com/blog/mysql-on-kubernetes/

bgrant0607 · 2018-01-10T17:19:38Z

BTW, here's a Vitess demo: https://youtu.be/J7h0F34iBx0?t=1513

enisoc · 2018-01-10T19:05:09Z

The MySQL Operator hasn't been released yet AFAIK, but based on our discussions with @CaptTofu as he was preparing for that talk, I think the comparison is the same as Vitess vs MySQL in general. If MySQL alone is a good fit for you, MySQL Operator will help you run it on Kubernetes. If you need middleware like Vitess on top of MySQL, MySQL Operator won't remove that need.

A prototype Vitess Operator is in progress now. We originally started going down this path with a Helm chart whose values.yaml was designed to look much like a CRD, except that it was expanded on the client side by Go template code. What I'm doing now is moving that logic into a server-side controller for a VitessCluster CRD, as an example of using kube-metacontroller to write Operators.

bassam · 2018-01-10T19:16:42Z

@enisoc do you see the vitess operator work going into the vitess project/repo or will it be separate?

enisoc · 2018-01-10T19:23:25Z

@bassam Initially I plan to post it in the kube-metacontroller repo, since metacontroller's API is still evolving and I want to keep the examples up to date. After the first versioned release of metacontroller, it would make sense to move the Vitess Operator into either the vitess repo, or into its own repo under a vitess-owned org.

derekperkins · 2018-01-10T22:04:34Z

@bassam I've been working with @enisoc on the Kubernetes integrations, and unsurprisingly since Vitess has been running in containers from the beginning, it's a perfect match. Where the MySQL Operator is going to necessarily have to deal with growing volume claims and increasing requests/limits, Vitess is much more predictable in terms of resource consumption. Instead of growing the single MySQL instance, it's not out of the question that the Vitess Operator could be splitting/merging shards behind the scenes, protecting you from resource waste and/or hot spots in your data, all by horizontally scaling pods.

clintkitson · 2018-01-15T17:14:32Z

Excellent @enisoc @derekperkins.

clintkitson · 2018-01-15T22:20:10Z

During the SWG call I made the comment to @bgrant0607 regarding the operators. I think today Vitess does great things for solving a MySQL limitation of scalability by abstracting control and data-plane activity. But this is really only valid if you have scaling problems with MySQL.

I believe projects that enable a cloud native experience are important for the TOC to consider. In the case of data services like this, there would be a couple of key things that can be addressed to enable this experience.

Consumers - How are the data services consumed by an application? Do the data services have integration to the CO that enables a consumer to define an application that makes use of the service without manual interaction? Is there integration as a part of a standard consumption API (open services broker) and K8s service catalog? ie. Deploy application, specify requirement for sql storage, mysql table space or instances created automatically and connection info advertised to application.
Providers - How are the data services operated? Are the lifecycle operations and scaling of the application handled automatically? I believe this question was addressed above through developing a K8s operator.

bgrant0607 · 2018-01-16T17:38:24Z

proposals/vitess.adoc

+
+*Description*:
+
+Vitess is a database clustering system for horizontal scaling of MySQL. Using the terminology from the link:http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf[Pavlo and Aslett NewSQL survey article], Vitess is “sharding middleware”.  By encapsulating shard-routing logic, Vitess allows application code and database queries to remain agnostic to the distribution of data onto multiple shards. You can split and merge shards as your needs change, with an atomic cutover step that is performed in seconds. Vitess has been serving all YouTube database traffic since 2011, and has grown to encompass tens of thousands of MySQL nodes. It has also gained increasing adoption in the community with about fifteen companies currently in the pipeline, some of whom have already gone into production. For more details, see the link:http://vitess.io/overview/[Vitess overview].


There was a request to use the term "orchestration" here. I propose inserting the following sentence after the first:

"Vitess orchestrates management of MySQL instances and intermediates requests to the cluster."

and the following one (borrowed from the Vitess overview), just before the sentence about serving YouTube traffic:

"Vitess also supports and automatically handles various scenarios, including master failover and data backups."

since that functionality is key to operating in a cloud-native environment -- Vitess is about more than just scaling and sharding.

@bgrant0607

As proposed by @bgrant0607 in the review comments.

bgrant0607 · 2018-01-16T18:32:30Z

proposals/vitess.adoc

+
+*Statement on alignment with CNCF mission*:
+
+NoSQL storage systems were designed to scale out, but focus on unstructured and non-transactional data. However, it is complex to migrate or build applications that truly need transactions, indexes, and joins over structured data using NoSQL. NewSQL storage systems such as Vitess fill that gap, and enable more applications to migrate to cloud-native architectures and to scale out. Vitess was built to be cloud-native for use within Google, and can link:http://vitess.io/getting-started/[run on Kubernetes].


@sougou

How do you feel about replacing "NewSQL storage systems such as Vitess" with the following:

"NewSQL storage systems and database orchestration systems such as Vitess"

?

We changed the "storage system" terminology in the description at the top, but we missed it here.

I can change it to "Database orchestration systems such as Vitess".

No need to mention NewSQL at all.

derekperkins · 2018-01-18T03:39:30Z

@clintkitson I know that Vitess bills itself as a MySQL sharding solution, but it really is so much more, and in my opinion, should be the default tool that Kubernetes users turn to. The growth of ProxySQL shows that there is a significant market for MySQL middleware, even without sharding. Vitess performs most, if not all, of the same features that ProxySQL does: efficiently pooling queries, offloading authentication, rewriting harmful queries, while providing future-proofing for companies who may need to shard in the future, plus it supports failover orchestration. Even if someone never had to shard, they would still see enormous benefits from migrating.

To your point about service catalog / broker, I'm not super familiar with that, but since Vitess understands the MySQL protocol, any traction there for MySQL would equally apply to Vitess.

enisoc · 2018-01-24T21:12:08Z

FYI regarding the Vitess Operator work mentioned above, the WIP is now posted here: GoogleCloudPlatform/metacontroller#10

caniszczyk · 2018-02-05T16:27:22Z

Welcome Vitess! We'll be working with the Vitess community over the next few weeks to welcome them to the CNCF project family and move over to https://github.com/vitessio

https://lists.cncf.io/g/cncf-toc/topic/result_vitess_project/10289386?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,10289386

+1 TOC binding votes (8 / 9):

Sam Lambert: https://lists.cncf.io/g/cncf-toc/message/1488
Ben Hindman: https://lists.cncf.io/g/cncf-toc/message/1491
Camille Fournier: https://lists.cncf.io/g/cncf-toc/message/1512
Brian Grant: https://lists.cncf.io/g/cncf-toc/message/1517
Bryan Cantrill: https://lists.cncf.io/g/cncf-toc/message/1523
Ken Owens: https://lists.cncf.io/g/cncf-toc/message/1535
Jon Boulle: https://lists.cncf.io/g/cncf-toc/message/1549
Camille Fournier: https://lists.cncf.io/g/cncf-toc/message/1558
Alexis Richardson: (voted +1 after the voting period)

+1 non-binding community votes:

Richard Li: https://lists.cncf.io/g/cncf-toc/message/1487
Nick Chase: https://lists.cncf.io/g/cncf-toc/message/1489
Bassam Tabbara: https://lists.cncf.io/g/cncf-toc/message/1490
Jitendra Vaidya: https://lists.cncf.io/g/cncf-toc/message/1493
Hedieh Yaghami: https://lists.cncf.io/g/cncf-toc/message/1494
Guido Iaquinti: https://lists.cncf.io/g/cncf-toc/message/1495
Deepak Vij: https://lists.cncf.io/g/cncf-toc/message/1496
Sugu Sougoumarane: https://lists.cncf.io/g/cncf-toc/message/1497
Robert Navarro: https://lists.cncf.io/g/cncf-toc/message/1498
Anthony Yeh: https://lists.cncf.io/g/cncf-toc/message/1499
Bryan Beaudreault: https://lists.cncf.io/g/cncf-toc/message/1500
Amit Khare: https://lists.cncf.io/g/cncf-toc/message/1501
Michael Demmer: https://lists.cncf.io/g/cncf-toc/message/1502
acharis@...: https://lists.cncf.io/g/cncf-toc/message/1503
jscheinblum@...: https://lists.cncf.io/g/cncf-toc/message/1504
Ameet Kotian: https://lists.cncf.io/g/cncf-toc/message/1505
Rafael Chacon: https://lists.cncf.io/g/cncf-toc/message/1506
Derek Perkins: https://lists.cncf.io/g/cncf-toc/message/1507
hmcgonigal@...: https://lists.cncf.io/g/cncf-toc/message/1508
Maggie Zhou: https://lists.cncf.io/g/cncf-toc/message/1509
Jon Tirsen: https://lists.cncf.io/g/cncf-toc/message/1510
Ashudeep Sharma: https://lists.cncf.io/g/cncf-toc/message/1511
Tony Shu: https://lists.cncf.io/g/cncf-toc/message/1513
Michael Pawliszyn : https://lists.cncf.io/g/cncf-toc/message/1514
Nathan Xu: https://lists.cncf.io/g/cncf-toc/message/1515
Chakri Nelluri: https://lists.cncf.io/g/cncf-toc/message/1518
Xie Jinke: https://lists.cncf.io/g/cncf-toc/message/1519
Shlomi Noach: https://lists.cncf.io/g/cncf-toc/message/1520
Quinton Hoole: https://lists.cncf.io/g/cncf-toc/message/1531
Mark Peek: https://lists.cncf.io/g/cncf-toc/message/1550
JungHyun Kim: https://lists.cncf.io/g/cncf-toc/message/1551
Joseph Jacks: https://lists.cncf.io/g/cncf-toc/message/1572

enisoc reviewed Nov 13, 2017

View reviewed changes

caniszczyk reviewed Nov 14, 2017

View reviewed changes

add license info for dependencies

db9de08

bgrant0607 reviewed Jan 16, 2018

View reviewed changes

Incorporate TOC recommended changes

294fe21

As proposed by @bgrant0607 in the review comments.

bgrant0607 reviewed Jan 16, 2018

View reviewed changes

NewSQL -> Database orchestration system

10e65e8

monadic mentioned this pull request Jan 30, 2018

Consider removing inception #85

Closed

caniszczyk merged commit aca63fc into cncf:master Feb 5, 2018


		Sponsor / Advisor from TOC: Brian Grant <briangrant@google.com>

		Unique Identifier: grpc


		NoSQL storage systems were designed to scale out, but focus on unstructured and non-transactional data. However, it is complex to migrate or build applications that truly need transactions, indexes, and joins over structured data using NoSQL. NewSQL storage systems such as Vitess fill that gap, and enable more applications to migrate to cloud-native architectures and to scale out. Vitess was built to be cloud-native for use within Google, and can link:http://vitess.io/getting-started/[run on Kubernetes].

		External Dependencies: Full list: https://github.com/youtube/vitess/blob/master/vendor/vendor.json. Top level orgs:


		Description:

		Vitess is a database clustering system for horizontal scaling of MySQL. Using the terminology from the link:http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf[Pavlo and Aslett NewSQL survey article], Vitess is “sharding middleware”. By encapsulating shard-routing logic, Vitess allows application code and database queries to remain agnostic to the distribution of data onto multiple shards. You can split and merge shards as your needs change, with an atomic cutover step that is performed in seconds. Vitess has been serving all YouTube database traffic since 2011, and has grown to encompass tens of thousands of MySQL nodes. It has also gained increasing adoption in the community with about fifteen companies currently in the pipeline, some of whom have already gone into production. For more details, see the link:http://vitess.io/overview/[Vitess overview].


		Statement on alignment with CNCF mission:

		NoSQL storage systems were designed to scale out, but focus on unstructured and non-transactional data. However, it is complex to migrate or build applications that truly need transactions, indexes, and joins over structured data using NoSQL. NewSQL storage systems such as Vitess fill that gap, and enable more applications to migrate to cloud-native architectures and to scale out. Vitess was built to be cloud-native for use within Google, and can link:http://vitess.io/getting-started/[run on Kubernetes].

Conversation

sougou commented Nov 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bassam commented Dec 13, 2017

Uh oh!

caniszczyk commented Dec 13, 2017

Uh oh!

bassam commented Dec 14, 2017

Uh oh!

sougou commented Dec 14, 2017

Uh oh!

bassam commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sougou commented Dec 14, 2017

Uh oh!

bassam commented Dec 15, 2017

Uh oh!

sougou commented Dec 15, 2017

Uh oh!

bgrant0607 commented Jan 10, 2018

Uh oh!

bgrant0607 commented Jan 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enisoc commented Jan 10, 2018

Uh oh!

bassam commented Jan 10, 2018

Uh oh!

enisoc commented Jan 10, 2018

Uh oh!

derekperkins commented Jan 10, 2018

Uh oh!

clintkitson commented Jan 15, 2018

Uh oh!

clintkitson commented Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekperkins commented Jan 18, 2018

Uh oh!

enisoc commented Jan 24, 2018

Uh oh!

caniszczyk commented Feb 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bassam commented Dec 14, 2017 •

edited

Loading

bgrant0607 commented Jan 10, 2018 •

edited

Loading

clintkitson commented Jan 15, 2018 •

edited

Loading

caniszczyk commented Feb 5, 2018 •

edited

Loading