Look ma, multiple DB support (Part I of the DB saga) by cdecker · Pull Request #2924 · ElementsProject/lightning

cdecker · 2019-08-08T18:50:45Z

Data integrity and data safety is paramount for nodes in the Lighning Network
since loss of data, or use of outdated data can directly lead to loss of
funds. c-lightning currently stores all its information in a sqlite3
database, and offers the opportunity to synchronously replicate any changes to
other storages through the use of the db_write hook.

To further bolster our data resilience, and allow for a variety of more
enterprisey deployments it is desirable to allownode operators to switch out
the sqlite3 backend for something else. In this PR implement a DB
abstraction framework that allows to configure other DBMSs as backends, and
future PRs will add concrete implementations of those drivers.

This PR consists of a couple of distinct things, but I wanted to make sure it
actually works before submitting a partial solution.

Compile time SQL statement extraction, rewriting for each supported DBMS,
and runtime lookup of the rewritten queries.
Creation of a minimal common API that all DBMSs that I looked at can
support, enabling the abstraction over the specific APIs. The API is very
much inspired by sqlite3 but I made sure that it can be easily mapped to
MySQL and Postgresql. In addition we have number of helper functionality to
bolt down some avenues for errors:
- Both bindings and column access use 0-based indexing, not like sqlite3s
  1-based bindings (which I found out were inspired by printf). This
  caused a couple of misunderstandings in the past.
- Runtime statement checking such as bind checking on execution (statement
  placeholders without a binding will result in failing the execution).
- Compile-time instrumentation (placeholders are counted during extraction
  and allow minimizing allocations at runtime).
- Allow numbered placeholders to avoid repetitive bindings (not implemented
  yet).
Implementation of the sqlite3 driver, now using the uniform API, and
keeping all sqlite3-specific code in a separate file that is easily
configured out if the library is missing or incompatible.

Since the "S" in SQL is a marketing lie, and SQL statements are not portable
across DBMSs I had to come up with a method to write portable code. The
solution I went for consists of marking the SQL statements that are to be
extracted with two macros (SQL and NAMED_SQL), which then allows a tool
(devtools/sql-rewrite.py) to extract them and rewrite them into variants
that work for the specific DBMS that we are rewriting for. The queries are
then stored in a large array inside of wallet/gen_db_${dbms}.c and looked up
using a unique name (autogenerated in the case of the SQL macro).

When preparing a DB statement we look up the rewritten statement in the array
and initialize a struct db_stmt, to which we can attach bindings. When
executing we perform a number checks and run the query against the selected
DBMS.

The DB-specific implementation of the common API is in wallet/db_${dbms}.c
and adheres to a common set of functions defined in
wallet/db_common.h. wallet/db.c itself only dispatches calls to the
DB-specific implementations through the use of a struct db_config instance
(yes, it is a virtual dispatch table, I implemented dynamic dispatch in a
non-OOP language...), but none of this should ever leak outside of the files
mentioned here, and it'll just look like a normal DB from outside.

Still to do for this PR:

Migrate SQL statements in db.c.
Migrate SQL statements in wallet.c.
Remove traces of sqlite3 in the non-sqlite3-specific files.

Once I have these missing pieces I will remove the draft status, so don't feel
forced to review this until I do, but feedback is always welcome 😉

Still to do for future PRs:

Implement MySQL driver
Implement Postgresql driver
Change hard to rewrite queries into simpler ones (e.g., INSERT OR REPLACE INTO)

jb55 · 2019-08-08T23:45:16Z

very cool

ZmnSCPxj · 2019-08-08T23:50:55Z

Since the "S" in SQL is a marketing lie, and SQL statements are not portable
across DBMSs I had to come up with a method to write portable code. The
solution I went for consists of marking the SQL statements that are to be
extracted with two macros (SQL and NAMED_SQL), which then allows a tool
(devtools/sql-rewrite.py) to extract them and rewrite them into variants
that work for the specific DBMS that we are rewriting for.

So we now have our own SQL variant?

Which is not to say it is a bad solution, mostly a gripe. I encountered this kind of issue before (GLSL shaders), and sometimes this is the only solution you can come up with without going whole hog and creating your own language.

yes, it is a virtual dispatch table, I implemented dynamic dispatch in a
non-OOP language...

C is an OOP language (for some definition of "is"). It is just that you have to implement OO yourself. See the shenanigans that gtk gets up to in its C code, for example.

jb55 · 2019-08-08T23:53:43Z

Which is not to say it is a bad solution, mostly a gripe. I encountered this kind of issue before (GLSL shaders), and sometimes this is the only solution you can come up with without going whole hog and creating your own language.

it's an interesting approach, I don't feel that strongly about it yet. the alternative being programming to some ORM which is never much fun. As long as it doesn't require that much hacking to get postgresql working, then replication will be much easier.

rustyrussell

I think xgettext -kSQL makes this problem much simpler, and it's a tool which already exists for this purpose.

The resulting .po file format is trivial to parse, or we could literally use dgettext here too, but this would be a little weird if we ever support real localization of strings.

rustyrussell · 2019-08-09T03:59:01Z

wallet/db.c


+	/* Since these queries will be treated as read-only they need to start
+	 * with "SELECT" and have no side-effects. */
+	assert(strncmp(query, "SELECT", 6) == 0);


We have 'strstarts()' for this BTW.

rustyrussell · 2019-08-09T03:59:55Z

wallet/db.h

+ * This macro is used to annotate SQL queries that might need rewriting for
+ * different SQL dialects. It is used both as a marker for the query
+ * extraction logic in devtools/sql-rewrite.py to identify queries, as well as
+ * a way to swap out the query text with it's name so that the query execution


"with its name"

niftynei

really nice draft

niftynei · 2019-08-20T00:50:51Z

devtools/sql-rewrite.py

+    "sqlite3": Sqlite3Rewriter(),
+}
+
+template = Template("""#ifndef LIGHTNINGD_WALLET_GEN_DB_${f.upper()}


put template in external file?

niftynei · 2019-08-20T00:57:24Z

wallet/Makefile

+wallet/db_sqlite3.c: wallet/gen_db_sqlite3.c
+
+wallet/gen_db_sqlite3.c: devtools/sql-rewrite.py wallet/db.c wallet/wallet.c wallet/test/run-db.c
+	LD_LIBRARY_PATH=${LIBCLANG_PATH} devtools/sql-rewrite.py -f sqlite3 > wallet/gen_db_sqlite3.c


I believe you can use $@ as the destination/target filename here, instead of spelling it out again.

niftynei · 2019-08-20T01:00:44Z

wallet/db.c

 	db->filename = tal_strdup(db, filename);
 	db->sql = sql;
+
+	for (size_t i=0; i<num_configs; i++)


spacing here is very Go-like >.<

sometimesyoudon'twantspacesandit'sstillperfectlyreadable.

But maybe not here :)

niftynei · 2019-08-20T01:03:19Z

wallet/db_common.h

-	struct db_query *queries;
-	size_t num_queries;
+	/* Is this a read-only query? If it is there's no need to tell plugins
+	 * about it. */


niftynei · 2019-08-20T01:12:42Z

wallet/db_sqlite3.c

+	char *errmsg;
+	err = sqlite3_exec(db->conn, "COMMIT;", NULL, NULL, &errmsg);
+	if (err != SQLITE_OK)
+		fatal("Failed to begin a transaction: %s", errmsg);


begin -> commit

niftynei · 2019-08-20T01:14:45Z

wallet/db_sqlite3.c

+		}
+	}
+
+	if (err != SQLITE_OK) {


it's a bit weird semantically to be checking if the 'err' is OK. Consider renaming to result?

niftynei · 2019-08-20T01:14:56Z

wallet/db_sqlite3.c

+	}
+
+	err = sqlite3_step(stmt->inner_stmt);
+	if (err != SQLITE_DONE) {


same as previous comment.

I pretty much copied this from the old code, and there I had gotten it from the sqlite3 example code :-)

cdecker · 2019-08-28T21:29:38Z

Ok, I think this is now good to go. I went ahead and migrated all the places we used the legacy implementation with the new, indirect, implementation and everything seems to be working ok.

I simplified the interface for querying a bit, removed a lot of redundancy and implemented the binding and column accessor functions to use the abstraction layer as well.

I left the migration commits unsquashed for now so they are a bit less daunting, but once the PR has the required ACKs I'll squash them into a single commit.

cdecker · 2019-08-28T21:30:14Z

PING @rustyrussell @niftynei @ZmnSCPxj

jb55 · 2019-08-28T22:53:33Z

@cdecker I assume you squash the fixups now that it's ready to review?

cdecker · 2019-08-29T06:47:35Z

@cdecker I assume you squash the fixups now that it's ready to review?

Keeping the migration fixups separate for until the review is done in order to make the review less daunting, but I'll squash before merging. @bitcoin-bot should be able to infer that the squashed version and the unsquashed version are the same (same changes) and re-apply ACKs automatically :-)

Signed-off-by: Christian Decker <decker.christian@gmail.com>

All drivers will have to reach into it, so put it in a place that is reachable from the drivers, along with all other definitions. Signed-off-by: Christian Decker <decker.christian@gmail.com>

This gets rid of the two parallel execution paths of read-only and write queries, by explicitly stating with each query whether it is a read-only query, we only need to remember the ones marked as write queries. Signed-off-by: Christian Decker <decker.christian@gmail.com>

These functions implement the lookup of the query, and the dispatch to the DB-specific functions that do the actual heavy lifting. Signed-off-by: Christian Decker <decker.christian@gmail.com>

This is the DB-specific counterpart to the previous commit. Signed-off-by: Christian Decker <decker.christian@gmail.com>

These do not require the ability to iterate over the result, hence they can be migrated already. Signed-off-by: Christian Decker <decker.christian@gmail.com>

This is much more in line with the rest of our memory management. Signed-off-by: Christian Decker <decker.christian@gmail.com>

For some of the query methods in the next step we need to have an idea of whether the stmt was executed (db_step function) so let's track that explicitly. Signed-off-by: Christian Decker <decker.christian@gmail.com>

This is the first step towards being able to extract information from query rows. Only the most basic types are exposed, the others will be built on top of these primitives. Signed-off-by: Christian Decker <decker.christian@gmail.com>

I was hoping to get rid of these by using "ON CONFLICT" upserts, however sqlite3 only started supporting them in version 3.24.0 which is newer than some of our deployment targets. Signed-off-by: Christian Decker <decker.christian@gmail.com>

These are used to do one-time initializations and wait for pending statements before closing. Signed-off-by: Christian Decker <decker.christian@gmail.com>

This has a slight side-effect of removing the actual begin and commit statements from the `db_write` hooks, but they are mostly redundant anyway (no harm in grouping pre-init statements into one transaction, and we know that each post-init call is supposed to be wrapped anyway). Signed-off-by: Christian Decker <decker.christian@gmail.com>

Signed-off-by: Christian Decker <decker.christian@gmail.com>

These are based on top of the basic column access functions, and act as a small type-safe wrapper, that also does a bit of validation. Signed-off-by: Christian Decker <decker.christian@gmail.com>

This is likely the last part we need to completely encapsulate the part of the sqlite3 API that we were using. Like the `db_count_changes` call I decided to pass in the `struct db_stmt` since really they refer to the statement that was executed and not the db. Signed-off-by: Christian Decker <decker.christian@gmail.com>

Signed-off-by: Christian Decker <decker.christian@gmail.com>

We now have a much stronger consistency check from the combination of transaction wrapping, tal memory leak detection. Tramsaction wrapping ensures that each statement is executed before the transaction is committed. The commit is also driven by the `io_loop`, which means that it is no longer possible for us to have statements outside of transactions and transactions are guaranteed to commit at the round's end. By adding the tal-awareness we can also get a much better indication as to whether we have un-freed statements flying around, which we can test at the end of the round as well. Signed-off-by: Christian Decker <decker.christian@gmail.com>

It's better to let the driver decide when and how to expand. It can then report the expanded statement back to the dispatch through the `db_changes_add` function. Signed-off-by: Christian Decker <decker.christian@gmail.com>

We are about to delete all the `sqlite3`-specific code from `db.c` and this is one of the last uses of the old interface. Signed-off-by: Christian Decker <decker.christian@gmail.com>

Signed-off-by: Christian Decker <decker.christian@gmail.com>

Now that all the users are migrated to the abstraction layer we can remove the legacy implementation. Signed-off-by: Christian Decker <decker.christian@gmail.com>

@rustyrussell

Signed-off-by: Christian Decker <decker.christian@gmail.com> Suggested-by: Rusty Russell <@rustyrussell>

cdecker · 2019-09-05T21:40:07Z

Ok, fixed the rebase issue I was having problems with yesterday.

Ready for review @rustyrussell 😉

rustyrussell · 2019-09-05T23:39:28Z

Ack 969f469

cdecker added wallet labels Aug 8, 2019

cdecker requested review from ZmnSCPxj, niftynei and rustyrussell August 8, 2019 18:50

cdecker self-assigned this Aug 8, 2019

cdecker force-pushed the multi-db branch 4 times, most recently from 921da08 to 9c845d1 Compare August 8, 2019 21:13

rustyrussell added this to the 0.7.3 milestone Aug 9, 2019

rustyrussell reviewed Aug 9, 2019

View reviewed changes

niftynei reviewed Aug 20, 2019

View reviewed changes

cdecker force-pushed the multi-db branch 4 times, most recently from fe79c77 to b2a5434 Compare August 22, 2019 22:10

cdecker force-pushed the multi-db branch 3 times, most recently from 2fb3e94 to 19e9459 Compare August 28, 2019 21:18

cdecker marked this pull request as ready for review August 28, 2019 21:22

cdecker force-pushed the multi-db branch from 19e9459 to fd80965 Compare August 28, 2019 21:47

cdecker mentioned this pull request Aug 29, 2019

dbbackup: A minimal database backup plugin lightningd/plugins#40

Closed

cdecker added 24 commits September 5, 2019 13:23

wallet: Move the db_fatal definition so we can use it in drivers

ce41328

Signed-off-by: Christian Decker <decker.christian@gmail.com>

wallet: Move the struct db definition to db_common.h

7e5a14c

All drivers will have to reach into it, so put it in a place that is reachable from the drivers, along with all other definitions. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Implement skaffolding for the dispatch of DB-specific functions

ec3e5cb

These functions implement the lookup of the query, and the dispatch to the DB-specific functions that do the actual heavy lifting. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Implement the sqlite3 driver

d726142

This is the DB-specific counterpart to the previous commit. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Switch to new DB asbtraction for DB migrations

89dae2c

These do not require the ability to iterate over the result, hence they can be migrated already. Signed-off-by: Christian Decker <decker.christian@gmail.com>

wallet: Call db_stmt_free from the db_stmt destructor automatically

643446b

This is much more in line with the rest of our memory management. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Track whether a db_stmt has been executed

91500cd

For some of the query methods in the next step we need to have an idea of whether the stmt was executed (db_step function) so let's track that explicitly. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Implement basic query capabilities

4207120

This is the first step towards being able to extract information from query rows. Only the most basic types are exposed, the others will be built on top of these primitives. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Add setup and teardown function to DB

a09a1e1

These are used to do one-time initializations and wait for pending statements before closing. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Migrate to DB abstraction layer in db.c

55f53d8

Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Add more type-safe bindings to the interface

7e24a75

Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Add type-safe column access functions

fa413ed

These are based on top of the basic column access functions, and act as a small type-safe wrapper, that also does a bit of validation. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Migrate invoices.c to new abstraction layer

4aacd62

Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Migrate wallet.c to the new abstraction layer

58cdb46

Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Move statement expansion into the driver

c431f63

It's better to let the driver decide when and how to expand. It can then report the expanded statement back to the dispatch through the `db_changes_add` function. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Switch to indirect db_last_insert_id version

755df7c

We are about to delete all the `sqlite3`-specific code from `db.c` and this is one of the last uses of the old interface. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Switch to indirect db close

7402aea

Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Remove sqlite3 from db.c and db.h

340d0de

Now that all the users are migrated to the abstraction layer we can remove the legacy implementation. Signed-off-by: Christian Decker <decker.christian@gmail.com>

db: Extract db config lookup into its own function

969f469

Signed-off-by: Christian Decker <decker.christian@gmail.com> Suggested-by: Rusty Russell <@rustyrussell>

cdecker force-pushed the multi-db branch from aef674a to 969f469 Compare September 5, 2019 12:40

rustyrussell merged commit 58f4489 into ElementsProject:master Sep 5, 2019

cdecker mentioned this pull request Sep 13, 2019

Postgresql driver for the multi-database abstraction #3057

Merged

3 tasks

darosior mentioned this pull request Oct 14, 2019

Backup and other feature requests #2400

Closed

Conversation

cdecker commented Aug 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jb55 commented Aug 8, 2019

Uh oh!

ZmnSCPxj commented Aug 8, 2019

Uh oh!

jb55 commented Aug 8, 2019

Uh oh!

rustyrussell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niftynei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdecker commented Aug 28, 2019

Uh oh!

cdecker commented Aug 28, 2019

Uh oh!

jb55 commented Aug 28, 2019

Uh oh!

cdecker commented Aug 29, 2019

Uh oh!

cdecker commented Sep 5, 2019

Uh oh!

rustyrussell commented Sep 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cdecker commented Aug 8, 2019 •

edited

Loading