- Several minor logging changes.
- Make the gevent MySQL driver more efficient at avoiding needless waits.
- Due to a bug in MySQL (incorrectly rounding the 'minute' value of a timestamp up), TIDs generated in the last half second of a minute would suddenly jump ahead by 4,266,903,756 integers (a full minute).
- Fix leaking an internal value for
innodb_lock_timeout
across commits on MySQL. This could lead totpc_vote
blocking longer than desired. See :issue:`331`.
Improve the safety of the persistent local cache in high-concurrency environments using older versions of SQLite. Perform a quick integrity check on startup and refuse to use the cache files if they are reported corrupt.
Switch the order in which object locks are taken: try shared locks first and only then attempt exclusive locks. Shared locks do not have to block, so a quick lock timeout here means that a
ReadConflictError
is inevitable. This works best on PostgreSQL and MySQL 8, which support true non-blocking locks. On MySQL 5.7, non-blocking locks are emulated with a 1s timeout. See :issue:`310`.Note
The transaction machinery will retry read conflict errors by default. The more rapid detection of them may lead to extra retries if there was a process still finishing its commit. Consider adding small sleep backoffs to retry logic.
Fix MySQL to immediately rollback its transaction when it gets a lock timeout, while still in the stored procedure on the database. Previously it would have required a round trip to the Python process, which could take an arbitrary amount of time while the transaction may have still been holding some locks. (After :issue:`310` they would only be shared locks, but before they would have been exclusive locks.) This should make for faster recovery in heavily loaded environments with lots of conflicts. See :issue:`313`.
Make MySQL clear its temp tables using a single round trip. Truncation is optional and disabled by default. See :issue:`319`.
Fix PostgreSQL to not send the definition of the temporary tables for every transaction. This is only necessary for the first transaction.
Improve handling of commit and rollback, especially on PostgreSQL. We now generate many fewer unneeded rollbacks. See :issue:`289`.
Stop checking the status of
readCurrent
OIDs twice.Make the gevent MySQL driver yield more frequently while getting large result sets. Previously it would block in C to read the entire result set. Now it yields according to the cursor's
arraysize
. See :issue:`315`.Polling for changes now iterates the cursor instead of using
fetchall()
. This can reduce memory usage and provide better behaviour in a concurrent environment, depending on the cursor implementation.Add three environment variables to control the odds of whether any given poll actually suggests shifted checkpoints. These are all floating point numbers between 0 and 1. They are
RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL
(default to 0.7, i.e., 70%),RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT
(default 0.8) andRELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE
(default 0.2). (There are corresponding class variables on the storage cache that could also be set.) Use values of1
,1
and0
to restore the old completely deterministic behaviour. It's not clear whether these will be useful, so they are not officially options yet but they may become so. Feedback is appreciated! See :issue:`323`.
- Eliminate runtime dependency on ZEO. See :issue:`293`.
- Fix a rare race condition allocating OIDs on MySQL. See :issue:`283`.
- Optimize the
loadBefore
method. It appears to be mostly used in the tests. - Fix the blob cache cleanup thread to use a real native thread if we're monkey-patched by gevent, using gevent's thread pool. Previously, cleaning up the blob cache would block the event loop for the duration. See :issue:`296`.
- Improve the thread safety and resource usage of blob cache cleanup. Previously it could spawn many useless threads.
- When caching a newly uploaded blob for a history free storage, if there's an older revision of the blob in the cache, and it is not in use, go ahead and preemptively remove it from disk. This can help prevent the cache size from growing out of hand and limit the number of expensive full cache checks required. See :issue:`297`.
- Change the default value of the configuration setting
shared-blob-dir
to false, meaning that the default is now to use a blob cache. If you were using shared blobs before, you'll need to explicitly set a value forshared-blob-dir
totrue
before starting RelStorage. - Add an option,
blob-cache-size-check-external
, that causes the blob cache cleanup process to run in a subprocess instead of a thread. This can free up the storage process to handle requests. This is not recommended on Windows. (python -m relstorage.blobhelper.cached /path/to/cache size_in_bytes
can be used to run a manual cleanup at any time. This is currently an internal implementation detail.) - Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable
RELSTORAGE_ABORT_EARLY
. Aborting early releases database locks to allow other transactions to make progress immediately. See :issue:`50`. - Reduce the strength of locks taken by
Connection.readCurrent
so that they don't conflict with other connections that just want to verify they haven't changed. This also lets us immediately detect a conflict error with an in-progress transaction that is trying to alter those objects. See :issue:`302`. - Make databases that use row-level locks (MySQL and PostgreSQL) raise specific exceptions on failures to acquire those locks. A different exception is raised for rows a transaction needs to modify compared to rows it only needs to read. Both are considered transient to encourage transaction middleware to retry. See :issue:`303`.
- Move more of the vote phase of transaction commit into a database stored procedure on MySQL and PostgreSQL, beginning with taking the row-level locks. This eliminates several more database round trips and the need for the Python thread (or greenlet) to repeatedly release and then acquire the GIL while holding global locks. See :issue:`304`.
- Make conflict resolution require fewer database round trips, especially on PostgreSQL and MySQL, at the expense of using more memory. In the ideal case it now only needs one (MySQL) or two (PostgreSQL) queries. Previously it needed at least twice the number of trips as there were conflicting objects. On both databases, the benchmarks are 40% to 80% faster (depending on cache configuration).
Eliminate a few extra round trips to the database on transaction completion: One extra
ROLLBACK
in all databases, and one query against thetransaction
table in history-preserving databases. See :issue:`159`.Prepare more statements used during regular polling.
Gracefully handle certain disconnected exceptions when rolling back connections in between transactions. See :issue:`280`.
Fix a cache error ("TypeError: NoneType object is not subscriptable") when an object had been deleted (such as through undoing its creation transaction, or with
multi-zodb-gc
).Implement
IExternalGC
for history-preserving databases. This lets them be used with zc.zodbdgc, allowing for multi-database garbage collection (see :issue:`76`). Note that you must pack the database after runningmulti-zodb-gc
in order to reclaim space.Caution!
It is critical that
pack-gc
be turned off (set to false) in a multi-database and that onlymulti-zodb-gc
be used to perform garbage collection.
Make
RelStorage.pack()
also accept a TID from the RelStorage database to pack to. The usual Unix timestamp form for choosing a pack time can be ambiguous in the event of multiple transactions within a very short period of time. This is mostly a concern for automated tests.Similarly, it will accept a value less than 0 to mean the most recent transaction in the database. This is useful when machine clocks may not be well synchronized, or from automated tests.
Remove vestigial top-level thread locks. No instance of RelStorage is thread safe.
RelStorage is an
IMVCCStorage
, which means that each ZODBConnection
gets its own new storage object. No visible storage state is shared among Connections. Connections are explicitly documented as not being thread safe. Since 2.0, RelStorage's Connection instances have taken advantage of that fact to be a little lighter weight through not being thread safe. However, they still paid the overhead of locking method calls and code complexity.The top-level storage (the one belonging to a
ZODB.DB
) still used heavyweight locks in earlier releases.ZODB.DB.storage
is documented as being only useful for tests, and theDB
object itself does not expose any operations that use the storage in a way that would require thread safety.The remaining thread safety support has been removed. This simplifies the code and reduces overhead.
If you were previously using the
ZODB.DB.storage
object, or aRelStorage
instance you constructed manually, from multiple threads, instead make sure each thread has a distinctRelStorage.new_instance()
object.A
RelStorage
instance now only implements the appropriate subset of ZODB storage interfaces according to its configuration. For example, if there is no configuredblob-dir
, it won't implementIBlobStorage
, and ifkeep-history
is false, it won't implementIStorageUndoable
.Refactor RelStorage internals for a cleaner separation of concerns. This includes how (some) queries are written and managed, making it easier to prepare statements, but only those actually used.
On MySQL, move allocating a TID into the database. On benchmarks of a local machine this can be a scant few percent faster, but it's primarily intended to reduce the number of round-trips to the database. This is a step towards :issue:`281`. See :pr:`286`.
On MySQL, set the connection timezone to be UTC. This is necessary to get values consistent between
UTC_TIMESTAMP
,UNIX_TIMESTAMP
,FROM_UNIXTIME
, and Python'stime.gmtime
, as used for comparing TIDs.On MySQL, move most steps of finishing a transaction into a stored procedure. Together with the TID allocation changes, this reduces the number of database queries from:
1 to lock + 1 to get TID + 1 to store transaction (0 in history free) + 1 to move states + 1 for blobs (2 in history free) + 1 to set current (0 in history free) + 1 to commit = 7 or 6 (in history free)
down to 1. This is expected to be especially helpful for gevent deployments, as the database lock is held, the transaction finalized and committed, and the database lock released, all without involving greenlets or greenlet switches. By allowing the GIL to be released longer it may also be helpful for threaded environments. See :issue:`281` and :pr:`287` for benchmarks and specifics.
Caution!
MySQL 5.7.18 and earlier contain a severe bug that causes the server to crash when the stored procedure is executed.
Make PyMySQL use the same precision as mysqlclient when sending floating point parameters.
Automatically detect when MySQL stored procedures in the database are out of date with the current source in this package and replace them.
- As for MySQL, move allocating a TID into the database.
- As for MySQL, move most steps of finishing a transaction into a stored procedure. On psycopg2 and psycopg2cffi this is done in a single database call. With pg8000, however, it still takes two, with the second call being the COMMIT call that releases locks.
- Speed up getting the approximate number of objects
(
len(storage)
) in a database by using the estimates collected by the autovacuum process or analyzing tables, instead of asking for a full table scan.
- Reduce the time that MySQL will wait to perform OID garbage collection on startup. See :issue:`271`.
- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
. See :issue:`270`. - Fix the "gevent MySQLdb" driver to be cooperative during
commit
androllback
operations. Previously, it would block the event loop for the entire time it took to send the commit or rollback request, the server to perform the request, and the result to be returned. Now, it frees the event loop after sending the request. See :issue:`272`. - Call
set_min_oid
less often if a storage is just updating existing objects, not creating its own. - Fix an occasional possible deadlock in MySQL's
set_min_oid
. See :pr:`276`.
Add support for the ZODB 5
connection.prefetch(*args)
API. This takes either OIDs (obj._p_oid
) or persistent ghost objects, or an iterator of those things, and asks the storage to load them into its cache for use in the future. In RelStorage, this uses the shared cache and so may be useful for more than one thread. This can be 3x or more faster than loading objects on-demand. See :issue:`239`.Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL versions natively handle blobs greater than 2GB in size, and the server was already chunking the blobs for storage, so our layer of extra chunking has become unnecessary.
Important
The first time a storage is opened with this version, blobs that have multiple chunks will be collapsed into a single chunk. If there are many blobs larger than 2GB, this could take some time.
It is recommended you have a backup before installing this version.
To verify that the blobs were correctly migrated, you should clean or remove your configured blob-cache directory, forcing new blobs to be downloaded.
Fix a bug that left large objects behind if a PostgreSQL database containing any blobs was ever zapped (with
storage.zap_all()
). Thezodbconvert
command, thezodbshootout
command, and the RelStorage test suite could all zap databases. Running thevacuumlo
command included with PostgreSQL will free such orphaned large objects, after which a regularvacuumdb
command can be used to reclaim space. See :issue:`260`.Conflict resolution can use data from the cache, thus potentially eliminating a database hit during a very time-sensitive process. Please file issues if you encounter any strange behaviour when concurrently packing to the present time and also resolving conflicts, in case there are corner cases.
Packing a storage now invalidates the cached values that were packed away. For the global caches this helps reduce memory pressure; for the local cache this helps reduce memory pressure and ensure a more useful persistent cache (this probably matters most when running on a single machine).
Make MySQL use
ON DUPLICATE KEY UPDATE
rather thanREPLACE
. This can be friendlier to the storage engine as it performs an in-placeUPDATE
rather than aDELETE
followed by anINSERT
. See :issue:`189`.Make PostgreSQL use an upsert query for moving rows into place on history-preserving databases.
Support ZODB 5's parallel commit feature. This means that the database-wide commit lock is taken much later in the process, and held for a much shorter time than before.
Previously, the commit lock was taken during the
tpc_vote
phase, and held while we checkedConnection.readCurrent
values, and checked for (and hopefully resolved) conflicts. Other transaction resources (such as other ZODB databases in a multi-db setup) then got to vote while we held this lock. Finally, intpc_finally
, objects were moved into place and the lock was released. This prevented any other storage instances from checking forreadCurrent
or conflicts while we were doing that.Now,
tpc_vote
is (usually) able to checkConnection.readCurrent
and check and resolve conflicts without taking the commit lock. Only intpc_finish
, when we need to finally allocate the transaction ID, is the commit lock taken, and only held for the duration needed to finally move objects into place. This allows other storages for this database, and other transaction resources for this transaction, to proceed with voting, conflict resolution, etc, in parallel.Consistent results are maintained by use of object-level row locking. Thus, two transactions that attempt to modify the same object will now only block each other.
There are two exceptions. First, if the
storage.restore()
method is used, the commit lock must be taken very early (beforetpc_vote
). This is usually only done as part of copying one database to another. Second, if the storage is configured with a shared blob directory instead of a blob cache (meaning that blobs are only stored on the filesystem) and the transaction has added or mutated blobs, the commit lock must be taken somewhat early to ensure blobs can be saved (after conflict resolution, etc, but before the end oftpc_vote
). It is recommended to store blobs on the RDBMS server and use a blob cache. The shared blob layout can be considered deprecated for this reason).In addition, the new locking scheme means that packing no longer needs to acquire a commit lock and more work can proceed in parallel with regular commits. (Though, there may have been some regressions in the deletion phase of packing speed MySQL; this has not been benchmarked.)
Note
If the environment variable
RELSTORAGE_LOCK_EARLY
is set when RelStorage is imported, then parallel commit will not be enabled, and the commit lock will be taken at the beginning of the tpc_vote phase, just like before: conflict resolution and readCurrent will all be handled with the lock held.This is intended for use diagnosing and temporarily working around bugs, such as the database driver reporting a deadlock error. If you find it necessary to use this setting, please report an issue at https://github.com/zodb/relstorage/issues.
See :issue:`125`.
Deprecate the option
shared-blob-dir
. Shared blob dirs prevent using parallel commits when blobs are part of a transaction.Remove the 'umysqldb' driver option. This driver exhibited failures with row-level locking used for parallel commits. See :issue:`264`.
Migrate all remaining MySQL tables to InnoDB. This is primarily the tables used during packing, but also the table used for allocating new OIDs.
Tables will be converted the first time a storage is opened that is allowed to create the schema (
create-schema
in the configuration; default is true). For large tables, this may take some time, so it is recommended to finish any outstanding packs before upgrading RelStorage.If schema creation is not allowed, and required tables are not using InnoDB, an exception will be raised. Please contact the RelStorage maintainers on GitHub if you have a need to use a storage engine besides InnoDB.
This allows for better error detection during packing with parallel commits. It is also required for MySQL Group Replication. Benchmarking also shows that creating new objects can be up to 15% faster due to faster OID allocation.
Things to be aware of:
- MySQL's general conversion notes suggest that if you had tuned certain server parameters for MyISAM tables (which RelStorage only used during packing) it might be good to evaluate those parameters again.
- InnoDB tables may take more disk space than MyISAM tables.
- The
new_oid
table may temporarily have more rows in it at one time than before. They will still be garbage collected eventually. The change in strategy was necessary to handle concurrent transactions better.
See :issue:`188`.
Fix an
OperationalError: database is locked
that could occur on startup if multiple processes were reading or writing the cache database. See :issue:`266`.
- Zapping a storage now also removes any persistent cache files. See :issue:`241`.
- Zapping a MySQL storage now issues
DROP TABLE
statements instead ofDELETE FROM
statements. This is much faster on large databases. See :issue:`242`. - Workaround the PyPy 7.1 JIT bug using MySQL Connector/Python. It is no longer necessary to disable the JIT in PyPy 7.1.
- On PostgreSQL, use PostgreSQL's efficient binary
COPY FROM
to store objects into the database. This can be 20-40% faster. See :issue:`247`. - Use more efficient mechanisms to poll the database for current TIDs when verifying serials in transactions.
- Silence a warning about
cursor.connection
from pg8000. See :issue:`238`. - Poll the database for the correct TIDs of older transactions when loading from a persistent cache, and only use the entries if they are current. This restores the functionality lost in the fix for :issue:`249`.
- Increase the default cache delta limit sizes.
- Fix a race condition accessing non-shared blobs when the blob cache limit was reached which could result in blobs appearing to be spuriously empty. This was only observed on macOS. See :issue:`219`.
- Fix a bug computing the cache delta maps when restoring from persistent cache that could cause data from a single transaction to be stale, leading to spurious conflicts.
- Drop support for PostgreSQL versions earlier than 9.6. See :issue:`220`.
- Make MySQL and PostgreSQL use a prepared statement to get transaction IDs. PostgreSQL also uses a prepared statement to set them. This can be slightly faster. See :issue:`246`.
- Make PostgreSQL use a prepared statement to move objects to their final destination during commit (history free only). See :issue:`246`.
- Fix an issue with persistent caches written to from multiple instances sometimes getting stale data after a restart. Note: This makes the persistent cache less useful for objects that rarely change in a database that features other actively changing objects; it is hoped this can be addressed in the future. See :issue:`249`.
Add support for Python 3.7.
Drop support for Python 3.4.
Drop support for Python 2.7.8 and earlier.
Drop support for ZODB 4 and ZEO 4.
Officially drop support for versions of MySQL before 5.7.9. We haven't been testing on anything older than that for some time, and older than 5.6 for some time before that.
Drop the
poll_interval
parameter. It has been deprecated with a warning and ignored since 2.0.0b2. See :issue:`222`.Drop support for pg8000 older than 1.11.0.
Drop support for MySQL Connector/Python older than 8.0.16. Many older versions are known to be broken. Note that the C extension, while available, is not currently recommended due to internal errors. See :issue:`228`.
Test support for MySQL Connector/Python on PyPy. See :issue:`228`.
Caution!
Prior to PyPy 7.2 or RelStorage 3.0a3, it is necessary to disable JIT inlining due to a PyPy bug with
struct.unpack
.Drop support for PyPy older than 5.3.1.
Drop support for the "MySQL Connector/Python" driver name since it wasn't possible to know if it would use the C extension or the Python implementation. Instead, explicitly use the 'Py' or 'C' prefixed name. See :pr:`229`.
Drop the internal and undocumented environment variables that could be used to force configurations that did not specify a database driver to use a specific driver. Instead, list the driver in the database configuration.
Opening a RelStorage configuration object read from ZConfig more than once would lose the database driver setting, reverting to 'auto'. It now retains the setting. See :issue:`231`.
Fix Python 3 with mysqlclient 1.4. See :issue:`213`.
Drop support for mysqlclient < 1.4.
Make driver names in RelStorage configurations case-insensitive (e.g., 'MySQLdb' and 'mysqldb' are both valid). See :issue:`227`.
Rename the column
transaction.empty
totransaction.is_empty
for compatibility with MySQL 8.0, whereempty
is now a reserved word. The migration will happen automatically when a storage is first opened, unless it is configured not to create the schema.Note
This migration has not been tested for Oracle.
Note
You must run this migration before attempting to upgrade a MySQL 5 database to MySQL 8. If you cannot run the upgrade through opening the storage, the statement is
ALTER TABLE transaction CHANGE empty is_empty BOOLEAN NOT NULL DEFAULT FALSE
.Stop getting a warning about invalid optimizer syntax when packing a MySQL database (especially with the PyMySQL driver). See :issue:`163`.
Add
gevent MySQLdb
, a new driver that cooperates with gevent while still using the C extensions ofmysqlclient
to communicate with MySQL. This is now recommended overumysqldb
, which is deprecated and will be removed.Rewrite the persistent cache implementation. It now is likely to produce much higher hit rates (100% on some benchmarks, compared to 1-2% before). It is currently slower to read and write, however. This is a work in progress. See :pr:`243`.
Add more aggressive validation and, when possible, corrections for certain types of cache consistency errors. Previously an
AssertionError
would be raised with the message "Detected an inconsistency between RelStorage and the database...". We now proactively try harder to avoid that situation based on some educated guesses about when it could happen, and should it still happen we now reset the cache and raise a type ofTransientError
allowing the application to retry. A few instances where previously incorrect data could be cached may now raise such aTransientError
. See :pr:`245`.
- Avoid deleting attributes of DB driver modules we import. Fixes :issue:`206` reported by Josh Zuech.
- Document that installing RelStorage from source requires a working CFFI compilation environment. Fixes :issue:`187`, reported by Johannes Raggam.
- Test with MySQL Connector/Python 8.0.6, up from 2.1.5. Note that PyPy 5.8.0 is known to not work with MySQL Connector/Python (although PyPy 5.6.0 did).
Implemented the storage
afterCompletion
method, which allows RelStorage storages to be notified of transaction endings for transactions that don't call the two-phase commit API. This allows resources to be used more efficiently because it prevents RDBMS transactions from being held open.Fixes: :issue:`147` (At least for ZODB 5.2.)
Oracle: Fix two queries that got broken due to the performance work in 2.1a1.
MySQL: Workaround a rare issue that could lead to a
TypeError
when getting new OIDs. See :issue:`173`.The
len
of a RelStorage instance now correctly reflects the approximate number of objects in the database. Previously it returned a hardcoded 0. See :issue:`178`.MySQL: Writing blobs to the database is much faster and scales much better as more blobs are stored. The query has been rewritten to use existing primary key indexes, whereas before it used a table scan due to deficiencies in the MySQL query optimizer. Thanks to Josh Zuech and enfold-josh. See :issue:`175`.
- 3.6.0 final release is tested on CI servers.
- Substantial performance improvements for PostgreSQL, both on reading and writing. Reading objects can be 20-40% faster. Writing objects can be 15-25% faster (the most benefit will be seen by history-free databases on PostgreSQL 9.5 and above). MySQL may have a (much) smaller improvement too, especially for small transactions. This was done through the use of prepared statements for the most important queries and the new 'ON CONFLICT UPDATE' syntax. See :pr:`157` and :issue:`156`.
- The umysqldb driver no longer attempts to automatically reconnect on a closed cursor exception. That fails now that prepared statements are in use. Instead, it translates the internal exception to one that the higher layers of RelStorage recognize as requiring reconnection at consistent times (transaction boundaries).
- Add initial support for the MySQL Connector/Python driver. See :issue:`155`.
- Backport ZODB #140 to older versions of ZODB. This improves write performance, especially in multi-threaded scenarios, by up to 10%. See :pr:`160`.
- MySQL temporary tables now use the InnoDB engine instead of MyISAM. See :pr:`162`.
- MySQL and Postgres now use the same optimized methods to get the latest TID at transaction commit time as they do at poll time. This is similar to :issue:`89`.
- MySQL now releases the commit lock (if acquired) during pre-pack with GC of a history-free storage at the same time as PostgreSQL and Oracle did (much earlier). Reported and initial fix provided in :pr:`9` by jplouis.
- Writing persistent cache files has been changed to reduce the risk of stale temporary files remaining. Also, files are kept open for a shorter period of time and removed in a way that should work better on Windows.
- RelStorage is now tested on Windows for MySQL and PostgreSQL thanks to AppVeyor.
- Add support for Python 3.6.
- The MySQL adapter will now produce a more informative error if it gets an unexpected result taking the commit lock. Reported by Josh Zuech.
- Compatibility with transaction 2.0 on older versions of ZODB (prior
to the unreleased version that handles encoding meta data for us),
newer versions of ZODB (that do the encoding), while maintaining
compatibility with transaction 1.x. In particular, the
history
method consistently returns bytes for username and description. - In very rare cases, persistent cache files could result in a corrupt cache state in memory after loading them, resulting in AttributeErrors until the cache files were removed and the instance restarted. Reported in :issue:`140` by Carlos Sanchez.
- List CFFI in setup_requires for buildout users.
Add the ability to limit the persistent cache files size. Thanks to Josh Zuech for the suggestion, which led to the next change.
Move the RelStorage shared cache to a windowed-LFU with segmented LRU instead of a pure LRU model. This can be a nearly optimal caching strategy for many workloads. The caching code itself is also faster in all tested cases.
It's especially helpful when using persistent cache files together with a file size limit, as we can now ensure we write out the most frequently useful data to the file instead of just the newest.
For more information see :issue:`127` and :pr:`128`. Thanks to Ben Manes for assistance talking through issues related to the cache strategy.
For write-heavy workloads, you may want to increase
cache_delta_size_limit
.The internal implementation details of the cache have been completely changed. Only the
StorageCache
class remains unchanged (though that's also an implementation class). CFFI is now required, and support for PyPy versions older than 2.6.1 has been dropped.On CPython, use LLBTrees for the cache delta maps. This allows using a larger, more effective size while reducing memory usage. Fixes :issue:`130`.
Persistent cache files use the latest TID in the cache as the file's modification time. This allows a more accurate choice of which file to read at startup. Fixes :issue:`126`.
Fix packing of history-preserving Oracle databases. Reported in :issue:`135` by Peter Jacobs.
- Use
setuptools.find_packages
andinclude_package_data
to ensure wheels have all the files necessary. This corrects an issue with the 2.0.0b5 release on PyPI. See :issue:`121` by Carlos Sanchez.
Supporting new databases should be simpler due to a code restructuring. Note that many internal implementation classes have moved or been renamed.
The umysqldb support handles query transformations more efficiently.
umysqldb now raises a more informative error when the server sends too large a packet.
Note
If you receive "Socket receive buffer full" errors, you are likely experiencing this issue in ultramysql and will need a patched version, such as the one provided in this pull request.
The local persistent cache file format has been changed to improve reading and writing speed. Old files will be cleaned up automatically. Users of the default settings could see improvements of up to 3x or more on reading and writing.
Compression of local persistent cache files has been disabled by default (but there is still an option to turn it back on). Operational experience showed that it didn't actually save that much disk space, while substantially slowing down the reading and writing process (2-4x).
Add an option,
cache-local-dir-read-count
to limit the maximum number of persistent local cache files will be used to populate a storages's cache. This can be useful to reduce startup time if cache files are large and workers have mostly similar caches.
- Add experimental support for umysqldb as a MySQL driver for Python
2.7. This is a gevent-compatible driver implemented in C for speed.
Note that it may not be able to store large objects (it has been
observed to fail for a 16M object---it hardcodes a
max_allowed_packet
of exactly 16MB for read and write buffers), and has been observed to have some other stability issues.
- Add support for ZODB 5. RelStorage continues to run on ZODB 4 >= 4.4.2.
- Add support for tooling to help understand RelStorage cache behaviour. This can help tune cache sizes and the choice to use Memcached or not. See :issue:`106` and :pr:`108`.
- Fix a threading issue with certain database drivers.
- Support for cx_Oracle versions older than 5.0 has been dropped. 5.0 was released in 2008.
- Support for PostgreSQL 8.1 and earlier has been dropped. 8.2 is likely to still work, but 9.0 or above is recommended. 8.2 was released in 2006 and is no longer supported by upstream. The oldest version still supported by upstream is 9.1, released in 2011.
- Using ZODB >= 4.4.2 (but not 5.0) is recommended to avoid deprecation warnings due to the introduction of a new storage protocol. The next major release of RelStorage will require ZODB 4.4.2 or above and should work with ZODB 5.0.
- Change the recommended and tested MySQL client for Python 2.7 away from the unmaintained MySQL-python to the maintained mysqlclient (the same one used by Python 3).
- PyMySQL now works and is tested on Python 3.
- A pure-Python PostgreSQL driver, pg8000, now works and is tested on all platforms. This is a gevent-compatible driver. Note that it requires a PostgreSQL 9.4 server or above for BLOB support.
- Support explicitly specifying the database driver to use. This can
be important when there is a large performance difference between
drivers, and more than one might be installed. (Also, RelStorage no
longer has the side-effect of registering
PyMySQL
asMySQLdb
andpsycopg2cffi
aspsycopg2
.) See :issue:`86`.
- Memcache connections are explicitly released instead of waiting for
GC to do it for us. This is especially important with PyPy and/or
python-memcached
. See :issue:`80`. - The
poll-interval
option is now ignored and polling is performed when the ZODB Connection requests it (at transaction boundaries). Experience with delayed polling has shown it typically to do more harm than good, including introducing additional possibilities for error and leading to database performance issues. It is expected that most sites won't notice any performance difference. A larger discussion can be found in :issue:`87`.
- Support a persistent on-disk cache. This can greatly speed up
application warmup after a restart (such as when deploying new code).
Some synthetic benchmarks show an 8-10x improvement. See :issue:`92`
for a discussion, and see the options
cache-local-dir
andcache-local-dir-count
. - Instances of :class:`.RelStorage` no longer use threading locks by default and hence are not thread safe. A ZODB :class:`Connection <ZODB.interfaces.IConnection>` is documented as not being thread-safe and must be used only by a single thread at a time. Because RelStorage natively implements MVCC, each Connection has a unique storage object. It follows that the storage object is used only by a single thread. Using locks just adds unneeded overhead to the common case. If this is a breaking change for you, please open an issue. See :pr:`91`.
- MySQL uses (what should be) a slightly more efficient poll query. See :issue:`89`.
- The in-memory cache allows for higher levels of concurrent operation via finer-grained locks. For example, compression and decompression are no longer done while holding a lock.
- The in-memory cache now uses a better approximation of a LRU algorithm with less overhead, so more data should fit in the same size cache. (For best performance, CFFI should be installed; a warning is generated if that is not the case.)
- The in-memory cache is now smart enough not to store compressed objects that grow during compression, and it uses the same compression markers as zc.zlibstorage to avoid double-compression. It can also gracefully handle changes to the compression format in persistent files.
- Update the ZODB dependency from ZODB3 3.7.0 to ZODB 4.3.1. Support for ZODB older than 3.10 has been removed; ZODB 3.10 may work, but only ZODB 4.3 is tested.
- Remove support for Python 2.6 and below. Python 2.7 is now required.
- Add support for PyPy on MySQL and PostgreSQL using PyMySQL and psycopg2cffi respectively. PyPy can be substantially faster than CPython in some scenarios; see :pr:`23`.
- Add initial support for Python 3.4+ for MySQL (using mysqlclient), PostgreSQL, and Oracle.
- Fixed
loadBefore
of a deleted/undone object to correctly raise a POSKeyError instead of returning an empty state. (Revealed by updated tests for FileStorage in ZODB 4.3.1.) - Oracle: Packing should no longer produce LOB errors. This partially reverts the speedups in 1.6.0b2. Reported in :issue:`30` by Peter Jacobs.
- :meth:`.RelStorage.registerDB` and :meth:`.RelStorage.new_instance` now work with storage wrappers like zc.zlibstorage. See :issue:`70` and :issue:`71`.
- zodbconvert: The
--incremental
option is supported with a FileStorage (or any storage that implementsIStorage.lastTransaction()
) as a destination, not just RelStorages. - zodbconvert: The
--incremental
option works correctly with a RelStorage as a destination. See :pr:`22`. With contributions by Sylvain Viollon, Mauro Amico, and Peter Jacobs. Originally reported by Jan-Wijbrand Kolman. - PostgreSQL:
zodbconvert --clear
should be much faster when the destination is a PostgreSQL schema containing lots of data. NOTE: There can be no other open RelStorage connections to the destination, or any PostgreSQL connection in general that might be holding locks on the RelStorage tables, orzodbconvert
will block indefinitely waiting for the locks to be released. Partial fix for :issue:`16` reported by Chris McDonough. zodbconvert
andzodbpack
use :mod:`argparse` instead of :mod:`optparse` for command line handling.
- MySQL: Use the "binary" character set to avoid producing "Invalid utf8 character string" warnings. See :issue:`57`.
- Conflict resolution uses the locally cached state instead of re-reading it from the database (they are guaranteed to be the same). See :issue:`38`.
- Conflict resolution reads all conflicts from the database in one query, instead of querying for each individual conflict. See :issue:`39`.
- PostgreSQL no longer encodes and decodes object state in Base64 during database communication thanks to database driver improvements. This should reduce network overhead and CPU usage for both the RelStorage client and the database server. psycopg2 2.4.1 or above is required; 2.6.1 or above is recommended. (Or psycopg2cffi 2.7.4.)
- PostgreSQL 9.3: Support
commit-lock-timeout
. Contributed in :pr:`20` by Sean Upton.
- Raise a specific exception when acquiring the commit lock (:exc:`~relstorage.adapters.interfaces.UnableToAcquireCommitLockError`) or pack lock (:exc:`~relstorage.adapters.interfaces.UnableToAcquirePackUndoLockError`) fails. See :pr:`18`.
RelStorage.lastTransaction()
is more consistent with FileStorage and ClientStorage, returning a useful value in more cases.- Oracle: Add support for getting the database size. Contributed in :pr:`21` by Mauro Amico.
- Support :class:`ZODB.interfaces.IExternalGC` for history-free
databases, allowing multi-database garbage collection with
zc.zodbdgc
. See :issue:`47`.
- Travis CI is now used to run RelStorage tests against MySQL and PostgreSQL on every push and pull request. CPython 2 and 3 and PyPy are all tested with the recommended database drivers.
- Documentation has been reorganized and moved to readthedocs.
- Updated the buildout configuration to just run relstorage tests and to select which databases to use at build time.
- Tests: Basic integration testing is done on Travis CI. Thanks to Mauro Amico.
RelStorage.lastTransaction()
is more consistent with FileStorage and ClientStorage, returning a useful value in more cases.- zodbconvert: The
--incremental
option is supported with a FileStorage (or any storage that implementsIStorage.lastTransaction()
) as a destination, not just RelStorages. - zodbconvert: The
--incremental
option is supported with a RelStorage as a destination. See :pr:`22`. With contributions by Sylvain Viollon, Mauro Amico, and Peter Jacobs. Originally reported by Jan-Wijbrand Kolman. - Oracle: Packing should no longer produce LOB errors. This partially reverts the speedups in 1.6.0b2. Reported in :issue:`30` by Peter Jacobs.
- Tests: Use the standard library doctest module for compatibility with newer zope.testing releases.
- Packing: Significantly reduced the RAM consumed by graph traversal during the pre_pack phase. (Tried several methods; encoded 64 bit IISets turned out to be the most optimal.)
- Packing: Used cursor.fetchmany() to make packing more efficient.
- The local cache is now more configurable and uses
zlib
compression by default. - Added support for
zodburi
, which means you can open a storage using "postgres:", "mysql:", or "oracle:" URIs. - Packing: Reduced RAM consumption while packing by using IIBTree.Set instead of built-in set objects.
- MySQL 5.5: The test suite was freezing in checkBackwardTimeTravel. Fixed.
- Added performance metrics using the perfmetrics package.
- zodbconvert: Add an --incremental option to the zodbconvert script, letting you convert additional transactions at a later date, or update a non-live copy of your database, copying over missing transactions.
- Replication: Added the ro-replica-conf option, which tells RelStorage to use a read-only database replica for load connections. This makes it easy for RelStorage clients to take advantage of read-only database replicas.
- Replication: When the database connection is stale (such as when RelStorage switches to an asynchronous replica that is not yet up to date), RelStorage will now raise ReadConflictError by default. Ideally, the application will react to the error by transparently retrying the transaction, while the database gets up to date. A subsequent transaction will no longer be stale.
- Replication: Added the revert-when-stale option. When this option is true and the database connection is stale, RelStorage reverts the ZODB connection to the stale state rather than raise ReadConflictError. This option is intended for highly available, read-only ZODB clients. This option would probably confuse users of read-write ZODB clients, whose changes would sometimes seem to be temporarily reverted.
- Caching: Use the database name as the cache-prefix by default. This will hopefully help people who accidentally use a single memcached for multiple databases.
- Fixed compatibility with persistent 4.0.5 and above.
- Packing: Lowered garbage collection object reference finding log level to debug; this stage takes mere seconds, even in large sites, but could produce 10s of thousands of lines of log output.
- RelStorage was opening a test database connection (and was leaving it idle in a transaction with recent ZODB versions that support IMVCCStorage.) RelStorage no longer opens that test connection.
- zodbconvert: Avoid holding a list of all transactions in memory.
- Just after installing the database schema, verify the schema was created correctly. This affects MySQL in particular.
PostgreSQL: Fixed another minor compatibility issue with PostgreSQL 9.0. Packing raised an error when the client used old an version of libpq.
Delete empty transactions in batches of 1000 rows instead of all in one go, to prevent holding the transaction lock for longer than absolutely necessary.
Oracle: Fix object reference downloading performance for large RelStorage databases during the garbage collection phase of a pack.
Oracle, PostgreSQL: Switch to storing ZODB blob in chunks up to 4GB (the maximum supported by cx_Oracle) or 2GB (PostgreSQL maximum blob size) to maximize blob reading and writing performance.
The PostgreSQL blob_chunk schema changed to support this, see notes/migrate-to-1.5.txt to update existing databases.
zodbconvert: When copying a database containing blobs, ensure the source blob file exists long enough to copy it.
Better packing based on experience with large databases. Thanks to Martijn Pieters!
- Added more feedback to the packing process. It'll now report during batch commit how much of the total work has been completed, but at most every .1% of the total number of transactions or objects to process.
- Renamed the --dry-run option to --prepack and added a --use-prepack-state to zodbpack. With these 2 options the pre-pack and pack phases can be run separately, allowing re-use of the pre-pack analysis data or even delegating the pre-pack phase off to a separate server.
- Replaced the packing duty cycle with a nowait locking strategy. The pack operation will now request the commit lock but pauses if it is already taken. It releases the lock after every batch (defaulting to 1 second processing). This makes the packing process faster while at the same time yielding to regular ZODB commits when busy.
- Do not hold the commit lock during pack cleanup while deleting rows from the object reference tables; these tables are pack-specific and regular ZODB commits never touch these.
Added an option to control schema creation / updating on startup. Setting the
create-schema
option to false will let you use RelStorage without a schema update.Fixed compatibility with PostgreSQL 9.0, which is capable of returning a new 'hex' type to the client. Some builds of psycopg2 return garbage or raise an error when they see the new type. The fix was to encode more SQL query responses using base 64.
With the new shared-blob-dir option set to false, it was possible for a thread to read a partially downloaded blob. Fixed. Thanks for the report from Maurits van Rees.
Support for "shared-blob-dir false" now requires ZODB 3.9 or better. The code in the ZODB 3.8 version of ZODB.blob is not compatible with BlobCacheLayout, leading to blob filename collisions.
- Added a state_size column to object_state, making it possible to query the size of objects without loading the state. The new column is intended for gathering statistics. A schema migration is required.
- Added more logging during zodbconvert to show that something is happening and give an indication of how far along the process is.
- Fixed a missing import in the blob cache cleanup code.
- Added a --dry-run option to zodbpack.
- Replaced the graph traversal portion of the pack code with a more efficient implementation using Python sets (instead of SQL). The new code is much faster for packing databases with deeply nested objects.
- Added an option to store ZODB blobs in the database. The option is called "shared-blob-dir" and it behaves very much like the ZEO option of the same name. Blobs stored in the database are broken into chunks to reduce the impact on RAM.
- Require setuptools or distribute. Plain distutils is not sufficient.
- Fixed compatibility with ZODB 3.10. As reported by JĂźrgen Herrmann, there was a problem with conflict errors. The RelStorage patch of the sync() method now works with ZODB 3.10.
- Fixed a bug in packing history-free databases. If changes were made to the database during the pack, the pack code could delete too many objects. Thanks to Chris Withers for writing test code that revealed the bug. A schema migration is required for history-free databases; see notes/migration-to-1.4.txt.
- Enabled logging to stderr in zodbpack.
- Oracle: always connect in threaded mode. Without threaded mode, clients of Oracle 11g sometimes segfault.
- Made compatible with ZODB 3.10.0b7.
- Enabled ketama and compression in pylibmc_wrapper. Both options are better for clusters. [Helge Tesdal]
- Oracle: Use a more optimal query for POSKeyError logging. [Helge Tesdal]
- Fixed a NameError that occurred when getting the history of an object where transaction extended info was set. [Helge Tesdal]
- Worked around an Oracle RAC bug: apparently, in a RAC environment, the read-only transaction mode does not isolate transactions in the manner specified by the documentation, so Oracle users now have to use serializable isolation like everyone else. It's slower but more reliable.
- Use the client time instead of the database server time as a factor in the transaction ID. RelStorage was using the database server time to reduce the need for synchronized clocks, but in practice, that policy broke tests and did not really avoid the need to synchronize clocks. Also, the effect of unsynchronized clocks is predictable and manageable: you'll get bunches of transactions with sequential timestamps.
- If the database returns an object from the future (which should never happen), generate a ReadConflictError, hopefully giving the application a chance to recover. The most likely causes of this are a broken database and threading bugs.
- Always update the RelStorage cache when opening a database connection for loading, even when no ZODB Connection is using the storage. Otherwise, code that used the storage interface directly could cause the cache to fall out of sync; the effects would be seen in the next ZODB.Connection.
- Added a ZODB monkey patch that passes the "force" parameter to the sync method. This should help the poll-interval option do its job better.
- Fixed a subtle bug in the cache code that could lead to an AssertionError indicating a cache inconsistency. The inconsistency was caused by after_poll(), which was ignoring the randomness of the order of the list of recent changes, leading it to sometimes put the wrong transfer ID in the "delta_after" dicts. Also expanded the AssertionError with debugging info, since cache inconsistency can still be caused by database misconfiguration and mismatched client versions.
- Oracle: updated the migration notes. The relstorage_util package is not needed after all.
History-preserving storages now replace objects on restore instead of just inserting them. This should solve problems people were having with the zodbconvert utility.
Oracle: call the DBMS_LOCK.REQUEST function directly instead of using a small package named
relstorage_util
. Therelstorage_util
package was designed as a secure way to access the DBMS_LOCK package, but the package turned out to be confusing to DBAs and provided no real security advantage. People who have already deployed RelStorage 1.4.x on Oracle need to do the following:GRANT EXECUTE ON DBMS_LOCK TO <zodb_user>;
You can also drop the
relstorage_util
package. Keep therelstorage_op
package.Made compatible with ZODB 3.10.
MySQL: specify the transaction isolation mode for every connection, since the default is apparently not necessarily "read committed" anymore.
- Auto-reconnect in new_oid().
- Include all test subpackages in setup.py.
- Raise an error if MySQL reverts to MyISAM rather than using the InnoDB storage engine.
- Added the keep-history option. Set it to false to keep no history. (Packing is still required for garbage collection and blob deletion.)
- Added the replica-conf and replica-timeout options. Set replica-conf to a filename containing the location of database replicas. Changes to the file take effect at transaction boundaries.
- Expanded the option documentation in README.txt.
- Revised the way RelStorage uses memcached. Minimized the number of trips to both the cache server and the database.
- Added an in-process pickle cache that serves a function similar to the ZEO cache.
- Added a wrapper module for pylibmc.
- Store operations now use multi-insert and multi-delete SQL statements to reduce the effect of network latency.
- Renamed relstorage.py to storage.py to overcome import issues. Also moved the Options class to options.py.
- Updated the patch for ZODB 3.7 and 3.8 to fix an issue with blobs and subtransactions.
- Divided the implementation of database adapters into many small objects, making the adapter code more modular. Added interfaces that describe the duties of each part.
- Oracle: Sped up restore operations by sending short blobs inline.
- Oracle: Use a timeout on commit locks. This requires installation of a small PL/SQL package that can access DBMS_LOCK. See README.txt.
- Oracle: Used PL/SQL bulk insert operations to improve write performance.
- PostgreSQL: use the documented ALTER SEQUENCE RESTART WITH statement instead of ALTER SEQUENCE START WITH.
- Moved MD5 sum computation to the adapters so they can choose not to use MD5.
- Changed loadSerial to load from the store connection only if the load connection can not provide the object requested.
- Stopped wrapping database disconnect exceptions. Now the code catches and handles them directly.
- Use the store connection rather than the load connection for OID allocation.
- Detect and handle backward time travel, which can happen after failover to an out-of-date asynchronous slave database. For simplicity, invalidate the whole ZODB cache when this happens.
- Replaced the speed test script with a separately distributed package,
zodbshootout
. - Added the
zodbpack
script.
- Added support for a blob directory. No BlobStorage wrapper is needed. Cluster nodes will need to use a shared filesystem such as NFS or SMB/CIFS.
- Added the blob-dir parameter to the ZConfig schema and README.txt.
- In Oracle, trim transaction descriptions longer than 2000 bytes.
- When opening the database for the first time, don't issue a warning about the inevitable POSKeyError on the root OID.
- If RelStorage tries to unpickle a corrupt object state during packing, it will now report the oid and tid in the log.
- RelStorage now implements IMVCCStorage, making it compatible with ZODB 3.9.0b1 and above.
- Removed two-phase commit support from the PostgreSQL adapter. The feature turned out to be unnecessary.
- Added MySQL 5.1.34 and above to the list of supportable databases.
- Fixed minor test failures under Windows. Windows is now a supportable platform.
Information about older releases can be found :doc:`here <HISTORY>`.