Set autocommit to 0 #4113

BlankEclair · 2024-12-10T12:26:42Z

This helps soften the blow of cockups where erroneous SQL statements are run directly against the database, as it would be trivial to ROLLBACK if statements were not automatically committed right after they were run.

When using the mariadb CLI client and MediaWiki's sql.php, a COMMIT must be explicitly run in order to actually save data to disk. Exiting without committing will automatically rollback the changes without a warning.

Incidents that would not have happened if this were the case:

Summary by CodeRabbit

New Features
- Introduced a new configuration option for transaction handling in the MariaDB settings.
Documentation
- Added comments to provide guidance on the new configuration option and its implications.

This helps soften the blow of cockups where erroneous SQL statements are run directly against the database, as it would be trivial to ROLLBACK if statements were not automatically committed right after they were run. Incidents that would not have happened if this were the case: * https://meta.miraheze.org/wiki/Tech:Incidents/2024-11-13-mhglobal-data-loss-and-recovery * https://meta.miraheze.org/wiki/Tech:Incidents/2024-12-08-mw-permissions-data-loss-and-recovery

coderabbitai · 2024-12-10T12:26:49Z

Walkthrough

The changes involve the addition of a new configuration parameter autocommit set to 0 in the [mysqld] section of the MariaDB configuration file. This setting modifies transaction handling by preventing automatic commits of transactions unless explicitly specified. Comments have been included to clarify the purpose of this setting and to warn against re-enabling it without a rollback mechanism. No other parameters or sections of the configuration file have been altered.

Changes

File Path	Change Summary
modules/mariadb/templates/config/mw.cnf.erb	Added `autocommit = 0` under the `[mysqld]` section with explanatory comments.

Poem

In the garden of code, a new rule takes flight,
Autocommit's set to zero, oh what a sight!
Transactions now linger, waiting for cue,
A gentle reminder, to think things through.
With caution we tread, in this digital space,
Hoping our changes will find their right place! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

BlankEclair · 2024-12-10T13:47:00Z

So it looks like that this isn't the magic bullet that it is. From #wikimedia-data-persistence:

[10/12/2024 23:17] <BlankEclair> hello, does wikimedia have autocommit set to 0 or something similar to prevent a major cockup when somebody mistypes a sql statement when directly connecting to the database? asking because i couldn't grep for (relevent results of) "autocommit" in operations/puppet
[11/12/2024 00:41] <marostegui> BlankEclair: We don't allow connections directly to the database. In any case, we cannot afford autocommit=0 due to performance
[11/12/2024 00:42] <BlankEclair> that's a much simpler answer, thanks ^^;
[11/12/2024 00:42] <marostegui> BlankEclair: You are welcome 
[11/12/2024 00:43] <BlankEclair> i'm curious about the performance point
[11/12/2024 00:44] <marostegui> BlankEclair: Mostly IO, but also, with setting it to 0 you have very high risks o creating long holds and blocking lots of other connections (and increase contention and hence deadlocks). In some other environments I've experience huge ibdata1 grows, to hundreds of GBs too
[11/12/2024 00:45] <marostegui> Setting autocommit=1 is way better for perfoamnce (also keep in mind we use RAID controllers with BBU) so we have that there with WB policy
[11/12/2024 00:46] <BlankEclair> alright, thanks again ^_^

paladox · 2024-12-10T14:06:07Z

You can use START TRANSACTION; and if necessary then type ROLLBACK.

BlankEclair · 2024-12-10T14:06:40Z

Yeah, but I'm kinda worried in case if someone forgot

BlankEclair · 2024-12-10T15:14:01Z

[11/12/2024 02:05] <RhinosF1> marostegui: do you have any other suggestions for preventing people doing major cockups while connected to the database?
[11/12/2024 02:06] <RhinosF1> And yes this cause 2 people for us (full ops on Miraheze) have managed to cock up big time
[11/12/2024 02:06] <RhinosF1> One with a bad delete and a bad update
[11/12/2024 02:07] <marostegui> RhinosF1: I'd suggest not letting anything to be run directly there but via MW or whatever other software is in place and (hopefully) with good measures to prevent issues (eg: do not allow deletes without a where, limit etc)
[11/12/2024 02:08] <marostegui> RhinosF1: Eventually having a good testing environment or CI that can catch common queries errors (and of course progressive rollouts, backups, etc)
[11/12/2024 02:08] <RhinosF1> marostegui: these were humans writing bad queries that have full sudo and didn't check what they had typed
[11/12/2024 02:08] <marostegui> RhinosF1: It is impossible to stop human errors, but if you can limit them to the most, that's something. But the root user will always exist of course
[11/12/2024 02:09] <marostegui> RhinosF1: Why did they have sudo on the first place? :)
[11/12/2024 02:09] <RhinosF1> marostegui: because they are miraheze's equivalent of wmf ops is the short answer
[11/12/2024 02:09] <marostegui> RhinosF1: Still no one should update the database just like that without going through some proper software in the middle
[11/12/2024 02:10] <RhinosF1> Ok fair
[11/12/2024 02:10] <RhinosF1> What do wmf use
[11/12/2024 02:10] <marostegui> RhinosF1: I'd never ever touch any row in production
[11/12/2024 02:10] <marostegui> RhinosF1: MW?
[11/12/2024 02:10] <RhinosF1> marostegui: always use sql.php ?
[11/12/2024 02:10] <RhinosF1> Does that auto create transactions so you can rollback?
[11/12/2024 02:11] <marostegui> RhinosF1: First I simply don't do it, and second, it needs to go though whatever MW uses, because me updating a row in production probablky doesn't trigger another million things that a proper script going through MW does (eg: maybe cache needs to be invalidated?)
[11/12/2024 02:12] <RhinosF1> marostegui: oh right so you suggest updating rows should always be done via a maint script or something then
[11/12/2024 02:12] <marostegui> RhinosF1: Absolutely
[11/12/2024 02:12] <RhinosF1> ah right
[11/12/2024 02:13] <marostegui> RhinosF1: If we are talking about the admin part,that's different of course

RhinosF1 · 2024-12-10T17:28:18Z

@coderabbitai review

coderabbitai · 2024-12-10T17:28:24Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

modules/mariadb/templates/config/mw.cnf.erb (1)
20-23: Enhance documentation with performance implications

While the comments clearly explain the safety rationale, they should also document:

Performance implications (increased I/O, potential for long transaction holds)

Risk of deadlocks

Impact on connection pooling

Consider adding these details to the comments:
 # Helps soften the blow of cockups such as T12886 and T12974
 # Please, please, please don't reenable unless if there is a substitute
 # to easily rolling back bad SQL statements when manually interfacing with
 # the database.
+# Note: This setting may impact performance due to increased I/O and
+# potential for deadlocks. Monitor transaction times and connection pools.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c3e5115 and a1dba7f.

📒 Files selected for processing (1)

modules/mariadb/templates/config/mw.cnf.erb (1 hunks)

🔇 Additional comments (1)

modules/mariadb/templates/config/mw.cnf.erb (1)

24-24: Consider alternative approaches to prevent data loss

While setting autocommit=0 provides a safety net, it comes with significant risks:

Performance Impact:
- Increased I/O overhead
- Higher risk of deadlocks
- Potential memory pressure from long-running transactions
Architectural Concerns:
- May conflict with connection pooling (thread_pool_size=32)
- Could affect application-level transaction management
- Risk of forgotten transactions blocking resources

Consider these alternatives:

Application-level safeguards:

-- Use explicit transactions where needed
START TRANSACTION;
-- Your SQL here
-- COMMIT or ROLLBACK

Implement a proxy layer that:
- Enforces explicit transaction control
- Logs all direct database access
- Provides automatic rollback on session end
Create restricted database roles that:
- Require explicit transaction control
- Limit dangerous operations
- Enable auditing of direct access

Let's check for any existing transaction management patterns:

✅ Verification successful

Let me gather more information about the database configuration and transaction patterns.

The previous search didn't find SQL files, so let's also explicitly search for them:

And let's check the specific context around the autocommit setting:

Let me search for any application code that might rely on autocommit behavior.

Based on the MariaDB configuration file and the context available, I can now provide a final assessment.

Setting autocommit=0 is a justified safety measure

The configuration comment explicitly states the rationale:

# Helps soften the blow of cockups such as T12886 and T12974
# Please, please, please don't reenable unless if there is a substitute
# to easily rolling back bad SQL statements when manually interfacing with
# the database.

The configuration is safe because:

It's used alongside appropriate transaction isolation level (READ-COMMITTED)
Thread pooling settings are properly configured (thread_pool_size = 32)
Lock wait timeout is reasonably set (innodb_lock_wait_timeout = 120)
Transaction log settings are optimized (innodb-flush-log-at-trx-commit = 1)

While autocommit=0 can impact performance, the benefits of transaction safety outweigh the costs in this environment, especially for manual database operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for transaction-related patterns in SQL files
rg -i "START TRANSACTION|BEGIN|COMMIT|ROLLBACK" --type sql

# Look for database connection configuration
rg -i "PDO|mysqli|DB::connection" --type php

Length of output: 731

Script:

#!/bin/bash
# Search for database configuration files and their content
fd -e cnf -e conf -e ini -X cat

# Look for any PHP files that might contain database interaction code
fd -e php -X grep -l "mysql" {}

# Check for any documentation about transaction handling
fd README -X cat {}
fd INSTALL -X cat {}

Length of output: 65750

Script:

fd -e sql -X cat {}

Length of output: 23

Script:

# Get context of the MariaDB config file
cat modules/mariadb/templates/config/mw.cnf.erb

Length of output: 4368

Script:

# Search for transaction-related code
rg -i "BEGIN|COMMIT|ROLLBACK|autocommit|START TRANSACTION" --type php

# Search for database connection configuration files
fd -e php -X grep -l "PDO|mysqli" {}

# Look for any documentation about database transactions
rg -i "transaction|autocommit" README

Length of output: 199

coderabbitai · 2024-12-10T17:29:56Z

modules/mariadb/templates/config/mw.cnf.erb

+# Helps soften the blow of cockups such as T12886 and T12974
+# Please, please, please don't reenable unless if there is a substitute
+# to easily rolling back bad SQL statements when manually interfacing with
+# the database.
+autocommit                     = 0


⚠️ Potential issue

Critical: Reconsider global autocommit disable

Based on the PR discussion and configuration analysis, disabling autocommit globally presents significant risks:

As noted by Marostegui, Wikimedia avoids this approach due to performance concerns

The current configuration shows heavy use of connection pooling and optimized thread handling

The real issue appears to be unrestricted direct database access

Recommended approach:

Keep autocommit=1 but implement:

Restricted database roles for direct access

Mandatory access through MediaWiki's abstraction layer

Automated backup before direct SQL execution

SQL statement validation proxy

If proceeding with autocommit=0:

Start with a subset of connections

Monitor performance metrics

Implement session timeouts

Add transaction monitoring

Would you like help creating:

A restricted database role configuration?

A SQL validation proxy implementation?

Performance monitoring queries?

Please suggest more options for how we can deal with bad SQLs, consider something that would allow setting this for only cli sessions.

@RhinosF1, certainly! Here are some options to help you deal with bad SQL statements and configure autocommit=0 specifically for CLI sessions:

Use init_connect to Disable Autocommit for Non-SUPER Users:

Set the init_connect global variable to execute SET autocommit=0; whenever a user connects:

SET GLOBAL init_connect = 'SET autocommit=0;';

Note: init_connect does not apply to users with the SUPER privilege. Ensure that CLI users do not have SUPER privileges to enforce this setting.

Conditional init_connect Based on User or Host:

Modify init_connect to conditionally disable autocommit based on the connecting user:

SET GLOBAL init_connect = " IF (CURRENT_USER() = 'cli_user'@'localhost') THEN SET autocommit=0; END IF; ";

Replace 'cli_user'@'localhost' with the appropriate user and host for CLI sessions.

Client-Side Configuration in my.cnf:

Update the MySQL client configuration to disable autocommit for CLI sessions:

[mysql] init-command=SET autocommit=0

Add this to the global /etc/mysql/my.cnf or users' personal ~/.my.cnf files under the [mysql] section. This ensures that whenever the mysql CLI is used, autocommit is disabled for that session.

Create a Wrapper Script for the MySQL CLI:

Develop a shell script that sets autocommit to 0 upon connection:

#!/bin/bash mysql --init-command="SET autocommit=0;" "$@"

Have users invoke this script instead of the mysql command to ensure autocommit is disabled during their sessions.

Restrict Direct Database Access:

Limit direct access to the database by:

Using application interfaces (like MediaWiki's sql.php) that handle transactions properly.

Providing limited privileges to users who require CLI access.

Encouraging the use of transactions (START TRANSACTION;) explicitly during manual interventions.

Implement a SQL Proxy with Session Initialization:

Utilize a SQL proxy (e.g., ProxySQL or MaxScale) to intercept connections and set session variables like autocommit=0 for specific users or hosts.

Educate and Enforce Best Practices:

Train operators on safe SQL practices, emphasizing the importance of transactions and proper commit/rollback procedures.

Create guidelines or checklists for performing manual SQL operations.

Additional Suggestions:

Audit and Logging:

Enable detailed logging for CLI sessions to monitor and audit queries executed directly on the database.

Automate Backups Before Manual Changes:

Implement scripts that create backups or snapshots before allowing direct SQL execution, providing a safety net in case of errors.

Review and Approval Processes:

Require review or approval for manual SQL statements that modify data, reducing the likelihood of mistakes.

Feel free to let me know if you'd like assistance implementing any of these options or exploring them further.

RhinosF1 · 2024-12-10T17:35:38Z

@BlankEclair: thoughts on any of the other options?

BlankEclair · 2024-12-10T23:24:51Z

DB things are not my specialty. Perhaps we can try to automatically run `SET autocommit = 0` when using sql.php or the mariadb CLI client, but please run everything through someone who knows what they're doing.

redbluegreenhat self-requested a review December 10, 2024 14:56

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set autocommit to 0 #4113

Set autocommit to 0 #4113

BlankEclair commented Dec 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

BlankEclair commented Dec 10, 2024

paladox commented Dec 10, 2024

BlankEclair commented Dec 10, 2024

BlankEclair commented Dec 10, 2024

RhinosF1 commented Dec 10, 2024

coderabbitai bot commented Dec 10, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024 •

edited

Loading

RhinosF1 Dec 10, 2024

coderabbitai bot Dec 10, 2024

RhinosF1 commented Dec 10, 2024

BlankEclair commented Dec 10, 2024 via email

Set autocommit to 0 #4113

Are you sure you want to change the base?

Set autocommit to 0 #4113

Conversation

BlankEclair commented Dec 10, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 10, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

BlankEclair commented Dec 10, 2024

paladox commented Dec 10, 2024

BlankEclair commented Dec 10, 2024

BlankEclair commented Dec 10, 2024

RhinosF1 commented Dec 10, 2024

coderabbitai bot commented Dec 10, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

RhinosF1 Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

RhinosF1 commented Dec 10, 2024

BlankEclair commented Dec 10, 2024 via email

BlankEclair commented Dec 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Dec 10, 2024 •

edited

Loading