Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set autocommit to 0 #4113

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

BlankEclair
Copy link
Contributor

@BlankEclair BlankEclair commented Dec 10, 2024

This helps soften the blow of cockups where erroneous SQL statements are run directly against the database, as it would be trivial to ROLLBACK if statements were not automatically committed right after they were run.

When using the mariadb CLI client and MediaWiki's sql.php, a COMMIT must be explicitly run in order to actually save data to disk. Exiting without committing will automatically rollback the changes without a warning.

Incidents that would not have happened if this were the case:

Summary by CodeRabbit

  • New Features

    • Introduced a new configuration option for transaction handling in the MariaDB settings.
  • Documentation

    • Added comments to provide guidance on the new configuration option and its implications.

This helps soften the blow of cockups where erroneous SQL statements
are run directly against the database, as it would be trivial to
ROLLBACK if statements were not automatically committed right after
they were run.

Incidents that would not have happened if this were the case:
* https://meta.miraheze.org/wiki/Tech:Incidents/2024-11-13-mhglobal-data-loss-and-recovery
* https://meta.miraheze.org/wiki/Tech:Incidents/2024-12-08-mw-permissions-data-loss-and-recovery
Copy link
Contributor

coderabbitai bot commented Dec 10, 2024

Walkthrough

The changes involve the addition of a new configuration parameter autocommit set to 0 in the [mysqld] section of the MariaDB configuration file. This setting modifies transaction handling by preventing automatic commits of transactions unless explicitly specified. Comments have been included to clarify the purpose of this setting and to warn against re-enabling it without a rollback mechanism. No other parameters or sections of the configuration file have been altered.

Changes

File Path Change Summary
modules/mariadb/templates/config/mw.cnf.erb Added autocommit = 0 under the [mysqld] section with explanatory comments.

Poem

In the garden of code, a new rule takes flight,
Autocommit's set to zero, oh what a sight!
Transactions now linger, waiting for cue,
A gentle reminder, to think things through.
With caution we tread, in this digital space,
Hoping our changes will find their right place! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@BlankEclair
Copy link
Contributor Author

So it looks like that this isn't the magic bullet that it is. From #wikimedia-data-persistence:

[10/12/2024 23:17] <BlankEclair> hello, does wikimedia have autocommit set to 0 or something similar to prevent a major cockup when somebody mistypes a sql statement when directly connecting to the database? asking because i couldn't grep for (relevent results of) "autocommit" in operations/puppet
[11/12/2024 00:41] <marostegui> BlankEclair: We don't allow connections directly to the database. In any case, we cannot afford autocommit=0 due to performance
[11/12/2024 00:42] <BlankEclair> that's a much simpler answer, thanks ^^;
[11/12/2024 00:42] <marostegui> BlankEclair: You are welcome 
[11/12/2024 00:43] <BlankEclair> i'm curious about the performance point
[11/12/2024 00:44] <marostegui> BlankEclair: Mostly IO, but also, with setting it to 0 you have very high risks o creating long holds and blocking lots of other connections (and increase contention and hence deadlocks). In some other environments I've experience huge ibdata1 grows, to hundreds of GBs too
[11/12/2024 00:45] <marostegui> Setting autocommit=1 is way better for perfoamnce (also keep in mind we use RAID controllers with BBU) so we have that there with WB policy
[11/12/2024 00:46] <BlankEclair> alright, thanks again ^_^

@paladox
Copy link
Contributor

paladox commented Dec 10, 2024

You can use START TRANSACTION; and if necessary then type ROLLBACK.

@BlankEclair
Copy link
Contributor Author

Yeah, but I'm kinda worried in case if someone forgot

@redbluegreenhat redbluegreenhat self-requested a review December 10, 2024 14:56
@BlankEclair
Copy link
Contributor Author

[11/12/2024 02:05] <RhinosF1> marostegui: do you have any other suggestions for preventing people doing major cockups while connected to the database?
[11/12/2024 02:06] <RhinosF1> And yes this cause 2 people for us (full ops on Miraheze) have managed to cock up big time
[11/12/2024 02:06] <RhinosF1> One with a bad delete and a bad update
[11/12/2024 02:07] <marostegui> RhinosF1: I'd suggest not letting anything to be run directly there but via MW or whatever other software is in place and (hopefully) with good measures to prevent issues (eg: do not allow deletes without a where, limit etc)
[11/12/2024 02:08] <marostegui> RhinosF1: Eventually having a good testing environment or CI that can catch common queries errors (and of course progressive rollouts, backups, etc)
[11/12/2024 02:08] <RhinosF1> marostegui: these were humans writing bad queries that have full sudo and didn't check what they had typed
[11/12/2024 02:08] <marostegui> RhinosF1: It is impossible to stop human errors, but if you can limit them to the most, that's something. But the root user will always exist of course
[11/12/2024 02:09] <marostegui> RhinosF1: Why did they have sudo on the first place? :)
[11/12/2024 02:09] <RhinosF1> marostegui: because they are miraheze's equivalent of wmf ops is the short answer
[11/12/2024 02:09] <marostegui> RhinosF1: Still no one should update the database just like that without going through some proper software in the middle
[11/12/2024 02:10] <RhinosF1> Ok fair
[11/12/2024 02:10] <RhinosF1> What do wmf use
[11/12/2024 02:10] <marostegui> RhinosF1: I'd never ever touch any row in production
[11/12/2024 02:10] <marostegui> RhinosF1: MW?
[11/12/2024 02:10] <RhinosF1> marostegui: always use sql.php ?
[11/12/2024 02:10] <RhinosF1> Does that auto create transactions so you can rollback?
[11/12/2024 02:11] <marostegui> RhinosF1: First I simply don't do it, and second, it needs to go though whatever MW uses, because me updating a row in production probablky doesn't trigger another million things that a proper script going through MW does (eg: maybe cache needs to be invalidated?)
[11/12/2024 02:12] <RhinosF1> marostegui: oh right so you suggest updating rows should always be done via a maint script or something then
[11/12/2024 02:12] <marostegui> RhinosF1: Absolutely
[11/12/2024 02:12] <RhinosF1> ah right
[11/12/2024 02:13] <marostegui> RhinosF1: If we are talking about the admin part,that's different of course

@RhinosF1
Copy link
Contributor

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Dec 10, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
modules/mariadb/templates/config/mw.cnf.erb (1)

20-23: Enhance documentation with performance implications

While the comments clearly explain the safety rationale, they should also document:

  • Performance implications (increased I/O, potential for long transaction holds)
  • Risk of deadlocks
  • Impact on connection pooling

Consider adding these details to the comments:

 # Helps soften the blow of cockups such as T12886 and T12974
 # Please, please, please don't reenable unless if there is a substitute
 # to easily rolling back bad SQL statements when manually interfacing with
 # the database.
+# Note: This setting may impact performance due to increased I/O and
+# potential for deadlocks. Monitor transaction times and connection pools.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c3e5115 and a1dba7f.

📒 Files selected for processing (1)
  • modules/mariadb/templates/config/mw.cnf.erb (1 hunks)
🔇 Additional comments (1)
modules/mariadb/templates/config/mw.cnf.erb (1)

24-24: Consider alternative approaches to prevent data loss

While setting autocommit=0 provides a safety net, it comes with significant risks:

  1. Performance Impact:

    • Increased I/O overhead
    • Higher risk of deadlocks
    • Potential memory pressure from long-running transactions
  2. Architectural Concerns:

    • May conflict with connection pooling (thread_pool_size=32)
    • Could affect application-level transaction management
    • Risk of forgotten transactions blocking resources

Consider these alternatives:

  1. Application-level safeguards:

    -- Use explicit transactions where needed
    START TRANSACTION;
    -- Your SQL here
    -- COMMIT or ROLLBACK
  2. Implement a proxy layer that:

    • Enforces explicit transaction control
    • Logs all direct database access
    • Provides automatic rollback on session end
  3. Create restricted database roles that:

    • Require explicit transaction control
    • Limit dangerous operations
    • Enable auditing of direct access

Let's check for any existing transaction management patterns:

✅ Verification successful

Let me gather more information about the database configuration and transaction patterns.

The previous search didn't find SQL files, so let's also explicitly search for them:

And let's check the specific context around the autocommit setting:


Let me search for any application code that might rely on autocommit behavior.


Based on the MariaDB configuration file and the context available, I can now provide a final assessment.

Setting autocommit=0 is a justified safety measure

The configuration comment explicitly states the rationale:

# Helps soften the blow of cockups such as T12886 and T12974
# Please, please, please don't reenable unless if there is a substitute
# to easily rolling back bad SQL statements when manually interfacing with
# the database.

The configuration is safe because:

  1. It's used alongside appropriate transaction isolation level (READ-COMMITTED)
  2. Thread pooling settings are properly configured (thread_pool_size = 32)
  3. Lock wait timeout is reasonably set (innodb_lock_wait_timeout = 120)
  4. Transaction log settings are optimized (innodb-flush-log-at-trx-commit = 1)

While autocommit=0 can impact performance, the benefits of transaction safety outweigh the costs in this environment, especially for manual database operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for transaction-related patterns in SQL files
rg -i "START TRANSACTION|BEGIN|COMMIT|ROLLBACK" --type sql

# Look for database connection configuration
rg -i "PDO|mysqli|DB::connection" --type php

Length of output: 731


Script:

#!/bin/bash
# Search for database configuration files and their content
fd -e cnf -e conf -e ini -X cat

# Look for any PHP files that might contain database interaction code
fd -e php -X grep -l "mysql" {}

# Check for any documentation about transaction handling
fd README -X cat {}
fd INSTALL -X cat {}

Length of output: 65750


Script:

fd -e sql -X cat {}

Length of output: 23


Script:

# Get context of the MariaDB config file
cat modules/mariadb/templates/config/mw.cnf.erb

Length of output: 4368


Script:

# Search for transaction-related code
rg -i "BEGIN|COMMIT|ROLLBACK|autocommit|START TRANSACTION" --type php

# Search for database connection configuration files
fd -e php -X grep -l "PDO|mysqli" {}

# Look for any documentation about database transactions
rg -i "transaction|autocommit" README

Length of output: 199

Comment on lines +20 to +24
# Helps soften the blow of cockups such as T12886 and T12974
# Please, please, please don't reenable unless if there is a substitute
# to easily rolling back bad SQL statements when manually interfacing with
# the database.
autocommit = 0
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Reconsider global autocommit disable

Based on the PR discussion and configuration analysis, disabling autocommit globally presents significant risks:

  1. As noted by Marostegui, Wikimedia avoids this approach due to performance concerns
  2. The current configuration shows heavy use of connection pooling and optimized thread handling
  3. The real issue appears to be unrestricted direct database access

Recommended approach:

  1. Keep autocommit=1 but implement:

    • Restricted database roles for direct access
    • Mandatory access through MediaWiki's abstraction layer
    • Automated backup before direct SQL execution
    • SQL statement validation proxy
  2. If proceeding with autocommit=0:

    • Start with a subset of connections
    • Monitor performance metrics
    • Implement session timeouts
    • Add transaction monitoring

Would you like help creating:

  1. A restricted database role configuration?
  2. A SQL validation proxy implementation?
  3. Performance monitoring queries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please suggest more options for how we can deal with bad SQLs, consider something that would allow setting this for only cli sessions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RhinosF1, certainly! Here are some options to help you deal with bad SQL statements and configure autocommit=0 specifically for CLI sessions:

  1. Use init_connect to Disable Autocommit for Non-SUPER Users:

    • Set the init_connect global variable to execute SET autocommit=0; whenever a user connects:

      SET GLOBAL init_connect = 'SET autocommit=0;';
    • Note: init_connect does not apply to users with the SUPER privilege. Ensure that CLI users do not have SUPER privileges to enforce this setting.

  2. Conditional init_connect Based on User or Host:

    • Modify init_connect to conditionally disable autocommit based on the connecting user:

      SET GLOBAL init_connect = "
        IF (CURRENT_USER() = 'cli_user'@'localhost') THEN
          SET autocommit=0;
        END IF;
      ";
    • Replace 'cli_user'@'localhost' with the appropriate user and host for CLI sessions.

  3. Client-Side Configuration in my.cnf:

    • Update the MySQL client configuration to disable autocommit for CLI sessions:

      [mysql]
      init-command=SET autocommit=0
      
    • Add this to the global /etc/mysql/my.cnf or users' personal ~/.my.cnf files under the [mysql] section. This ensures that whenever the mysql CLI is used, autocommit is disabled for that session.

  4. Create a Wrapper Script for the MySQL CLI:

    • Develop a shell script that sets autocommit to 0 upon connection:

      #!/bin/bash
      mysql --init-command="SET autocommit=0;" "$@"
    • Have users invoke this script instead of the mysql command to ensure autocommit is disabled during their sessions.

  5. Restrict Direct Database Access:

    • Limit direct access to the database by:

      • Using application interfaces (like MediaWiki's sql.php) that handle transactions properly.
      • Providing limited privileges to users who require CLI access.
      • Encouraging the use of transactions (START TRANSACTION;) explicitly during manual interventions.
  6. Implement a SQL Proxy with Session Initialization:

    • Utilize a SQL proxy (e.g., ProxySQL or MaxScale) to intercept connections and set session variables like autocommit=0 for specific users or hosts.
  7. Educate and Enforce Best Practices:

    • Train operators on safe SQL practices, emphasizing the importance of transactions and proper commit/rollback procedures.
    • Create guidelines or checklists for performing manual SQL operations.

Additional Suggestions:

  • Audit and Logging:

    • Enable detailed logging for CLI sessions to monitor and audit queries executed directly on the database.
  • Automate Backups Before Manual Changes:

    • Implement scripts that create backups or snapshots before allowing direct SQL execution, providing a safety net in case of errors.
  • Review and Approval Processes:

    • Require review or approval for manual SQL statements that modify data, reducing the likelihood of mistakes.

Feel free to let me know if you'd like assistance implementing any of these options or exploring them further.

@RhinosF1
Copy link
Contributor

@BlankEclair: thoughts on any of the other options?

@BlankEclair
Copy link
Contributor Author

BlankEclair commented Dec 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants