Migrate to WordPress.com corrupts emojis #100535
Labels
[Feature Group] Content Management
Features related to the tools and screens that admins use to manage their sites core content.
[Feature] Site Migration
Features related to site migrations to WPcom
Needs triage
Ticket needs to be triaged
[Pri] High
Address as soon as possible after BLOCKER issues
[Product] WordPress.com
All features accessible on and related to WordPress.com.
[Status] Auto-allocated
[Status] Escalated to Product Ambassadors
[Status] Priority Review Triggered
Quality squad has been notified of this issue in #dotcom-triage-alerts
[Type] Bug
When a feature is broken and / or not performing as intended
Context and steps to reproduce
When migrating to WordPress.com from various hosts, we often experience corrupt emojis. We are not certain on a root cause, however, typically a self-hosted site is defaulting to utf8mb4/utf8mb4_unicode_520_ci since that is what core has used since 2016.
WordPress.com UTF-8 sites use utf8mb4/utf8mb4_general_ci. While those values are being forced for the DB_CHARSET and DB_COLLATE constants in the site's meta, we've also noticed that various collation and charset variables are still set to latin1/latin1_swedish_ci as shown here:
When a self-hosted site with emojis in post titles, comments, etc. is migrated with Migrate to WordPress.com, they all become corrupt and stored as replacement characters/mojibake. Here is a recent example of a wp_comments table with this issue where all of the question marks should be emojis:
Steps to reproduce:
Site owner impact
Fewer than 20% of the total website/platform users
Severity
Moderate
What other impact(s) does this issue have?
Platform revenue
If a workaround is available, please outline it here.
Workarounds are to use an alternative migration plugin such as AIOWPM or manually import the database. When this happens, HEs/users essentially need to migrate the site twice.
Platform
Atomic
The text was updated successfully, but these errors were encountered: