Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to WordPress.com corrupts emojis #100535

Open
mgozdis opened this issue Feb 27, 2025 · 0 comments
Open

Migrate to WordPress.com corrupts emojis #100535

mgozdis opened this issue Feb 27, 2025 · 0 comments
Labels
[Feature Group] Content Management Features related to the tools and screens that admins use to manage their sites core content. [Feature] Site Migration Features related to site migrations to WPcom Needs triage Ticket needs to be triaged [Pri] High Address as soon as possible after BLOCKER issues [Product] WordPress.com All features accessible on and related to WordPress.com. [Status] Auto-allocated [Status] Escalated to Product Ambassadors [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts [Type] Bug When a feature is broken and / or not performing as intended

Comments

@mgozdis
Copy link

mgozdis commented Feb 27, 2025

Context and steps to reproduce

When migrating to WordPress.com from various hosts, we often experience corrupt emojis. We are not certain on a root cause, however, typically a self-hosted site is defaulting to utf8mb4/utf8mb4_unicode_520_ci since that is what core has used since 2016.

WordPress.com UTF-8 sites use utf8mb4/utf8mb4_general_ci. While those values are being forced for the DB_CHARSET and DB_COLLATE constants in the site's meta, we've also noticed that various collation and charset variables are still set to latin1/latin1_swedish_ci as shown here:

Image

When a self-hosted site with emojis in post titles, comments, etc. is migrated with Migrate to WordPress.com, they all become corrupt and stored as replacement characters/mojibake. Here is a recent example of a wp_comments table with this issue where all of the question marks should be emojis:

Image

Steps to reproduce:

  1. Create a self-hosted site that uses utf8mb4/utf8mb4_unicode_520_ci by default like WordPress core (Noting that sites on JN seem to migrate fine as it uses utf8mb4/utf8mb4_general_ci same as Dotcom)
  2. Create a post with emojis in the title
  3. Migrate the self-hosted site to WordPress.com with Migrate to WordPress.com
  4. See data corruption with emojis

Site owner impact

Fewer than 20% of the total website/platform users

Severity

Moderate

What other impact(s) does this issue have?

Platform revenue

If a workaround is available, please outline it here.

Workarounds are to use an alternative migration plugin such as AIOWPM or manually import the database. When this happens, HEs/users essentially need to migrate the site twice.

Platform

Atomic

@mgozdis mgozdis added [Feature Group] Content Management Features related to the tools and screens that admins use to manage their sites core content. [Feature] Site Migration Features related to site migrations to WPcom [Product] WordPress.com All features accessible on and related to WordPress.com. [Type] Bug When a feature is broken and / or not performing as intended Needs triage Ticket needs to be triaged labels Feb 27, 2025
@matticbot matticbot added [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts [Status] Auto-allocated labels Feb 27, 2025
@mgozdis mgozdis changed the title Migrate to Wordpress.com corrupts emojis Migrate to WordPress.com corrupts emojis Feb 27, 2025
@github-actions github-actions bot added the [Pri] High Address as soon as possible after BLOCKER issues label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature Group] Content Management Features related to the tools and screens that admins use to manage their sites core content. [Feature] Site Migration Features related to site migrations to WPcom Needs triage Ticket needs to be triaged [Pri] High Address as soon as possible after BLOCKER issues [Product] WordPress.com All features accessible on and related to WordPress.com. [Status] Auto-allocated [Status] Escalated to Product Ambassadors [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts [Type] Bug When a feature is broken and / or not performing as intended
Projects
Development

No branches or pull requests

2 participants