emoji-test-regex-pattern offers Java- and JavaScript-compatible regular expression patterns to match all emoji symbols and sequences listed in the emoji-test.txt
file provided as part of Unicode® Technical Standard #51.
These patterns can then be embedded into source code as part of projects such as emoji-regex.
This repository contains a script that generates this regular expression pattern based on Unicode data. Because of this, the pattern can easily be updated whenever new emoji are added to the Unicode standard.
emoji-test-regex-pattern also includes other useful assets! For every supported version of the Emoji standard, emoji-test-regex-pattern generates the following files:
index.txt
— a list of all the emoji matched by the other files in that folder alongside their code points in U+XXXX notation.index-strings.txt
— a list of all the emoji matched by the other files in that folder.cpp-re2.txt
— a regular expression pattern matching all the emoji inindex.txt
compatible with the C++ RE2 library.css.txt
— a CSSunicode-range
value matching each code point in any of the emoji sequences inindex.txt
.java.txt
— a Java-compatible regular expression pattern matching all the emoji inindex.txt
.javascript.txt
— a JavaScript-compatible regular expression pattern matching all the emoji inindex.txt
, for use in regular expressions without theu
flag.javascript-u.txt
— a JavaScript-compatible regular expression pattern matching all the emoji inindex.txt
, for use in regular expressions with theu
flag.javascript-v.txt
— a JavaScript-compatible regular expression pattern matching all the emoji inindex.txt
, for use in regular expressions with thev
flag.
Note that although Unicode Emoji UTS#51 follows the versioning system used by the Unicode Standard, the version numbers can be different. For example, when Unicode 13.0 was released, so was Emoji 13.0. But later, Emoji 13.1 was published while the Unicode version number remained at 13.0. Therefore, we use the Emoji version as specified in UTS#51 (and not the version of the Unicode Standard itself) to version the different patterns:
dist/emoji-13.0/index.txt
dist/emoji-13.0/index-strings.txt
dist/emoji-13.0/cpp-re2.txt
dist/emoji-13.0/css.txt
dist/emoji-13.0/java.txt
dist/emoji-13.0/javascript.txt
dist/emoji-13.0/javascript-u.txt
dist/emoji-13.0/javascript-v.txt
dist/emoji-13.1/*.txt
dist/emoji-14.0/*.txt
dist/emoji-15.0/*.txt
dist/emoji-15.1/*.txt
dist/emoji-16.0/*.txt
dist/latest/*.txt
See the dist/
folder.
-
Update the Unicode data dependency in
package.json
by running the following commands:# Example: Emoji 17.0 (UTS#51) is released, and its data is included in the @unicode/unicode-17.0.0 package. npm install unicode-emoji-17.0@npm:@unicode/unicode-17.0.0@latest --save-dev
-
Generate the new output:
npm run build
-
Verify that
dist
contains the new file. -
Send a pull request with the changes, and get it reviewed & merged.
-
On the
main
branch, bump the version number inpackage.json
:npm version patch -m 'Release v%s'
Instead of
patch
, useminor
ormajor
as needed.Note that this produces a Git commit + tag.
-
Push the release commit and tag:
git push && git push --tags
Our CI then automatically publishes the new release to npm.
Mathias Bynens |
This project is a fork of emoji-regex, with a different goal. emoji-test-regex-pattern is available under the same MIT license as the original project.