-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Unicode characters in slugs #640
Conversation
Should we still convert all characters to (localized) lower-case here? Done Or should we just do the bare minimum and let any SSG do the rest? But then why even have slug customization -- the CMS wouldn't even know what the slug was going to be, then. |
OK, everybody, this is ready for review. Please throw some test cases at it, and let me know if you find a unique one. |
80690e4
to
dcc7a73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @tech4him1, thank you! Just a couple of small change requests, after which this should be good to go.
src/backends/backend.js
Outdated
@@ -42,7 +43,7 @@ const slugFormatter = (template = "{{slug}}", entryData) => { | |||
return identifier; | |||
}; | |||
|
|||
return template.replace(/\{\{([^\}]+)\}\}/g, (_, field) => { | |||
let slug = template.replace(/\{\{([^\}]+)\}\}/g, (_, field) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we allow some mutable ops for dense logic or areas where performance is a concern, we generally want to avoid them. I'd recommend something more like:
const slug = template.replace(...)
.method()
.method();
I'd also move sanitizeFilename
and the trailing character replacement to a new sanitizeSlug
export from urlHelper
, so you can just return sanitizeSlug(slug)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but I'm not sure if I have it the best way. Could you please review sanitizeSlug
?
src/lib/urlHelper.js
Outdated
@@ -12,6 +12,22 @@ export function getNewEntryUrl(collectionName, direct) { | |||
return getUrl(`/collections/${ collectionName }/entries/new`, direct); | |||
} | |||
|
|||
// Unreserved chars from RFC3987. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be a little bit more verbose in this comment, maybe even add a reference link? You did a lot of research, just want to capture some of it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds good, because I sure don't want to have to do it over again :).
@erquhart Can you please review the last commit on this and see if you have any improvements on the |
src/lib/urlHelper.js
Outdated
export function sanitizeSlug(str, { replacement = '-' }) { | ||
if (!isString(str)) throw "`sanitizeSlug` only accepts strings as input."; | ||
if (!isString(replacement)) throw "the `sanitizeSlug` replacement character must be a string."; | ||
let slug = str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to avoid mutations - we can use a couple of Lodash functions (flow
and partialRight
) to go immutable without the cruft:
const sanitize = flow([
partialRight(sanitizeIRI, { replacement }),
partialRight(sanitizeFilename, { replacement }),
]);
const sanitizedSlug = sanitize(slug);
// After declaring replacement expressions:
const normalizedSlug = slug
.replace(...)
.replace(...);
Be sure to keep your source comments, those are helpful.
} | ||
}); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you bring back your source comment explaining this regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@erquhart OK, done. Are the comments clear enough? Also, do you have any suggestions on making the |
src/lib/urlHelper.js
Outdated
return result; | ||
} | ||
|
||
export function sanitizeSlug(str, { replacement = '-' } = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use an options object here? Do we expect more complex arguments in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dopry No, I was just keeping the interface the same as sanitizeFilename
and sanitizeIRI
. Also, I think sanitizeSlug(slug, {replacement: '-'})
is clearer than sanitizeSlug(slug, '-')
.
Is there any particular reason that you don't think it should be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use Intellisense so I get hinted variable names and types usually. If you were to write a jsdoc block for this you'd have to create a typedef for each options object. When there is only one option there isn't much function purpose to use it.
I personally think it's much easier to understand...
function sanitizeSlug(slug, replacement = '-') {} but this is a matter of preference and I'd approve with or without... just curious at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dopry I'm talking about calling the function -- where sanitizeSlug(slug, {replacement: '-'})
is clearer than sanitizeSlug(slug, '-')
.
src/lib/urlHelper.js
Outdated
const doubleReplacement = new RegExp('(?:' + escapeRegExp(replacement) + ')+', 'g'); | ||
const trailingReplacment = new RegExp(escapeRegExp(replacement) + '$'); | ||
const normalizedSlug = sanitizedSlug | ||
.replace(doubleReplacement, '-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use replacement instead of '-'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
return normalizedSlug; | ||
} | ||
|
||
export function urlize(string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urlize is no longer used, it should be removed so we're not dragging unused code around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, but there are a couple reasons I don't want to remove it yet, plus it gets trimmed from the bundle anyway. Thanks for noting this, though, I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What reasons? It's in the git history, we can always dig it up if we need it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@biilmann added it. Also, we weren't using it before, either, so I think it should be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
- Summary
Right now, we only support Latin chars in slugs/filenames (we strip accents from chars if we can, and drop everything else). This isn't very user-friendly, and latest browsers (including IE11) support Unicode, so we might as well allow users to use it.
We still have to sanitize the slugs to make sure that the filenames are safe , but other than that, we don't have to do anything else.This PR upgrades the slug formatter to use the new IRI standards (RFC 3987), which support other than latin-1 chars in slugs, instead of the URI one (RFC 3986).
One other thing I did here was to sanitize all types of slugs, not just the input titles. This should make it much easier to do slug customization later.
As a side effect, this closes #607.
- Test plan
Lots of testing needs done on this, and we should probably announce it as well (for users that were previously having accents stripped).
Tests were also added.
Example Test Cases:
escrzy
ěščřžý
日本語のタイトル
duckgooseelephant
duck-goose-elephant
netlify-cms-planning-working-in-sprints
netlify-cms-planning-working-in-sprints
this-that-and-the-other
this-that-and-the-other
- Description for the changelog
Support Unicode characters in slugs.
- A picture of a cute animal (not mandatory but encouraged)
![wynand-van-poortvliet-364366](https://user-images.githubusercontent.com/20345941/31055305-04c7e068-a67e-11e7-8726-19ef72dbcd06.jpg)