-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Japanese test improvements #962
Conversation
Hi @wallace11 thanks! I'm afraid that this won't pass. The current algorithm first generates a sequence of words that are determined by some "separating characters" like whitespace/punctuation etc. But now the dates are surrounded by text: Edit: the CI complains about formatting, this can be fixed by running |
@eikek Regarding spaces, that's the thing - in "normal" Japanese there's no such thing. That's exactly why I wanted to create a proper Japanese tests to see if it catches that. I looked at some of my documents and indeed on some of them you've got the date as part of the first sentence or the title (which is also a sentence). Do you think it'd be possible to fix that? |
@wallace11 no worries! (you only would need to install sbt for this) Thanks for your explanation! I just read around wikipedia that there are no spaces in Japanese :) Well, I guess this means doing it completely differently here. If you have some documents you could share, that would help! That way I could run this against some "real" data. I might be able to remove all characters that are not arabic numbers or the letters for year/month/day… maybe this gives some results. |
Not very efficient, but should work to find the position of dates in japanese text.
@wallace11 I just pushed a quite crude fix :-). It preprocesses the text and removes all characters that don't take part in a date. Your tests should pass now. You could try this against your documents. I can merge this and some minutes later a nightly version is published. |
@eikek |
@wallace11 Great 😃 ! Sounds like a weekend 😉 Thank you four your help! |
Hi there,
Here's some more sensible Japanese tests.
I hope that they pass 😆