-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem getting the baseform right for inflected adjectives and verbs #5
Comments
Is it possible that you do not use the latest version of SimpleNLG-DE? There was in issue regarding the indexation of variants in previous versions that has been fixed (#2) in the latest version which could lead to this behaviour. |
I use SimpleNLG-DE v1.1.1 from Marven and the tests ported to c# against the ikvm SimpleNLG-DE version passed without problem but the performance of loading 42MB of MucLex.xml is a challenge using ikvm approach. I use the codes from the existing master branch to port over to csharp. I am still learning SimpleNLG-DE, and I have not yet completely tracked what was changed to create SimpleNLG-DE from the parent codes which is tailored for English |
But if the tests pass, then the base form is the same, I am not sure I understand what the problem is? |
The first approach: ikvm SimpleNLG-DE version involves SimpleNLG-DE.jar 1.1.1 from Marven without porting java to c#. The performance is the issue. The second approach looks into the existing java codes in the master branch and port that to c#. This approach promises far better performance of loading MucLex.xml than the first approach. However, I need to track how SimpleNLG-DE java codes create the inflected WordElement (with the right baseform) used in the tests cited above. I fail to port that as the second approach csharp ported version fails to provide the right baseform for the inflected words used in the tests above. |
If you follow the issue I linked above you will find the exact commit where this was introduced / fixed in SimpleNLG-DE (commit d77058a) |
@DaBr01 thanks, this is a good start to learn SimpleNLG-DE |
@DaBr01 Question 1: which part of the codes deals with capitalization of the noun? [Fact]
public void CreateAMoreComplexSentence1()
{
SPhraseSpec sentence = nlgFactory.createClause();
NPPhraseSpec subject = nlgFactory.createNounPhrase("der hund");
VPPhraseSpec verb = nlgFactory.createVerbPhrase("jagen");
NPPhraseSpec object1 = nlgFactory.createNounPhrase("george");
sentence.setSubject(subject);
sentence.setVerb(verb);
sentence.setObject(object1);
string output = realiser.realiseSentence(sentence);
Assert.Equal("Der Hund jagt George.", output);
} I managed to have "jagt" from "jagen". However, I could not get capitalization of george => George and hund => Hund Appreciate your help. |
Uff I really don't know that by heart :) A search in the repo might be helpful: https://github.com/search?q=repo%3Asebischair%2FSimpleNLG-DE+capital&type=code Looks like there is a function capitaliseFirstLetter in the OrthographyProcessor. |
NOTE: this is ported to csharp
Question 1=> Please suggest which step of creating inflected wordElement this could go wrong such that the baseform is defined wrong?
SimpleNLG-DE/src/test/java/simplenlgde/morphology/BasewordTest.java
Line 51 in 5c831cb
baseformAdj2 remains "gute" instead of "gut"
SimpleNLG-DE/src/test/java/simplenlgde/morphology/BasewordTest.java
Line 64 in 5c831cb
baseformvp2 remains "bin" instead of "sein"
The text was updated successfully, but these errors were encountered: