Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem getting the baseform right for inflected adjectives and verbs #5

Open
GeorgeS2019 opened this issue Feb 27, 2024 · 8 comments

Comments

@GeorgeS2019
Copy link

NOTE: this is ported to csharp

Question 1=> Please suggest which step of creating inflected wordElement this could go wrong such that the baseform is defined wrong?

public void adjectiveBasewordTest(){

		string[] baseWords = { "gut", "gut",  "gut" , "gut" };
		string[] inflectedWords = { "gute", "gutes", "gutem",  "guten" };

		for (int i = 0; i < baseWords.Length; i++)
		{
			AdjPhraseSpec adj1 = nlgFactory.createAdjectivePhrase(baseWords[i]);
			AdjPhraseSpec adj2 = nlgFactory.createAdjectivePhrase(inflectedWords[i]);

			var baseformAdj1 = ((WordElement)adj1.getAdjective()).getBaseForm();
			var baseformAdj2 = ((WordElement)adj2.getAdjective()).getBaseForm();

baseformAdj2 remains "gute" instead of "gut"

		String[] baseword = { "sein", "sein", "sein", "sein", "gehen", "gehen"};
		String[] inflected = { "bin", "bist", "ist", "sind", "ging", "gingen" };

		for (int i = 0; i < baseword.Length; i++)
		{
			VPPhraseSpec vp1 = nlgFactory.createVerbPhrase(baseword[i]);
			VPPhraseSpec vp2 = nlgFactory.createVerbPhrase(inflected[i]);

			var baseformvp1 = ((WordElement)vp1.getVerb()).getBaseForm();
			var baseformvp2 = ((WordElement)vp2.getVerb()).getBaseForm();

baseformvp2 remains "bin" instead of "sein"

@DaBr01
Copy link
Contributor

DaBr01 commented Feb 27, 2024

Is it possible that you do not use the latest version of SimpleNLG-DE? There was in issue regarding the indexation of variants in previous versions that has been fixed (#2) in the latest version which could lead to this behaviour.

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Feb 27, 2024

@DaBr01

I use SimpleNLG-DE v1.1.1 from Marven and the tests ported to c# against the ikvm SimpleNLG-DE version passed without problem but the performance of loading 42MB of MucLex.xml is a challenge using ikvm approach.

I use the codes from the existing master branch to port over to csharp.

I am still learning SimpleNLG-DE, and I have not yet completely tracked what was changed to create SimpleNLG-DE from the parent codes which is tailored for English

@DaBr01
Copy link
Contributor

DaBr01 commented Feb 27, 2024

But if the tests pass, then the base form is the same, I am not sure I understand what the problem is?

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Feb 27, 2024

The first approach: ikvm SimpleNLG-DE version involves SimpleNLG-DE.jar 1.1.1 from Marven without porting java to c#.

The performance is the issue.

The second approach looks into the existing java codes in the master branch and port that to c#. This approach promises far better performance of loading MucLex.xml than the first approach.

However, I need to track how SimpleNLG-DE java codes create the inflected WordElement (with the right baseform) used in the tests cited above.

I fail to port that as the second approach csharp ported version fails to provide the right baseform for the inflected words used in the tests above.

@DaBr01
Copy link
Contributor

DaBr01 commented Feb 27, 2024

If you follow the issue I linked above you will find the exact commit where this was introduced / fixed in SimpleNLG-DE (commit d77058a)

@GeorgeS2019
Copy link
Author

@DaBr01 thanks, this is a good start to learn SimpleNLG-DE

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Feb 28, 2024

@DaBr01
Good morning, the above tests passed now.

Question 1: which part of the codes deals with capitalization of the noun?

	[Fact]
	public void CreateAMoreComplexSentence1()
	{
		SPhraseSpec sentence = nlgFactory.createClause();


		NPPhraseSpec subject = nlgFactory.createNounPhrase("der hund");
		VPPhraseSpec verb = nlgFactory.createVerbPhrase("jagen");
		NPPhraseSpec object1 = nlgFactory.createNounPhrase("george");

		sentence.setSubject(subject);
		sentence.setVerb(verb);
		sentence.setObject(object1);


		string output = realiser.realiseSentence(sentence);

		Assert.Equal("Der Hund jagt George.", output);

	}

I managed to have "jagt" from "jagen". However, I could not get capitalization of george => George and hund => Hund

Appreciate your help.

@DaBr01
Copy link
Contributor

DaBr01 commented Feb 28, 2024

Uff I really don't know that by heart :) A search in the repo might be helpful: https://github.com/search?q=repo%3Asebischair%2FSimpleNLG-DE+capital&type=code

Looks like there is a function capitaliseFirstLetter in the OrthographyProcessor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants