Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Chinese Search Bug #245

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

7emotions
Copy link

@7emotions 7emotions commented Jun 29, 2024

  • 我已阅读说明,并确认此 PR 不适用问卷提交或修改。

Description

As discussed in #138 , at the moment searching with full university name does not work.

Bug Reproduce

As shown in the following figures, NJUST, for example, was included in the dataset, but no corresponding results were found in the search box.

image

image

Solution Effect

As shown in the following figure, NJUST, including its multiple campuses, has been correctly displayed.

image

In addition, searching suggestions are provided while typing.

image

PS: It occurs that some samples with too long title name are unable to be wriiten when building site. Thus, I take the liberty of filtering them out. However, in terms of results, all those samples filtered out are negative sample.

@7emotions
Copy link
Author

@aisuneko

Copy link
Contributor

@Rachel030219 Rachel030219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so many minor and completely irrelevant changes to code style

@7emotions
Copy link
Author

Commits have been reset. I am very sorry for the additional workload before.

@7emotions 7emotions requested a review from Rachel030219 July 18, 2024 08:34
# Ignored samples with excessively long names
if len(filename) > 150:
continue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the modification is that an error was encountered while writing files on account of long filename. This is not relevant to the main work of this PR.

@7emotions
Copy link
Author

7emotions commented Jul 18, 2024

Additionally, in order to implement keyword search, the source file of mkdocs needs to be modified to use jieba for Chinese word segmentation. However, such modification is not in the repository code, and I want to know whether this solution is acceptable.

Related changes can be found here.

微信图片_20240718170053

微信图片_20240718171016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants