Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add op/extract/split Unit Test #186

Merged
merged 4 commits into from
Feb 29, 2024
Merged

Add op/extract/split Unit Test #186

merged 4 commits into from
Feb 29, 2024

Conversation

Sdddell
Copy link
Contributor

@Sdddell Sdddell commented Feb 19, 2024

Add the following tests:

  • test_markdown_header_splitter
    • cannot pass custom headers, e.g. <h1> xx </h1>
  • test_recursive_character_splitter
  • test_pattern_splitter_op
  • test_splitter_factory

@CambioML
Copy link
Owner

@Sdddell It looks like the build is failing due to formatting issue. Please make sure you follow https://www.notion.so/goldpiggy/Python-Linter-and-formatter-Setup-30fb3b81f0904af889832e4c697c5ec9?pvs=4 to properly setup your vs code, reformat, and submit your PR again.

def setUp(self):
self.splitter = MarkdownHeaderSplitter("test_splitter")

def test_call(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide a meaningful function name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is testing the special method def __call__(self, x). Should I just rename it to test___call__() or test_speical_call()?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it test_callable_object_construction

self.assertEqual(output_nodes[0].value_dict["text"], ["# Header ## Content"])
self.assertEqual(output_nodes[1].value_dict["text"], ["# Header", "## Content"])

def test_header_splitter(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can split into two test cases if you are intended to test single_header and multiple_header:
test_header_splitter_with_single_header
test_header_splitter_with_multiple_header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing processing Sequence[Node].

markdown_str = " <h1># Header</h1> \n Content"
headers_to_split_on_list0 = []
headers_to_split_on_list1 = [("\n", "h2")]
headers_to_split_on_list2 = [("<h1>", "h1")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does headers_to_split_on_listx mean here?

self.assertEqual(result, ['<h1># Header</h1>\nContent'])
self.assertEqual(result0, [])
self.assertEqual(result1, [])
# self.assertEqual(result2, ['<h1># Header</h1>\nContent'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is commented out?

def setUp(self):
self.splitter = PatternSplitter("test_splitter")

def test_call(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: provide a meaningful function name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it test_callable_object_construction

@Sdddell
Copy link
Contributor Author

Sdddell commented Feb 20, 2024

Fix styles.

@goldmermaid
Copy link
Collaborator

It looks like styling issue is gone. However, it looks like the build is failing due to unittest not passing. @Sdddell

@Sdddell
Copy link
Contributor Author

Sdddell commented Feb 21, 2024

The failing test case is about checking if markdown_header_splitter can process the custom markdown header (<h1>). Is that a valid test case, or do I need to comment out it?

@CambioML
Copy link
Owner

The failing test case is about checking if markdown_header_splitter can process the custom markdown header (<h1>). Is that a valid test case, or do I need to comment out it?

@SayaZhang Could you please help to take a look.

@Sdddell
Copy link
Contributor Author

Sdddell commented Feb 24, 2024

Remove custom html-style header test of markdown header splitter.

def test_list(self):
splitters = SplitterOpsFactory.list()

self.assertEqual(len(splitters), 3)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe try assertListEqual or assertEqual directly rather than comparing element by element


output_nodes = self.splitter([node0, node1, node2, node3])

self.assertEqual(len(output_nodes), 4)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here maybe try assertEqual directly rather than comparing element by element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@SayaZhang
Copy link
Collaborator

LGTM

@SayaZhang SayaZhang merged commit 3d8bb57 into CambioML:main Feb 29, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants