Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added serialization of the trie into a memory buffer. #12

Merged
merged 1 commit into from
Jan 23, 2021

Conversation

KOLANICH
Copy link
Contributor

No description provided.

@KOLANICH
Copy link
Contributor Author

KOLANICH commented Jan 19, 2020

@thep, @tlwg

Copy link
Contributor

@thep thep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbols in libdatrie.map are versioned. Instead of inserting new symbols to old version block, please create a new version block (DATRIE_0.2.13), deriving from DATRIE_0.2.7, and add new symbols there. For libdatrie.def, simple insertion is fine.

Copy link
Contributor

@thep thep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer function naming to start with target object type, so they look like OOP object methods. Please rename get_trie_serialized_size(). get_alpha_map_serialized_size(), get_tail_serialized_size() to trie_get_serialized_size(), alpha_map_get_serialized_size(), tail_get_serialized_size() respectively.

With this, please also update documentation for trie_get_serialized_size(), as it's not for file creation.

datrie/tail.c Outdated Show resolved Hide resolved
datrie/tail.c Outdated Show resolved Hide resolved
datrie/tail.c Show resolved Hide resolved
datrie/tail.c Show resolved Hide resolved
tests/test_serialization.c Outdated Show resolved Hide resolved
@thep
Copy link
Contributor

thep commented Jan 20, 2021

Sorry for some comments that might introduce some confusion. TrieChar is supposed to be the type of raw characters used in trie node transitions, as opposed to AlphaChar for the Unicode characters in the real world. TrieChar happens to be unsigned char in the current code, but it's not guaranteed to always be so. For example, there is a request in Issue #1 to increase TrieChar size.

In low-level code like serialization, I think we need to be strict on bytes and use unsigned char where byte is actually meant. So, please revert the TrieChar replacements and use the explicit unsigned char instead. You can assume TrieChar to be of one-byte size for now. We can fix this later if TrieChar enhancement actually happens. Sorry for the confusion.

@KOLANICH
Copy link
Contributor Author

I think we need to be strict on bytes and use unsigned char

unsigned char is not guaranteed to be a byte. uint8_t is.

@thep
Copy link
Contributor

thep commented Jan 21, 2021

I think we need to be strict on bytes and use unsigned char

unsigned char is not guaranteed to be a byte. uint8_t is.

Yeah, we can do so to the whole project once we decide to move to C99. However, this reminds me of the similar typedefs in datrie/typedefs.h. We can use uint8 for this purpose, then.

@KOLANICH KOLANICH force-pushed the fileless branch 5 times, most recently from bf760ef to fa19e1a Compare January 23, 2021 06:45
@thep
Copy link
Contributor

thep commented Jan 23, 2021

Please also add this change set to tests/Makefile.am, to get the test run.

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 1c32dc2..41b55cb 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -7,6 +7,7 @@ TESTS = \
 	test_iterator \
 	test_store-retrieve \
 	test_file \
+	test_serialization \
 	test_nonalpha \
 	test_null_trie \
 	test_term_state \
@@ -19,6 +20,7 @@ check_PROGRAMS = \
 	test_iterator \
 	test_store-retrieve \
 	test_file \
+	test_serialization \
 	test_nonalpha \
 	test_null_trie \
 	test_term_state \
@@ -54,6 +56,12 @@ test_file_SOURCES = \
 	$(NULL)
 test_file_LDADD = $(top_builddir)/datrie/libdatrie.la
 
+test_serialization_SOURCES = \
+	test_serialization.c \
+	utils.c \
+	$(NULL)
+test_serialization_LDADD = $(top_builddir)/datrie/libdatrie.la
+
 test_nonalpha_SOURCES = \
 	test_nonalpha.c \
 	utils.c \

@thep
Copy link
Contributor

thep commented Jan 23, 2021

Merged. Thank you very much for your contribution!

@KOLANICH KOLANICH deleted the fileless branch January 23, 2021 19:05
@KOLANICH
Copy link
Contributor Author

And thank you for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants