New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Represent problem data in atomspace #8

Closed

Yidnekachew wants to merge 5 commits into opencog:master from Yidnekachew:represent-problem-data-in-atomspace

Collaborator

Yidnekachew commented Jun 8, 2018 •

edited

Loading

Implemented as per #3

Yidnekachew and others added 5 commits

June 8, 2018 13:58


          implement atomese representation of a problem data

06320ba


          add command line argument to use atomese

8d04ad5


          update README

2593c14


          add guile as a dependency

a95ab37


          add unit tests for atomese representation

1d2eda5

ngeiswei reviewed

View reviewed changes

CMakeLists.txt

@@ @@ -458,11 +473,13 @@ FIND_PACKAGE(Doxygen) @@
               ADD_SUBDIRECTORY(doc EXCLUDE_FROM_ALL)
               # Show a summary of what we got
+              SUMMARY_ADD("AtomSpace" "A weighted and typed hypergraph database" HAVE_ATOMSPACE)

Member

ngeiswei Jun 12, 2018

I don't think you need to add that since AtomSpace is a requirement anyway.

moses/moses/main/table-problems.cc

+                  if(_tpp.use_atomese)
+              		read_atomese_table(pms);
+                  else
+                      read_combo_table(pms);

Member

ngeiswei Jun 12, 2018

There's a mixture of spaces and tabs here. You want to respect the same indentation format within the same file. So you may either use whitespaces or reformat the entire table-problems.cc to use tabs, it's up to you (as long as such reformatting takes place in a different commit, within that PR, it's OK).

tests/comboreduct/atomese_representation/atomese_reprUTest.cxxtest

+              class atomese_reprUTest : public CxxTest::TestSuite
+              {
+              private:
+                  AtomSpace *as;

Member

ngeiswei Jun 12, 2018

All your new code is tab indented, why not have this one tab indented too?

Member

ngeiswei Jun 12, 2018

Oh I see, it's from @masrb but still.

ngeiswei reviewed

View reviewed changes

tests/comboreduct/atomese_representation/atomese_reprUTest.cxxtest

+                  {
+                      eval->eval(
+                          "(load-from-path \"tests/comboreduct/atomese_representation/real_data_result.scm\")");
+                      Handle expected = eval->eval_h("(cog-execute! real_data_repr)");

Member

ngeiswei Jun 12, 2018

I don't think you need to execute real_data_repr as it's supposed to have already been executed at loading time.

tests/comboreduct/atomese_representation/atomese_reprUTest.cxxtest

+                  {
+                      eval->eval(
+                          "(load-from-path \"tests/comboreduct/atomese_representation/boolean_data_result.scm\")");
+                      Handle expected = eval->eval_h("(cog-execute! boolean_data_repr)");

Member

ngeiswei Jun 12, 2018

Same remark about executing.

ngeiswei reviewed

View reviewed changes

moses/comboreduct/atomese_representation/atomese_representation.h

+               * @param repr
+               * @return
+               */
+              Handle& get_atomese_representation(std::istream& in, Handle& repr);

Member

ngeiswei Jun 12, 2018 •

edited

Loading

I don't see any benefit in having the Handle both returned and taken as argument. Usually streaming functions do return the stream as well as taking it in argument in order to chain streaming, but seeing its usage I don't even think you would need to chain streaming. So basically you I suggest either of the following signatures

Handle get_atomese_representation(std::istream& in)
std::istream& get_atomese_representation(std::istream& in, Handle& repr)
The first one because it is simpler, and the second one because it allows, will you ever need it, to chain streaming. I probably go with 1 (cause I don't see chaining streaming in the future since csv info usually live in a standalone file) but it's up to you to evaluate which one is best.

ngeiswei reviewed

View reviewed changes

moses/comboreduct/atomese_representation/atomese_representation.cc

+              	type_tree tt;
+              	bool has_header, is_sparse;
+              	const string& target_feature = "";
+              	const string& timestamp_feature = "";

Member

ngeiswei Jun 12, 2018

OK, it's actually valid C++, what happens is that "" is converted to a string before initializing the string reference, as explained here http://en.cppreference.com/w/cpp/language/reference_initialization. However I don't see what benefits it brings, you might just as well remove &, wouldn't you?

moses/comboreduct/atomese_representation/atomese_representation.cc

+              	// TODO: find a better way of finding the domain (boolean, real).
+              	// For instance, the inputs could be contins where as the outputs boolean.
+              	type_node otype = get_type_node(get_signature_output(tt));

Member

ngeiswei Jun 12, 2018

As you note it is a temporary shortcoming, but it's OK for now.

moses/comboreduct/atomese_representation/atomese_representation.cc

+              using namespace std;
+              using namespace combo;
+              Handle& create_repr(const vector<string>& labels, HandleSeq rows,

Member

ngeiswei Jun 12, 2018

It would be more efficient to pass rows as a const ref, like labels.

Member

ngeiswei Jun 12, 2018

Also, same remark as for get_atomese_representation, why not just return a Handle instead of taking the output Handle as ref argument?

moses/comboreduct/atomese_representation/atomese_representation.cc

+              		repr = createLink(SIMILARITY_LINK,
+              		                  parse_header(labels),
+              		                  createLink(rows, SET_LINK));
+              }

Member

ngeiswei Jun 12, 2018

Oh, and repr is not returned.

moses/comboreduct/atomese_representation/atomese_representation.cc

+              	}
+              	// TODO: Add other i/o varieties here.
+              }

Member

ngeiswei Jun 12, 2018

repr is not returned either.

Member

ngeiswei commented Jun 12, 2018

@Yidnekachew I understand that in the long run it is preferable to load data directly from a file to Atomese. However for now it might be simpler to load the data first into a table or ctable (or whatever is at your disposal) and then convert it into Atomese. For instance your loader doesn't support optional header, and maybe other things that I am overseeing.

The other option, perhaps the best one, would be to refactor the table loaders so you can reuse as much as possible from that (as you've done for instance with inferTableAttributes and tokenizeRow) thus keeping a compact code while not having to pay the overhead of creating an temporary table.

Member

ngeiswei commented Jun 12, 2018

@Yidnekachew I think it's better if you close this PR and create a new one, as there is enough changes to apply, and there's a conflict (I know github may resolve that at the press of button but it's still introduces some junk in the git history).

Member

ngeiswei commented Jun 12, 2018

Yes, so what I mean, is that you would rebase your branch onto the master, then remove your commits via a soft reset (to not loose the changes), then apply your modifications, and then create cohesive commits possible by selecting hunks of code rather whole files (if necessary).

Collaborator Author

Yidnekachew commented Jun 12, 2018

@ngeiswei Thanks for the comments! I'm closing this PR now.

Yidnekachew closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet