Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent problem data in atomspace #8

Closed
wants to merge 5 commits into from
Closed

Represent problem data in atomspace #8

wants to merge 5 commits into from

Conversation

Yidnekachew
Copy link
Collaborator

@Yidnekachew Yidnekachew commented Jun 8, 2018

Implemented as per #3

@@ -458,11 +473,13 @@ FIND_PACKAGE(Doxygen)
ADD_SUBDIRECTORY(doc EXCLUDE_FROM_ALL)

# Show a summary of what we got
SUMMARY_ADD("AtomSpace" "A weighted and typed hypergraph database" HAVE_ATOMSPACE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to add that since AtomSpace is a requirement anyway.

if(_tpp.use_atomese)
read_atomese_table(pms);
else
read_combo_table(pms);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a mixture of spaces and tabs here. You want to respect the same indentation format within the same file. So you may either use whitespaces or reformat the entire table-problems.cc to use tabs, it's up to you (as long as such reformatting takes place in a different commit, within that PR, it's OK).

class atomese_reprUTest : public CxxTest::TestSuite
{
private:
AtomSpace *as;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All your new code is tab indented, why not have this one tab indented too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, it's from @masrb but still.

{
eval->eval(
"(load-from-path \"tests/comboreduct/atomese_representation/real_data_result.scm\")");
Handle expected = eval->eval_h("(cog-execute! real_data_repr)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to execute real_data_repr as it's supposed to have already been executed at loading time.

{
eval->eval(
"(load-from-path \"tests/comboreduct/atomese_representation/boolean_data_result.scm\")");
Handle expected = eval->eval_h("(cog-execute! boolean_data_repr)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark about executing.

* @param repr
* @return
*/
Handle& get_atomese_representation(std::istream& in, Handle& repr);
Copy link
Member

@ngeiswei ngeiswei Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any benefit in having the Handle both returned and taken as argument. Usually streaming functions do return the stream as well as taking it in argument in order to chain streaming, but seeing its usage I don't even think you would need to chain streaming. So basically you I suggest either of the following signatures

  1. Handle get_atomese_representation(std::istream& in)
  2. std::istream& get_atomese_representation(std::istream& in, Handle& repr)
    The first one because it is simpler, and the second one because it allows, will you ever need it, to chain streaming. I probably go with 1 (cause I don't see chaining streaming in the future since csv info usually live in a standalone file) but it's up to you to evaluate which one is best.

type_tree tt;
bool has_header, is_sparse;
const string& target_feature = "";
const string& timestamp_feature = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's actually valid C++, what happens is that "" is converted to a string before initializing the string reference, as explained here http://en.cppreference.com/w/cpp/language/reference_initialization. However I don't see what benefits it brings, you might just as well remove &, wouldn't you?


// TODO: find a better way of finding the domain (boolean, real).
// For instance, the inputs could be contins where as the outputs boolean.
type_node otype = get_type_node(get_signature_output(tt));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you note it is a temporary shortcoming, but it's OK for now.

using namespace std;
using namespace combo;

Handle& create_repr(const vector<string>& labels, HandleSeq rows,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient to pass rows as a const ref, like labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, same remark as for get_atomese_representation, why not just return a Handle instead of taking the output Handle as ref argument?

repr = createLink(SIMILARITY_LINK,
parse_header(labels),
createLink(rows, SET_LINK));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and repr is not returned.

}

// TODO: Add other i/o varieties here.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr is not returned either.

@ngeiswei
Copy link
Member

@Yidnekachew I understand that in the long run it is preferable to load data directly from a file to Atomese. However for now it might be simpler to load the data first into a table or ctable (or whatever is at your disposal) and then convert it into Atomese. For instance your loader doesn't support optional header, and maybe other things that I am overseeing.

The other option, perhaps the best one, would be to refactor the table loaders so you can reuse as much as possible from that (as you've done for instance with inferTableAttributes and tokenizeRow) thus keeping a compact code while not having to pay the overhead of creating an temporary table.

@ngeiswei
Copy link
Member

@Yidnekachew I think it's better if you close this PR and create a new one, as there is enough changes to apply, and there's a conflict (I know github may resolve that at the press of button but it's still introduces some junk in the git history).

@ngeiswei
Copy link
Member

Yes, so what I mean, is that you would rebase your branch onto the master, then remove your commits via a soft reset (to not loose the changes), then apply your modifications, and then create cohesive commits possible by selecting hunks of code rather whole files (if necessary).

@Yidnekachew
Copy link
Collaborator Author

@ngeiswei Thanks for the comments! I'm closing this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants