You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now I am not sure if there is a reason for subjects to be a set from from DocumentDirectory but a list from DocumentFile, so I'm just reporting this as a possible bug.
The text was updated successfully, but these errors were encountered:
You are right, this is an inconsistency. It could be resolved either way. And it should be resolved, since it already caused a bug in a release!
It's quite common to assume that the order of subjects doesn't matter, and this is the case for all the current algorithms (SVC being a special case since it can handle only one subject per document). Also the vector representation for subjects is order-agnostic. So I think it would make sense to change DocumentFile to use a set instead of a list.
PR #501 was a quick fix for training SVC backend on fulltext corpus, but it did not address the underlying reason:
Right now I am not sure if there is a reason for subjects to be a set from from DocumentDirectory but a list from DocumentFile, so I'm just reporting this as a possible bug.
The text was updated successfully, but these errors were encountered: