Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For OCGNN models, update hypersphere radius, and filter training data with active_mask #95

Merged
merged 10 commits into from
Jan 3, 2024

Conversation

ahmed3amerai
Copy link
Contributor

This pull request follows the discussion, to update hypersphere radius. The modifications could be summarized as follows.

  • Set the default warmup value as math.inf, to update the hypersphere radius and centre in every epoch
  • Added train is_train flag parameter to update radius during training only.
  • Updating the centre c of the hypersphere before the first epoch, instead of initialising it by zeros. My experiments show that if the initial features are not centred around 0, the initializing by zeros affects the performance since the centre of the hypersphere starts far from the majority of data points.

Moreover, It follows the discussion, to allow the use of training/ validating/ testing masks.

  • modified the script to filter the data with an active_mask variable. That will consider only the active nodes in (decision_score_, decision_function, and loss_func ).
  • If someone wants to use it he/she needs to set data.active_mask = data.train_mask before training and data.active_mask = data.test_mask before testing.

Regards,
Ahmed Aly

@ahmed3amerai
Copy link
Contributor Author

  • I fixed all models to run smoothly, however, I am unsure if models other than OCGNN require updating the hypersphere radius since I was focussing mainly on the OCGNN model. Theoretically, if any model depends on hypersphere learning it should require similar handling.

  • For enabling "active_mask" to filter train/valid/test data, I have enabled it in OCGNN. Similarly, I enabled it in models that utilize the DeepDetector function, namely AdONE, AnomalyDAE, CONAD, DOMINANT, GUIDE, GAE, and DONE. I enabled it by filtering all vectors passed to the loss_func function with "active_mask". (Please confirm that logic)

  • The other models need more study to their algorithms to enable "active_mask".

@kayzliu
Copy link
Member

kayzliu commented Dec 19, 2023

Hi Ahmed, thanks a lot for your contributions. However, I have some concerns.

  • active_mask: as discussed in How do you use different masks in PyGOD datasets. #50, we are focusing on unsupervised settings so far. Thus, there is no need to mask any node in the training data, and we won't encourage users to do so.
  • is_train: I am a little confused by "training only." For the transductive setting, the score is ready after training. For the inductive setting, we won't use warmup if predict is called.
  • c: we are happy to take this change if it helps.

@ahmed3amerai
Copy link
Contributor Author

ahmed3amerai commented Dec 19, 2023

It's my pleasure to contribute, regarding your concerns.

  • For active_mask: I understand some models do not require masking training nodes in unsupervised settings. However, since OCGNN is a one-class classifier model, it requires training on only one class of data, as mentioned in the original paper, and testing on both classes. To achieve that in a transductive setting I utilized masks.
  • I used is_train to avoid updating the hypersphere radius or centre when predict is called. If we initialize warmup=math.inf, the condition self.warmup > 0 will still be true even when predict is called. I think it's ok if you see a better way to assert that without using is_train flag.

@kayzliu
Copy link
Member

kayzliu commented Dec 21, 2023

Thanks for your reply. That makes sense. I am still considering whether we should add this mask for the only one class model in our library. Actually, this one class setting is very strict. The training data cannot contain any outliers. It might be too ideal for real-world applications.

For the is_train flag, it can be more elegant by setting the warmup to the number of training steps, which can be done by passing a parameter. So no need to change this part.

@ahmed3amerai
Copy link
Contributor Author

I am just confirming, Is there any tasks you need from my side for this pull request?

@@ -108,17 +108,20 @@ def loss_func(self, emb):
score : torch.Tensor
Outlier scores of shape :math:`N` with gradients.
"""
if is_train:
if self.warmup > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are able to merge this part without is_train flag.

Copy link
Member

@kayzliu kayzliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep only the part I marked and reverse rest of them?

@ahmed3amerai
Copy link
Contributor Author

I kept the part you marked and reversed the rest of the changes.

@kayzliu
Copy link
Member

kayzliu commented Jan 3, 2024

Thanks for reversing. But I still see several other files was changed, including the cache files.

@ahmed3amerai
Copy link
Contributor Author

Yes it was cache files and empty spaces, It should be clean now.

@coveralls
Copy link

coveralls commented Jan 3, 2024

Pull Request Test Coverage Report for Build 7393446504

  • 6 of 6 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.001%) to 99.573%

Totals Coverage Status
Change from base Build 7283849275: 0.001%
Covered Lines: 3029
Relevant Lines: 3042

💛 - Coveralls

@kayzliu kayzliu self-requested a review January 3, 2024 05:15
Copy link
Member

@kayzliu kayzliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kayzliu kayzliu merged commit 3342243 into pygod-team:main Jan 3, 2024
14 checks passed
@kayzliu
Copy link
Member

kayzliu commented Jan 3, 2024

Thanks for your effort. PR Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants