-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation unclear about LabelModel strategy #1462
Comments
from code, I guess they have implemented approach in “Learning the Structure of Generative Models without Labeled Data ” |
@glf1030, Thanks for the reply. Can you please point me to where in the code the ideas of using the inverse covariance matrix and robust PCA to infer the graph structure? I tried to dig into the code but it was not clear. Also, I wish there was a way to visualize the graph structure that was learned for debugging purposes. |
hi, I am also struggling in understanding the code. I read the paper "Training Complex Models with Multi-Task Weak Supervision" and I guess the Algorithm 1 on that paper is implemented in current code. `
I am not quite sure .. ok, I think I am a little bit close. def _loss_mu is to update z in the above algorithm1 |
Hi @cdeepakroy @glf1030 thanks for the deep dive here! The current implemented model is based on the algorithm in the paper @glf1030 is pointing to, Training Complex Models with Multi-Task Weak Supervision, published in AAAI'19. We will add a line or two similar to the above clarifying this in the docstring. In this approach, given a set of dependencies between the labeling functions (LFs), we compute the statistics of how different cliques of labeling functions agree and disagree with each other, and then use a matrix completion-style approach to recover the Regarding the model being learned: currently we learn a model in which we assume the LFs are conditionally independent given the unobserved true label Y, a common assumption in weak supervision / crowd modeling approaches. And, going beyond the data programming paper we published in NeurIPS'16 (referenced above), we actually estimate different LF accuracies for each label class, i.e. we estimate |
Thanks for your reply. As for "we will also add support for (a) modeling LF dependencies and (b) estimating the LF dependency structure, as we have supported in previous versions of the code and published on (e.g. see our ICML'17 and ICML'19 papers)." I wonder which version of the code has implemented such strategy? @ajratner The corresponding code, I think, is: I wonder: |
@ajratner Thank you for the clarification. Will be looking forward to the future release that provides implementations for (a) modeling LF dependencies and (b) estimating the LF dependency structure. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
FWIW, I'm looking forward to I'm not very hands-on with PyTorch, but it seems to me that it's a matter of updating the loss function
I can try raising a PR if it's 2. |
Hi @chaturv3di first of all, thanks so much for the offer of help! Unfortunately, the integration with the current v0.9 label model in a robust form (i.e. non-research code) is something that we think has some non-trivial aspects and design decisions involved, and so it's something the core team plans to handle. I don't have a timeline for this as yet, but will keep you updated! |
Issue description
It is not clear from the documentation what strategy is used in
snorkel.labeling.LabelModel()
.The description in the docstring says it uses a
A conditionally independent LabelModel to learn LF weights and assign training labels
. I am guessing in this strategy the labeling functions are assumed to be independent when conditioned on the true label as described in the section namedIndependent Labeling functions
in page 4 of the paper - Data Programming: Creating Large Training Sets, QuicklyHowever, this blog post -- Introducing the New Snorkel - says that in
snorkel v0.9
a robust PCA / Low-rank + sparse based approach is used to automatically learn the dependency / relationship / correlation structure between the labeling functions as described in the paper -- Learning Dependency Structures for Weak Supervision Models. This approach to me is very promising than the aforementioned conditionally independent version. I want to use this for my use-case.Can anyone confirm which of the above two strategies is implemented by
snorkel.labeling.LabelModel()
?The text was updated successfully, but these errors were encountered: