Documentation unclear about LabelModel strategy #1462

cdeepakroy · 2019-09-13T20:54:43Z

Issue description

It is not clear from the documentation what strategy is used in snorkel.labeling.LabelModel().

The description in the docstring says it uses a A conditionally independent LabelModel to learn LF weights and assign training labels. I am guessing in this strategy the labeling functions are assumed to be independent when conditioned on the true label as described in the section named Independent Labeling functions in page 4 of the paper - Data Programming: Creating Large Training Sets, Quickly

However, this blog post -- Introducing the New Snorkel - says that in snorkel v0.9 a robust PCA / Low-rank + sparse based approach is used to automatically learn the dependency / relationship / correlation structure between the labeling functions as described in the paper -- Learning Dependency Structures for Weak Supervision Models. This approach to me is very promising than the aforementioned conditionally independent version. I want to use this for my use-case.

Can anyone confirm which of the above two strategies is implemented by snorkel.labeling.LabelModel()?

The text was updated successfully, but these errors were encountered:

glf1030 · 2019-09-17T09:57:54Z

from code, I guess they have implemented approach in “Learning the Structure of Generative Models without Labeled Data ”

cdeepakroy · 2019-09-18T17:01:52Z

@glf1030, Thanks for the reply. Can you please point me to where in the code the ideas of using the inverse covariance matrix and robust PCA to infer the graph structure? I tried to dig into the code but it was not clear. Also, I wish there was a way to visualize the graph structure that was learned for debugging purposes.

glf1030 · 2019-09-19T03:50:41Z

@glf1030, Thanks for the reply. Can you please point me to where in the code the ideas of using the inverse covariance matrix and robust PCA to infer the graph structure? I tried to dig into the code but it was not clear. Also, I wish there was a way to visualize the graph structure that was learned for debugging purposes.

hi, I am also struggling in understanding the code. I read the paper "Training Complex Models with Multi-Task Weak Supervision" and I guess the Algorithm 1 on that paper is implemented in current code. `
def _loss_mu(self, l2: float = 0) -> torch.Tensor:
----------
l2
A float or np.array representing the per-source regularization
strengths to use, by default 0

    Returns
    -------
    torch.Tensor
        Overall mu loss between learned mu and initial mu
    """
    loss_1 = torch.norm((self.O - self.mu @ self.P @ self.mu.t())[self.mask]) ** 2
    loss_2 = torch.norm(torch.sum(self.mu @ self.P, 1) - torch.diag(self.O)) ** 2
    return loss_1 + loss_2 + self._loss_l2(l2=l2)`

I am not quite sure ..

ok, I think I am a little bit close.

def _loss_mu is to update z in the above algorithm1

ajratner · 2019-09-21T02:06:50Z

Hi @cdeepakroy @glf1030 thanks for the deep dive here! The current implemented model is based on the algorithm in the paper @glf1030 is pointing to, Training Complex Models with Multi-Task Weak Supervision, published in AAAI'19. We will add a line or two similar to the above clarifying this in the docstring.

In this approach, given a set of dependencies between the labeling functions (LFs), we compute the statistics of how different cliques of labeling functions agree and disagree with each other, and then use a matrix completion-style approach to recover the LabelModel parameters from this observed matrix (more precisely: we compute the inverse generalized covariance matrix of the junction tree of the LF dependency graph, and perform a matrix completion-style approach wrt this).

Regarding the model being learned: currently we learn a model in which we assume the LFs are conditionally independent given the unobserved true label Y, a common assumption in weak supervision / crowd modeling approaches. And, going beyond the data programming paper we published in NeurIPS'16 (referenced above), we actually estimate different LF accuracies for each label class, i.e. we estimate $P(\lf | Y)$ for all $\lf, Y$. In future releases coming soon, we will also add support for (a) modeling LF dependencies and (b) estimating the LF dependency structure, as we have supported in previous versions of the code and published on (e.g. see our ICML'17 and ICML'19 papers). Hope this helps!

glf1030 · 2019-09-23T01:22:06Z

Thanks for your reply. As for "we will also add support for (a) modeling LF dependencies and (b) estimating the LF dependency structure, as we have supported in previous versions of the code and published on (e.g. see our ICML'17 and ICML'19 papers)." I wonder which version of the code has implemented such strategy? @ajratner
From paper "Training Complex Models With Multi-task Weak Supervision" page 17, the loss function is

The corresponding code, I think, is:

I wonder:
1, The scripts didn't calculate the inverse matrix,
2，does loss_2 equal with Z*Z.t ?

cdeepakroy · 2019-09-25T16:53:37Z

@ajratner Thank you for the clarification. Will be looking forward to the future release that provides implementations for (a) modeling LF dependencies and (b) estimating the LF dependency structure.

github-actions · 2019-10-26T12:08:45Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

chaturv3di · 2019-11-25T05:58:34Z

FWIW, I'm looking forward to LabelModel implementation which handles LF dependencies. If I remember correctly, this was there in v0.7, and was immensely helpful.

I'm not very hands-on with PyTorch, but it seems to me that it's a matter of updating the loss function _loss_mu() mentioned above. Or am I missing some subtle complexity, @ajratner, @cdeepakroy, @glf1030? I'm trying to understand under which category this PR would fall.

Try it if you're comfortable with PyTorch (and DL algebra in general).
It's simple, read up a tutorial on PyTorch and try it yourself. The current implementation has all the variables that you'll need.

I can try raising a PR if it's 2.

ajratner · 2019-11-26T18:28:02Z

Hi @chaturv3di first of all, thanks so much for the offer of help! Unfortunately, the integration with the current v0.9 label model in a robust form (i.e. non-research code) is something that we think has some non-trivial aspects and design decisions involved, and so it's something the core team plans to handle. I don't have a timeline for this as yet, but will keep you updated!

paroma assigned ajratner Sep 13, 2019

github-actions bot added the no-issue-activity label Oct 26, 2019

github-actions bot closed this as completed Oct 31, 2019

abenton mentioned this issue Nov 19, 2019

Implementation of label model from "Training Complex Models with Multi-Task Weak Supervision" #1517

Closed

selectasterisk mentioned this issue Feb 19, 2020

Snorkel MeTaL: Multi-task Basic tutorial example #1552

Closed

chaturv3di mentioned this issue Jul 16, 2020

LabelModel with LFs dependencies #1596

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation unclear about LabelModel strategy #1462

Documentation unclear about LabelModel strategy #1462

cdeepakroy commented Sep 13, 2019 •

edited

Loading

glf1030 commented Sep 17, 2019 •

edited

Loading

cdeepakroy commented Sep 18, 2019

glf1030 commented Sep 19, 2019 •

edited

Loading

ajratner commented Sep 21, 2019

glf1030 commented Sep 23, 2019 •

edited

Loading

cdeepakroy commented Sep 25, 2019

github-actions bot commented Oct 26, 2019

chaturv3di commented Nov 25, 2019 •

edited

Loading

ajratner commented Nov 26, 2019

Documentation unclear about LabelModel strategy #1462

Documentation unclear about LabelModel strategy #1462

Comments

cdeepakroy commented Sep 13, 2019 • edited Loading

Issue description

glf1030 commented Sep 17, 2019 • edited Loading

cdeepakroy commented Sep 18, 2019

glf1030 commented Sep 19, 2019 • edited Loading

ajratner commented Sep 21, 2019

glf1030 commented Sep 23, 2019 • edited Loading

cdeepakroy commented Sep 25, 2019

github-actions bot commented Oct 26, 2019

chaturv3di commented Nov 25, 2019 • edited Loading

ajratner commented Nov 26, 2019

cdeepakroy commented Sep 13, 2019 •

edited

Loading

glf1030 commented Sep 17, 2019 •

edited

Loading

glf1030 commented Sep 19, 2019 •

edited

Loading

glf1030 commented Sep 23, 2019 •

edited

Loading

chaturv3di commented Nov 25, 2019 •

edited

Loading