-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pytest
and make GitHub Workflow for tests
#183
base: master
Are you sure you want to change the base?
Use pytest
and make GitHub Workflow for tests
#183
Conversation
Test workflows are passing now that we xfail and skip classes with failing tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments below. Can you also add some details on why tests are being skipped in the PR body? What errors are you seeing? If they're trivial to fix, let's just fix them. If there are more significant issues, we can xfail for now and skip, but it would be good to have some context in the PR body so that we can be sure these changes aren't introducing regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor request to remove comment; otherwise this looks good!
It looks like many of the tests have been annotated as |
@embr So I just ran
Not sure exactly what the problem is here but it seems like a bad dependency pin. At least according to this |
@peytondmurray what version of python did you use? |
@embr There are many many errors, so for space, I am including just the first two. The vast majority of them are This is in a fresh Python 3.9.20 virtual environment.
|
|
Interesting. I was able to run the tests using a fresh 3.10.12 virtual environment and get basically the same pickle errors on my machine. I get the following errors (again only first two shown) with 3.11.10:
|
It seems that there is some issue with pickling and using |
Wait, you get different errors depending on the python version? Just for clarification:
With respect to the first issue, it looks like protocol buffers changed at some point, and an env variable was added which has an impact on how message sharing happens. @smokestacklightnin did you try setting this env variable?
|
I'm able to reproduce the pickling issues:
It looks like somewhere we're trying to pickle a generator. |
So looking at this a little more closely: here's the class FlipCountTest(testutil.TensorflowModelAnalysisTest):
...
def testFlipCount(self):
computations = flip_count.FlipCount(
thresholds=[0.3],
counterfactual_prediction_key='counterfactual_pred_key',
example_id_key='example_id_key',
).computations(example_weighted=True)
binary_confusion_matrix = computations[0]
matrices = computations[1]
metrics = computations[2]
# TODO(b/171180441): Handle absence of ground truth labels in counterfactual
# examples while computing flip count metrics.
examples = [
{
'labels': None,
'predictions': np.array([0.5]),
'example_weights': np.array([1.0]),
'features': {
'counterfactual_pred_key': np.array([0.7]),
'example_id_key': np.array(['id_1']),
},
},
{
'labels': None,
'predictions': np.array([0.1, 0.7]), # to test flattening
'example_weights': np.array([3.0]),
'features': {
'counterfactual_pred_key': np.array([1.0, 0.1]),
'example_id_key': np.array(['id_2']),
},
},
{
'labels': None,
'predictions': np.array([0.5, 0.2]),
'example_weights': np.array([2.0]),
'features': {
'counterfactual_pred_key': np.array([0.2, 0.4]),
'example_id_key': np.array(['id_3']),
},
},
{
'labels': None,
'predictions': np.array([0.2, 0.1]),
'example_weights': np.array([1.0]),
'features': {
'counterfactual_pred_key': np.array([0.4, 0.5]),
'example_id_key': np.array(['id_4']),
},
},
]
with beam.Pipeline() as pipeline:
# pylint: disable=no-value-for-parameter
result = (
pipeline
| 'Create' >> beam.Create(examples)
| 'Process' >> beam.Map(metric_util.to_standard_metric_inputs, True)
| 'AddSlice' >> beam.Map(lambda x: ((), x))
| 'ComputeBinaryConfusionMatrix'
>> beam.CombinePerKey(binary_confusion_matrix.combiner)
| 'ComputeMatrices'
>> beam.Map(
lambda x: (x[0], matrices.result(x[1]))
) # pyformat: ignore
| 'ComputeMetrics' >> beam.Map(lambda x: (x[0], metrics.result(x[1])))
)
# pylint: enable=no-value-for-parameter
def check_result(got):
try:
self.assertLen(got, 1)
got_slice_key, got_metrics = got[0]
self.assertEqual(got_slice_key, ())
self.assertLen(got_metrics, 6)
self.assertDictElementsAlmostEqual(
got_metrics,
{
metric_types.MetricKey(
name='flip_count/[email protected]',
example_weighted=True,
): 5.0,
metric_types.MetricKey(
name='flip_count/[email protected]',
example_weighted=True,
): 7.0,
metric_types.MetricKey(
name='flip_count/[email protected]',
example_weighted=True,
): 6.0,
metric_types.MetricKey(
name='flip_count/[email protected]',
example_weighted=True,
): 7.0,
},
)
self.assertAllEqual(
got_metrics[
metric_types.MetricKey(
name='flip_count/[email protected]',
example_weighted=True,
)
],
np.array([['id_2'], ['id_3']]),
)
self.assertAllEqual(
got_metrics[
metric_types.MetricKey(
name='flip_count/[email protected]',
example_weighted=True,
)
],
np.array([['id_2'], ['id_3'], ['id_4']]),
)
except AssertionError as err:
raise util.BeamAssertException(err)
util.assert_that(result, check_result, label='result')
...
It also looks like As proof of this you can comment out any of the lines inside |
So after a little more investigation, it looks like others are seeing the same issue: pytest-dev/pytest#12448. The view of the
Which is what's going on here. |
a1603ca
to
02a1858
Compare
02a1858
to
474ffcb
Compare
This is a **temporary** fix to make tests pass
474ffcb
to
f2afe82
Compare
f2afe82
to
48f69d7
Compare
The crux of the problem is that using
Although they don't explicitly say not to use I propose xfailing these tests and writing an issue (or multiple issues) to rewrite these tests to follow Apache Beam testing guidelines. @embr How does that sound? |
Thanks for digging into this. My preference would be to keep test coverage at this stage if we have any reasonable options. Specifically, I worry that removing a large part of our test coverage will hide unrelated issues that arise as part of moving to pytest. Of course, if we have not options other than a large migration, then breaking the PR into manageable chunks makes sense. Ideally we can find a way to keep test coverage, while working through the large migration. For example, I looked into the Alternatively, can we find a way to keep using the existing assertions, but avoid the pickling of the |
Absolutely understandable. @smokestacklightnin Let's swap back to running |
The problems persis even with using |
Sounds good |
@smokestacklightnin If you're lucky enough that the bazel build issues don't affect your environment, please go ahead and finish the conversion to |
Changes in this PR:
__name__ == "__main__"
section from tests because it is not necessary, or even run, when using pytest. Nontrivial functionality from those sections was preserved..gitignore
file to the project