Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new metric spec causing OOM #115

Open
axelning opened this issue Mar 2, 2021 · 0 comments
Open

Adding new metric spec causing OOM #115

axelning opened this issue Mar 2, 2021 · 0 comments

Comments

@axelning
Copy link

axelning commented Mar 2, 2021

System information

  • Have I specified the code to reproduce the issue
    (Yes/No): yes
  • Environment in which the code is executed (e.g., Local
    (Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): ubuntu 18.04
  • TensorFlow
    version (you are using): - TFX Version: 0.26.1- Python version:3.6.7 tensorflow version:2.3.2

Describe the current behavior
when do as the tutorial, I was adding a extra tfma.EvalConfig to evaluator as

eval_config = tfma.EvalConfig(
		model_specs=[
			# This assumes a serving model with signature 'serving_default'. If
			# using estimator based EvalSavedModel, add signature_name: 'eval' and
			# remove the label_key.
			tfma.ModelSpec(label_key='Label',
			               model_type=constants.TF_GENERIC
			               )
		],
		metrics_specs=[
			tfma.MetricsSpec(
				# The metrics added here are in addition to those saved with the
				# model (assuming either a keras model or EvalSavedModel is used).
				# Any metrics added into the saved model (for example using
				# model.compile(..., metrics=[...]), etc) will be computed
				# automatically.
				metrics=[
					tfma.MetricConfig(class_name='ExampleCount'),
					tfma.MetricConfig(
						class_name='BinaryAccuracy',
						threshold=tfma.MetricThreshold(
							value_threshold=tfma.GenericValueThreshold(
								lower_bound={'value': 0.5}),
							change_threshold=tfma.GenericChangeThreshold(
								direction=tfma.MetricDirection.HIGHER_IS_BETTER,
								absolute={'value': -1e-10})))
				]
			)
		],
		slicing_specs=[
			# An empty slice spec means the overall slice, i.e. the whole dataset.
			tfma.SlicingSpec(),
			# Data can be sliced along a feature column. In this case, data is
			# sliced along feature column trip_start_hour.
			# tfma.SlicingSpec(feature_keys=['trip_start_hour'])
		])

calls the function in tfma/metrics/metrics_spec.py: 477 _keys_and_metrics_from_specs(metrics_specs)

this functions will call the from_config() in tensorflow/python/keras/engine/base_layer.py :697

looks like this call will form a new layer. and this operation will take all the gpu memory and causeing OOM for the following running.

Describe the expected behavior
Well, another OOM issue, looks like tf need a emergency surgery concerning the memory managment

Standalone code to reproduce the issue Providing a bare minimum test case or

Name of your Organization (Optional)

Other info / logs Include any logs or source code that would be helpful to
just normal OOM error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants