Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC] Log reader changes for MLGO environments. #242

Merged
merged 7 commits into from
May 16, 2023
Merged

Conversation

jacob-hegna
Copy link
Collaborator

No description provided.

@jacob-hegna jacob-hegna changed the title Log reader changes for MLGO environments. [NFC] Log reader changes for MLGO environments. May 16, 2023
@jacob-hegna jacob-hegna requested a review from mtrofin May 16, 2023 02:51
@@ -86,6 +86,11 @@
}


def convert_dtype_to_ctype(dtype: str) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Tuple(type, tf.DType) or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -229,7 +252,7 @@ def _add_feature(se: tf.train.SequenceExample, spec: tf.TensorSpec,


def read_log_as_sequence_examples(
fname: str) -> Dict[str, tf.train.SequenceExample]:
fname: str,) -> Dict[str, tf.train.SequenceExample]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the comma here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -74,7 +74,7 @@
'int32_t': (ctypes.c_int32, tf.int32),
'uint32_t': (ctypes.c_uint32, tf.uint32),
'int64_t': (ctypes.c_int64, tf.int64),
'uint64_t': (ctypes.c_uint64, tf.uint64)
'uint64_t': (ctypes.c_uint64, tf.uint64),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the comma necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -95,7 +100,8 @@ def create_tensorspec(d: Dict[str, Any]) -> tf.TensorSpec:
return tf.TensorSpec(
name=name,
shape=tf.TensorShape(shape),
dtype=_element_type_name_to_dtype[element_type_str])
dtype=_element_type_name_to_dtype[element_type_str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the comma necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -108,6 +114,7 @@ class LogReaderTensorValue:

Endianness is assumed to be the same as the log producer's.
"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious change

@@ -120,6 +127,14 @@ def __init__(self, spec: tf.TensorSpec, buffer: bytes):
def spec(self):
return self._spec

@property
def buffer(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this and the len property? should they have a small unit test, too?

also, maybe buffer -> raw_bytes to make it clear it's that, not another way to dereference the typed view

and then len -> do you need to use it as a length of the raw buffer? if so, maybe call it like that and return the _len multiplied by the scalar type size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I convert the data given by the log_reader to a numpy array, and the easiest way to do that was to use https://numpy.org/doc/stable/reference/generated/numpy.frombuffer.html.

I use numpy in the environments instead of tensorflow because numpy is universally used in every ML framework (tf, pytorch, jax) and I want to start decoupling things from TF when they don't fundamentally require it.

I'm not sure if they need unit tests - if you want I can add them, but they're not doing anything surprising. I think if the other tests fail it should catch any issues here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the object itself acts as a buffer, it has an indexer and a len.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np has its own notion of a buffer interface which is pretty specific, and if we naively try to pass one of these objects to np.frombuffer we get the error:

 TypeError: a bytes-like object is required, not 'LogReaderTensorValue'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha! interesting. OK. so can we do with just a raw_bytes, looks like len() wouldn't be needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, replaced raw_bytes and len with a to_numpy method directly, and added a unit test for it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a, neat. Thanks!

compiler_opt/rl/log_reader_test.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@jacob-hegna jacob-hegna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for all the spurious changes is that I realized my vim was autoformatting the python files not using yapf, but instead using the internal google formatting tool... so I would save the file, thinking it was formatting, upload the commit, see that yapf failed in CI, then use yapf and reupload. So, all the spurious changes were because of the google internal formatter and yapf (also a google python formatter) fighting each other. went back through and tried to revert all the spurious things.

@@ -74,7 +74,7 @@
'int32_t': (ctypes.c_int32, tf.int32),
'uint32_t': (ctypes.c_uint32, tf.uint32),
'int64_t': (ctypes.c_int64, tf.int64),
'uint64_t': (ctypes.c_uint64, tf.uint64)
'uint64_t': (ctypes.c_uint64, tf.uint64),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -86,6 +86,11 @@
}


def convert_dtype_to_ctype(dtype: str) -> Any:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -95,7 +100,8 @@ def create_tensorspec(d: Dict[str, Any]) -> tf.TensorSpec:
return tf.TensorSpec(
name=name,
shape=tf.TensorShape(shape),
dtype=_element_type_name_to_dtype[element_type_str])
dtype=_element_type_name_to_dtype[element_type_str],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -120,6 +127,14 @@ def __init__(self, spec: tf.TensorSpec, buffer: bytes):
def spec(self):
return self._spec

@property
def buffer(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I convert the data given by the log_reader to a numpy array, and the easiest way to do that was to use https://numpy.org/doc/stable/reference/generated/numpy.frombuffer.html.

I use numpy in the environments instead of tensorflow because numpy is universally used in every ML framework (tf, pytorch, jax) and I want to start decoupling things from TF when they don't fundamentally require it.

I'm not sure if they need unit tests - if you want I can add them, but they're not doing anything surprising. I think if the other tests fail it should catch any issues here.

@@ -229,7 +252,7 @@ def _add_feature(se: tf.train.SequenceExample, spec: tf.TensorSpec,


def read_log_as_sequence_examples(
fname: str) -> Dict[str, tf.train.SequenceExample]:
fname: str,) -> Dict[str, tf.train.SequenceExample]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

compiler_opt/rl/log_reader_test.py Outdated Show resolved Hide resolved
@mtrofin
Copy link
Collaborator

mtrofin commented May 16, 2023

lgtm

@jacob-hegna jacob-hegna merged commit 9d00bcf into main May 16, 2023
@jacob-hegna jacob-hegna deleted the log_reader_change branch May 16, 2023 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants