-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NFC] Log reader changes for MLGO environments. #242
Conversation
compiler_opt/rl/log_reader.py
Outdated
@@ -86,6 +86,11 @@ | |||
} | |||
|
|||
|
|||
def convert_dtype_to_ctype(dtype: str) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Tuple(type, tf.DType)
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -229,7 +252,7 @@ def _add_feature(se: tf.train.SequenceExample, spec: tf.TensorSpec, | |||
|
|||
|
|||
def read_log_as_sequence_examples( | |||
fname: str) -> Dict[str, tf.train.SequenceExample]: | |||
fname: str,) -> Dict[str, tf.train.SequenceExample]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the comma here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -74,7 +74,7 @@ | |||
'int32_t': (ctypes.c_int32, tf.int32), | |||
'uint32_t': (ctypes.c_uint32, tf.uint32), | |||
'int64_t': (ctypes.c_int64, tf.int64), | |||
'uint64_t': (ctypes.c_uint64, tf.uint64) | |||
'uint64_t': (ctypes.c_uint64, tf.uint64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the comma necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -95,7 +100,8 @@ def create_tensorspec(d: Dict[str, Any]) -> tf.TensorSpec: | |||
return tf.TensorSpec( | |||
name=name, | |||
shape=tf.TensorShape(shape), | |||
dtype=_element_type_name_to_dtype[element_type_str]) | |||
dtype=_element_type_name_to_dtype[element_type_str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the comma necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -108,6 +114,7 @@ class LogReaderTensorValue: | |||
|
|||
Endianness is assumed to be the same as the log producer's. | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spurious change
compiler_opt/rl/log_reader.py
Outdated
@@ -120,6 +127,14 @@ def __init__(self, spec: tf.TensorSpec, buffer: bytes): | |||
def spec(self): | |||
return self._spec | |||
|
|||
@property | |||
def buffer(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this and the len
property? should they have a small unit test, too?
also, maybe buffer
-> raw_bytes
to make it clear it's that, not another way to dereference the typed view
and then len
-> do you need to use it as a length of the raw buffer? if so, maybe call it like that and return the _len multiplied by the scalar type size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I convert the data given by the log_reader to a numpy array, and the easiest way to do that was to use https://numpy.org/doc/stable/reference/generated/numpy.frombuffer.html.
I use numpy in the environments instead of tensorflow because numpy is universally used in every ML framework (tf, pytorch, jax) and I want to start decoupling things from TF when they don't fundamentally require it.
I'm not sure if they need unit tests - if you want I can add them, but they're not doing anything surprising. I think if the other tests fail it should catch any issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the object itself acts as a buffer, it has an indexer and a len.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np has its own notion of a buffer interface which is pretty specific, and if we naively try to pass one of these objects to np.frombuffer
we get the error:
TypeError: a bytes-like object is required, not 'LogReaderTensorValue'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha! interesting. OK. so can we do with just a raw_bytes
, looks like len() wouldn't be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, replaced raw_bytes
and len
with a to_numpy
method directly, and added a unit test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a, neat. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for all the spurious changes is that I realized my vim was autoformatting the python files not using yapf, but instead using the internal google formatting tool... so I would save the file, thinking it was formatting, upload the commit, see that yapf failed in CI, then use yapf and reupload. So, all the spurious changes were because of the google internal formatter and yapf (also a google python formatter) fighting each other. went back through and tried to revert all the spurious things.
compiler_opt/rl/log_reader.py
Outdated
@@ -74,7 +74,7 @@ | |||
'int32_t': (ctypes.c_int32, tf.int32), | |||
'uint32_t': (ctypes.c_uint32, tf.uint32), | |||
'int64_t': (ctypes.c_int64, tf.int64), | |||
'uint64_t': (ctypes.c_uint64, tf.uint64) | |||
'uint64_t': (ctypes.c_uint64, tf.uint64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -86,6 +86,11 @@ | |||
} | |||
|
|||
|
|||
def convert_dtype_to_ctype(dtype: str) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -95,7 +100,8 @@ def create_tensorspec(d: Dict[str, Any]) -> tf.TensorSpec: | |||
return tf.TensorSpec( | |||
name=name, | |||
shape=tf.TensorShape(shape), | |||
dtype=_element_type_name_to_dtype[element_type_str]) | |||
dtype=_element_type_name_to_dtype[element_type_str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
compiler_opt/rl/log_reader.py
Outdated
@@ -120,6 +127,14 @@ def __init__(self, spec: tf.TensorSpec, buffer: bytes): | |||
def spec(self): | |||
return self._spec | |||
|
|||
@property | |||
def buffer(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I convert the data given by the log_reader to a numpy array, and the easiest way to do that was to use https://numpy.org/doc/stable/reference/generated/numpy.frombuffer.html.
I use numpy in the environments instead of tensorflow because numpy is universally used in every ML framework (tf, pytorch, jax) and I want to start decoupling things from TF when they don't fundamentally require it.
I'm not sure if they need unit tests - if you want I can add them, but they're not doing anything surprising. I think if the other tests fail it should catch any issues here.
compiler_opt/rl/log_reader.py
Outdated
@@ -229,7 +252,7 @@ def _add_feature(se: tf.train.SequenceExample, spec: tf.TensorSpec, | |||
|
|||
|
|||
def read_log_as_sequence_examples( | |||
fname: str) -> Dict[str, tf.train.SequenceExample]: | |||
fname: str,) -> Dict[str, tf.train.SequenceExample]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
lgtm |
No description provided.