-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML DataLoaders #115
ML DataLoaders #115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you run make style
on this before I place in review comments?
Went to review and looks like there are still a couple conflicts @al-rigazzi |
Experiment.create_run_settings has been implmented which is a factory method for creating run settings types. The run_command default is "auto" which attempts to automatically detect the run_command on the system to use.
this functionality may be reused in other places within the library, so the create_run_settings function was placed inside smartsim/settings and is now called from the experiment Local tests have been adapted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple things
- make sure to throw the deprecation wanring for
smartsim.tf
and lets let the user import that for a release. basically an empty module with the dep and an import works here. use thewarnings
library and throwDeprecationWarning
- Some re-naming for brevity
- Lets add tests for all of these in the
tests/backends
folder. - Use the logger.
- Lotta copy paste, lets keep the comments to what users need to see and direct them to the places they need to go for more information if necessary. These should be easy to grok in a VSCode like editor
- instead of
smartsim.ml.tf.data
lets have those imports resolve tosmartsim.ml.tf
for brevity. - Workers comment may be in a differnt ticket but lets expose that if the functionality works and be sure to document the decision there.
- tests tests tests tests
smartsim/ml/data.py
Outdated
batch_key = form_name(self.sample_prefix, self.batch_idx, sub_index) | ||
self.client.put_tensor(batch_key, samples) | ||
if self.verbose: | ||
print(f"Put batch {batch_key}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger instead here?
doc/tutorials/machine_learning.rst
Outdated
|
||
from SmartRedis import Client | ||
# simulation initialization code | ||
client = Client(cluster=False, address=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note here that address is needed if not launched through SmartSim
doc/tutorials/machine_learning.rst
Outdated
|
||
# producer | ||
producer_script = "producer.py" | ||
settings = RunSettings("python", exe_args=producer_script) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_run_settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
doc/tutorials/machine_learning.rst
Outdated
# producer | ||
producer_script = "producer.py" | ||
settings = RunSettings("python", exe_args=producer_script) | ||
uploader_model = exp.create_model("producer", settings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can enable key prefixing here instead and save a line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no clue we could do that.
self.init_sources() | ||
self.init_samples() | ||
|
||
def log(self, message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added logger now!
smartsim/ml/data.py
Outdated
`auto`. | ||
|
||
- When specifying `auto`, the user must also specify | ||
`uploader_name`. BatchDownloader will get all needed information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you might have done some copy paste here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, now I fixed all names and only kept docstrings of base classes
smartsim/ml/data.py
Outdated
:type smartredis_cluster: bool | ||
:param smartredis_address: Address of Redis client as <ip_address>:<port> | ||
:type smartredis_address: str | ||
:param replica_rank: When BatchDownloader is used in a distributed setting, indicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have these an sub-indicies? this doesn't make sense to me, can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was confusing. The sub-indices are those of the uploader, I renamed them as uploader_ranks
, which is just an int
.
) | ||
|
||
@staticmethod | ||
def worker_init_fn(worker_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the user specifying number of workers?
Does this spawn threads or processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are handled by the super-class torch.DataLoader
and corresponding settings.
from os import environ | ||
from time import sleep | ||
import numpy as np | ||
from mpi4py import MPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make sure we have notes in here that specify you need MPI4Py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it in the Jupyter notebook, should be enough, but I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supper cool looking stuff!! Here's some quick notes for consideration:
- I'm gonna second shortening up the docstrings to essential information. As someone coming from the outside, it took me while to try and figure out how the sub-classes differed while searching through the boilerplate.
- In docstrings, multi-line bullet points should end in
\
when moving to next line - Found a couple of minor errors in the docs.
- Marked some places in where I thought the code-style seemed a bit off. Address as you see fit.
doc/tutorials/machine_learning.rst
Outdated
and one application (the ``training_service``) downloading the samples to train a DNN. | ||
|
||
A richer example, entirely implemented in Python, is available as a Jupyter Notebook in the | ||
``tutorials`` section of the SmartSim repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link for reader convenience?
doc/tutorials/machine_learning.rst
Outdated
from smartsim.settings import RunSettings | ||
|
||
db = Orchestrator(port=6780) | ||
exp = ("online-training", launcher="local") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exp = Experiment("online-training", launcher="local")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
doc/tutorials/machine_learning.rst
Outdated
producer_script = "producer.py" | ||
settings = RunSettings("python", exe_args=producer_script) | ||
uploader_model = exp.create_model("producer", settings) | ||
uploader_model.attach_generator_files(to_copy=script) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_copy=producer_script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
smartsim/ml/torch/data.py
Outdated
else: | ||
sources = [] | ||
|
||
print(f"{worker_id}: {sources}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this was too verbose anyhow. I removed it.
@@ -0,0 +1,40 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np
never accessed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -0,0 +1,55 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np
not accessed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
dataset.init_sources() | ||
overall_sources = dataset.sources | ||
|
||
worker_id = worker_info.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the worker_id
param used before being overwritten here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to the Torch docs I took this from. My best guess is that it is not.
Adds various dimensions to the CI build matrix for SmartSim. The build matrix now uses MacOS & Ubuntu, GNU8, RedisAI 1.2.3 & 1.2.5, and Python 3.7-3.9. The build matrix excludes building with RedisAI 1.2.5 when on MacOS as RedisAI temporarily removed support for MacOS in 1.2.4 and 1.2.5 [ committed by @EricGustin and @Spartee ] [ reviewed by @Spartee ]
* Remove np from step.py and requirements Create a helper function called get_base_36_repr so that we can remove numpy from step.py Remove numpy from requirements.txt Pin requirements.dev to a specific version of numpy * Remove numpy from requirements * Remove numpy from setup [ committed by @EricGustin ] [ reviewed by @Spartee ]
Edits to the Tutorials section of the documentation to highlight refined api [ committed by @MattToast ] [ reviewed by @Spartee ]
This PR adds the ability to build the documentation inside a docker build so that users don't have to worry about the specific build steps. The build exports the docs folder into the current clone so that the dev doesn't have to do it themselves. build with ``make docks``. The manual documentation build is still available with make docs. There is also a new make tutorials that builds the tutorials into a developer docker with the current clone of SmartSim. This will eventually have a prod docker build too whch downloads the pinned version of smartsim for the latest release. [ committed by @Spartee ] [ reviewed by @al-rigazzi ]
Remove some tutorial files that appear to have been duplicated. [ committed by @MattToast ] [ reviewed by @Spartee ]
Edits to the Tutorials section of the documentation to highlight refined api [ committed by @MattToast ] [ reviewed by @Spartee ]
This PR adds the ability to build the documentation inside a docker build so that users don't have to worry about the specific build steps. The build exports the docs folder into the current clone so that the dev doesn't have to do it themselves. build with ``make docks``. The manual documentation build is still available with make docs. There is also a new make tutorials that builds the tutorials into a developer docker with the current clone of SmartSim. This will eventually have a prod docker build too whch downloads the pinned version of smartsim for the latest release. [ committed by @Spartee ] [ reviewed by @al-rigazzi ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work! Excited to see the propagate into the community.
This PR adds ML Data Loaders for Keras/TF and PyTorch.
They follow the download-and-store approach, not the streaming one (thus this could be added). Samples are downloaded from the DB and kept as copy of the data set (which means that they are not discarded after one epoch).
Specializations required some fine tuning, but the behavior is the same for TF and PyTorch. There are super-classes for all downloaders.