Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeuralynxIO: support List[str] for filename parameter #1440

Merged
merged 37 commits into from
Jul 24, 2024

Conversation

NxNiki
Copy link
Contributor

@NxNiki NxNiki commented Mar 19, 2024

Add a feature to allow multiple files as filename.

Now user can create a list of filenames and assign it to the argument include_filenames of neo.io.NeuralynxIO(dirname='xxx', include_filenames=session_data) to read the specific files.

If there are multiple files for a single channel, they should be consistent for all channels.

@zm711 zm711 changed the title support List[str] for filename NeuralynxIO: support List[str] for filename parameter Mar 20, 2024
@zm711
Copy link
Contributor

zm711 commented Mar 20, 2024

I guess one question I would have overall is why overload the filename argument? Unfortunately I don't think we would change (like change filename->filenames) since it is a core part, but why not add another argument include_filenames to mirror the exclude_filenames. I'll ping @samuelgarcia here so he can decide if he is okay with the overloading filename or not.

@zm711 zm711 requested a review from samuelgarcia March 20, 2024 12:25
@samuelgarcia
Copy link
Contributor

Maybe w could use filenames to avoid confusion ? even if it makes the signature a bit heavy...

@zm711
Copy link
Contributor

zm711 commented Mar 20, 2024

I think having filename and filenames would be super confusing and misused a lot. Would you be okay with include_filenames then it would still be dirname='path', filename=None, and include_filenames=['file0', 'file1']?

Or are you suggesting switch filename to filenames and have some broken code?

@NxNiki
Copy link
Contributor Author

NxNiki commented Mar 20, 2024

I think having filename and filenames would be super confusing and misused a lot. Would you be okay with include_filenames then it would still be dirname='path', filename=None, and include_filenames=['file0', 'file1']?

Or are you suggesting switch filename to filenames and have some broken code?

I don't feel it is necessary to create separate variables for single and multiple input files in the arguments. Maybe we can use filenames as the argument and create self.filename = filenames if filenames is a string and do self.filename = filenames[0] if len(filenames)==1?

The local filenames is also created when dirname is used. I don't feel using it for multiple input files adds more confusion here.

@zm711
Copy link
Contributor

zm711 commented Mar 20, 2024

@NxNiki,

I agree that's the better overall solution, but changing the argument name could break users' code. So my question for Sam was whether he was okay to do a breaking change at the api level or if he wanted to work around the current api to protect the old way of doing things.

@NxNiki
Copy link
Contributor Author

NxNiki commented Mar 20, 2024

@NxNiki,

I agree that's the better overall solution, but changing the argument name could break users' code. So my question for Sam was whether he was okay to do a breaking change at the api level or if he wanted to work around the current api to protect the old way of doing things.

Maybe add a deprecate warning?

@apdavison
Copy link
Member

I think this is perhaps an opportunity to formalize IO constructor arguments across all IO classes, before we get to the 1.0 release.

I think it will be confusing to have some IO classes that accept a list of things as the first argument, and others only a single thing. To solve the current issue, I support @zm711's suggestion of adding include_filenames, but then let's also remove the filename argument (or rather, add a deprecation warning for now, remove it in v1.0)

@NxNiki
Copy link
Contributor Author

NxNiki commented Mar 21, 2024

I think this is perhaps an opportunity to formalize IO constructor arguments across all IO classes, before we get to the 1.0 release.

I think it will be confusing to have some IO classes that accept a list of things as the first argument, and others only a single thing. To solve the current issue, I support @zm711's suggestion of adding include_filenames, but then let's also remove the filename argument (or rather, add a deprecation warning for now, remove it in v1.0)

Yeah, that makes sense. Shall we ignore dirname and filename when using include_filenames or add files to filename/files in dirname? The later option seems logical, but the logic is kind of complex and the use case may be rare in practice.

@apdavison
Copy link
Member

Shall we ignore dirname and filename when using include_filenames or add files to filename/files in dirname? The later option seems logical, but the logic is kind of complex and the use case may be rare in practice.

I don't use NeuralynxIO very much, so I will defer to those who do, but would suggest:

  1. If include_filenames is not empty, then filename must be empty (otherwise raise exception), and
  2. If dirname is not empty, then the contents of include_filenames should be interpreted as paths relative to dirname.

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments about the current implementation.

neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
@NxNiki
Copy link
Contributor Author

NxNiki commented Mar 22, 2024

Hi, I am trying to run pytest, but it seems the environment cannot be correctly set up. Neither setup.py nor the pyproject.toml works for me.

@zm711
Copy link
Contributor

zm711 commented Mar 22, 2024

@NxNiki,

I routinely set up new environments building neo from source. What is the exact error you're getting. Note that we are still trying to work on making sure all tests work on Windows so testing may not be perfect on Windows machines (outside of core tests).

@NxNiki
Copy link
Contributor Author

NxNiki commented Mar 23, 2024

@NxNiki,

I routinely set up new environments building neo from source. What is the exact error you're getting. Note that we are still trying to work on making sure all tests work on Windows so testing may not be perfect on Windows machines (outside of core tests).

I am running the code on a Mac (M1 chip). The error when I use poetry to install the package is:

poetry install
[tool.poetry] section not found in /Users/XinNiuAdmin/Documents/python-neo/pyproject.toml.

I can use pip to install the packages with pip install . But running pytest has the following error:

======================================================================================================= test session starts =======================================================================================================
platform darwin -- Python 3.9.6, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/XinNiuAdmin/Documents/python-neo
configfile: pyproject.toml
collected 1389 items / 1 error                                                                                                                                                                                                    

============================================================================================================= ERRORS ==============================================================================================================
_________________________________________________________________________________________ ERROR collecting neo/test/iotest/test_tiffio.py _________________________________________________________________________________________
ImportError while importing test module '/Users/XinNiuAdmin/Documents/python-neo/neo/test/iotest/test_tiffio.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
neo/test/iotest/test_tiffio.py:3: in <module>
    from PIL import Image
E   ModuleNotFoundError: No module named 'PIL'
===================================================================================================== short test summary info =====================================================================================================
ERROR neo/test/iotest/test_tiffio.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================================================================== 1 error in 1.01s =========================================================================================================

I guess this is also due to the incorrect pyproject.toml?

@zm711
Copy link
Contributor

zm711 commented Mar 23, 2024

@NxNiki,

If you want to run full tests you should do

pip install -e .[test]

You are missing a package. This is fine if you don't want to download a bunch of extra packages, but then rather than run pytest at the root you'll need to run individual tests instead. We don't have a poetry section in our code and I don't use poetry so I'm not sure how to install with poetry.

@NxNiki
Copy link
Contributor Author

NxNiki commented Apr 30, 2024

Hi, Sorry for the delay. I think this PR is ready to review. Let me know if there are additional comments.

@NxNiki NxNiki requested review from zm711 and alejoe91 April 30, 2024 19:05
Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments @NxNiki.

It is very close. I think we would want to simplify the logic a bit in a few places where I indicated. You want to try those? and then I'll do another read. Just ping me again when you're ready.

neo/io/neuralynxio.py Outdated Show resolved Hide resolved
neo/io/neuralynxio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
neo/rawio/neuralynxrawio/neuralynxrawio.py Show resolved Hide resolved
@NxNiki
Copy link
Contributor Author

NxNiki commented May 8, 2024

Hi @zm711, thanks for giving valuable suggestions. The PR is ready for another round of review.

@zm711
Copy link
Contributor

zm711 commented May 8, 2024

I'll give it another full read this weekend!

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last suggestion and one last discussion point, but then I think we are basically there.

Comment on lines +165 to +166
elif isinstance(include_filenames, str):
include_filenames = [include_filenames]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one I'm still a little torn on. I would prevent to force include_filenames to be a list, but I think it is okay if we accept a string in the case of only wanting one file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the hood, include_filenames has to be a list, or the iteration at line 220 would loop over each string character.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. I was saying why should we accept a string at all (so delete line 165 and 166). Since our new argument is filenames shouldn't we just always make the users give a list (or any iterable)? Why should we accept a single file rather than a list of a single file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked on Chatgpt and it seems quite common to allow for flexible input types in this case. I am open to both ways though.

Here is the answer from chatgpt:

In Python, when designing an I/O interface to read a list of files, it's generally user-friendly to allow for flexible input types. Here are the considerations for both approaches you mentioned:

Allowing a string or list as input:

Pros: More convenient for users who often deal with a single file. It reduces the overhead of wrapping a filename in a list, making the function easier to use in scripts or interactive sessions where only one file is processed.
Cons: The function implementation needs to handle different types of input, checking whether the input is a string or a list. This can make the function slightly more complex.
Strictly requiring a list of strings:

Pros: Consistency in input type can simplify the function's logic, as it always expects a list. This can make the code simpler and potentially less error-prone when dealing with multiple files.
Cons: Less convenient for cases where only a single file is involved, as the user must wrap the filename in a list even if it's just one.
Given these considerations, a common practice is to design the interface to accept both strings and lists. This approach enhances usability without significantly complicating the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apdavison did you have opinions on this? I prefer to be more restrictive, but I know other parts of Neo's API is relatively permissive. I think you prefer more flexibility too @samuelgarcia, so I'm fine with being outvoted here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a tough call, but I can't see that accepting a string will cause major problems, so I'm happy to go with the current interface.

neo/rawio/neuralynxrawio/neuralynxrawio.py Outdated Show resolved Hide resolved
@zm711
Copy link
Contributor

zm711 commented May 21, 2024

@NxNiki, we have a conference that a bunch of us are attending next week, but we will try to get final review done the week after that!

@NxNiki
Copy link
Contributor Author

NxNiki commented May 21, 2024

@NxNiki, we have a conference that a bunch of us are attending next week, but we will try to get final review done the week after that!

Thanks for the update. I hope you have a nice trip!

@zm711
Copy link
Contributor

zm711 commented Jun 6, 2024

@apdavison did you want to check out the argument reorganization on this one before we have our meeting?

@zm711
Copy link
Contributor

zm711 commented Jul 2, 2024

@apdavison, just wanted to ping you again for giving this one a read!

@NxNiki
Copy link
Contributor Author

NxNiki commented Jul 23, 2024

@apdavison, just wanted to ping you again for giving this one a read!

Hi, any updates? Let me know if there are additional comments or concerns.

@zm711 zm711 requested a review from apdavison July 23, 2024 21:31
@apdavison apdavison merged commit ad38785 into NeuralEnsemble:master Jul 24, 2024
2 checks passed
@apdavison
Copy link
Member

Many thanks @NxNiki for all your work on this, and sorry for the delayed response from my side.

@NxNiki
Copy link
Contributor Author

NxNiki commented Jul 24, 2024

Many thanks @NxNiki for all your work on this, and sorry for the delayed response from my side.

NP! This is a great learning opportunity for me. Thanks for the discussions and comments!

@@ -183,33 +213,24 @@ def _parse_header(self):
event_annotations = []

if self.rawmode == "one-dir":
filenames = sorted(os.listdir(self.dirname))
dirname = self.dirname
filenames = os.listdir(self.dirname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am realising only now that this little has an bi big impact because the file are unsorted and so channel are not ordered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think this is Anthony's issue? This was just merged yesterday, so I guess we could confirm that the issue showed up, but makes sense that could be the problem. You plan to just open the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry for missing this issue. Do we also want to sort include_filenames? One edge case would be the user using os.listdir to collect the files and pass it to include_filenames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wory. @Antho2422 did a quick fix.
For include_filenames, I think we do not need the sorter() because it makes that the user can providde a list in the desired order. no ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants