Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the project missing sample_name, but with sample_table_index set correctly, cannot run in looper #459

Closed
Tracked by #457
donaldcampbelljr opened this issue Dec 5, 2023 · 9 comments
Labels

Comments

@donaldcampbelljr
Copy link
Contributor

No description provided.

@donaldcampbelljr
Copy link
Contributor Author

Unclear if this is actually a looper issue

@khoroshevskyi
Copy link
Member

I just checked all peppy tests related to this issue, and it seems, this is not a peppy issue...

@nsheff
Copy link
Contributor

nsheff commented Dec 18, 2023

Is there some way of getting the sample's index (whether it be from sample_name or some custom attribute) from a peppy sample object?

donaldcampbelljr added a commit to pepkit/looper that referenced this issue Dec 18, 2023
@khoroshevskyi
Copy link
Member

  • You can specify different sample id. By default it is 'sample_name' (This functionality works fine)
  • to get sample by id (whether it is sample_name or other id) you can use this function: prj.get_sample("sample_id")
  • to get sample_table index: prj.sample_table_index

  • There is no function that will return id (if you don't know sample_table_index). So, sample.get_id() is not implemented.
  • Or, for example, does peppy allow you to use sample.sample_name when someone uses sample.sample as index with sample_table_index: sample - this is not implemented either.

@donaldcampbelljr
Copy link
Contributor Author

Traceback from Looper:

/home/drc/GITHUB/looper/master/looper/venv/bin/python -m looper run --looper-config .looper.yaml 
Looper version: 1.6.0a2
Command: run
Using default divvy config. You may specify in env var: ['DIVCFG']

Pipestat compatible: False
Traceback (most recent call last):
  File "/home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/peppy/simple_attr_map.py", line 45, in __getattr__
    return self._mapped_attr[item]
KeyError: 'sample_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/drc/GITHUB/looper/master/looper/looper/__main__.py", line 8, in <module>
    sys.exit(main())
  File "/home/drc/GITHUB/looper/master/looper/looper/cli_looper.py", line 715, in main
    return run(args, rerun=(args.command == "rerun"), **compute_kwargs)
  File "/home/drc/GITHUB/looper/master/looper/looper/looper.py", line 457, in __call__
    name=sample.sample_name,
  File "/home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/peppy/simple_attr_map.py", line 47, in __getattr__
    raise AttributeError(f"Attribute not found: {item}")
AttributeError: Attribute not found: sample_name

Process finished with exit code 1

This is if I attempt to use project config:

pep_version: 2.0.0
sample_table: sample_annotation.csv
sample_table_index: not_sample_name

with sample annotation:

not_sample_name,library,file,toggle
frog_1,anySampleType,data/frog1_data.txt,1
frog_2,anySampleType,data/frog2_data.txt,1

@donaldcampbelljr
Copy link
Contributor Author

image

@donaldcampbelljr
Copy link
Contributor Author

donaldcampbelljr commented Jan 17, 2024

Confirmed that the issue is here in Peppy:

peppy/peppy/project.py

Lines 641 to 661 in c0affde

for sample in self.samples:
if self.st_index not in sample:
message = (
f"{CFG_SAMPLE_TABLE_KEY} is missing '{self.st_index}' column; "
f"you must specify {CFG_SAMPLE_TABLE_KEY}s in {self.st_index} or derive them"
)
if self.st_index != SAMPLE_NAME_ATTR:
try:
custom_sample_name = sample[self.st_index]
except KeyError:
raise InvalidSampleTableFileException(
f"Specified {CFG_SAMPLE_TABLE_KEY} index ({self.st_index}) does not exist"
)
sample[SAMPLE_NAME_ATTR] = custom_sample_name
_LOGGER.warning(
message
+ f"using specified {CFG_SAMPLE_TABLE_KEY} index ({self.st_index}) instead. "
+ f"Setting name: {custom_sample_name}"
)
else:
raise InvalidSampleTableFileException(message)

@khoroshevskyi
Copy link
Member

There is a reason why we don't want to do that. This change was made in this commit: 3d5495b

So I think this is correct behaviour

@donaldcampbelljr donaldcampbelljr removed the status in PEP Mar 19, 2024
@donaldcampbelljr
Copy link
Contributor Author

After discussion, it was determined that this is something that should be solved within Peppy. We will return to using sample.sample_name as a way to easily identify the sample identifier regardless of whether or not the user provided a different identifier via sample_table_index. Generally, we want the user to call items via bracket notation sample[sample_name] (since attmap has been removed). However, sample_name will be elevated such that the user can easily get the sample identifier via attribute access, e.g. sample.sample_name.

This fix should happen in the above previously referenced Peppy code, and Looper changes should be reverted to access the same.sample_name attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants