Skip to content

Conversation

@apaasch
Copy link
Collaborator

@apaasch apaasch commented Mar 8, 2024

Task structure to store boost histogram objects as TGraphAsymmErrors in rootfiles.
Main goal is to do this indepently from Root using uproot.
This is not yet possible but there is ongoing work from the uproot team addressing this issue.
Updates can be followed in this PR from uproot5.

As long as this feature is not yet implement the conversion is done in two steps:

  1. Extract values from the boost histogram and store as arrays, which can be used as an input for a TGraph object. Store the dictionary as a pickel file.
  2. Extract the arrays from the pickle file and use them as input for a TGraphAsymmError. This step is done independently from CF, since no Root module is implemented in the Sandboxes.

In the long term, the second step is included in the first one and remove the dependency from Root.

Additional

  • Move category IDs to base task

Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @apaasch. It looks OK to me so far, I only have a few comments related to code style and variable naming, but this should be easy to fix. One should also maybe clarify if the convert_to_root.py script is meant to run while columnflow is sourced or completely outside (it would also be good to add this information to the file).

raise RuntimeError(
"no single dataset found in config matching "
f"process `{self.branch_data.process}`"
"no single dataset found in config matching ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comma needed here since the string continues on the next line.

Suggested change
"no single dataset found in config matching ",
"no single dataset found in config matching "

Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

raise RuntimeError(
"no single user-supplied dataset matched "
f"process `{self.branch_data.process}`"
"no single user-supplied dataset matched ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"no single user-supplied dataset matched ",
"no single user-supplied dataset matched "

Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

print(f"{arg}: {getattr(args, arg)}")

# Construct the file path
base_path = "/nfs/dust/cms/user/paaschal/WorkingArea/Analysis/JERC/DiJet/data/dijet_store/analysis_dijet/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this script be run inside a columnflow environment (i.e. after sourcing setup.sh)? If so, it would be good to avoid hardcoding the path here and use the environment variables insteda (e.g. os.getenv("CF_STORE_LOCAL")).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be run within the CF environment. You suggestion will be added to the next commit.

apaasch and others added 3 commits March 12, 2024 10:11
Redo flake8 changes
Change naming scheme and structur in root task to be more efficient and readable.

Co-authored-by: Daniel Savoiu <[email protected]>
Copy link
Collaborator Author

@apaasch apaasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all your suggestion have been added.

print(f"{arg}: {getattr(args, arg)}")

# Construct the file path
base_path = "/nfs/dust/cms/user/paaschal/WorkingArea/Analysis/JERC/DiJet/data/dijet_store/analysis_dijet/"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be run within the CF environment. You suggestion will be added to the next commit.

raise RuntimeError(
"no single dataset found in config matching "
f"process `{self.branch_data.process}`"
"no single dataset found in config matching ",
Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

raise RuntimeError(
"no single user-supplied dataset matched "
f"process `{self.branch_data.process}`"
"no single user-supplied dataset matched ",
Copy link
Collaborator Author

@apaasch apaasch Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Looks good to me, but there is one small change I would make to the convert_to_root.py script (see below).

Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me now. Let's merge!

@dsavoiu dsavoiu merged commit 2dcdeaa into uhh-cms:master Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants