Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: outside of artifact store bounds #2986

Open
1 task done
Aadik1ng opened this issue Sep 4, 2024 · 8 comments
Open
1 task done

[BUG]: outside of artifact store bounds #2986

Aadik1ng opened this issue Sep 4, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@Aadik1ng
Copy link

Aadik1ng commented Sep 4, 2024

System Information

SYSTEM_INFO: {'os': 'windows', 'windows_version_release': '10', 'windows_version': '10.0.19045', 'windows_version_service_pack': 'SP0', 'windows_version_os_type':
'Multiprocessor Free'}

What happened?

Initiating a new run for the pipeline: data_ingestion_pipeline.
Executing a new run.
Caching is disabled by default for data_ingestion_pipeline.
Using user: default
Using stack: stack_1
orchestrator: default
artifact_store: local_artifact_store
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml up.

Relevant log output

Initiating a new run for the pipeline: data_ingestion_pipeline.
Executing a new run.
Caching is disabled by default for data_ingestion_pipeline.
Using user: default
Using stack: stack_1
  orchestrator: default
  artifact_store: local_artifact_store
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml up.
Failed to execute data ingestion pipeline: File `D:data\artifacts\data_ingestion_step\logs` is outside of artifact store bounds `data/artifacts`

CURRENT STACK

Name: stack_1
ID: d1765057-c27f-4e5e-ac3e-e7cee6a45797
User: default / a8bf9397-512c-4f9a-9266-f24b3ea10921
Workspace: default / 5e1c6e98-5302-47d4-bbc7-b97d24d1def3

ORCHESTRATOR: default

Name: default
ID: 33520fcf-456f-438d-b894-a765938a6b5e
Type: orchestrator
Flavor: local
Configuration: {}
Workspace: default / 5e1c6e98-5302-47d4-bbc7-b97d24d1def3

ARTIFACT_STORE: local_artifact_store

Name: local_artifact_store
ID: abae7772-933e-40fb-8893-0eb1bd855a9e
Type: artifact_store
Flavor: local
Configuration: {'path': 'data/artifacts'}
User: default / a8bf9397-512c-4f9a-9266-f24b3ea10921
Workspace: default / 5e1c6e98-5302-47d4-bbc7-b97d24d1def3

Reproduction steps

@step
def data_ingestion_step(config_path: str) -> Annotated[Dict[str, str], "paths"]:
try:
# Load configuration
config = load_config(config_path)
pdf_path = config['balance_sheet_pdf']

    # Use ZenML's artifact store context to determine base paths
    context = get_step_context()
    base_dir = context.get_output_artifact_uri()

    # Construct directories within the artifact store
    # Construct directories within the artifact store
    image_dir = os.path.join(base_dir, config.get('image_dir', 'data/artifacts/images'))
    table_dir = os.path.join(base_dir, config.get('table_dir', 'data/artifacts/tables'))
    other_dir = os.path.join(base_dir, config.get('other_dir', 'data/artifacts/other_content'))



    os.makedirs(image_dir, exist_ok=True)
    os.makedirs(table_dir, exist_ok=True)
    os.makedirs(other_dir, exist_ok=True)

    # Set up logging within the artifact store
    log_file_path = os.path.join(base_dir, 'logs', 'data_ingestion.log')
    os.makedirs(os.path.dirname(log_file_path), exist_ok=True)
    logging.basicConfig(
        filename=log_file_path,
        filemode='a',
        level=logging.INFO,
        format='%(asctime)s - %(levelname)s - %(message)s'
    )
    logger = logging.getLogger(__name__)

    logger.info(f"Image directory: {image_dir}")
    logger.info(f"Table directory: {table_dir}")
    logger.info(f"Other content directory: {other_dir}")

    # Extract data from PDF
    extract_from_pdf(pdf_path, image_dir, table_dir, other_dir)
    
    # Log extraction info
    log_extraction_info(pdf_path, image_dir, table_dir, other_dir)

    # Return paths as a dictionary
    return {
        "image_dir": image_dir,
        "table_dir": table_dir,
        "other_dir": other_dir
    }
except Exception as e:
    logger.error(f"Error in data ingestion step: {e}")
    raise

@pipeline
def data_ingestion_pipeline(config_path: str):
data_ingestion_step(config_path=config_path)

if name == "main":
config_path = 'config.yml' # Adjust this path if needed
try:
# Run the pipeline
data_ingestion_pipeline = data_ingestion_pipeline.with_options(enable_cache=False)
data_ingestion_pipeline(config_path=config_path)
except Exception as e:
print(f"Failed to execute data ingestion pipeline: {e}")

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Aadik1ng Aadik1ng added the bug Something isn't working label Sep 4, 2024
@schustmi
Copy link
Contributor

schustmi commented Sep 5, 2024

This seems like an issue with Windows paths. Can you just quickly verify whether the following code also fails for you on your local machine please:

from zenml import pipeline, step

@step
def logging_step() -> None:
  print("Some log message")

@pipeline
def p():
  logging_step()

if __name__ == "__main__":
  p()

@Aadik1ng
Copy link
Author

Aadik1ng commented Sep 5, 2024

FileNotFoundError: File D:\data\artifacts\logging_step\logs is outside of artifact store bounds data/artifacts
PS D:> & C:/Users/aadit/AppData/Local/Microsoft/WindowsApps/python3.11.exe D:\src\rough.py
Initiating a new run for the pipeline: p.
Executing a new run.
Using user: default
Using stack: stack_1
orchestrator: default
artifact_store: local_artifact_store
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml up.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\src\rough.py:12 in │
│ │
│ 9 logging_step() │
│ 10 │
│ 11 if name == "main": │
│ ❱ 12 p() │
│ 13 │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\new\pipelines\pipeline.py:1382 in call
│ │
│ 1379 │ │ │ return self.entrypoint(*args, **kwargs) │
│ 1380 │ │ │
│ 1381 │ │ self.prepare(*args, **kwargs) │
│ ❱ 1382 │ │ return self._run(**self._run_args) │
│ 1383 │ │
│ 1384 │ def _call_entrypoint(self, *args: Any, **kwargs: Any) -> None: │
│ 1385 │ │ """Calls the pipeline entrypoint function with the given arguments. │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\new\pipelines\pipeline.py:771 in _run │
│ │
│ 768 │ │ │ │ │ │ "zenml up." │
│ 769 │ │ │ │ │ ) │
│ 770 │ │ │ │
│ ❱ 771 │ │ │ deploy_pipeline( │
│ 772 │ │ │ │ deployment=deployment_model, stack=stack, placeholder_run=run │
│ 773 │ │ │ ) │
│ 774 │ │ │ if run: │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\new\pipelines\run_utils.py:153 in │
│ deploy_pipeline │
│ │
│ 150 │ │ │ # placeholder run to stay in the database │
│ 151 │ │ │ Client().delete_pipeline_run(placeholder_run.id) │
│ 152 │ │ │
│ ❱ 153 │ │ raise e │
│ 154 │ finally: │
│ 155 │ │ constants.SHOULD_PREVENT_PIPELINE_EXECUTION = previous_value │
│ 156 │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\new\pipelines\run_utils.py:141 in │
│ deploy_pipeline │
│ │
│ 138 │ previous_value = constants.SHOULD_PREVENT_PIPELINE_EXECUTION │
│ 139 │ constants.SHOULD_PREVENT_PIPELINE_EXECUTION = True │
│ 140 │ try: │
│ ❱ 141 │ │ stack.deploy_pipeline(deployment=deployment) │
│ 142 │ except Exception as e: │
│ 143 │ │ if ( │
│ 144 │ │ │ placeholder_run │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\stack\stack.py:853 in deploy_pipeline │
│ │
│ 850 │ │ Returns: │
│ 851 │ │ │ The return value of the call to orchestrator.run_pipeline(...). │
│ 852 │ │ """ │
│ ❱ 853 │ │ return self.orchestrator.run(deployment=deployment, stack=self) │
│ 854 │ │
│ 855 │ def _get_active_components_for_step( │
│ 856 │ │ self, step_config: "StepConfiguration" │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\orchestrators\base_orchestrator.py:187 in run │
│ │
│ 184 │ │ environment = get_config_environment_vars(deployment=deployment) │
│ 185 │ │ │
│ 186 │ │ try: │
│ ❱ 187 │ │ │ result = self.prepare_or_run_pipeline( │
│ 188 │ │ │ │ deployment=deployment, stack=stack, environment=environment │
│ 189 │ │ │ ) │
│ 190 │ │ finally: │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\orchestrators\local\local_orchestrator.py:78 in │
│ prepare_or_run_pipeline │
│ │
│ 75 │ │ │ │ │ step_name, │
│ 76 │ │ │ │ ) │
│ 77 │ │ │ │
│ ❱ 78 │ │ │ self.run_step( │
│ 79 │ │ │ │ step=step, │
│ 80 │ │ │ ) │
│ 81 │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\orchestrators\base_orchestrator.py:207 in │
│ run_step │
│ │
│ 204 │ │ │ step=step, │
│ 205 │ │ │ orchestrator_run_id=self.get_orchestrator_run_id(), │
│ 206 │ │ ) │
│ ❱ 207 │ │ launcher.launch() │
│ 208 │ │
│ 209 │ @staticmethod
│ 210 │ def requires_resources_in_orchestration_environment( │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\orchestrators\step_launcher.py:164 in launch │
│ │
│ 161 │ │ │
│ 162 │ │ if step_logging_enabled: │
│ 163 │ │ │ # Configure the logs │
│ ❱ 164 │ │ │ logs_uri = step_logging.prepare_logs_uri( │
│ 165 │ │ │ │ self._stack.artifact_store, │
│ 166 │ │ │ │ self._step.config.name, │
│ 167 │ │ │ ) │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\logging\step_logging.py:87 in prepare_logs_uri │
│ │
│ 84 │ ) │
│ 85 │ │
│ 86 │ # Create the dir │
│ ❱ 87 │ if not artifact_store.exists(logs_base_uri): │
│ 88 │ │ artifact_store.makedirs(logs_base_uri) │
│ 89 │ │
│ 90 │ # Delete the file if it already exists │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:151 in │
call
│ │
│ 148 │ │ has_self = bool(args and isinstance(args[0], BaseArtifactStore)) │
│ 149 │ │ │
│ 150 │ │ # sanitize inputs for relevant args and kwargs, keep rest unchanged │
│ ❱ 151 │ │ args = tuple( │
│ 152 │ │ │ self._sanitize_potential_path( │
│ 153 │ │ │ │ arg, │
│ 154 │ │ │ ) │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:152 in │
│ │
│ │
│ 149 │ │ │
│ 150 │ │ # sanitize inputs for relevant args and kwargs, keep rest unchanged │
│ 151 │ │ args = tuple( │
│ ❱ 152 │ │ │ self._sanitize_potential_path( │
│ 153 │ │ │ │ arg, │
│ 154 │ │ │ ) │
│ 155 │ │ │ if i + has_self in self.path_args │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:133 in │
│ _sanitize_potential_path │
│ │
│ 130 │ │ │ path = path.replace(ntpath.sep, posixpath.sep) │
│ 131 │ │ │ self._validate_path(path) │
│ 132 │ │ else: │
│ ❱ 133 │ │ │ self._validate_path(str(Path(path).absolute().resolve())) │
│ 134 │ │ │
│ 135 │ │ return path │
│ 136 │
│ │
│ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │
│ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:98 in │
│ _validate_path │
│ │
│ 95 │ │ │ │ bounds. │
│ 96 │ │ """ │
│ 97 │ │ if not path.startswith(self.fixed_root_path): │
│ ❱ 98 │ │ │ raise FileNotFoundError( │
│ 99 │ │ │ │ f"File {path} is outside of " │
│ 100 │ │ │ │ f"artifact store bounds {self.fixed_root_path}" │
│ 101 │ │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: File D:\data\artifacts\logging_step\logs is outside of artifact store bounds data/artifacts

@Aadik1ng
Copy link
Author

Aadik1ng commented Sep 5, 2024

These are my environment variable

HOMEDRIVE: C:
HOMEPATH: \Users\aadit
LOCALAPPDATA: C:\Users\aadit\AppData\Local
LOGONSERVER: \DESKTOP-C4QBLHV
NUMBER_OF_PROCESSORS: 16
ORIGINAL_XDG_CURRENT_DESKTOP: undefined
OS: Windows_NT
PATH:
C:\Users\aadit\AppData\Local\zenml;
PATHEXT: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL

ZENML_HOME: C:\Users\aadit\AppData\Local\zenml
ZES_ENABLE_SYSMAN: 1

@schustmi
Copy link
Contributor

schustmi commented Sep 5, 2024

Are you using the default local artifact store? Or did you register a custom one?

@Aadik1ng
Copy link
Author

Aadik1ng commented Sep 5, 2024

I was using custom in my previous stack but everything is on default now
Initiating a new run for the pipeline: simple_ml_pipeline.
Executing a new run.
Using user: default
Using stack: default
orchestrator: default
artifact_store: default

@Aadik1ng
Copy link
Author

Aadik1ng commented Sep 5, 2024

(PS) D:\Dir> zenml artifact-store describe default
Artifact_Store 'default' of flavor 'local' with id 'd6305633-a8f6-45ea-87f6-ae469da61fcf' is owned by user '-'.
No configuration options are set for this component.
No labels are set for this component.
No connector is set for this component.

so the default artifact-store was not set properly so i change it to this
basically created a new artifact-store called balance with dir/data/artifact as path

(PS)D:\Dir> zenml artifact-store describe balance
Artifact_Store 'Balance' of flavor 'local' with id 'bc735317-765d-4e0a-99c7-dbf0728f4702' is owned by user 'default'.
'Balance' ARTIFACT_STORE Component
Configuration (ACTIVE)
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓
┃ COMPONENT_PROPERTY │ VALUE ┃
┠────────────────────┼────────────────┨
┃ PATH │ data/artifacts ┃
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛
No labels are set for this component.
No connector is set for this component.

@schustmi
Copy link
Contributor

schustmi commented Sep 5, 2024

What do you mean by "the default artifact store was no set properly"? You ran into some issues when running pipelines with the default artifact store?

I think with your custom artifact store, it's somehow messing up the volumes. Did you explicitly register it with D:\data\artifacts?

@Aadik1ng
Copy link
Author

Aadik1ng commented Sep 5, 2024

when i was using default artifact store i was getting the same issue but the path were

Failed to execute data ingestion pipeline: File C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\Roaming\zenml\local_stores\5b773d37-c170-4a39-b1f5-d6a20fec3d5b\data_ingestion_step\logs is outside of artifact store bounds C:\Users\aadit\AppData\Roaming\zenml\local_stores\5b773d37-c170-4a39-b1f5-d6a20fec3d5b

so i change the artifact_store to custom one, updated its path to data/artifacts. I got the same issue but this time the paths were

D:\data\artifacts..

PS D:\RAG-on-Balance_Sheet> zenml artifact-store register balance --flavor=local
You are configuring a stack component that is using local resources while connected to a remote ZenML server. The stack component may not be usable from other hosts or by other users. You should consider using a non-local stack component alternative instead.
Successfully registered artifact_store balance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants