Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add glob/iglob API for pattern-based file listing across different storage backends #1634

Open
shenshanf opened this issue Feb 11, 2025 · 0 comments

Comments

@shenshanf
Copy link

shenshanf commented Feb 11, 2025

Currently, mmengine.fileio.list_dir_or_file doesn't support glob pattern matching when listing files. While Python's built-in glob.glob exists, it only works with local filesystem and cannot be used with other storage backends.

Proposed Solution

Add two new API functions:

def glob(pattern, *, recursive=False, backend_args=None):
    """Return a list of paths matching a pathname pattern.
    """
    pass

def iglob(pattern, *, recursive=False, backend_args=None):
    """Return an iterator yielding paths matching a pathname pattern.
    """
    pass

Example usage:

from mmengine.fileio import glob

# List all jpg files in a directory
files = glob('s3://bucket/path/*.jpg', backend_args={'access_key': '...'})

# Recursively find all .png files 
files = glob('local/path/**/*.png', recursive=True)

Current workaround requires manual filtering:

from mmengine.fileio import list_dir_or_file
import fnmatch

files = list_dir_or_file('s3://path/', list_dir=False)
jpg_files = [f for f in files if fnmatch.fnmatch(f, '*.jpg')] 

Having a backend-agnostic glob implementation would:

  1. Provide consistent pattern matching across different storage backends
  2. Simplify file filtering without manual pattern matching
  3. Match the functionality users expect from standard file operations
  4. Improve code readability when working with specific file patterns

Would appreciate feedback on this proposal. Thank you!

@shenshanf shenshanf changed the title [Feature Request] Add glob pattern support to list_dir_or_file for cross-backend file filtering [Feature Request] Add glob/iglob API for pattern-based file listing across different storage backends markdownCopy## Feature Description Feb 11, 2025
@shenshanf shenshanf changed the title [Feature Request] Add glob/iglob API for pattern-based file listing across different storage backends markdownCopy## Feature Description [Feature Request] Add glob/iglob API for pattern-based file listing across different storage backends Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant