Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optional dependencies #103

Open
gagb opened this issue Dec 17, 2024 · 7 comments
Open

optional dependencies #103

gagb opened this issue Dec 17, 2024 · 7 comments
Assignees

Comments

@gagb
Copy link
Contributor

gagb commented Dec 17, 2024

          Unsure about this - perhaps should be an optional dep?

Originally posted by @casperdcl in #100 (comment)

@afourney
Copy link
Member

@gagb @casperdcl , Yeah, more generally, a lot of these should be optional dependencies.

Ideally we would have something like:

pip install markitdown[ocr, openai, yt_transcript]

Etc. to optionally include some of the more esoteric or heavy dependencies. We can then just include or exclude the converters accordingly.

What do you think?

@afourney afourney self-assigned this Dec 17, 2024
@gagb
Copy link
Contributor Author

gagb commented Dec 18, 2024

I like this but there is so much appeal to the simplicity from just running pip install markitdown

@casperdcl
Copy link

casperdcl commented Dec 18, 2024

Aliases are quite easy to implement...

pip install markitdown[all] could be made identical to pip install markitdown[ocr,llm,yt].

Pretty common Pythonicity.

btw you should probably rename this issue "optional dependencies" or similar. Also you can use a markdown quote block (>) rather than code block in the description 😉

@gagb gagb changed the title Unsure about this - perhaps should be an optional dep? optional dependencies Dec 18, 2024
@SigireddyBalasai
Copy link
Contributor

i think we can just make all the dependencies optional and make the script install dependencies if needed as the way it is happening in ultralytics there if a package is needed it will be installed on runtime also updates also work on runtime

@casperdcl
Copy link

casperdcl commented Dec 18, 2024

Whoa at most you could do:

try:
    import openai
except ImportError as exc:
    raise ImportError("please `pip/conda install openai` or `pip install markitdown[llm]`") from exc

Meanwhile side-effects like this are highly discouraged:

try:
    import openai
except ImportError:
    os.system(f"{sys.executable} -m pip imstall openai")
    import openai

@SigireddyBalasai
Copy link
Contributor

SigireddyBalasai commented Dec 18, 2024

i think something like

import sys
import pip
import pkg_resources

def check_and_install_module(module_name, check_for_updates=False):
    """
    Check if a Python module is installed. Optionally check and perform updates.
    
    Args:
        module_name (str): Name of the module to check and install
        check_for_updates (bool, optional): Whether to check and perform updates. Defaults to False.
    
    Returns:
        dict: A dictionary with installation/update status
    """
    try:
        # Try to import the module
        __import__(module_name)
        print(f"Module {module_name} is already installed.")
        
        # Check for updates if requested
        if check_for_updates:
            try:
                # Get current installed version
                current_version = pkg_resources.get_distribution(module_name).version
                
                # Check for available updates
                pip.main(['list', '--outdated'])
                
                # Perform update
                print(f"Updating {module_name}...")
                update_result = pip.main(['install', '--upgrade', module_name])
                
                if update_result == 0:
                    # Get new version after update
                    new_version = pkg_resources.get_distribution(module_name).version
                    print(f"Updated {module_name} from {current_version} to {new_version}")
                    return {
                        'installed': True, 
                        'updated': True, 
                        'old_version': current_version, 
                        'new_version': new_version
                    }
                else:
                    print(f"Failed to update {module_name}")
                    return {
                        'installed': True, 
                        'updated': False
                    }
            
            except Exception as update_error:
                print(f"Error checking/updating {module_name}: {update_error}")
                return {
                    'installed': True, 
                    'updated': False
                }
        
        return {
            'installed': True, 
            'updated': False
        }
    
    except ImportError:
        print(f"Module {module_name} not found. Attempting to install...")
        
        try:
            # Use pip to install the module
            install_result = pip.main(['install', module_name])
            
            if install_result == 0:
                # Verify the module is now importable
                __import__(module_name)
                print(f"Successfully installed {module_name}")
                return {
                    'installed': True, 
                    'updated': False
                }
            else:
                print(f"Failed to install {module_name}")
                return {
                    'installed': False, 
                    'updated': False
                }
        
        except Exception as e:
            print(f"Failed to install {module_name}. Error: {e}")
            return {
                'installed': False, 
                'updated': False
            }

# Example usage
if __name__ == "__main__":
    # Check and install pandas
    print(check_and_install_module('pandas'))
    
    # Check, install, and update requests
    print(check_and_install_module('requests', check_for_updates=True))

@robfitzgerald
Copy link

👍 to the idea of using optional dependencies. i wanted to try using markitdown as a global install (pip install -u markitdown) in my base python environment but when i did that, i got the whole kitchen sink:

Installing collected packages: pytz, pydub, puremagic, XlsxWriter, tzdata, speechrecognition, soupsieve, sniffio, pydantic-core, Pillow, pathvalidate, numpy, lxml, jiter, h11, et-xmlfile, defusedxml, cobble, annotated-types, youtube-transcript-api, python-pptx, pydantic, pandas, openpyxl, mammoth, httpcore, cryptography, beautifulsoup4, anyio, pdfminer-six, markdownify, httpx, openai, markitdown

there's a whole bunch of users out there that won't have the system privs to bring in this many dependencies (or will stay away because it doesn't make sense). it seems like it should be possible to install the minimum set for minimum stated functionality, "MarkItDown is a utility for converting various files to Markdown."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants