Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Both the parsers in vast_utils.py are broken. #43

Open
fake-name opened this issue Jun 24, 2024 · 3 comments
Open

Both the parsers in vast_utils.py are broken. #43

fake-name opened this issue Jun 24, 2024 · 3 comments

Comments

@fake-name
Copy link

I'm trying to figure out how the parse_env() function works. So far, it seems like it doesn't work for anything:

from vastai_client.vast_utils import parse_env
env_strs = [
"'-e INTERRO_USER=mlbox_2'",
"'-e ASR_MODEL=base'",
]


for env_str in env_strs:
	ret = parse_env(env_str)

	print("Env: %s -> '%s'" % (env_str, ret))

returns:

Env: '-e INTERRO_USER=mlbox_2' -> '{}'
Env: '-e ASR_MODEL=base' -> '{}'

The ASR_MODEL example string is literally from the docstring of the function. Nothing I do seems to ever cause it to return anything but an empty dict.

@fake-name
Copy link
Author

fake-name commented Jun 24, 2024

Ok, so there are a BUNCH of completely valid environment variable options that are broken.

  • Leading spaces (completely valid) break the parsing.
  • Multiple spaces between any arguments break the parsing.
  • Surrounding the string in quotes (like the vast.ai documentation and the function docstring say you should use) breaks the parsing.
  • Dots in environment variables (One of my env variables is a URL) breaks the parsing.
  • Quoting the value for your environment variable breaks the parsing.
  • Spaces in the env variables (which are completely valid if quoted) break the parsing.

It seems like the parser thingie here was kind of ad-hoc for a single use case, and it completely falls over if you do anything else.

@fake-name
Copy link
Author

fake-name commented Jun 24, 2024

Here is a rewritten parse_env() that should handle most valid things you can pass as environment variables:

import shlex
import itertools


def parse_env(envs: str | None) -> dict[str, str]:
	"""Parse the environment variables provided to the docker container.

	Args:
		envs (str | None): String with environment variables: "'-e ASR_MODEL=base'".

	Returns
	-------
		dict[str, str]: The parsed environment variables.
	"""

	result: dict[str, str] = {}
	if envs is None:
		return result

	# Handle wrapping quotes because the docs suck
	if (envs.strip().startswith("'") and envs.strip().endswith("'")) or \
		(envs.strip().startswith('"') and envs.strip().endswith('"')):
		envs = shlex.split(envs)

		assert len(envs) == 1, "Quoted envs argument did not parse into a single item after doing initial quote unwrap"
		envs = envs[0]

	split = shlex.split(envs)

	for first, second in itertools.pairwise(split):
		if first == '-e':

			name, value = second.split('=', 1)
			result[name] = value

		if first == '-p':
			if set(second).issubset(set("0123456789:")):
				result["-p " + second] = "1"


	return result

It mainly fixes things by virtue of delegating all the shell command line parsing complexity to shlex, which is part of the stdlib and exists for literally this exact issue.

@fake-name fake-name changed the title Environment variables just..... do not work. Both the parsers in vast_utils.py are broken. Jun 24, 2024
@fake-name
Copy link
Author

fake-name commented Jun 24, 2024

Ok, so parse_query() is ALSO broken:


query_strs = [
	"num_gpus=1 gpu_name in [RTX_6000Ada, A40] gpu_ram>40 rentable=true verified=false",
]


for env_str in query_strs:
	ret = parse_query(env_str)

	print("Query: '%s' -> '%s'" % (env_str, ret))
python3 env_wat.py
Traceback (most recent call last):
  File "/media/Scripts/book_cat/env_wat.py", line 319, in <module>
    ret = parse_query(env_str)
          ^^^^^^^^^^^^^^^^^^^^
  File "/media/Scripts/book_cat/env_wat.py", line 286, in parse_query
    v[op_name] = value.replace('_', ' ')
                 ^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'replace'

It appears that the in [list] syntax just causes it to barf and fail.

This query string works fine for the vastai command line interface. Looking at the regexes in parse_query() I think there was some intention to support the in syntax, but in any event it doesn't work.

Edit: No, it's just broken period. num_gpus=1 gpu_name=RTX_6000Ada gpu_ram>40 rentable=true verified=false ALSO fails to parse.

fake-name pushed a commit to fake-name/vastai_client that referenced this issue Jun 27, 2024
This replaces the hand-rolled, broken shell argument parser with `shlex`, which has a full argument parser that handles all the weird corner cases of command line arguments.
Note that `parse_query()` is still broken (see Barahlush#43).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant