Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with running HEIM #3080

Open
sudhir-mcw opened this issue Oct 22, 2024 · 3 comments
Open

Issue with running HEIM #3080

sudhir-mcw opened this issue Oct 22, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation HEIM (Text2Image) user question

Comments

@sudhir-mcw
Copy link

sudhir-mcw commented Oct 22, 2024

HI @teetone
I am trying to try out heim with the following command and was facing the issue from heim documentation

helm-run --run-entries mscoco:model=huggingface/stable-diffusion-v1-4 --suite my-heim-suite --max-eval-instances 1

HuggingFaceDiffusersClient error: Failed to import diffusers.pipelines.stable_diffusion because of the following error (look up to see its traceback):
'Config' object has no attribute 'define_bool_state'
Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)

File "helm/src/helm/benchmark/window_services/window_service_factory.py", line 17, in get_window_service
model_deployment: Optional[ModelDeployment] = get_model_deployment(model_deployment_name)
File "helm/src/helm/benchmark/model_deployment_registry.py", line 132, in get_model_deployment
raise ValueError(f"Model deployment {name} not found")
ValueError: Model deployment openai/clip-vit-large-patch14 not found

0%| | 0/1 [00:35<?, ?it/s]
} [37.279s]
Traceback (most recent call last):
File "helm/src/helm/benchmark/run.py", line 380, in
main()
File "helm/src/helm/common/hierarchical_logger.py", line 104, in wrapper
return fn(*args, **kwargs)
File "helm/src/helm/benchmark/run.py", line 351, in main
run_benchmarking(
File "helm/src/helm/benchmark/run.py", line 128, in run_benchmarking
runner.run_all(run_specs)
File "helm/src/helm/benchmark/runner.py", line 226, in run_all
raise RunnerError(f"Failed runs: [{failed_runs_str}]")
helm.benchmark.runner.RunnerError: Failed runs: ["mscoco:model=huggingface_stable-diffusion-v1-4"]

Here is information on my setup
conda env Python 3.9.20
I installed heim using the build from source instead of using pip, as pip version was taking quite a long time to resolve the dependencies
Here are the steps i used to install

cd helm
pip install -r requirements.txt
pip install -e .[all]

I checked the community forum and tried replacing jax version to latest as well, but still no luck

jax==0.4.30
jaxlib==0.4.30

Are there any other installation and quick start documentation related to heim apart from heim.md in the docs ?

@sudhir-mcw sudhir-mcw changed the title Issues with running HEIM Issue with running HEIM Oct 22, 2024
@yifanmai
Copy link
Collaborator

The likely cause is that you have not run install-heim-extras.sh as explained in the HEIM docs; could you try that and see if that fixes things?

Sorry that this was not clearly explained in the documentation. I've updated the documentation to make things more clear.

@yifanmai yifanmai added documentation Improvements or additions to documentation user question HEIM (Text2Image) labels Oct 22, 2024
@sudhir-mcw
Copy link
Author

Hi @yifanmai, Thanks for the reply.
I tried once again after installing the install-heim-extras.sh,
The process gets interrupted with the following error

AestheticsMetric() {
    Parallelizing computation on 1 items over 4 threads {

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.58s/it]
} [8.579s]██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.58s/it]
} [8.58s]
CLIPScoreMetric(multilingual=False) {
Parallelizing computation on 1 items over 4 threads {
0%| | 0/1 [00:00<?, ?it/s]
} [0.002s] | 0/1 [00:00<?, ?it/s]
} [0.002s]
} [14.125s]
} [6m14.466s]
Error when running mscoco:model=huggingface_stable-diffusion-v1-4:
Traceback (most recent call last):
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/runner.py", line 216, in run_all
self.run_one(run_spec)
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/runner.py", line 307, in run_one
metric_result: MetricResult = metric.evaluate(
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/metrics/metric.py", line 143, in evaluate
results: List[List[Stat]] = parallel_map(
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/common/general.py", line 235, in parallel_map
results = list(tqdm(executor.map(process, items), total=len(items), disable=None))
File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
yield fs.pop().result()
File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/metrics/metric.py", line 77, in process
self.metric.evaluate_generation(
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/metrics/image_generation/clip_score_metrics.py", line 58, in evaluate_generation
prompt = WindowServiceFactory.get_window_service(model, metric_service).truncate_from_right(prompt)
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/window_services/window_service_factory.py", line 17, in get_window_service
model_deployment: Optional[ModelDeployment] = get_model_deployment(model_deployment_name)
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/model_deployment_registry.py", line 130, in get_model_deployment
raise ValueError(f"Model deployment {name} not found")
ValueError: Model deployment openai/clip-vit-large-patch14 not found

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [06:14<00:00, 374.49s/it]
} [6m21.356s]
Traceback (most recent call last):
File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/bin/helm-run", line 8, in
sys.exit(main())
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/common/hierarchical_logger.py", line 104, in wrapper
return fn(*args, **kwargs)
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/run.py", line 350, in main
run_benchmarking(
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/run.py", line 127, in run_benchmarking
runner.run_all(run_specs)
File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/runner.py", line 225, in run_all
raise RunnerError(f"Failed runs: [{failed_runs_str}]")
helm.benchmark.runner.RunnerError: Failed runs: ["mscoco:model=huggingface_stable-diffusion-v1-4"]

It's runnning fine till aesthetic metrics, it gets stopped at clip score calculation, Are there any configuration I am missing on?

@yifanmai
Copy link
Collaborator

I'm able to reproduce this myself. @teetone would you know what's happening here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation HEIM (Text2Image) user question
Projects
None yet
Development

No branches or pull requests

2 participants