Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

Closed
dfengpo opened this issue Jan 27, 2024 · 12 comments
Closed

convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156

dfengpo opened this issue Jan 27, 2024 · 12 comments
Labels
bug Something isn't working stale

Comments

@dfengpo
Copy link

dfengpo commented Jan 27, 2024

I use python convert-hf-to-gguf.py /Qwen-72B-Chat.
And I am getting the same error:
blk.33.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.34.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.35.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.35.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 Killed

What does this mean “Killed”?
@ggerganov @slaren @prusnak

@prusnak
Copy link
Collaborator

prusnak commented Jan 27, 2024

I assume you have run out of memory. How much RAM do you have?

@dfengpo
Copy link
Author

dfengpo commented Jan 27, 2024

I assume you have run out of memory. How much RAM do you have?

My RAM is 64Gb,cpu32

@ngxson
Copy link
Collaborator

ngxson commented Jan 27, 2024

The process is likely to be killed because of low memory: https://stackoverflow.com/questions/726690/what-killed-my-process-and-why

@Galunid
Copy link
Collaborator

Galunid commented Jan 27, 2024

It seems like a bug, and the offending line is

model_kv = dict(self.get_tensors())

Which makes the script load the whole (80GB+) model into memory, instead of using mmap from torch

@lmxin123 Could you try the script with the changes below?

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
 
     def write_tensors(self):
         block_count = self.hparams["num_hidden_layers"]
-        model_kv = dict(self.get_tensors())
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
-        for name, data_torch in model_kv.items():
+        for name, data_torch in self.get_tensors():
             # we don't need these
             if name.endswith(".rotary_emb.inv_freq"):
                 continue

@Galunid Galunid added bug Something isn't working and removed bug-unconfirmed labels Jan 27, 2024
@dfengpo
Copy link
Author

dfengpo commented Jan 28, 2024

It seems like a bug, and the offending line is

model_kv = dict(self.get_tensors())

Which makes the script load the whole (80GB+) model into memory, instead of using mmap from torch

@lmxin123 Could you try the script with the changes below?

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
 
     def write_tensors(self):
         block_count = self.hparams["num_hidden_layers"]
-        model_kv = dict(self.get_tensors())
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
-        for name, data_torch in model_kv.items():
+        for name, data_torch in self.get_tensors():
             # we don't need these
             if name.endswith(".rotary_emb.inv_freq"):
                 continue

Thank you for your response, but unfortunately, I don't know Python, and I'm unable to test your modifications. I look forward to an official update of the version.

@timopb
Copy link

timopb commented Jan 28, 2024

It seems like a bug, and the offending line is

model_kv = dict(self.get_tensors())

Which makes the script load the whole (80GB+) model into memory, instead of using mmap from torch

@lmxin123 Could you try the script with the changes below?

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
 
     def write_tensors(self):
         block_count = self.hparams["num_hidden_layers"]
-        model_kv = dict(self.get_tensors())
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
-        for name, data_torch in model_kv.items():
+        for name, data_torch in self.get_tensors():
             # we don't need these
             if name.endswith(".rotary_emb.inv_freq"):
                 continue

Having the same issue with converting falcon-40b on a machine with 24GB RAM. Process gets killed most likely due to lack of memory. I applied the patch from your response but it didn't help unfortunately.

@Galunid
Copy link
Collaborator

Galunid commented Jan 28, 2024

@lmxin123 Could you try with this script? https://gist.github.com/Galunid/c169dd4078c9cb11e8d8a4a8888eab2b
Just copy the contents into convert-hf-to-gguf.py and run it like you normally would.

@Galunid
Copy link
Collaborator

Galunid commented Jan 28, 2024

@timopb Falcon is a separate issue and the above is not applicable.

@arch-btw
Copy link
Contributor

arch-btw commented Feb 4, 2024

I'm having the same problem even with the new script by @Galunid

It's not just loading the original model into ram, it's also writing the new model to ram first, instead of disk.

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
@okwinds
Copy link

okwinds commented Apr 6, 2024

This issue is likely caused by an Out of Memory (OOM) error. You can prevent OOM by utilizing virtual memory and creating a swap file to allocate additional resources. Here's how you can establish a swap file and apply it to your virtual memory:

  1. First, create a swap file with a size that corresponds to the dimensions of your model. For instance, if you require a substantial amount of memory, you can allocate 20 gigabytes (20G) for the swap file using the following command:

    sudo fallocate -l 20G /swapfile

  2. Next, change the permissions of the swap file to ensure it is only accessible by the root user:

    sudo chmod 600 /swapfile

  3. Now, turn the file into a swap area that the system can use:

    sudo mkswap /swapfile

  4. Activate the swap file so that it is ready for use:

    sudo swapon /swapfile

With the swap file in place, you should be able to transform your model without encountering an OOM error.

After you have successfully transformed your model, you can disable the swap file from the virtual memory and delete it to free up space. Here's how:

  1. Turn off the swap file:

    sudo swapoff /swapfile

  2. Remove the swap file:

    sudo rm /swapfile

By following these steps, you can effectively manage your system's memory resources and prevent OOM errors during model transformations.

btw,
You can use the free -h command to check the status of your virtual memory and determine if it has been set up successfully.

Good luck~

I use python convert-hf-to-gguf.py /Qwen-72B-Chat. And I am getting the same error: blk.33.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.34.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.35.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.35.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 Killed

What does this mean “Killed”? @ggerganov @slaren @prusnak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

7 participants