Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Downloading with PyDrive #10

Open
xuedue opened this issue Apr 12, 2021 · 19 comments
Open

About Downloading with PyDrive #10

xuedue opened this issue Apr 12, 2021 · 19 comments

Comments

@xuedue
Copy link

xuedue commented Apr 12, 2021

Hello author, thank you for sharing.

I met "quota exceeded" error and I wanna download with pydrive. But in Step2, after click on enable drive API, I can't find the destop App,
image
I wanna know how to Select Desktop app and Download client configuration.

Thanks again.

@royorel
Copy link
Owner

royorel commented Apr 12, 2021

Hi @xuedue,

It looks like Google have updated that page. Just follow these instructions in the prerequisites section:
image

Specifically bullets 3 & 4. What PyDrive really needs is the credentials files.

I will update the readme file accordingly soon, sorry about the confusion

@xuedue
Copy link
Author

xuedue commented Apr 13, 2021

Thank you for your reply.

I have download the client_secrets.json.
image
image

After add --pydrive flag, I run the script but I met two error.

Sometimes the error is TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。
image

Sometimes the error appears as mentioned in another issue ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

image

I tried many times, but still don't know how to solve this problem.

@royorel
Copy link
Owner

royorel commented Apr 13, 2021

@xuedue

These errors seem like local connection or firewall issues, there is nothing I can do about it.

@xuedue
Copy link
Author

xuedue commented Apr 14, 2021

@xuedue

These errors seem like local connection or firewall issues, there is nothing I can do about it.

Thanks for your reply, I have resumed the download now, but is the size of this data set really as large as the following?

微信图片_20210414211505

Assuming that the size of each picture is 100K, the data set should be nearly 7G.
image

@royorel
Copy link
Owner

royorel commented Apr 14, 2021

@xuedue,

The script downloads the in-the-wild images of the original FFHQ dataset, because we take slightly larger crops than the original FFHQ dataset. The downloaded in-the-wild images are deleted after processing and aren't saved, but their overall size is indeed 0.93TB. The Final dataset size depends on the final resolution of the image (defined by the user), if you save 1024x1024 images the size will be about 90GB, slightly larger than the original FFHQ dataset because of the segmentation maps. For 256x256 resolution, the final size should indeed be around 7.5GB

@xuedue
Copy link
Author

xuedue commented Apr 14, 2021

@xuedue,

The script downloads the in-the-wild images of the original FFHQ dataset, because we take slightly larger crops than the original FFHQ dataset. The downloaded in-the-wild images are deleted after processing and aren't saved. The Final dataset size depends on the final resolution of the image (defined by the user), if you save 1024x1024 images the size will be about 90GB, slightly larger than the original FFHQ dataset because of the segmentation maps. For 256x256 resolution, the final size should indeed be around 7.5GB

Thank you for your reply.

I just run the get_ffhq_aging.bat without any modification. So the size of data set should be 7.5GB after downloading.

image

I am sorry to confirm again.

I wanna to know if my disk space only needs to be greater than 7.5GB. And Except for the 256*256 pictures that need to be saved, no other pictures will be downloaded to my disk? I am a little afraid that all 0.93TB of data will be stored on my disk.

@royorel
Copy link
Owner

royorel commented Apr 14, 2021

Your disk space shouldn't be larger than 7.5 GB. You can look at the download script and see that each thread deletes the in-the-wild image right after processing is done.

def _download_thread(spec_queue, exception_queue, stats, dst_dir, output_size, drive, download_kwargs):
with requests.Session() as session:
while not spec_queue.empty():
spec = spec_queue.get()
try:
if drive != None:
pydrive_utils.pydrive_download(drive, spec['file_url'], spec['file_path'])
else:
download_file(session, spec, stats, **download_kwargs)
if spec['file_path'].endswith('.png'):
align_in_the_wild_image(spec, dst_dir, output_size)
os.remove(spec['file_path'])
except:
exception_queue.put(sys.exc_info())
with stats['lock']:
stats['files_done'] += 1

So during download, the maximum number of in-the-wild images on your disk will be num_threads (default is 32).

@xuedue
Copy link
Author

xuedue commented Apr 16, 2021

I am sorry to ask again.

When I used PyDrive to download, I encountered a problem. In the middle of downloading, the program will report the following error. I am confused that the program does not report an error at the beginning but reports an error during the download.

I spent two days trying to download and search for solutions to this problem. But this problem is still unsolved.

微信图片_20210416173126

@royorel
Copy link
Owner

royorel commented Apr 16, 2021

Hi @xuedue,

This also seems like a local machine issue that doesn't relate to the downloading code.

Google-ing your error message suggests it might be a proxy issue. Here is the most relevant result:
aws/aws-cli#5773

@xuedue
Copy link
Author

xuedue commented Apr 17, 2021

Hi @xuedue,

This also seems like a local machine issue that doesn't relate to the downloading code.

Google-ing your error message suggests it might be a proxy issue. Here is the most relevant result:
aws/aws-cli#5773

Well, but I just use the school network.

I want to know that 256x256 resolution dataset has any difference with the origin NVIDIA FFHQ dataset except for the resolution?

Could you provide the resize code for me?

If I resize the original FFHQ dataset image to 256*256, does it mean that I have obtained the image data and annotation data(ffhq_aging_labels.csv) of your paper?

Sorry to bother you again.

@royorel
Copy link
Owner

royorel commented Apr 17, 2021

Well, but I just use the school network.

It might be an issue with your school's network than

I want to know that 256x256 resolution dataset has any difference with the origin NVIDIA FFHQ dataset except for the resolution?

There are differences in the dataset, we take larger crops, that's why we start from the in-the-wild images.

Could you provide the resize code for me?

The alignment code is this function:

def align_in_the_wild_image(spec, dst_dir, output_size, transform_size=4096, enable_padding=True):

If I resize the original FFHQ dataset image to 256*256, does it mean that I have obtained the image data and annotation data(ffhq_aging_labels.csv) of your paper?

No, since the crops are different, If you wish, you can run the segmentation code on the original FFHQ dataset to get the correct segmentation maps. The annotation data is correct regardless of the image size or cropping method (the age, gender and the rest of the labels don't change). However, since the crop size is different you won't get the exact same results that we got in the paper. That's because tighter crops don't capture the change in head shape through the years so well.

@woshixiaozhou
Copy link

@xuedue
这些错误似乎是本地连接或防火墙问题,对此我无能为力。

感谢您的答复,我现在恢复了下载,但是此数据集的大小真的和下面的一样大吗?

微信图片_20210414211505

假设每张图片的大小为100K,则数据集应接近7G。
图像

我也遇到了同样问题,请问您是如何解决的?

@royorel
Copy link
Owner

royorel commented Apr 18, 2021

@woshixiaozhou, please use english in comments

@ahripanto
Copy link

Hi there!
I get a similar issue. After creating an Auth2 API Key and downloading the client_secrets.json i can authorize the script.
But i get the quota limint error.
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v2/files/1Tkyob6bsb0POmg8gg-XXXXXXXXX?alt=json returned "User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=2260XXXXXX". Details: "[{'domain': 'usageLimits', 'reason': 'userRateLimitExceeded', 'message': 'User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=2260XXXXXX', 'extendedHelp': 'https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=2260XXXXX'}]">
It sounds like the script try to access the Google Drive of Tero Karras and not my own.

@royorel
Copy link
Owner

royorel commented Apr 22, 2021

Hi @ahripanto,

What we saw initially, which led us to write the pydrive optionality, was that we got a quota limit error with the original download script (which matches the script provided by Nvidia) even though we were able to manually download the same file from Nvidia's google drive. Using pydrive eliminated this issue, because it is using exactly the same API as in google drive. However, if the quota limit is indeed exceeded and you can't download a file manually, pydrive would not solve the issue. In that case that's a hard limit put by Google and there's nothing that can be done about it except waiting for the quota to be released.

The script indeed tries to access Nvidia's google drive and not your own. The dataset is just shared with you, it's not actually located on your personal google drive, there are 2 reasons for that:

  1. Storing the full dataset would require about 1TB of space, not everyone has that in their personal Google Drive account.
  2. To avoid any copyright issues. If a person requests to delete his image from the original FFHQ dataset (Nvidia provided that option) it will automatically be removed from our dataset as well. If you hold a copy of the dataset, that image won't be removed, and you (and us for providing a script to do that) would be violating copyrights.

@ahripanto
Copy link

Hi @ royorel,
thanks for your explaining answer. Currently i working on a solution, to get the dataset on AWS Bucket to share it over Torrent to avoid this quota limit for all other users.
I will report when i successful backup this dataset.

@royorel
Copy link
Owner

royorel commented Apr 26, 2021

@woshixiaozhou I'm asking once again to use English.

@wangtingwei1993
Copy link

@xuedue
Hi,

It seems that you have downloaded the dataset successfully. Could you please share the image (50958.png 256x256 pixel) with me because there is something wrong with this image (0 bytes) in my case. Thanks lot!

@royorel
Copy link
Owner

royorel commented May 21, 2021

Hi @wangtingwei1993,

We cannot directly share images. However, the non cropped in-the-wild image can be found at:
https://drive.google.com/file/d/13T3T9oVe0KfjRdjcOmLMjsPSQtK8ZlbX/view?usp=sharing

After downloading it you can apply the rest of the script to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants