Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat not dump to bin when download finished yahoo data #1685

Open
chujun-L opened this issue Oct 27, 2023 · 4 comments
Open

cat not dump to bin when download finished yahoo data #1685

chujun-L opened this issue Oct 27, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@chujun-L
Copy link

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4681/4681 [8:14:52<00:00, 6.34s/it]
[]
2023-10-27 04:38:09.199 | INFO | data_collector.base:_collector:195 - error symbol nums: 0
2023-10-27 04:38:09.199 | INFO | data_collector.base:_collector:196 - current get symbol nums: 4681
2023-10-27 04:38:09.199 | INFO | data_collector.base:collector_data:209 - 1 finish.
2023-10-27 04:38:09.199 | INFO | data_collector.base:collector_data:216 - total 4681, error: 0
2023-10-27 04:38:09.199 | INFO | collector:download_index_data:236 - get bench data: csi300(000300)......
2023-10-27 04:38:14.459 | INFO | collector:download_index_data:236 - get bench data: csi100(000903)......
2023-10-27 04:38:19.705 | INFO | collector:download_index_data:236 - get bench data: csi500(000905)......
2023-10-27 04:38:24.962 | INFO | data_collector.utils:get_calendar_list:68 - get calendar list: ALL......
2023-10-27 04:38:36.805 | WARNING | data_collector.utils:wrapper:491 - _get_calendar: 1 :2008-05-->Expecting value: line 1 column 1 (char 0)
2023-10-27 04:38:58.022 | INFO | data_collector.utils:get_calendar_list:106 - end of get calendar list: ALL.
[43216:MainThread](2023-10-27 04:38:58,023) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[43216:MainThread](2023-10-27 04:38:58,460) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings.
[43216:MainThread](2023-10-27 04:38:58,461) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/ranzhi/alex/qlib_data')}
2023-10-27 04:39:04.391 | INFO | data_collector.base:normalize:312 - normalize data......
0%| | 0/4474 [00:01<?, ?it/s]
[43216:MainThread](2023-10-27 04:39:30,975) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[TypeError: cannot convert the series to <class 'float'>].
File "scripts/data_collector/yahoo/collector.py", line 1223, in
fire.Fire(Run)
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "scripts/data_collector/yahoo/collector.py", line 1198, in update_data_to_bin
self.normalize_data_1d_extend(qlib_data_1d_dir)
File "scripts/data_collector/yahoo/collector.py", line 1082, in normalize_data_1d_extend
yc.normalize()
File "/ranzhi/alex/qlib/scripts/data_collector/base.py", line 317, in normalize
for _ in worker.map(self._executor, file_list):
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
TypeError: cannot convert the series to <class 'float'>

@chujun-L chujun-L added the bug Something isn't working label Oct 27, 2023
@SunsetWolf
Copy link
Collaborator

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

@ElonJustin7
Copy link

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

Excuse me, do you know why there is an error "ValueError: 2002-01-->Expecting value: line 1 column 1 (char 0)" when normalizing data downloaded from Yahoo Finance using the collector function? It seems to be a problem encountered by _get_calendar. Thanks!!
截屏2023-11-23 21 35 49

@SunsetWolf
Copy link
Collaborator

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

Excuse me, do you know why there is an error "ValueError: 2002-01-->Expecting value: line 1 column 1 (char 0)" when normalizing data downloaded from Yahoo Finance using the collector function? It seems to be a problem encountered by _get_calendar. Thanks!! 截屏2023-11-23 21 35 49

I noticed that you are asking a lot of questions at the same time, scattered in different issues, and it's hard to understand why you are doing this.
My tried to normalize the data using the yahoo.collector script did not reproduce the issue you raised, and the raw data I used was from 2000 to 2023.
Are you able to provide more information to help me reproduce this issue.

@ElonJustin7
Copy link

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

Excuse me, do you know why there is an error "ValueError: 2002-01-->Expecting value: line 1 column 1 (char 0)" when normalizing data downloaded from Yahoo Finance using the collector function? It seems to be a problem encountered by _get_calendar. Thanks!! 截屏2023-11-23 21 35 49

I noticed that you are asking a lot of questions at the same time, scattered in different issues, and it's hard to understand why you are doing this. My tried to normalize the data using the yahoo.collector script did not reproduce the issue you raised, and the raw data I used was from 2000 to 2023. Are you able to provide more information to help me reproduce this issue.

Thank you for your attention! I later resolved the issue. When downloading stock data from Yahoo Finance, I needed to use a VPN due to firewall restrictions. However, when normalizing the data, having the VPN enabled resulted in the error mentioned above. Closing the VPN allowed for successful normalization, which means whether the VPN is on or off, the update_data_to_bin method will encounter issues.

So, for the issue of being unable to access Yahoo Finance directly, my suggestion is to first enable VPN to download the data locally, and then turn off VPN before normalizing the data. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants