Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Metadata mismatch found in from_delayed. #107

Open
wangjianqiao111 opened this issue May 29, 2024 · 2 comments
Open

ValueError: Metadata mismatch found in from_delayed. #107

wangjianqiao111 opened this issue May 29, 2024 · 2 comments

Comments

@wangjianqiao111
Copy link

Please make sure that this is a bug.

System information

  • OS Platform and Distribution (e.g., CentOS 7.6):linux
  • Python version:3.10
  • HyperGBM version:0.3.2
  • Other Python packages(run pip list):
    Package Version

anyio 4.3.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
attrs 23.2.0
Babel 2.15.0
bcrypt 4.1.3
beautifulsoup4 4.12.3
bleach 6.1.0
catboost 1.2.5
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
coloredlogs 15.0.1
comm 0.2.2
cryptography 3.4
cx-Oracle 8.3.0
cycler 0.12.1
dask 2024.5.1
dask-expr 1.1.1
dask-glm 0.3.2
dask-ml 2024.4.4
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
distributed 2024.5.1
exceptiongroup 1.2.1
executing 2.0.1
fastjsonschema 2.19.1
featuretools 1.31.0
Flask 1.1.2
Flask-Cors 3.0.10
Flask-OpenTracing 1.1.0
flatbuffers 1.12
fonttools 4.51.0
fqdn 1.5.1
fsspec 2024.5.0
ftputil 5.0.4
future 1.0.0
google 3.0.0
graphviz 0.20.3
greenlet 3.0.3
grpcio 1.63.0
grpcio-opentracing 1.1.4
grpcio-reflection 1.34.1
gunicorn 20.1.0
h11 0.14.0
holidays 0.48
httpcore 1.0.5
httpx 0.27.0
humanfriendly 10.0
hypergbm 0.3.2
hypernets 0.3.2
ibm-db 3.2.0
ibm-db-sa 0.4.0
idna 3.7
imbalanced-learn 0.12.2
importlib_metadata 7.1.0
importlib_resources 6.4.0
iniconfig 2.0.0
ipykernel 6.29.4
ipython 8.24.0
ipython-genutils 0.2.0
ipywidgets 8.1.2
isoduration 20.11.0
itsdangerous 2.2.0
jaeger-client 4.4.0
jedi 0.19.1
jieba 0.42.1
Jinja2 3.1.4
joblib 1.4.2
json5 0.9.25
jsonpointer 2.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.1
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.0
jupyter_server_terminals 0.5.3
jupyterlab 4.2.0
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.1
jupyterlab_widgets 3.0.10
kiwisolver 1.4.5
lightgbm 4.3.0
llvmlite 0.42.0
locket 1.0.0
MarkupSafe 2.1.5
matplotlib 3.5.3
matplotlib-inline 0.1.7
minio 7.1.17
mistune 3.0.2
mpmath 1.3.0
msgpack 1.0.8
multipledispatch 1.0.0
mysqlclient 2.1.1
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
nltk 3.8.1
notebook 7.2.0
notebook_shim 0.2.4
numba 0.59.1
numpy 1.26.4
onnx 1.16.0
onnxruntime 1.18.0
opencv-python 4.9.0.80
opentracing 2.4.0
overrides 7.7.0
packaging 24.0
pandas 2.2.2
pandocfilters 1.5.1
paramiko 3.4.0
parso 0.8.4
partd 1.4.2
pexpect 4.9.0
pika 1.3.2
pillow 10.3.0
pip 24.0
platformdirs 4.2.2
plotly 5.22.0
pluggy 1.5.0
prettytable 3.10.0
prometheus_client 0.20.0
prompt-toolkit 3.0.43
protobuf 3.19.0
psutil 5.9.8
psycopg2-binary 2.9.9
ptyprocess 0.7.0
pure-eval 0.2.2
pure-sasl 0.6.2
py4j 0.10.9.7
pyarrow 14.0.1
pycparser 2.22
pycryptodome 3.20.0
Pygments 2.18.0
pygraphviz 1.11
PyHive 0.7.0
pymssql 2.2.9
PyNaCl 1.5.0
pyparsing 3.1.2
pyrsistent 0.20.0
pyspark 3.5.1
pytest 8.2.1
python-dateutil 2.9.0.post0
python-json-logger 2.0.7
pytz 2024.1
PyYAML 5.4.1
pyzmq 26.0.3
referencing 0.35.1
regex 2024.5.15
requests 2.31.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.18.1
sasl 0.3.1
scikit-learn 1.4.2
scikit-plot 0.3.7
scipy 1.13.0
seldon-core 1.10.0
semantic-version 2.10.0
Send2Trash 1.8.3
setuptools 69.2.0
setuptools-rust 1.9.0
shap 0.45.1
six 1.16.0
sklearn-pandas 2.2.0
sklearn2pmml 0.107.1
slicer 0.0.8
sniffio 1.3.1
sortedcontainers 2.4.0
soupsieve 2.5
sparse 0.15.2
SQLAlchemy 2.0.30
stack-data 0.6.3
sympy 1.12
tblib 3.0.0
tenacity 8.3.0
teradatasql 20.0.0.12
teradatasqlalchemy 20.0.0.1
terminado 0.18.1
threadloop 1.0.2
threadpoolctl 3.5.0
thrift 0.13.0
thrift-sasl 0.4.3
tinycss2 1.3.0
tomli 2.0.1
toolz 0.12.1
tornado 6.4
tqdm 4.66.4
traitlets 5.14.3
types-python-dateutil 2.9.0.20240316
typing_extensions 4.11.0
tzdata 2024.1
uri-template 1.3.0
urllib3 1.25.9
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.3
wheel 0.43.0
widgetsnbextension 4.0.10
woodwork 0.31.0
xgboost 2.0.3
XlsxWriter 3.2.0
zict 3.0.0
zipp 3.18.2

Describe the current behavior
'''2024-05-29 14:55:30 [ERROR] Traceback (most recent call last):
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2024-05-29 14:55:30 [ERROR] return _run_code(code, main_globals, None,
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/runpy.py", line 86, in _run_code
2024-05-29 14:55:30 [ERROR] exec(code, run_globals)
2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 144, in
2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 132, in
2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 41, in get_args_func
2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 73, in _execfile
2024-05-29 14:55:30 [ERROR] File "main.py", line 126, in
2024-05-29 14:55:30 [ERROR] step.fit(df_train=df_train, df_test=df_test)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/workdir/code_120b1eff-da34-4ae4-b9eb-752dd0f776ba/hypergbm_step.py", line 131, in fit
2024-05-29 14:55:30 [ERROR] experiment = make_experiment(log_level='INFO', verbose=1, use_cache=False, **hypergbm_params_input)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypergbm/experiment.py", line 226, in make_experiment
2024-05-29 14:55:30 [ERROR] experiment = _make_experiment(hyper_model_cls, train_data,
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/experiment/_maker.py", line 378, in make_experiment
2024-05-29 14:55:30 [ERROR] id = hasher(dict(X_train=X_train, y_train=y_train, X_test=X_test, X_eval=X_eval, y_eval=y_eval,
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/data_hasher.py", line 20, in call
2024-05-29 14:55:30 [ERROR] for x in self._iter_data(data):
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/dask_ex/_data_hasher.py", line 21, in _iter_data
2024-05-29 14:55:30 [ERROR] yield from super()._iter_data(data)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/data_hasher.py", line 58, in _iter_data
2024-05-29 14:55:30 [ERROR] yield from self._iter_data(v)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/dask_ex/_data_hasher.py", line 15, in _iter_data
2024-05-29 14:55:30 [ERROR] yield from self._iter_dask_dataframe(data)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/dask_ex/_data_hasher.py", line 30, in _iter_dask_dataframe
2024-05-29 14:55:30 [ERROR] meta={name: 'u8'}).compute()
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask_expr/_collection.py", line 476, in compute
2024-05-29 14:55:30 [ERROR] return DaskMethodsMixin.compute(out, **kwargs)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask/base.py", line 375, in compute
2024-05-29 14:55:30 [ERROR] (result,) = compute(self, traverse=False, **kwargs)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask/base.py", line 661, in compute
2024-05-29 14:55:30 [ERROR] results = schedule(dsk, keys, **kwargs)
2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask/dataframe/utils.py", line 424, in check_meta
2024-05-29 14:55:30 [ERROR] raise ValueError(
2024-05-29 14:55:30 [ERROR] ValueError: Metadata mismatch found in from_delayed.
2024-05-29 14:55:30 [ERROR] Partition type: pandas.core.frame.DataFrame
2024-05-29 14:55:30 [ERROR] +-------------+--------+----------+
2024-05-29 14:55:30 [ERROR] | Column | Found | Expected |
2024-05-29 14:55:30 [ERROR] +-------------+--------+----------+
2024-05-29 14:55:30 [ERROR] | 'contact' | object | string |
2024-05-29 14:55:30 [ERROR] | 'default' | object | string |
2024-05-29 14:55:30 [ERROR] | 'education' | object | string |
2024-05-29 14:55:30 [ERROR] | 'housing' | object | string |
2024-05-29 14:55:30 [ERROR] | 'job' | object | string |
2024-05-29 14:55:30 [ERROR] | 'loan' | object | string |
2024-05-29 14:55:30 [ERROR] | 'marital' | object | string |
2024-05-29 14:55:30 [ERROR] | 'month' | object | string |
2024-05-29 14:55:30 [ERROR] | 'poutcome' | object | string |
2024-05-29 14:55:30 [ERROR] | 'y' | object | string |
2024-05-29 14:55:30 [ERROR] +-------------+--------+----------+
2024-05-29 14:55:31 [ERROR] errorCode is 1'''

Describe the expected behavior

'''run successful'''
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Jupyter notebook.

Are you willing to submit PR?(Yes/No)

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.

@wangjianqiao111
Copy link
Author

运行HyperGBM-dask,二分类任务时报错如上

@lixfz
Copy link
Collaborator

lixfz commented May 30, 2024

you can downgrade pandas to ver 1.5.x and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants