An all-in-one command-line toolbox for data scientists to complete their daily job easier and Low-code .
Support common data cleaning and machine learning methods from Pandas, Sklearn Seborn, etc. ---add /your_path/Bobatea_datatool/bin to /.bashrc
example : "PATH=/Bobatea_datatool/bin:$PATH"
source ~/.bashrc
and run install.sh :
$ bash install.sh
command | function |
---|---|
index | add index to data |
header | add header to data |
take | take wanted columns |
drop | drop unwant columns |
range | take wanted rows range |
merge | merge two datasheets together |
mergebyix | merge multiple csv files by index |
group | group by wanted column,output muti csvs |
csvf | feature value filter |
t | data transpose |
log | apply logarithm |
mean | apply arithmetic mean |
pca | apply Principal component analysis |
tsne | apply tSNE |
scatter | plot scatterter |
kmean | plot 2D kmeans |
pair | plot pair plot |
curve | plot line curve plot |
hcluster | plot hierarchical cluster plot |
These commands are well integrated with terminal commands like cat, |, >, etc... together to build the data pipeline faster and easier.
step1. get data :
$ wget http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
step2. add header, index and choose features, run PCA, and plot in just less than one minute.
$ cat iris.data | header --h "sepal_length,sepal_width,petal_length,petal_width,species" | drop species | pca | scatter
more exampls please checkout ./test/