Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
-
Updated
Jul 7, 2024 - Python
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Google, Naver multiprocess image web crawler (Selenium)
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Simple-IT-English: smart wordbook from community for community
AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Interview coding questions and experiences for several companies merged into one repository
Scalable Bloom Filter implemented in Python
A cross-platform Echarts dashboard application,Powerpoint-like, designed based on Excel data, with the capability to update data remotely.supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall. 支持柱状图、条形图、折线图、曲线图、折线填充图、曲线填充图、气泡图、扇形图。
Possibly the fastest DataFrame-agnostic quality check library in town.
Apache Spark 3 - Structured Streaming Course Material
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."