Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
Updated
Jul 1, 2024 - Go
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
A light-weight, flexible, and expressive statistical data testing library
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Concurrent and multi-stage data ingestion and data processing with Elixir
Large-scale pretraining for dialogue
Extract Transform Load for Python 3.5+
Python Stream Processing
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Kubernetes-native platform to run massively parallel data/streaming jobs
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Data and tools for generating and inspecting OLMo pre-training data.
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Advanced and Fast Data Transformation in R
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."