Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ Since new expressions are a very common feature request, we wanted to make it ea
#### Step 1: Implement the function in Rust

Add your function to the appropriate crate (`daft-functions-json`, `daft-functions-utf8`, etc.).
For more advanced use cases, see existing implementations in [daft-functions-utf8](src/daft-functions-utf8/src/lib.rs)
For more advanced use cases, see existing implementations in [daft-functions-utf8](https://github.com/Eventual-Inc/Daft/blob/main/src/daft-functions-utf8/src/lib.rs)

```rs
// This prelude defines all required ScalarUDF dependencies.
Expand Down
24 changes: 12 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

|CI| |PyPI| |Latest Tag| |Coverage| |Slack|

`Website <https://www.getdaft.io>`_ • `Docs <https://docs.getdaft.io>`_ • `Installation <https://docs.getdaft.io/en/stable/install/>`_ • `Daft Quickstart <https://docs.getdaft.io/en/stable/quickstart/>`_ • `Community and Support <https://github.com/Eventual-Inc/Daft/discussions>`_
`Website <https://www.daft.ai>`_ • `Docs <https://docs.daft.ai>`_ • `Installation <https://docs.daft.ai/en/stable/install/>`_ • `Daft Quickstart <https://docs.daft.ai/en/stable/quickstart/>`_ • `Community and Support <https://github.com/Eventual-Inc/Daft/discussions>`_

Daft: Unified Engine for Data Analytics, Engineering & ML/AI
============================================================


`Daft <https://www.getdaft.io>`_ is a distributed query engine for large-scale data processing using Python or SQL, implemented in Rust.
`Daft <https://www.daft.ai>`_ is a distributed query engine for large-scale data processing using Python or SQL, implemented in Rust.

* **Familiar interactive API:** Lazy Python Dataframe for rapid and interactive iteration, or SQL for analytical queries
* **Focus on the what:** Powerful Query Optimizer that rewrites queries to be as efficient as possible
* **Data Catalog integrations:** Full integration with data catalogs such as Apache Iceberg
* **Rich multimodal type-system:** Supports multimodal types such as Images, URLs, Tensors and more
* **Seamless Interchange**: Built on the `Apache Arrow <https://arrow.apache.org/docs/index.html>`_ In-Memory Format
* **Built for the cloud:** `Record-setting <https://blog.getdaft.io/p/announcing-daft-02-10x-faster-io>`_ I/O performance for integrations with S3 cloud storage
* **Built for the cloud:** `Record-setting <https://www.daft.ai/blog/announcing-daft-02>`_ I/O performance for integrations with S3 cloud storage

**Table of Contents**

Expand Down Expand Up @@ -44,12 +44,12 @@ Installation

Install Daft with ``pip install daft``.

For more advanced installations (e.g. installing from source or with extra dependencies such as Ray and AWS utilities), please see our `Installation Guide <https://docs.getdaft.io/en/stable/install/>`_
For more advanced installations (e.g. installing from source or with extra dependencies such as Ray and AWS utilities), please see our `Installation Guide <https://docs.daft.ai/en/stable/install/>`_

Quickstart
^^^^^^^^^^

Check out our `quickstart <https://docs.getdaft.io/en/stable/quickstart/>`_!
Check out our `quickstart <https://docs.daft.ai/en/stable/quickstart/>`_!

In this example, we load images from an AWS S3 bucket's URLs and resize each image in the dataframe:

Expand Down Expand Up @@ -77,16 +77,16 @@ Benchmarks
----------
|Benchmark Image|

To see the full benchmarks, detailed setup, and logs, check out our `benchmarking page. <https://docs.getdaft.io/en/stable/resources/benchmarks/tpch/>`_
To see the full benchmarks, detailed setup, and logs, check out our `benchmarking page. <https://docs.daft.ai/en/stable/resources/benchmarks/tpch/>`_


More Resources
^^^^^^^^^^^^^^

* `Daft Quickstart <https://docs.getdaft.io/en/stable/quickstart/>`_ - learn more about Daft's full range of capabilities including dataloading from URLs, joins, user-defined functions (UDF), groupby, aggregations and more.
* `User Guide <https://docs.getdaft.io/en/stable/>`_ - take a deep-dive into each topic within Daft
* `API Reference <https://docs.getdaft.io/en/stable/api/>`_ - API reference for public classes/functions of Daft
* `SQL Reference <https://docs.getdaft.io/en/stable/sql/>`_ - Daft SQL reference
* `Daft Quickstart <https://docs.daft.ai/en/stable/quickstart/>`_ - learn more about Daft's full range of capabilities including dataloading from URLs, joins, user-defined functions (UDF), groupby, aggregations and more.
* `User Guide <https://docs.daft.ai/en/stable/>`_ - take a deep-dive into each topic within Daft
* `API Reference <https://docs.daft.ai/en/stable/api/>`_ - API reference for public classes/functions of Daft
* `SQL Reference <https://docs.daft.ai/en/stable/sql/>`_ - Daft SQL reference

Contributing
------------
Expand All @@ -108,7 +108,7 @@ The data that we collect is:
2. **Metadata-only:** We do not collect any of our users’ proprietary code or data
3. **For development only:** We do not buy or sell any user data

Please see our `documentation <https://docs.getdaft.io/en/stable/resources/telemetry/>`_ for more details.
Please see our `documentation <https://docs.daft.ai/en/stable/resources/telemetry/>`_ for more details.

.. image:: https://static.scarf.sh/a.png?x-pxid=31f8d5ba-7e09-4d75-8895-5252bbf06cf6

Expand All @@ -131,7 +131,7 @@ Related Projects
| `Dask DF <https://github.com/dask/dask>`_ | No | Python object | Yes | No | Some(Pandas) | Yes |
+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+

Check out our `engine comparison page <https://docs.getdaft.io/en/stable/resources/engine_comparison/>`_ for more details!
Check out our `engine comparison page <https://docs.daft.ai/en/stable/resources/engine_comparison/>`_ for more details!

License
-------
Expand Down
3 changes: 2 additions & 1 deletion daft/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -839,7 +839,8 @@ def write_json(
Note:
This call is **blocking** and will execute the DataFrame when called

!!! Currently only supported with the Native runner!
Warning:
Currently only supported with the Native runner!
"""
if write_mode not in ["append", "overwrite", "overwrite-partitions"]:
raise ValueError(
Expand Down
84 changes: 51 additions & 33 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,45 @@
* Daft User Guide
* Guide
* [Overview](index.md)
* [Installation](install.md)
* [Quickstart](quickstart.md)
* [Core Concepts](core_concepts.md)
* I/O
* [Overview](io/index.md)
* [Amazon Web Services](io/aws.md)
* [Apache Hudi](io/hudi.md)
* [Apache Iceberg](io/iceberg.md)
* [Delta Lake](io/delta_lake.md)
* [Hugging Face Datasets](io/huggingface.md)
* [Microsoft Azure](io/azure.md)
* [SQL](io/sql.md)
* [SQL](sql_overview.md)
* [Sessions](sessions.md)
* Catalogs
* [Overview](catalogs/index.md)
* [AWS Glue](catalogs/glue.md)
* [AWS S3 Tables](catalogs/s3tables.md)
* [Unity Catalog](catalogs/unity_catalog.md)
* [Spark Connect](spark_connect.md)
* [Distributed Computing](distributed.md)
* [Integrations](integrations.md)
* Advanced
* [Managing Memory Usage](advanced/memory.md)
* [Partitioning](advanced/partitioning.md)
* [Observability](advanced/observability.md)
* Resources
* [Architecture](resources/architecture.md)
* [Engine Comparison](resources/engine_comparison.md)
* [Tutorials](resources/tutorials.md)
* [Benchmarks](resources/benchmarks/tpch.md)
* [Telemetry](resources/telemetry.md)
* Migration Guide
* [Coming from Dask](migration/dask_migration.md)
* Modalities
* [Overview](modalities/index.md)
* [Custom Modalities](modalities/custom.md)
* [URLs and Files](modalities/urls.md)
* [Text](modalities/text.md)
* [Images](modalities/images.md)
* [JSON and Nested Data](modalities/json.md)
* Data Connectors
* [Overview](connectors/index.md)
* [Custom Connectors](connectors/custom.md)
* [AWS Glue](connectors/glue.md)
* [AWS S3 Tables](connectors/s3tables.md)
* [Apache Hudi](connectors/hudi.md)
* [Apache Iceberg](connectors/iceberg.md)
* [Azure Blob Store](connectors/azure.md)
* [Delta Lake](connectors/delta_lake.md)
* [Hugging Face Datasets](connectors/huggingface.md)
* [S3](connectors/aws.md)
* [SQL Databases](connectors/sql.md)
* [Unity Catalog (Databricks)](connectors/unity_catalog.md)
* Running Custom Python Code
* [Overview](custom-code/index.md)
* [User-Defined Functions (UDFs)](custom-code/udfs.md)
* [Working with GPUs](custom-code/gpu.md)
* [External APIs](custom-code/apis.md)
* [Scaling Out and Deployment](distributed.md)
* Optimization and Debugging
* [Overview](optimization/index.md)
* [Architecture](optimization/architecture.md)
* [Managing Memory Usage](optimization/memory.md)
* [Partitioning](optimization/partitioning.md)
* [Observability](optimization/observability.md)
* [Benchmarks](benchmarks/index.md)
* [Community <sup>↗</sup>](http://www.daft.ai/slack)
* [Contributing](contributing-to-daft.md)
* [Roadmap](roadmap.md)
* [Release Notes <sup>↗</sup>](https://github.com/Eventual-Inc/Daft/releases)
* [Usage Telemetry](telemetry.md)
* Python API
* [Overview](api/index.md)
* [I/O](api/io.md)
Expand All @@ -50,6 +54,7 @@
* [Data Types](api/datatypes.md)
* [Aggregations](api/aggregations.md)
* [Series](api/series.md)
* [Spark Connect](api/spark_connect.md)
* [Configuration](api/config.md)
* [Miscellaneous](api/misc.md)
* SQL Reference
Expand All @@ -60,3 +65,16 @@
* [USE](sql/statements/use.md)
* [Data Types](sql/datatypes.md)
* [Window Functions](sql/window_functions.md)

<!--
TODO
* [Custom Connectors](connectors/custom.md)
* [CSV](connectors/csv.md)
* [Google Cloud Storage (GCS)](connectors/gcs.md)
* [HTTP](connectors/http.md)
* [JSON](connectors/json.md)
* [Lance](connectors/lance.md)
* [Parquet](connectors/parquet.md)
* [Turbopuffer](connectors/turbopuffer.md)
* [WARC (Web ARChive)](connectors/warc.md)
-->
4 changes: 3 additions & 1 deletion docs/api/catalogs_tables.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Catalogs and Tables

Daft integrates with various catalog implementations using its `Catalog` and `Table` interfaces. These are high-level APIs to manage catalog objects (tables and namespaces), while also making it easy to leverage Daft's existing `daft.read_` and `df.write_` APIs for open table formats like [Iceberg](../io/iceberg.md) and [Delta Lake](../io/delta_lake.md). Learn more about [Catalogs & Tables](../catalogs/index.md) in Daft User Guide.
Daft integrates with various catalog implementations using its `Catalog` and `Table` interfaces. These are high-level APIs to manage catalog objects (tables and namespaces), while also making it easy to leverage Daft's existing `daft.read_` and `df.write_` APIs for open table formats like [Iceberg](../connectors/iceberg.md) and [Delta Lake](../connectors/delta_lake.md).

<!-- Learn more about [Catalogs & Tables](../catalogs/index.md) in Daft User Guide. -->

::: daft.catalog.Catalog
options:
Expand Down
2 changes: 1 addition & 1 deletion docs/api/io.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# I/O

Daft offers a variety of approaches to creating a DataFrame from reading various data sources (in-memory data, files, data catalogs, and integrations) and writing to various data sources. See more about [I/O](../io/index.md) in Daft User Guide.
Daft offers a variety of approaches to creating a DataFrame from reading various data sources (in-memory data, files, data catalogs, and integrations) and writing to various data sources. See more about other [Connectors](../connectors/index.md) in Daft User Guide.

## Input

Expand Down
File renamed without changes.
File renamed without changes.
164 changes: 0 additions & 164 deletions docs/catalogs/index.md

This file was deleted.

5 changes: 2 additions & 3 deletions docs/io/aws.md → docs/connectors/aws.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Amazon Web Services
# Reading from and Writing to AWS S3

Daft is able to read/write data to/from AWS S3, and understands natively the URL protocol `s3://` as referring to data that resides
in S3.
Daft has [record-setting](https://www.daft.ai/blog/announcing-daft-02) performance when reading from and writing to AWS S3, and understands natively the URL protocol `s3://` as referring to data that resides in S3.

## Authorization/Authentication

Expand Down
Loading
Loading