Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add the dat tables to the readme #44

Merged
merged 2 commits into from
Jan 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,179 @@ For an example implementation of this, see the example PySpark tests in `tests/p

TBD.

## Generated tables

**all_primitive_types**

Table containing all non-nested types.

```
+----+-----+-----+-----+----+-------+-------+-----+-------------+-------+----------+-------------------+
|utf8|int64|int32|int16|int8|float32|float64| bool| binary|decimal| date32| timestamp|
+----+-----+-----+-----+----+-------+-------+-----+-------------+-------+----------+-------------------+
| 0| 0| 0| 0| 0| 0.0| 0.0| true| []| 10.000|1970-01-01|1970-01-01 00:00:00|
| 1| 1| 1| 1| 1| 1.0| 1.0|false| [00]| 11.000|1970-01-02|1970-01-01 01:00:00|
| 2| 2| 2| 2| 2| 2.0| 2.0| true| [00 00]| 12.000|1970-01-03|1970-01-01 02:00:00|
| 3| 3| 3| 3| 3| 3.0| 3.0|false| [00 00 00]| 13.000|1970-01-04|1970-01-01 03:00:00|
| 4| 4| 4| 4| 4| 4.0| 4.0| true|[00 00 00 00]| 14.000|1970-01-05|1970-01-01 04:00:00|
+----+-----+-----+-----+----+-------+-------+-----+-------------+-------+----------+-------------------+
```

**basic_append**

A basic table with two append writes.

```
+------+------+-------+
|letter|number|a_float|
+------+------+-------+
| a| 1| 1.1|
| b| 2| 2.2|
| c| 3| 3.3|
| d| 4| 4.4|
| e| 5| 5.5|
+------+------+-------+
```

**basic_partitioned**

A basic partitioned table.

```
+------+------+-------+
|letter|number|a_float|
+------+------+-------+
| b| 2| 2.2|
| NULL| 6| 6.6|
| c| 3| 3.3|
| a| 1| 1.1|
| a| 4| 4.4|
| e| 5| 5.5|
+------+------+-------+
```

**multi_partitioned**

Multiple levels of partitioning, with boolean, timestamp, and decimal partition columns.

```
+-----+-------------------+--------------------+---+
| bool| time| amount|int|
+-----+-------------------+--------------------+---+
|false|1970-01-02 08:45:00|12.00000000000000...| 3|
| true|1970-01-01 00:00:00|200.0000000000000...| 1|
| true|1970-01-01 12:30:00|200.0000000000000...| 2|
+-----+-------------------+--------------------+---+
```

**multi_partitioned_2**

Multiple levels of partitioning, with boolean, timestamp, and decimal partition columns.

```
+-----+-------------------+--------------------+---+
| bool| time| amount|int|
+-----+-------------------+--------------------+---+
|false|1970-01-02 08:45:00|12.00000000000000...| 3|
| true|1970-01-01 00:00:00|200.0000000000000...| 1|
| true|1970-01-01 12:30:00|200.0000000000000...| 2|
+-----+-------------------+--------------------+---+
```

**nested_types**

Table containing various nested types.

```
+---+------------+---------------+--------------------+
| pk| struct| array| map|
+---+------------+---------------+--------------------+
| 0| {0.0, true}| [0]| {}|
| 1|{1.0, false}| [0, 1]| {0 -> 0}|
| 2| {2.0, true}| [0, 1, 2]| {0 -> 0, 1 -> 1}|
| 3|{3.0, false}| [0, 1, 2, 3]|{0 -> 0, 1 -> 1, ...|
| 4| {4.0, true}|[0, 1, 2, 3, 4]|{0 -> 0, 1 -> 1, ...|
+---+------------+---------------+--------------------+
```

**no_replay**

Table with a checkpoint and prior commits cleaned up.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**no_stats**

Table with no stats.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**stats_as_structs**

Table with stats only written as struct (not JSON) with Checkpoint.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**with_checkpoint**

Table with a checkpoint.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**with_schema_change**

Table which has schema change using overwriteSchema=True.

```
+----+----+
|num1|num2|
+----+----+
| 22| 33|
| 44| 55|
| 66| 77|
+----+----+
```

## Models

The test cases contain several JSON files to be read by connector tests. To make it easier to read them, we provide [JSON schemas](https://json-schema.org/) for each of the file types in `out/schemas/`. They can be read to understand
Expand Down
Loading