Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs](load) add data type docs #1521

Merged
merged 8 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions docs/data-operate/import/complex-types/array.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
{
"title": "ARRAY",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

`ARRAY<T>` An array of T-type items, it cannot be used as a key column.

- Before version 2.0, it was only supported in the Duplicate model table.
- Starting from version 2.0, it is supported in the non-key columns of the Unique model table.

T-type could be any of:

```sql
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
```

## CSV format import

### Step 1: Prepare the data

Create the following csv file: `test_array.csv`
The separator is `|` instead of comma to distinguish it from the comma in array.

```
1|[1,2,3,4,5]
2|[6,7,8]
3|[]
4|null
```

### Step 2: Create a table in the database

```sql
CREATE TABLE `array_test` (
`id` INT NOT NULL,
`c_array` ARRAY<INT> NULL
)
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
```

### Step 3: Load data

```bash
curl --location-trusted \
-u "root":"" \
-H "column_separator:|" \
-H "columns: id, c_array" \
-T "test_array.csv" \
http://localhost:8040/api/testdb/array_test/_stream_load
```

### Step 4: Check the imported data

```sql
mysql> SELECT * FROM array_test;
+------+-----------------+
| id | c_array |
+------+-----------------+
| 1 | [1, 2, 3, 4, 5] |
| 2 | [6, 7, 8] |
| 3 | [] |
| 4 | NULL |
+------+-----------------+
4 rows in set (0.01 sec)
```

## JSON format import

### Step 1: Prepare the data

Create the following JSON file, `test_array.json`

```json
[
{"id":1, "c_array":[1,2,3,4,5]},
{"id":2, "c_array":[6,7,8]},
{"id":3, "c_array":[]},
{"id":4, "c_array":null}
]
```

### Step 2: Create a table in the database

```sql
CREATE TABLE `array_test` (
`id` INT NOT NULL,
`c_array` ARRAY<INT> NULL
)
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
```

### Step 3: Load data

```bash
curl --location-trusted \
-u "root":"" \
-H "format:json" \
-H "columns: id, c_array" \
-H "strip_outer_array:true" \
-T "test_array.json" \
http://localhost:8040/api/testdb/array_test/_stream_load
```

### Step 4: Check the imported data

```sql
mysql> SELECT * FROM array_test;
+------+-----------------+
| id | c_array |
+------+-----------------+
| 1 | [1, 2, 3, 4, 5] |
| 2 | [6, 7, 8] |
| 3 | [] |
| 4 | NULL |
+------+-----------------+
4 rows in set (0.01 sec)
```
149 changes: 149 additions & 0 deletions docs/data-operate/import/complex-types/json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
{
"title": "JSON",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

The JSON data type stores JSON data efficiently in a binary format and allows access to its internal fields through JSON functions.

By default, it supports up to 1048576 bytes (1MB), and can be increased up to 2147483643 bytes (2GB). This can be adjusted via the string_type_length_soft_limit_bytes configuration.

Compared to storing JSON strings in a regular STRING type, the JSON type has two main advantages:

JSON format validation during data insertion.
More efficient binary storage format, enabling faster access to JSON internal fields using functions like json_extract, compared to get_json_xx functions.
Note: In version 1.2.x, the JSON type was named JSONB. To maintain compatibility with MySQL, it was renamed to JSON starting from version 2.0.0. Older tables can still use the previous name.

## CSV format import

### Step 1: Prepare the data

Create the following csv file: `test_json.csv`
The separator is `|` instead of comma to distinguish it from the comma in json.

```
1|{"name": "tom", "age": 35}
2|{"name": null, "age": 28}
3|{"name": "micheal", "age": null}
4|{"name": null, "age": null}
5|null
```

### Step 2: Create a table in the database

```sql
CREATE TABLE json_test (
id INT NOT NULL,
c_json JSON NULL
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
```

### Step 3: Load data

```bash
curl --location-trusted \
-u "root":"" \
-H "column_separator:|" \
-H "columns: id, c_json" \
-T "test_json.csv" \
http://localhost:8040/api/testdb/json_test/_stream_load
```

### Step 4: Check the imported data

```sql
SELECT * FROM json_test;
+------+-------------------------------+
| id | c_json |
+------+-------------------------------+
| 1 | {"name":"tom","age":35} |
| 2 | {"name":null,"age":28} |
| 3 | {"name":"micheal","age":null} |
| 4 | {"name":null,"age":null} |
| 5 | null |
+------+-------------------------------+
5 rows in set (0.01 sec)
```

## JSON format import

### Step 1: Prepare the data

Create the following JSON file, `test_json.json`

```json
[
{"id": 1, "c_json": {"name": "tom", "age": 35}},
{"id": 2, "c_json": {"name": null, "age": 28}},
{"id": 3, "c_json": {"name": "micheal", "age": null}},
{"id": 4, "c_json": {"name": null, "age": null}},
{"id": 5, "c_json": null}
]
```

### Step 2: Create a table in the database

```sql
CREATE TABLE json_test (
id INT NOT NULL,
c_json JSON NULL
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
```

### Step 3: Load data

```bash
curl --location-trusted \
-u "root":"" \
-H "format:json" \
-H "columns: id, c_json" \
-H "strip_outer_array:true" \
-T "test_json.json" \
http://localhost:8040/api/testdb/json_test/_stream_load
```

### Step 4: Check the imported data

```sql
mysql> SELECT * FROM json_test;
+------+-------------------------------+
| id | c_json |
+------+-------------------------------+
| 1 | {"name":"tom","age":35} |
| 2 | {"name":null,"age":28} |
| 3 | {"name":"micheal","age":null} |
| 4 | {"name":null,"age":null} |
| 5 | NULL |
+------+-------------------------------+
5 rows in set (0.01 sec)
```
Loading
Loading