Skip to content

Commit 270b9da

Browse files
matkuliakamotl
authored andcommitted
Cluvio: Starter tutorial
1 parent 26db150 commit 270b9da

File tree

3 files changed

+231
-12
lines changed

3 files changed

+231
-12
lines changed

docs/integrate/cluvio/index.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,10 @@
1010
:::{rubric} About
1111
:::
1212

13-
[Cluvio] is a programmable and interactive dashboarding platform — your analytics
13+
[Cluvio] is a cloud-based business intelligence and analytics solution that
14+
enables businesses to analyze data through a dashboard.
15+
16+
It provides a programmable and interactive dashboarding platform — your analytics
1417
cockpit. Run queries, filter your results, choose the most vivid way to display them,
1518
and share them with your colleagues and partners without efforts.
1619

@@ -31,8 +34,8 @@ your entire company in few minutes.
3134
::::{grid} 2
3235

3336
:::{grid-item-card} Data Analysis with Cluvio and CrateDB
34-
:link: https://community.cratedb.com/t/data-analysis-with-cluvio-and-cratedb/1571
35-
:link-type: url
37+
:link: cluvio-tutorial
38+
:link-type: ref
3639
Explore how to leverage the power of Cluvio, a modern data analysis platform
3740
with CrateDB Cloud as the underlying database.
3841
:::
@@ -44,6 +47,12 @@ with CrateDB Cloud as the underlying database.
4447
[CrateDB and Cluvio]
4548
```
4649

50+
:::{toctree}
51+
:maxdepth: 1
52+
:hidden:
53+
Tutorial <tutorial>
54+
:::
55+
4756

4857
[Cluvio]: https://www.cluvio.com/
4958
[CrateDB and Cluvio]: https://cratedb.com/integrations/cratedb-and-cluvio

docs/integrate/cluvio/tutorial.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
(cluvio-tutorial)=
2+
# Data Analysis with Cluvio and CrateDB
3+
4+
## Introduction
5+
6+
In this tutorial, we'll explore how to leverage the power of [Cluvio](https://www.cluvio.com), a modern data analysis platform with [CrateDB Cloud](https://console.cratedb.cloud/) as the underlying database.
7+
8+
## Prerequisites
9+
10+
* [CrateDB Cloud cluster up and running](https://crate.io/docs/cloud/en/latest/tutorials/cluster-deployment/index.html)
11+
* [Cluvio account](https://www.cluvio.com/)
12+
13+
14+
## Set up CrateDB
15+
16+
Deploying a CrateDB cloud cluster has never been easier, simply follow our tutorial [here](https://crate.io/docs/cloud/en/latest/tutorials/cluster-deployment/stripe.html#cluster-deployment-stripe) and you can have a cluster up and running within minutes. We offer a CRFREE plan which offers up to 2 vCPUs, 2 GiB of memory, and 8 GiB of storage completely for free. Ideal for small-scale testing and evaluation purposes.
17+
18+
### Load data into CrateDB cluster
19+
20+
In this tutorial we'll use 2 tables as our datasource. [flights](http://stat-computing.org/dataexpo/2009) and [airports](https://openflights.org/data.php) from January of 2008.
21+
22+
#### Create tables
23+
24+
Once you have access to the Admin UI of your cluster, execute these statements in the console:
25+
26+
```sql
27+
CREATE TABLE airports (
28+
code character varying(3) NOT NULL,
29+
name character varying(100) NOT NULL,
30+
city character varying(50) NOT NULL,
31+
country character varying(50) NOT NULL,
32+
latitude double precision NOT NULL,
33+
longitude double precision NOT NULL,
34+
elevation integer NOT NULL
35+
);
36+
```
37+
38+
```sql
39+
CREATE TABLE flights (
40+
year integer,
41+
month integer,
42+
day_of_month integer,
43+
day_of_week integer,
44+
dep_time integer,
45+
crs_dep_time integer,
46+
arr_time integer,
47+
crs_arr_time integer,
48+
unique_carrier varchar(6),
49+
flight_num integer,
50+
tail_num varchar(8),
51+
actual_elapsed_time integer,
52+
crs_elapsed_time integer,
53+
air_time integer,
54+
arr_delay integer,
55+
dep_delay integer,
56+
origin varchar(3),
57+
dest varchar(3),
58+
distance integer,
59+
taxi_in integer,
60+
taxi_out integer,
61+
cancelled integer,
62+
cancellation_code varchar(1),
63+
diverted varchar(1),
64+
carrier_delay integer,
65+
weather_delay integer,
66+
nas_delay integer,
67+
security_delay integer,
68+
late_aircraft_delay integer,
69+
dep_timestamp timestamp,
70+
arr_timestamp timestamp
71+
);
72+
```
73+
74+
This creates 2 empty tables in your database. `flights` and `airports`, with the correct data types of the columns.
75+
76+
#### Import data
77+
78+
Now you should import the data into the tables. We will use Console "Import" feature in this example. Use the following links:
79+
80+
* airports - https://s3.amazonaws.com/crate.sampledata/flights/dataset-airports.csv.gz
81+
* flights - https://s3.amazonaws.com/crate.sampledata/flights/dataset-flights.csv.gz
82+
83+
![.csv import|690x315](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/0/059227c592c98e2025b64c1d2d22e20c24624359.png)
84+
85+
Make sure to use your pre-created tables in the "Table name" field, otherwise the column types may be created incorrectly. Do this for both .csv files:
86+
87+
![Import summary|690x224](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/3/364db0dc98d56d8e27e274fd072cb5f076cf1110.png)
88+
89+
After this your tables should no longer be empty. `airports` contains 5876 records, and `flights` 150 000 records.
90+
91+
## Connect CrateDB to Cluvio
92+
93+
Now that you have the dataset to visualize, you can connect your cluster to Cluvio.
94+
95+
In Cluvio, navigate to `Settings` -> `Datasources` -> `Add datasource`:
96+
97+
![Connecting CrateDB to Cluvio|649x500](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/9/9e05b3e35de868cc4cc33efb232afcf892652276.png){width=640}
98+
99+
`Host` and `Password` will understandably differ for you.
100+
101+
After filling out the details for your cluster, press `Next: Test Connection`. If the credentials are correct, you should see this success message:
102+
103+
![Successful connection message|690x110](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/2/2cd3cd0945b219ca0010ce672ee9808324ba4cfa.png){width=640}
104+
105+
## Dashboards
106+
107+
A dashboard is the main point of Cluvio. It is a collection of interactive reports, giving great insight into any area of your data. These are the types of charts Cluvio offers:
108+
109+
* [Table Chart](https://docs.cluvio.com/chart-types/table-chart)
110+
* [Pivot / Cohort Table Chart](https://docs.cluvio.com/chart-types/pivot-cohort-table-chart)
111+
* [Number Chart](https://docs.cluvio.com/chart-types/number-chart)
112+
* [Pie Chart](https://docs.cluvio.com/chart-types/pie-chart)
113+
* [Line Chart](https://docs.cluvio.com/chart-types/line-chart)
114+
* [Bar Chart](https://docs.cluvio.com/chart-types/bar-chart)
115+
* [Map Chart](https://docs.cluvio.com/chart-types/map-chart)
116+
* [Gauge Chart](https://docs.cluvio.com/chart-types/gauge-chart)
117+
* [XY / Bubble Chart](https://docs.cluvio.com/chart-types/xy-bubble-chart)
118+
* [Word Cloud Chart](https://docs.cluvio.com/chart-types/word-cloud-chart)
119+
* [Histogram Chart](https://docs.cluvio.com/chart-types/histogram-chart)
120+
121+
![Example dashboard|690x343](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/4/4cc716d7a71c91476dfe2affa55881d39afe5d93.png)
122+
123+
Now, let's create some and see how Cluvio works. Head to **[Dashboards](https://app.cluvio.com/dashboards)** -> `New Dashboard`. After naming your Dashboard, you can create your first report. Click the `New report` in the upper right.
124+
125+
### Number of flights and delays
126+
127+
The first piece information you might be interested in, for a given period, is the number of flights and average delays of departures and arrivals. This is the code for this report:
128+
129+
```
130+
SELECT
131+
COUNT(*) AS "Number of flights",
132+
AVG(dep_delay) AS "Average Departure Delay",
133+
AVG(arr_delay) AS "Average Arrival Delay"
134+
FROM doc.flights
135+
ORDER BY 1
136+
```
137+
This is a pretty simple query that counts the number of rows in the `flights` as the number of flights, and averages values in the `dep_delay` and `arr_delay` for the departure delays and arrival delays respectively.
138+
139+
![Number of flights and delays|690x117](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/4/4841404a21b56cb1e5b92736af8b79656b0912ec.png){width=800}
140+
141+
To see the information displayed this way, you need to switch to "Number" chart after running query.
142+
143+
### Country distribution
144+
145+
This query looks at the country distribution in the `airports` table:
146+
147+
```
148+
SELECT country,
149+
COUNT(1)
150+
FROM doc.airports
151+
GROUP BY country
152+
ORDER BY 2 DESC
153+
```
154+
155+
In this one, it's suitable to use pie chart to better see the distribution. We also used the `Value(%)` option for the legend, and edited the legend to show up to 25 values (countries).
156+
157+
![Country distribution|690x452](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/2/2f11e42d61e93395267f847b3ee91d5be0d076f9.png){width=800}
158+
159+
## Filters
160+
161+
[Filters](https://app.cluvio.com/settings/filters) offer a great way to quickly specify the condition under which you want to display your data.
162+
163+
In the `flights` table in `day_of_week` column 1 represents Monday, 2 means Tuesday, etc. Using that, we can create a filter to display data for a specific day of the week without changing the SQL in our reports.
164+
165+
```
166+
VALUES
167+
(1, 'Monday'),
168+
(2, 'Tuesday'),
169+
(3, 'Wednesday'),
170+
(4, 'Thursday'),
171+
(5, 'Friday'),
172+
(6, 'Saturday'),
173+
(7, 'Sunday')
174+
ORDER BY 1
175+
```
176+
177+
Now we can filter the data by day of the week:
178+
179+
![Using filter to display data for specific day of the week|690x255](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/9/90335d44316d329ebe6d70a6a63879dec52ee5e8.png){width=800}
180+
181+
Find out more about filters [here](https://docs.cluvio.com/filters/overview).
182+
183+
## SQL snippets
184+
185+
SQL snippets are small reusable pieces of code that can make your work easier within larger dataset. They are managed [here](https://app.cluvio.com/settings/sql-snippets).
186+
187+
We used them to create JOIN statements:
188+
189+
```
190+
JOIN doc.airports AS origin_airport ON flights.origin = origin_airport.code
191+
JOIN doc.airports AS dest_airport ON flights.dest = dest_airport.code
192+
```
193+
194+
This snippet creates two joins between the `flights` and `airports` tables, aliasing the `airports` table as `origin_airport` and `dest_airport` for the origin and destination airports, respectively.
195+
196+
Then create a report using the snippet:
197+
198+
```
199+
SELECT flights.year,
200+
flights.month,
201+
origin_airport.city AS origin_city,
202+
dest_airport.city AS destination_city,
203+
COUNT(*) AS number_of_flights
204+
FROM doc.flights
205+
[join_airports] -- Reference to the SQL Snippet
206+
WHERE [flight_filters] -- Reference our filters
207+
GROUP BY flights.year, flights.month, origin_airport.city, dest_airport.city
208+
ORDER BY number_of_flights DESC
209+
LIMIT 100;
210+
```
211+
212+
Using the SQL snippets and filters, we can quickly find out what is the most popular destination departing from Los Angeles (LAX) on a Tuesday. Pretty cool.
213+
214+
![Popular destinations|690x309](https://us1.discourse-cdn.com/flex020/uploads/crate/original/2X/7/7257af47c58215459e1fe2de135a966c19fedbd5.png)
215+
216+
## Conclusion
217+
218+
That's it for this tutorial. If using Cluvio could help you make sense of your data, feel free to head to [Cloud Console](https://console.cratedb.cloud/), connect your cluster to [Cluvio](https://app.cluvio.com/) and get started! Make sure to visit their [documentation](https://docs.cluvio.com/) to explore all the features.

docs/topic/bi/index.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,7 @@
77
Integrations of CrateDB with other tools, specifically related to business
88
analytics and intelligence software.
99

10-
11-
## PowerBI
12-
10+
- {ref}`cluvio`
1311
- {ref}`powerbi`
14-
15-
## Rill
16-
1712
- {ref}`rill`
18-
19-
## Tableau
20-
2113
- {ref}`tableau`

0 commit comments

Comments
 (0)