You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrate/locust/tutorial.md
+50-46Lines changed: 50 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,25 +3,16 @@
3
3
4
4
## Introduction
5
5
6
-
As with every other database, users want to run performance tests to get a feel for the performance of their workload.
6
+
Like with any database, you’ll want to run performance tests to understand
7
+
your workload’s behavior.
7
8
8
9
CrateDB offers a couple of tools that can be used for specific use cases. For example, the [nodeIngestBench][] allows you to run high-performance ingest benchmarks against a CrateDB cluster or use the [TimeIt][] function within the cr8 toolkit to measure the runtime of a given SQL statement on a cluster.
We use Locust as the framework to run load tests with a customizable set of SQL statements. [Locust][] is a great, flexible, open-source (Python) framework that can swarm the database with users and get the RPS (request per second) for different queries. This small blog shows how to use Locust to load test CrateDB in your environment.
11
+
Use Locust to run load tests with a customizable set of SQL statements. [Locust][] is a flexible, open‑source Python framework that can swarm the database with users and report RPS (requests per second) per query. This tutorial shows how to use Locust to load test CrateDB in your environment.
14
12
15
-
For this blog, I’m running a 3-node cluster created in a local docker environment as described in this [tutorial][].
For this tutorial, we use a 3‑node local Docker cluster (see this [tutorial][]).
19
14
20
-
First, we must set up the data model and load some data. I’m using [DBeaver][] to connect in this case, but this can be done by either the [CrateDB CLI tools][] or the Admin UI that comes with either the self- or [fully-managed][] CrateDB solution.
First, set up the data model and load data. This example uses [DBeaver][], but you can also use the [CrateDB CLI tools][] or the Admin UI in self‑managed or [fully-managed][] CrateDB.
25
16
26
17
Create the following tables:
27
18
@@ -45,11 +36,11 @@ CREATE TABLE IF NOT EXISTS "weekly_aggr_weather_data"(
45
36
);
46
37
```
47
38
48
-
Create the user used further down the line.
39
+
Create the user for the load test.
49
40
```sql
50
-
CREATEUSERlocustwith (password ='load_test');
51
-
GRANT ALL PRIVILEGES ON table weather_data to locust;
52
-
GRANT ALL PRIVILEGES ON table weekly_aggr_weather_data to locust;
41
+
CREATEUSERlocustWITH (password ='load_test');
42
+
GRANT ALL PRIVILEGES ON table weather_data TO locust;
43
+
GRANT ALL PRIVILEGES ON table weekly_aggr_weather_data TO locust;
53
44
```
54
45
55
46
Load some data into the `weather_data` table by using the following statement.
@@ -60,7 +51,7 @@ FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_we
60
51
WITH (format ='csv', compression ='gzip', empty_string_as_null = true);
61
52
```
62
53
63
-
The `weather_data` table should now have 70k rows of data.
54
+
The `weather_data` table now contains roughly 70k rows.
64
55
65
56
```text
66
57
select count(*) from weather_data;
@@ -70,11 +61,11 @@ count(*)|
70
61
70000|
71
62
```
72
63
73
-
We leave the other table empty as that one will be populated as part of the load test.
64
+
Leave `weekly_aggr_weather_data`empty; the load test populates it.
74
65
75
66
## Install Locust
76
67
77
-
In this case, I installed Locust on my Mac, but in an acceptance environment, you probably want to run this Locust on one or more “driver” machines. Especially when you want to push the database, you will need enough firepower on the driver side to push the database.
68
+
Install Locust locally for a quick start. In staging or production‑like testing, run Locust on one or more driver machinesto generate sufficient load.
78
69
79
70
On Python (3.9 or later), install Locust as well as the CrateDB driver:
80
71
```bash
@@ -89,8 +80,10 @@ locust -V
89
80
90
81
## Run Locust
91
82
92
-
Start with a simple test to ensure the connectivity is there and you can connect to the database. Copy the code below and write to a file named `locustfile.py`.
93
-
Besides the pure Locust execution, it also contains a CrateDB-specific implementation, connecting to CrateDB using our Python driver, instead of a plain HTTP client.
83
+
Start with a simple connectivity check.
84
+
Copy the code below into a file named `locustfile.py`.
85
+
It uses a CrateDB-specific client built on the Python driver rather than
# This is what makes the request actually get logged in Locust
148
141
self._request_event.fire(**request_meta)
@@ -178,7 +171,7 @@ Some explanation on some of the code above ☝️
178
171
179
172
The class `CrateDBClient` implements how to connect to CrateDB and details on how to measure requests. `CrateDBUser` represents a Locust-generated user based on the `CrateDBClient`.
180
173
181
-
In the actual Locust configuration, with the `wait_time = between(1, 5)`, you can control the number of queries and the randomization of the queries by using between. This will execute the different queries with a random interval between 1 and 5 sec. Another option that will give you more control over the amount of executed queries per second is using the `wait_time = constant_throughput(1.0)`, which will execute 1 of the queries per second for every user, or if you set it to `(2.0)`, will execute two queries every second.
174
+
In Locust, `wait_time = between(1, 5)` randomizes task execution between1 and 5 seconds. To control throughput more precisely, use `wait_time = constant_throughput(1.0)`, which runs one task per second per user (set to `2.0` for two tasks per second).
182
175
183
176
For every query you want to include in your test, you will need to create a block like this:
184
177
@@ -205,11 +198,11 @@ Define the number of users and the spawn rate. As this is an initial test, we le
205
198
206
199
{h=320px}
As you can see, is 1 query being executed with an RPS of 1. The number of failures should be 0. If you stop the test and start a New test with ten users, you should get an RPS of 10.
205
+
Locust executes one query at ~1 RPS (requests per second) with zero failures. If you stop and start a new test with 10 users, you’ll see ~10 RPS.
# This is what makes the request actually get logged in Locust
346
339
self._request_event.fire(**request_meta)
@@ -368,15 +361,17 @@ class QuickstartUser(CrateDBUser):
368
361
369
362
@task(5)
370
363
defquery01(self):
364
+
city = random.choice(self.cities)
371
365
self.client.send_query(
372
-
f"""
366
+
"""
373
367
SELECT location, ROUND(AVG(temperature)) AS avg_temp
374
368
FROM weather_data
375
-
WHERE location = '{random.choice(self.cities)}'
369
+
WHERE location = ?
376
370
GROUP BY location
377
-
ORDER BY 2 DESC
371
+
ORDER BY avg_temp DESC
378
372
""",
379
373
"Avg Temperature per City",
374
+
params=(city,),
380
375
)
381
376
382
377
@task(1)
@@ -417,14 +412,15 @@ class QuickstartUser(CrateDBUser):
417
412
418
413
@task(5)
419
414
defquery04(self):
415
+
city = random.choice(self.cities)
420
416
self.client.send_query(
421
-
f"""
417
+
"""
422
418
WITH minmax AS (
423
419
SELECT location,
424
420
MIN(timestamp) AS mintstamp,
425
421
MAX(timestamp) AS maxtstamp
426
422
FROM weather_data
427
-
WHERE location = '{random.choice(self.cities)}'
423
+
WHERE location = ?
428
424
GROUP BY location
429
425
)
430
426
SELECT a.timestamp,
@@ -435,9 +431,10 @@ class QuickstartUser(CrateDBUser):
435
431
FROM weather_data a, minmax b
436
432
WHERE a.location = b.location
437
433
AND a.timestamp BETWEEN b.mintstamp AND b.maxtstamp
438
-
ORDER BY 1;
434
+
ORDER BY a.timestamp;
439
435
""",
440
436
"Bridge the Gaps per City",
437
+
params=(city,),
441
438
)
442
439
443
440
@task(1)
@@ -461,11 +458,9 @@ class QuickstartUser(CrateDBUser):
461
458
462
459
```
463
460
464
-
Note that the weight (of query01 and query04) is five compared to the rest, which has a weight of 1, which means that the likelihood that two queries will execute is five times higher than the others. This shows how you can influence the weight of the different queries.
465
-
466
-
Let’s run this load test and see what happens.
461
+
Queries 01 and 04 have weight 5; Locust schedules them ~5× as often as the others (weight 1). Use weights to shape your query mix.
467
462
468
-
I started the run with 100 users.
463
+
Let’s run this load test and see what happens. The following run was started with 100 users.
@@ -482,3 +477,12 @@ If you want to download the locust data, you can do that on the last tab.
482
477
## Conclusion
483
478
484
479
When you want to run a load test against a CrateDB Cluster with multiple queries, Locust is a great and flexible tool that lets you quickly define a load test and see what numbers regarding users and RPS are possible for that particular setup.
0 commit comments