-
Notifications
You must be signed in to change notification settings - Fork 9
Getting Started V2
Whilst I tried to make the syntactical difference between querying and scanning minimal, DynamoDB has some limitations and differences to what you can do in a Query
and Scan
request. For instance, whilst you can use the full range of comparison operators ("EQ", "NE", "GT", etc.) in a Scan request, only a subset of these operators ("EQ", "GT", "GE", "LT", "LE", "BEGINS_WITH" and "BETWEEN") can be used in a Query request.
To learn more about the supported operations, read this and this API doc.
The external DSL outlined below is only intended to be used to perform a Query
and Scan
request, the code provided in the repository only allows you to use them as such. If you're looking to Get
an item by keys (hash and range), you should continue to use the existing mechanisms to do so. This library does not intent to replace all existing functionalities of the AWS SDK but to make it easy to perform Query
and Scan
operations.
The rest of this guide assumes that you have at least basic understanding of DynamoDB and its data model. If you're not familiar with DynamoDB already, please refer to its API documentation here to help getting yourself familiarized, believe me, it'll pay off in the long run because it's such an awesome product!
The query syntax outlined below is only applicable when working with
AmazonDynamoDBClient
andDynamoDBContext
types under theAmazon.DynamoDBv2
top level namespace introduced inAWSSDK
v1.5.18.0. If you're working with the mirrored types in theAmazon.DynamoDB
namespace, then please refer to the syntax outlined here instead.
A Query
request is only permitted if your table defines both a hash and a range key. A Query
allows you to specify the hash key value and optionally specify at most one comparison against the range key value, if no range key comparison is specified then all available range keys will be returned.
The basic format of a Query
can be expressed as:
SELECT AttributeName1, AttributeName2, ..., AttributeNameN
FROM TableName
WHERE HashKeyAttributeName = HashKeyValue
AND RangeKeyAttributeName operator RangeKeyValue(s)
ORDER DESC
LIMIT N
WITH (NoConsistentRead, PageSize(n), NoReturnedCapacity, Index(IndexName, AllAttributesFlag))
where:
-
AttributeName1
toAttributeNameN
are the names of the attributes you want to return with the query, and if you want to return all attributes, you can also just use asterisk (*) instead -
TableName
is the name of the table, which at the time of writing, the library does not allow table names to be any case variants of 'where' or 'limit' in order to avoid subtle bugs such as:SELECT * FROM WHERE ThreadId = 2012
from creeping into your code because
TableName
had been parsed as "WHERE ThreadId = 2012" -
HashKeyAttributeName
andRangeKeyAttributeName
are the names of the attributes used for theHash
andRange
keys -
HashKeyValue
andRangeKeyValue
can be either numeric (integer or floating number) or string (which must be enclosed in double quotes (")) -
WHERE
clause is non-optional, and must specify one and only one value for the hash key with the '=' operator -
an optional comparison can be made against the range key by specifying an
operator
with one of more values for the operator -
operator
is one of the allowed comparison operators, i.e.-
=
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age = 30
-
>=
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age >= 30
-
>
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age > 30
-
<=
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age <= 30
-
<
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age < 30
-
BEGINS WITH
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Subject BEGINS WITH \"Dynamo\"
-
BETWEEN .. AND ..
:SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age BETWEEN 10 AND 30
-
-
ORDER
clause is optional, and can be followed by eitherASC
orDESC
to search forward or backward respectively. If not specified, the default behaviour (search forward) will be used. -
LIMIT
clause is optional, and can be used to limit the number of results that are returned. -
WITH
clause is optional, and can be used to set optional parameters for the query:-
NoConsistentRead
: by default the DSL will issue query requests with conssitent read, theNoConsistentRead
option will force the DSL to use eventual consistency instead -
PageSize
: by defaultDynamoDB
will return as many items as available in your table that can fit into the max 1MB return size per request, but this creates strong bursts of activities againstDynamoDB
which is likely to cause other operations to be throttled. You can use thePageSize
option to set the max number of items that are returned per query request to reduce the amount of bursts your generate againstDynamoDB
-
NoReturnedCapacity
: only useful when used with the low-levelAmazonDynamoDBClient
, specifying this option stopsDynamoDB
from returning the consumed capacity unit in the response -
Index
: used to specify that the query should be made against aLocal Secondary Index
of the specifiedIndexName
, theAllAttributesFlag
is aboolean
value (eithertrue
orfalse
) which indicates whether the query should return ALL attributes (which might involve fetching from the table and hence consuming more capacity units) or just the PROJECTED attributes in the index.
-
-
the keywords
SELECT
,FROM
,WHERE
,LIMIT
,ORDER
,WITH
and the operators are all case insensitive
A Scan
request is more permissive but also more expensive to run (both in terms of latency and read units consumed which translates directly to running cost). A 'Scan' allows you to filter on arbitrary fields in your data using a wide range of comparison operators.
The basic format of a Scan
can be expressed as:
SELECT AttributeName1, AttributeName2, ..., AttributeNameN
FROM TableName
WHERE AttributeName1 operator1 AttributeValue1
AND AttributeName2 operator2 AttributeValue2
AND ...
AND AttributeNameN operatorN AttributeValueN
LIMIT N
WITH (PageSize(n), NoReturnedCapacity, Segments(m))
where:
-
AttributeName1
toAttributeNameN
are the names of the attributes you want to return with the query, and if you want to return all attributes, you can also just use asterisk (*) instead -
TableName
is the name of the table, which at the time of writing, the library does not allow table names to be any case variants of 'where' or 'limit' in order to avoid subtle bugs such as:SELECT * FROM WHERE Year = 2012
from creeping into your code because
TableName
had been parsed as "WHERE Year = 2012" -
WHERE
clause is optional, and can specify one or more comparisons against the values of attributes using one of the supported operators -
if
WHERE
clause is not specified, then all rows in the table will be returned -
AttributeValue1
toAttributeValueN
can be either numeric (integer or floating number) or string (which must be enclosed in double quotes (")) -
operator
is one of the allowed comparison operators, i.e.-
=
:SELECT * FROM Employees WHERE FirstName = \"Yan\"
-
!=
:SELECT * FROM Employees WHERE FirstName != \"Yan\"
-
>=
:SELECT * FROM Employees WHERE Age >= 18
-
>
:SELECT * FROM Employees WHERE Age > 18
-
<=
:SELECT * FROM Employees WHERE Age <= 99
-
<
:SELECT * FROM Employees WHERE Age < 99
-
CONTAINS
:SELECT * FROM Employees WHERE Interests CONTAINS \"DynamoDB\"
-
NOT CONTAINS
:SELECT * FROM Employees WHERE LuckyNumbers NOT CONTAINS 11
-
BEGINS WITH
:SELECT * FROM Employees WHERE DateJoined BEGINS WITH \"2010\"
-
IS NULL
:SELECT * FROM Employees WHERE FavouriteTeam IS NULL
-
IS NOT NULL
:SELECT * FROM Employees WHERE FavouriteTeam IS NOT NULL
-
BETWEEN .. AND ..
:SELECT * FROM Employees WHERE Age BETWEEN 18 AND 30
-
IN (...)
:SELECT * FROM Employees WHERE Age IN (18, 25, 32, 39)
-
-
LIMIT
clause is optional, and can be used to limit the number of results that are returned. -
WITH
clause is optional, and can be used to set optional parameters for the scan:-
PageSize
: by defaultDynamoDB
will return as many items as available in your table that can fit into the max 1MB return size per request, but this creates strong bursts of activities againstDynamoDB
which is likely to cause other operations to be throttled. You can use thePageSize
option to set the max number of items that are returned per scan request to reduce the amount of bursts your generate againstDynamoDB
-
NoReturnedCapacity
: only useful when used with the low-levelAmazonDynamoDBClient
, specifying this option stopsDynamoDB
from returning the consumed capacity unit in the response -
Segments
: Amazon announced support for parallel Scans back in May 2013. To execute a scan in parallel to boost overall time required to complete the scan, you can use theSegments
option. Please refer to theDynamoDB
best practice guide on when to use parallal scans and how to choose the correct number of segments.
-
-
the keywords
SELECT
,FROM
,WHERE
,LIMIT
,WITH
and the operators are all case insensitive
If all you want to find out is the number of matching items rather than the actual items themselves, then you can also use the COUNT
keyword in place of SELECT
.
For instance, for a query count:
COUNT *
FROM TableName
WHERE HashKeyAttributeName = HashKeyValue
AND RangeKeyAttributeName operator RangeKeyValue(s)
LIMIT N
or a scan count:
COUNT *
FROM TableName
WHERE AttributeName1 operator1 AttributeValue1
AND AttributeName2 operator2 AttributeValue2
AND ...
AND AttributeNameN operatorN AttributeValueN
LIMIT N
Please note that DynamoDB
does not allow attribute names to be specified when performing a count operation, so a valid count query will always start with COUNT * FROM ...
.
Also, at the time of writing, count queries can only be used with the low-level AmazonDynamoDBClient
class. This is because there's no support for performing counts from the DynamoDBContext
class, which as an ORM layer I think it's sensible decision.
For now, there are extension methods for the AmazonDynamoDBClient
and DynamoDBContext
classes to allow you to execute a Query
or Scan
using a query string written in the syntax described above.
In the DynamoDbV2.SQL.Execution
namespace, there exists a number of extension methods to asynchronously or synchronously execute a Query
or Scan
:
/// Executes a query asynchronously and returns the results
[<Extension>]
static member QueryAsync : AmazonDynamoDBClient * string -> Async<QueryResponse>
/// Executes a query asynchronously as a task and returns the results
[<Extension>]
static member QueryAsyncAsTask : AmazonDynamoDBClient * string -> Task<QueryResponse>
/// Executes a query synchronously and returns the results
[<Extension>]
static member Query : AmazonDynamoDBClient * string -> QueryResponse
/// Executes a scan asynchronously and returns the results
[<Extension>]
static member ScanAsync : AmazonDynamoDBClient * string -> Async<ScanResponse>
/// Executes a scan asynchronously as a task and returns the results
[<Extension>]
static member ScanAsyncAsTask : AmazonDynamoDBClient * string -> Task<ScanResponse>
/// Executes a scan synchronously and returns the results
[<Extension>]
static member Scan : AmazonDynamoDBClient * string -> ScanResponse
Examples of the query syntax can be found here.
In the DynamoDb.SQL.Execution
namespace, there are also two extension methods for DynamoDBContext
class to synchronously execute a Query
or Scan
:
/// Executes a query synchronously and returns the results
[<Extension>]
static member ExecQuery<'T> : DynamoDBContext * string -> IEnumerable<'T>
/// Executes a scan synchronously and returns the results
[<Extension>]
static member ExecScan<'T> : DynamoDBContext * string -> IEnumerable<'T>
I had decided to name the methods ExecQuery
and ExecScan
intentionally to avoid clashing with the existing Query<T>(object hashKey)
and Scan<T>(object hashKey)
methods. This is because methods that take in object as argument always leave doubt in developers' minds, and can introduce subtle runtime bugs if someone had forgotten to use the DynamoDb.SQL.Execution
namespace and attempted to run a Query
/Scan
using a string written in the above DSL syntax, in which case the DSL string will be interpreted as the hash key instead and mostly likely return nothing...
Examples of the scan syntax can be found here.
- the usual DynamoDB limitations also supplies here, so a maximum of 1MB of data can be returned per API call, there's currently no support to automatically use
LastEvaluatedKey
to make the follow up request. - DynamoDB supports a number of data types (String, String Set, Number, Number Set, Binary, Binary Set), but for the time being this library only supports the most commonly used data types of String and Number.