I recently found myself putting some effort in trying to handle reading items from a DynamoDB table and returning a Pandas Dataframe. Basically, I wanted to abstract some complexity away from available Boto3 read actions, and handle once for all the headache of thinking about keys, query, scan, etc.: since I'm pretty happy with the results, I decided to open-sourcing it here while it's being evaluated as candidate for wr.dynamodb.read_items
in aws/aws-sdk-pandas#1867.
Since the code should be auto-consistent, if this won't be included in awswrangler stable codebase it's probable that sooner or later I will decide to package it and publish to pip (I tried to stick to awswrangler best practices as much as I can, so it should be an handy drop-in replacement).
I am aware of the recent addition of wr.dynamodb.read_partiql_query
in aws/aws-sdk-pandas#1390, as well as its currently issue as reported in aws/aws-sdk-pandas#1571, but the below proposed solution does not involve PartiQL: my goal was to avoid as much as possible the risks that come with its usage towards a DynamoDB table, regarding possible translation of a given query to a full scan op (see for example the disclaimer in the docs).
- automatically switch between available DynamoDB read actions, choosing the optimal one as defined in this hierarchy
get_item > batch_get_item > query > scan
(inspiration from here and here) - handle response pagination ("manually", without using paginators)
- handle possible
UnprocessedKeys
inbatch_get_item
response - support condition expressions both as
string
andKey/Attr
fromboto3.dynamodb.conditions
- automatically handles botocore client error involving DynamoDB reserved keywords via
ExpressionAttributeNames
substitutions - prevent unwanted full table scan thanks to the
allow_full_scan
kwarg which defaults toFalse
- allow attributes selection via
columns
kwarg, which corresponds to Boto3ProjectionExpression
- support both strongly and eventually consistent reads with the
consistent
kwarg, which defaults toFalse
- support limiting the number of returning items with the
max_items_evaluated
kwarg (a kind of anhead()
method for the table!) - returning consumed read capacity units
- anything related to local or global secondary table index
- parallel scan
The last features are unchecked because I considered them out of scope, at least for the moment.
Reading 5 random items from a table
import awswrangler as wr
df = wr.dynamodb.read_items(table_name='my-table', max_items_evaluated=5)
Strongly-consistent reading of a given partition value from a table
import awswrangler as wr
df = wr.dynamodb.read_items(table_name='my-table', partition_values=['my-value'], consistent=True)
Reading items pairwise-identified by partition and sort values, from a table with a composite primary key
import awswrangler as wr
df = wr.dynamodb.read_items(
table_name='my-table',
partition_values=['pv_1', 'pv_2'],
sort_values=['sv_1', 'sv_2']
)
Reading items while retaining only specified attributes, automatically handling possible collision with DynamoDB reserved keywords
import awswrangler as wr
df = wr.dynamodb.read_items(
table_name='my-table',
partition_values=['my-value'],
columns=['connection', 'other_col'] # connection is a reserved keyword, managed under the hood!
)
Reading all items from a table explicitly allowing full scan
import awswrangler as wr
df = wr.dynamodb.read_items(table_name='my-table', allow_full_scan=True)
Reading items matching a KeyConditionExpression expressed with boto3.dynamodb.conditions.Key
import awswrangler as wr
from boto3.dynamodb.conditions import Key
df = wr.dynamodb.read_items(
table_name='my-table',
key_condition_expression=Key('key_1').eq('val_1') and Key('key_2').eq('val_2')
)
Same as above, but with KeyConditionExpression as string
import awswrangler as wr
df = wr.dynamodb.read_items(
table_name='my-table',
key_condition_expression='key_1 = :v1 and key_2 = :v2',
expression_attribute_values={':v1': 'val_1', ':v2': 'val_2'},
)
Reading items matching a FilterExpression expressed with boto3.dynamodb.conditions.Attr
import awswrangler as wr
from boto3.dynamodb.conditions import Attr
df = wr.dynamodb.read_items(
table_name='my-table',
filter_expression=Attr('my_attr').eq('this-value')
)
Same as above, but with FilterExpression as string
import awswrangler as wr
df = wr.dynamodb.read_items(
table_name='my-table',
filter_expression='my_attr = :v',
expression_attribute_values={':v': 'this-value'}
)
Reading items involving an attribute which collides with DynamoDB reserved keywords
import awswrangler as wr
df = wr.dynamodb.read_items(
table_name='my-table',
filter_expression='#operator = :v',
expression_attribute_names={'#operator': 'operator'},
expression_attribute_values={':v': 'this-value'}
)
I tested all the above-checked features with relatively simple dummy tables, so it probably requires more focused test sessions. Anyway, all the tests have been done with Python 3.10.8 and following package versions:
pandas==1.5.2
botocore==1.29.16
boto3==1.26.16
awswrangler==2.17.0