Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 38 #63

Merged
merged 27 commits into from
Oct 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b7b6ec0
Add boto3 and moto dependencies
eigenbeam Sep 18, 2024
b1f333b
Add ability to publish to Kinesis
eigenbeam Sep 18, 2024
f1d400b
Add aws unit tests
eigenbeam Sep 19, 2024
880766b
Fix formatting in pyproject
eigenbeam Sep 19, 2024
b6e8cd9
Refactor test setup into fixtures
eigenbeam Sep 19, 2024
6099e18
Add docstrings and cleanup code.
eigenbeam Sep 19, 2024
f3223a3
Add README instructions for configuratin AWS creds
eigenbeam Sep 19, 2024
8ecb2f3
Update example ini file
eigenbeam Sep 19, 2024
6ff2c35
Add validation of configuration values
eigenbeam Sep 19, 2024
0592161
Add script and instructions to setup AWS env
eigenbeam Sep 19, 2024
1588974
Extract configuration into its own module
eigenbeam Oct 3, 2024
13903f8
Fix merge and module extraction-induced mistakes
eigenbeam Oct 3, 2024
93000c1
WIP: Add cli option for # granules to proces
eigenbeam Oct 2, 2024
62b7aa4
Update gitignore
eigenbeam Oct 2, 2024
ae7d8c0
clean up rebase oversights, refine tests
juliacollins Oct 4, 2024
328c8f3
Merge branch 'main' into issue-38
eigenbeam Oct 8, 2024
34c4d3f
Merge branch 'issue-36' into issue-38
eigenbeam Oct 8, 2024
be17d09
Add scope comment to the aws unit tests
eigenbeam Oct 8, 2024
c89f061
WIP: Lookup the kinesis stream arn by name
eigenbeam Oct 8, 2024
3336383
Add pytest-watcher to run tests continuously
eigenbeam Oct 9, 2024
9de7656
Change kinesis stream validation to use stream name
eigenbeam Oct 9, 2024
ab7b0de
Modify aws module to use the stream name, not arn.
eigenbeam Oct 9, 2024
9a475c3
WIP: Use the kinesis stream name, not ARN
eigenbeam Oct 9, 2024
034dbb9
Modify the config to interpolate values with environment
eigenbeam Oct 9, 2024
2f6ac5a
Merge pull request #72 from nsidc/issue-37
eigenbeam Oct 9, 2024
8a624fe
Remove unused show_config function from metgen module
eigenbeam Oct 9, 2024
2fa38ad
Remove duplicate configuration & mapping
eigenbeam Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
__pycache__
dist
dist
example/test.ini
64 changes: 63 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,58 @@ With the Poetry shell running, start the instameta tool and verify that it’s w
init
process

## AWS Credentials

In order to process science data and stage it for Cumulus, you must first create & setup your AWS
credentials. Several options for doing this are given here:

### Manually Creating Configuration Files

First, create a directory in your user's home directory to store the AWS configuration:

$ mkdir -p ~/.aws

In the `~/.aws` directory, create a file named `config` with the contents:

[default]
region = us-west-2
output = json

In the `~/.aws` directory, create a file named `credentials` with the contents:

[default]
aws_access_key_id = TBD
aws_secret_access_key = TBD

Finally, restrict the permissions of the directory and files:

$ chmod -R go-rwx ~/.aws

When you obtain the AWS key pair (not covered here), edit the `~/.aws/credentials` file
and replace `TBD` with the public and secret key values.

### Using the AWS CLI

You may install (or already have it installed) the AWS Command Line Interface on the
machine where you are running the tool. Follow the
[AWS CLI Install instructions](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
for the platform on which you are running.

Once you have the AWS CLI, you can use it to create the `~/.aws` directory and the
`config` and `credentials` files:

$ aws configure

You will be prompted to enter your AWS public access and secret key values, along with
the AWS region and CLI output format. The AWS CLI will create and populate the directory
and files with your values.

If you require access to multiple AWS accounts, each with their own configuration--for
example, different accounts for pre-production vs. production--you can use the AWS CLI
'profile' feature to manage settings for each account. See the [AWS configuration
documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-using-profiles)
for the details.

## Usage

* Show the help text:
Expand All @@ -89,6 +141,12 @@ With the Poetry shell running, start the instameta tool and verify that it’s w

$ instameta info --config example/modscg.ini

* Process science data and stage it for Cumulus:

# Source the AWS profile (once) before running 'process'-- use 'default' or a named profile
$ source scripts/env.sh default
$ instameta process --config example/modscg.ini

* Exit the Poetry shell:

$ exit
Expand All @@ -114,10 +172,14 @@ TBD

$ poetry install

### Running tests:
### Run tests:

$ poetry run pytest

### Run tests when source changes (uses [pytest-watcher](https://github.com/olzhasar/pytest-watcher)):

$ poetry run ptw . --now --clear

## Credit

This content was developed by the National Snow and Ice Data Center with funding from
Expand Down
5 changes: 4 additions & 1 deletion example/modscg.ini
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,8 @@ provider = FORTY_TWO
[Destination]
local_output_dir = ./json
ummg_dir = ummg
kinesis_arn = abcd-1234-wxyz-0666
kinesis_stream_name = abcd-${environment}-1234-wxyz-0666
write_cnm_file = True

[Settings]
checksum_type = SHA256
1,646 changes: 1,641 additions & 5 deletions poetry.lock

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,14 @@ python = "^3.12"
click = "^8.1.7"
pyfiglet = "^1.0.2"
netCDF4 = "^1.6.5"

rich = "^13.7.1"
boto3 = "^1.35.22"

[tool.poetry.group.test.dependencies]
pytest = "^8.3.2"
moto = {extras = ["all"], version = "^5.0.14"}

pytest-watcher = "^0.4.3"
[tool.poetry.group.dev.dependencies]
ruff = "^0.5.5"
mypy = "^1.11.1"
Expand Down
25 changes: 25 additions & 0 deletions scripts/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

if (( $# != 1 )); then
echo "Usage: source env.sh aws_profile_name"
echo " where aws_profile_name is an AWS CLI named profile"
echo " https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html"
exit 1
else
export AWS_PROFILE=$1

AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile "$AWS_PROFILE")
AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile "$AWS_PROFILE")
AWS_REGION=$(aws configure get region --profile "$AWS_PROFILE" || echo "$AWS_DEFAULT_REGION")
eigenbeam marked this conversation as resolved.
Show resolved Hide resolved
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
eigenbeam marked this conversation as resolved.
Show resolved Hide resolved

export AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY
export AWS_REGION
export AWS_ACCOUNT_ID

echo "AWS environment:"
echo " AWS_PROFILE: $AWS_PROFILE"
echo " AWS_REGION: $AWS_REGION"
echo " AWS_ACCOUNT_ID: $AWS_ACCOUNT_ID"
fi
27 changes: 27 additions & 0 deletions src/nsidc/metgen/aws.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import boto3


KINESIS_PARTITION_KEY = "metgenc-duck"

def kinesis_stream_exists(stream_name):
client = boto3.client("kinesis", region_name="us-west-2")
try:
summary = client.describe_stream_summary(StreamName=stream_name)
return True
except Exception as e:
return False

def post_to_kinesis(stream_name, cnm_message):
"""Posts a message to a Kinesis stream."""
client = boto3.client("kinesis", region_name="us-west-2")
try:
result = client.put_record(
StreamName=stream_name,
Data=cnm_message,
PartitionKey=KINESIS_PARTITION_KEY
)
print(f'Published CNM message {cnm_message} to stream: {stream_name}')
return result['ShardId']
except Exception as e:
print(e)
raise e
29 changes: 20 additions & 9 deletions src/nsidc/metgen/cli.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import click

from nsidc.metgen import config
from nsidc.metgen import metgen
from nsidc.metgen import constants

Expand All @@ -12,29 +13,39 @@ def cli():
pass

@cli.command()
@click.option('--config', help='Path to configuration file to create or replace')
@click.option('-c', '--config', help='Path to configuration file to create or replace')
def init(config):
"""Populates a configuration file based on user input."""
click.echo(metgen.banner())
config = metgen.init_config(config)
click.echo(f'Initialized the metgen configuration file {config}')

@cli.command()
@click.option('--config', help='Path to configuration file to display', required=True)
def info(config):
@click.option('-c', '--config', 'config_filename', help='Path to configuration file to display', required=True)
def info(config_filename):
"""Summarizes the contents of a configuration file."""
click.echo(metgen.banner())
configuration = metgen.configuration(metgen.config_parser(config))
configuration = config.configuration(config.config_parser_factory(config_filename), {})
configuration.show()

@cli.command()
@click.option('--config', help='Path to configuration file', required=True)
@click.option('--env', help='environment', default=constants.DEFAULT_CUMULUS_ENVIRONMENT, show_default=True)
def process(config, env=constants.DEFAULT_CUMULUS_ENVIRONMENT):
@click.option('-c', '--config', 'config_filename', help='Path to configuration file', required=True)
@click.option('-e', '--env', help='environment', default=constants.DEFAULT_CUMULUS_ENVIRONMENT, show_default=True)
@click.option('-n', '--number', help="Process at most 'count' granules.", metavar='count', required=False, default=-1)
@click.option('-wc', '--write-cnm', is_flag=True, help="Write CNM messages to files.")
def process(config_filename, env, write_cnm, number):
"""Processes science data files based on configuration file contents."""
click.echo(metgen.banner())
configuration = metgen.configuration(metgen.config_parser(config), env)
metgen.process(configuration)
overrides = {
'write_cnm_file': write_cnm,
'number': number
}
configuration = config.configuration(config.config_parser_factory(config_filename), overrides, env)
try:
metgen.process(configuration)
except Exception as e:
print("\nUnable to process data: " + str(e))
exit(1)
click.echo(f'Processed granules using the configuration file {config}')

if __name__ == "__main__":
Expand Down
104 changes: 104 additions & 0 deletions src/nsidc/metgen/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import configparser
import dataclasses
from datetime import datetime, timezone
import os.path
import uuid

from nsidc.metgen import aws
from nsidc.metgen import constants


@dataclasses.dataclass
class Config:
environment: str
data_dir: str
auth_id: str
version: str
provider: str
local_output_dir: str
ummg_dir: str
kinesis_stream_name: str
write_cnm_file: bool
checksum_type: str
number: int

def show(self):
# TODO add section headings in the right spot (if we think we need them in the output)
print()
print('Using configuration:')
for k,v in self.__dict__.items():
print(f' + {k}: {v}')

def enhance(self, producer_granule_id):
mapping = dataclasses.asdict(self)
collection_details = self.collection_from_cmr(mapping)

mapping['auth_id'] = collection_details['auth_id']
mapping['version'] = collection_details['version']
mapping['producer_granule_id'] = producer_granule_id
mapping['submission_time'] = datetime.now(timezone.utc).isoformat()
mapping['uuid'] = str(uuid.uuid4())

return mapping

# Is the right place for this function?
def collection_from_cmr(self, mapping):
# TODO: Use auth_id and version from mapping object to retrieve collection
# metadata from CMR, including formatted version number, temporal range, and
# spatial coverage.
return {
'auth_id': mapping['auth_id'],
'version': mapping['version']
}

def config_parser_factory(configuration_file):
if configuration_file is None or not os.path.exists(configuration_file):
raise ValueError(f'Unable to find configuration file {configuration_file}')
cfg_parser = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())
cfg_parser.read(configuration_file)
return cfg_parser


def _get_configuration_value(environment, section, name, value_type, config_parser, overrides, default=None):
vars = { 'environment': environment }
if overrides.get(name) is None:
if value_type is bool:
return config_parser.getboolean(section, name, fallback=default)
elif value_type is int:
return config_parser.getint(section, name, fallback=default)
else:
value = config_parser.get(section, name, vars=vars, fallback=default)
print(name, vars, value)
return value
else:
return overrides.get(name)

def configuration(config_parser, overrides, environment=constants.DEFAULT_CUMULUS_ENVIRONMENT):
try:
return Config(
environment,
_get_configuration_value(environment, 'Source', 'data_dir', str, config_parser, overrides),
_get_configuration_value(environment, 'Collection', 'auth_id', str, config_parser, overrides),
_get_configuration_value(environment, 'Collection', 'version', int, config_parser, overrides),
_get_configuration_value(environment, 'Collection', 'provider', str, config_parser, overrides),
_get_configuration_value(environment, 'Destination', 'local_output_dir', str, config_parser, overrides),
_get_configuration_value(environment, 'Destination', 'ummg_dir', str, config_parser, overrides),
_get_configuration_value(environment, 'Destination', 'kinesis_stream_name', str, config_parser, overrides),
_get_configuration_value(environment, 'Destination', 'write_cnm_file', bool, config_parser, overrides, False),
_get_configuration_value(environment, 'Settings', 'checksum_type', str, config_parser, overrides, 'SHA256'),
_get_configuration_value(environment, 'Settings', 'number', int, config_parser, overrides, -1),
)
except Exception as e:
return Exception('Unable to read the configuration file', e)

def validate(configuration):
"""Validates each value in the configuration."""
validations = [
['data_dir', lambda dir: os.path.exists(dir), 'The data_dir does not exist.'],
['local_output_dir', lambda dir: os.path.exists(dir), 'The local_output_dir does not exist.'],
# ['ummg_dir', lambda dir: os.path.exists(dir), 'The ummg_dir does not exist.'], ## Not sure what validation to do
eigenbeam marked this conversation as resolved.
Show resolved Hide resolved
['kinesis_stream_name', lambda name: aws.kinesis_stream_exists(name), 'The kinesis stream does not exist.'],
]
errors = [msg for name, fn, msg in validations if not fn(getattr(configuration, name))]
return len(errors) == 0, errors

Loading