Parallel extraction (#386)

* feat: add mapping sets to kernel (#337) * refactor: mapping set model * chore: joining tables - submission, mapping * chore: add input to mapping set model * fix: (kernel-test) mapping set * fix: remove unused containers * fix: db network * fix: set mappingset input to nullable * chore: mappingset migration * fix: (kernel) mapping set to mapping relationship * fix: (kernel) extraction on only active mappings within a set * chore: (kernel) migration clean-up * revert: migration * chore: merge migrations * fix: kernel model update * fix: filter typo * chore: update kernel ERD * chore: migrate mappings to mapping set * fix: clear mapping projectschemas then readd * fix: format code * fix: format code * fix: pylint whitespaces * feat: (ui) mappingsets to pipelines (#361) * feat: ui contract model * fix: kernel migration * fix: kernel migrations * feat: ui mapping set * feat: react component update * test: (ui) pipeline fetch * fix: remove comments * fix: readonly message * fix: model default_name * fix: loop mapping set pages * fix: test add pipeline * fix: artefacts generation with mapping sets (#350) * fix: tweaking models (100% coverage) * fix: tweaking filters (100% coverage) * fix(views): distinct count in stats * fix: upsert project artefacts with mapping sets * fix: API urls * fix(common): submit to kernel with mapping set * fix(odk): submission to kernel * fix(odk): generated mappings are read only * fix: naming naming naming * fix: tweaking * fix(couchdb): adapt to mapping set model * fix: reactivate UI tests in travis (#371) * fix: reactivate UI tests in travis * test: sass-lint rules * fix mappingset migration (#379) * chore: add mappingset to client test_fixtures * chore: swap mapping.id for mappingset.id in submission generation * fix: use model from swagger for submissions in integration tests * fix: remove useless test and change scope of entity generation * feat (ui): pipeline fetch/publish using mapping set (#373) * fix: tweaking models (100% coverage) * fix: tweaking filters (100% coverage) * fix(views): distinct count in stats * fix: upsert project artefacts with mapping sets * fix: API urls * fix(common): submit to kernel with mapping set * fix(odk): submission to kernel * fix(odk): generated mappings are read only * feat: ui contract model * fix: kernel migration * fix: kernel migrations * feat: ui mapping set * feat: react component update * test: (ui) pipeline fetch * fix: remove comments * fix: readonly message * fix: naming naming naming * fix: model default_name * fix: loop mapping set pages * fix: test add pipeline * fix: tweaking * fix: pipeline:contracts * feat: pipeline publish * fix(ui): test * fix: temp deactivate couch-sync tests * test (ui): mapping set 100% * fix: project name * fix: ui test consistency * chore: readonly class * chore: selected pipeline readonly class * added styling for readonly-pipeline in overview screen * added styling to readonly-pipeline navbar * added styling for read-only text inputs * better presentation of mapping-definitions json textarea. * fix (ui): pipeline - contract infix * fix (ui): fix pipeline view * fix(ui): test * fix (ui): check pipelines lenght * fix (ui): migration fix * fix(ui): contract migration * fix: artefacts names (#387) * fix(ui): filter piplines - redux * feat(kernel): create an empty mapping along with the passthrough one (#389) * feat(kernel): create empty mapping * fix: run_entity_extraction * fixed css grid and added word break for long titles without breaking space (#397) * docs(ui): fix model comments (#401) * feat(kernel): include a random input in the generated mapping (#402) * feat: include input in generated mapping * fix: do not duplicate constants * fix(ui): the derived data must have an id field with UUID content (#400) * fix: the schema must have an id field with UUID content * fix: apply only to derived schemas * fix: also derived entity type * fix: cleaning * fix: check id field in EntityTypes list * test: implement id rule * fix: including docs
eHealthAfrica · Oct 10, 2018 · da1e5fb · da1e5fb
1 parent bbcb105
commit da1e5fb
Show file tree

Hide file tree

Showing 61 changed files with 2,720 additions and 1,043 deletions.
diff --git a/aether-common-module/README.md b/aether-common-module/README.md
@@ -84,10 +84,10 @@ Possible responses:
 - `Always Look on the Bright Side of Life!!!` ✘
 - `Brought to you by eHealth Africa - good tech for hard places` ✔
 
-#### To make submissions linked to an existing project artefact (mapping).
+#### To push submissions linked to an existing project artefact (mapping set).
 
 ```python
-aether.common.kernel.utils.submit_to_kernel(submission, mapping_id, submission_id=None)
+aether.common.kernel.utils.submit_to_kernel(submission, mappingset_id, submission_id=None)
 ```
 
 ### Conf section

diff --git a/aether-common-module/aether/common/kernel/tests/test_utils.py b/aether-common-module/aether/common/kernel/tests/test_utils.py
@@ -140,12 +140,12 @@ def test__test_connection_get_fail(self, mock_get, mock_head):
             )
 
     @mock.patch.dict('os.environ', AETHER_ENV_MOCK)
-    def test_submit_to_kernel__without_mapping_id(self):
+    def test_submit_to_kernel__without_mappingset_id(self):
         self.assertRaises(
             Exception,
             utils.submit_to_kernel,
-            submission={'a': 1},
-            mapping_id=None,
+            submission={},
+            mappingset_id=None,
         )
 
     @mock.patch.dict('os.environ', AETHER_ENV_MOCK)
@@ -154,22 +154,22 @@ def test_submit_to_kernel__without_submission(self):
             Exception,
             utils.submit_to_kernel,
             submission=None,
-            mapping_id=1,
+            mappingset_id=1,
         )
 
     @mock.patch('requests.put')
     @mock.patch('requests.post')
     @mock.patch.dict('os.environ', AETHER_ENV_MOCK)
     def test_submit_to_kernel__without_submission_id(self, mock_post, mock_put):
-        utils.submit_to_kernel(submission={'_id': 'a'}, mapping_id=1, submission_id=None)
+        utils.submit_to_kernel(submission={'_id': 'a'}, mappingset_id=1, submission_id=None)
         mock_put.assert_not_called()
         mock_post.assert_called()
 
     @mock.patch('requests.put')
     @mock.patch('requests.post')
     @mock.patch.dict('os.environ', AETHER_ENV_MOCK)
     def test_submit_to_kernel__with_submission_id(self, mock_post, mock_put):
-        utils.submit_to_kernel(submission={'_id': 'a'}, mapping_id=1, submission_id=1)
+        utils.submit_to_kernel(submission={'_id': 'a'}, mappingset_id=1, submission_id=1)
         mock_put.assert_called()
         mock_post.assert_not_called()
 

diff --git a/aether-common-module/aether/common/kernel/utils.py b/aether-common-module/aether/common/kernel/utils.py
@@ -139,16 +139,16 @@ def get_data(url):
     return results
 
 
-def submit_to_kernel(submission, mapping_id, submission_id=None):
+def submit_to_kernel(submission, mappingset_id, submission_id=None):
     '''
     Push the submission to Aether Kernel
     '''
 
     if submission is None:
         raise errors.SubmissionError(_('Cannot make submission without content!'))
 
-    if mapping_id is None:
-        raise errors.SubmissionError(_('Cannot make submission without mapping!'))
+    if mappingset_id is None:
+        raise errors.SubmissionError(_('Cannot make submission without mapping set!'))
 
     if submission_id:
         # update existing doc
@@ -164,7 +164,7 @@ def submit_to_kernel(submission, mapping_id, submission_id=None):
         url,
         json={
             'payload': submission,
-            'mapping': mapping_id,
+            'mappingset': mappingset_id,
         },
         headers=get_auth_header(),
     )
diff --git a/aether-couchdb-sync-module/aether/sync/api/couchdb_sync.py b/aether-couchdb-sync-module/aether/sync/api/couchdb_sync.py
@@ -194,5 +194,5 @@ def post_to_aether(document, aether_id=False):
         )
 
     return kernel_utils.submit_to_kernel(submission=document,
-                                         mapping_id=str(schema.kernel_id),
+                                         mappingset_id=str(schema.kernel_id),
                                          submission_id=aether_id)
diff --git a/aether-couchdb-sync-module/aether/sync/api/tests/test_couchdb_sync.py b/aether-couchdb-sync-module/aether/sync/api/tests/test_couchdb_sync.py
@@ -35,7 +35,7 @@
 from . import clean_couch
 
 
-SUBMISSION_FK = 'mapping'
+SUBMISSION_FK = 'mappingset'
 headers_testing = kernel_utils.get_auth_header()
 device_id = 'test_import-from-couch'
 

diff --git a/aether-kernel/aether/kernel/admin.py b/aether-kernel/aether/kernel/admin.py
@@ -29,13 +29,18 @@ class ProjectAdmin(CompareVersionAdmin):
 
 class MappingAdmin(CompareVersionAdmin):
     form = forms.MappingForm
+    list_display = ('id', 'name', 'revision',)
+    readonly_fields = ('id',)
+
+
+class MappingSetAdmin(CompareVersionAdmin):
     list_display = ('id', 'name', 'revision', 'project',)
     readonly_fields = ('id',)
 
 
 class SubmissionAdmin(CompareVersionAdmin):
     form = forms.SubmissionForm
-    list_display = ('id', 'revision', 'mapping', 'map_revision',)
+    list_display = ('id', 'revision', 'mappingset',)
     readonly_fields = ('id',)
 
 
@@ -57,6 +62,7 @@ class EntityAdmin(CompareVersionAdmin):
 
 
 admin.site.register(models.Project, ProjectAdmin)
+admin.site.register(models.MappingSet, MappingSetAdmin)
 admin.site.register(models.Mapping, MappingAdmin)
 admin.site.register(models.Submission, SubmissionAdmin)
 admin.site.register(models.Schema, SchemaAdmin)

diff --git a/aether-kernel/aether/kernel/api/avro_tools.py b/aether-kernel/aether/kernel/api/avro_tools.py
@@ -22,9 +22,14 @@
 for details.
 '''
 
-import collections
-import copy
-import uuid
+import random
+
+from collections import namedtuple
+from copy import deepcopy
+from os import urandom
+from string import ascii_letters
+from uuid import uuid4
+
 
 # Constants used by AvroValidator to distinguish between avro types
 # ``int`` and ``long``.
@@ -55,6 +60,87 @@
 NAMESPACE = 'org.ehealthafrica.aether'
 
 
+def random_string():
+    return ''.join(random.choice(ascii_letters) for i in range(random.randint(1, 30)))
+
+
+def random_avro(schema):
+    '''
+    Generates a random value based on the given AVRO schema.
+    '''
+
+    name = schema.get('name')
+    avro_type = schema['type']
+    if isinstance(avro_type, list):  # UNION or NULLABLE
+        # ["null", "int", "string", {"type: "record", ...}]
+        avro_type = [t for t in avro_type if t != NULL]  # ignore NULL
+        if len(avro_type) == 1:  # it was NULLABLE
+            avro_type = avro_type[0]
+
+    if __has_type(avro_type):  # {"type": {"type": "zzz", ...}}
+        schema = avro_type
+        avro_type = avro_type.get('type')
+
+    if avro_type == NULL:
+        return None
+
+    if avro_type == BOOLEAN:
+        return True if random.random() > 0.5 else False
+
+    if avro_type in [BYTES, FIXED]:
+        return urandom(schema.get('size', 8))
+
+    if avro_type == INT:
+        return random.randint(INT_MIN_VALUE, INT_MAX_VALUE)
+
+    if avro_type == LONG:
+        return random.randint(LONG_MIN_VALUE, LONG_MAX_VALUE)
+
+    if avro_type in [FLOAT, DOUBLE]:
+        return random.random() + random.randint(INT_MIN_VALUE, INT_MAX_VALUE)
+
+    if avro_type == STRING:
+        if name == 'id':
+            return str(uuid4())  # "id" fields contain an UUID
+        return random_string()
+
+    if avro_type == ENUM:
+        return random.choice(schema['symbols'])
+
+    if avro_type == RECORD:
+        return {
+            f['name']: random_avro(f)
+            for f in schema.get('fields', [])
+        }
+
+    if avro_type == MAP:
+        values = schema.get('values')
+        map_type = values if __has_type(values) else {'type': values}
+        return {
+            random_string(): random_avro(map_type)
+            for i in range(random.randint(1, 5))
+        }
+
+    if avro_type == ARRAY:
+        items = schema.get('items')
+        array_type = items if __has_type(items) else {'type': items}
+        return [
+            random_avro(array_type)
+            for i in range(random.randint(1, 5))
+        ]
+
+    if isinstance(avro_type, list):  # UNION
+        # choose one random type and generate value
+        # ["int", "string", {"type: "record", ...}]
+        ut = avro_type[random.randint(0, len(avro_type) - 1)]
+        ut = ut if __has_type(ut) else {'type': ut}
+        return random_avro(ut)
+
+    # TODO: named types  ¯\_(ツ)_/¯
+
+    return None
+
+
 class AvroValidationException(Exception):
     pass
 
@@ -70,7 +156,7 @@ class AvroValidationException(Exception):
 #
 #     indicates that the expected type at path "$.a.b" was a union of
 #     'null' and 'string'. The actual value was 1.
-AvroValidationError = collections.namedtuple(
+AvroValidationError = namedtuple(
     'AvroValidationError',
     ['expected', 'datum', 'path'],
 )
@@ -333,9 +419,11 @@ def avro_schema_to_passthrough_artefacts(item_id, avro_schema):
     '''
 
     if not item_id:
-        item_id = str(uuid.uuid4())
+        item_id = str(uuid4())
+
+    definition = deepcopy(avro_schema)
+    sample = random_avro(definition)
 
-    definition = copy.deepcopy(avro_schema)
     # assign default namespace
     if not definition.get('namespace'):
         definition['namespace'] = NAMESPACE
@@ -377,7 +465,15 @@ def avro_schema_to_passthrough_artefacts(item_id, avro_schema):
         'definition': {
             'entities': {name: item_id},
             'mapping': rules,
-        }
+        },
+        # this is an auto-generated mapping that shouldn't be modified manually
+        'is_read_only': True,
+        'is_active': True,
+        'input': sample,  # include a data sample
     }
 
     return schema, mapping
+
+
+def __has_type(avro_type):
+    return isinstance(avro_type, dict) and avro_type.get('type')
diff --git a/aether-kernel/aether/kernel/api/filters.py b/aether-kernel/aether/kernel/api/filters.py
@@ -39,26 +39,71 @@ class Meta:
 
 
 class MappingFilter(filters.FilterSet):
+    mappingset = filters.CharFilter(
+        method='mappingset_filter',
+    )
+    projectschema = filters.CharFilter(
+        method='projectschema_filter',
+    )
+
+    def mappingset_filter(self, queryset, name, value):
+        if is_uuid(value):
+            return queryset.filter(mappingset__pk=value)
+        else:
+            return queryset.filter(mappingset__name=value)
+
+    def projectschema_filter(self, queryset, name, value):
+        if is_uuid(value):
+            return queryset.filter(projectschemas__in=[value])
+        else:
+            return queryset.filter(projectschemas__name__in=[value])
+
     class Meta:
         fields = '__all__'
         exclude = ('definition',)
         model = models.Mapping
 
 
+class MappingSetFilter(filters.FilterSet):
+    project = filters.CharFilter(
+        method='project_filter',
+    )
+
+    def project_filter(self, queryset, name, value):
+        if is_uuid(value):
+            return queryset.filter(project__pk=value)
+        else:
+            return queryset.filter(project__name=value)
+
+    class Meta:
+        fields = '__all__'
+        exclude = ('input',)
+        model = models.MappingSet
+
+
 class SubmissionFilter(filters.FilterSet):
     instanceID = filters.CharFilter(
         field_name='payload__meta__instanceID',
     )
     project = filters.CharFilter(
         method='project_filter',
     )
+    mappingset = filters.CharFilter(
+        method='mappingset_filter',
+    )
 
     def project_filter(self, queryset, name, value):
         if is_uuid(value):
             return queryset.filter(project__pk=value)
         else:
             return queryset.filter(project__name=value)
 
+    def mappingset_filter(self, queryset, name, value):
+        if is_uuid(value):
+            return queryset.filter(mappingset__pk=value)
+        else:
+            return queryset.filter(mappingset__name=value)
+
     class Meta:
         fields = '__all__'
         exclude = ('payload',)
@@ -100,6 +145,9 @@ class EntityFilter(filters.FilterSet):
     project = filters.CharFilter(
         method='project_filter',
     )
+    mapping = filters.CharFilter(
+        method='mapping_filter',
+    )
 
     def project_filter(self, queryset, name, value):
         if is_uuid(value):
@@ -113,6 +161,12 @@ def schema_filter(self, queryset, name, value):
         else:
             return queryset.filter(projectschema__schema__name=value)
 
+    def mapping_filter(self, queryset, name, value):
+        if is_uuid(value):
+            return queryset.filter(mapping__pk=value)
+        else:
+            return queryset.filter(mapping__name=value)
+
     class Meta:
         fields = '__all__'
         exclude = ('payload',)