EM-DAT Extraction Transform #128

Rup-Narayan-Rajbanshi · 2025-01-23T12:30:47Z

Changes

Add Extraction and Transformation for EMDAT

This PR doesn't introduce any:

This PR contains valid:

tests
permission checks (tests here too)
translations

thenav56

Looks good. Few comments

thenav56 · 2025-01-27T07:57:53Z

apps/etl/extraction/sources/emdat/extract.py

+        # Get latest emdat extraction object so that we do not need to fetch historical data
+        latest_extraction = (
+            ExtractionData.objects.filter(
+                source=ExtractionData.Source.EMDAT, status=ExtractionData.Status.SUCCESS, resp_data__isnull=False
+            )
+            .exclude(source_validation_status=ExtractionData.ValidationStatus.NO_DATA)
+            .order_by("-created_at")
+            .first()
+        )
+        if latest_extraction:
+            with latest_extraction.resp_data.open() as data_file:
+                data = data_file.read()
+
+            data_json = json.loads(data)
+            if data_json["data"]["public_emdat"]:
+                total_hazard_objects = data_json["data"]["public_emdat"]["total_available"]
+                # total_hazard_objects is passed as offset not to fetch historical data
+                variables = {"offset": total_hazard_objects, "include_hist": False, "classif": classification_keys}


Let's move this outside try/catch as we aren't handling this there.

@thenav56 we need to use this inside loop, so we cannot move this out side try/catch. I am using this variable to fetch data for the latest year.

thenav56 · 2025-01-27T08:01:05Z

apps/etl/extraction/sources/emdat/extract.py

+            with latest_extraction.resp_data.open() as data_file:
+                data = data_file.read()
+
+            data_json = json.loads(data)
+            if data_json["data"]["public_emdat"]:


We should maybe add another JSON field in ExtractionData to store this kind of information (maybe metadata?). Then define dataclasse/schema for that field for each data source if required. Then get this information directly instead of loading raw dataset each time.

@thenav56 I added a json field in PR:#129 for this functionality.

thenav56 · 2025-01-27T08:03:15Z

apps/etl/extraction/sources/emdat/extract.py

+            data_json = json.loads(data)
+            if data_json["data"]["public_emdat"]:
+                total_hazard_objects = data_json["data"]["public_emdat"]["total_available"]
+                # total_hazard_objects is passed as offset not to fetch historical data


Let's append with XXX: here as this is a hack

@thenav56 this is removed in next commit, as we are using year filter to fetch data.

thenav56 · 2025-01-27T08:04:35Z

apps/etl/etl_tasks/emdat.py

+    extraction_id = import_hazard_data()
+
+    # Transform the data from emdat
+    transform_emdat_data(extraction_id)


Any reason we aren't running this in separate celery tasks?

@thenav56 I did this so that transformation starts only after the extraction ends. The whole extraction object is saved in a row. Thus pass extraction_id into transformation_emdat() and do the required transformation.

thenav56 · 2025-01-27T08:05:33Z

apps/etl/transform/sources/emdat.py

+    """
+    ext_instance = ExtractionData.objects.filter(id=extraction_id).first()
+    if ext_instance and ext_instance.source_validation_status == ExtractionData.ValidationStatus.NO_DATA:
+        logger.error(


logger.error(

Let's using warning... using logger.error will send alert to sentry.

thenav56 · 2025-01-27T08:07:39Z

apps/etl/transform/sources/emdat.py

+        logger.error(
+            "No data available",
+            exe_info=True,
+            extra={"source": ExtractionData.Source.EMDAT, "extraction_id": ext_instance.id},


extra={"source": ExtractionData.Source.EMDAT, "extraction_id": ext_instance.id},

This is not required here

Rup-Narayan-Rajbanshi

@thenav56 I have replied to your comments. Please have a see.

Rup-Narayan-Rajbanshi · 2025-01-28T10:37:44Z

apps/etl/extraction/sources/emdat/extract.py

+        # Get latest emdat extraction object so that we do not need to fetch historical data
+        latest_extraction = (
+            ExtractionData.objects.filter(
+                source=ExtractionData.Source.EMDAT, status=ExtractionData.Status.SUCCESS, resp_data__isnull=False
+            )
+            .exclude(source_validation_status=ExtractionData.ValidationStatus.NO_DATA)
+            .order_by("-created_at")
+            .first()
+        )
+        if latest_extraction:
+            with latest_extraction.resp_data.open() as data_file:
+                data = data_file.read()
+
+            data_json = json.loads(data)
+            if data_json["data"]["public_emdat"]:
+                total_hazard_objects = data_json["data"]["public_emdat"]["total_available"]
+                # total_hazard_objects is passed as offset not to fetch historical data
+                variables = {"offset": total_hazard_objects, "include_hist": False, "classif": classification_keys}


@thenav56 we need to use this inside loop, so we cannot move this out side try/catch. I am using this variable to fetch data for the latest year.

Rup-Narayan-Rajbanshi force-pushed the feature/extract-emdat branch from 2551d53 to 08bbaa4 Compare January 23, 2025 12:35

Rup-Narayan-Rajbanshi marked this pull request as ready for review January 24, 2025 12:40

Rup-Narayan-Rajbanshi requested review from emmanuelmathot, thenav56 and ranjan-stha January 24, 2025 12:40

Rup-Narayan-Rajbanshi force-pushed the feature/extract-emdat branch from f99d8cc to 8341b0e Compare January 27, 2025 08:03

thenav56 requested changes Jan 27, 2025

View reviewed changes

Rup-Narayan-Rajbanshi force-pushed the feature/extract-emdat branch 2 times, most recently from d8883ad to f99d8cc Compare January 27, 2025 09:52

Rup-Narayan-Rajbanshi commented Jan 28, 2025

View reviewed changes

Rup-Narayan-Rajbanshi changed the title ~~Feature/extract emdat~~ EM-DAT Extraction Transform Jan 28, 2025

Rup-Narayan-Rajbanshi added 4 commits January 28, 2025 17:54

Add extraction.

543f5a4

Add transformation for EMDAT

001afa9

Add extraction for historical and latest data.

9104c78

Add filter from, to into emdat query.

cae3c0c

Rup-Narayan-Rajbanshi force-pushed the feature/extract-emdat branch from 5826b39 to cae3c0c Compare January 28, 2025 12:20

frozenhelium approved these changes Feb 4, 2025

View reviewed changes

frozenhelium requested a review from thenav56 February 4, 2025 04:10

thenav56 approved these changes Feb 4, 2025

View reviewed changes

frozenhelium merged commit 1ce874b into develop Feb 4, 2025

frozenhelium deleted the feature/extract-emdat branch February 4, 2025 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EM-DAT Extraction Transform #128

EM-DAT Extraction Transform #128

Rup-Narayan-Rajbanshi commented Jan 23, 2025 •

edited

Loading

thenav56 left a comment

thenav56 Jan 27, 2025

Rup-Narayan-Rajbanshi Jan 28, 2025

thenav56 Jan 27, 2025

Rup-Narayan-Rajbanshi Jan 28, 2025

thenav56 Jan 27, 2025

Rup-Narayan-Rajbanshi Jan 28, 2025

thenav56 Jan 27, 2025

Rup-Narayan-Rajbanshi Jan 28, 2025

thenav56 Jan 27, 2025

thenav56 Jan 27, 2025

Rup-Narayan-Rajbanshi left a comment

Rup-Narayan-Rajbanshi Jan 28, 2025

EM-DAT Extraction Transform #128

EM-DAT Extraction Transform #128

Conversation

Rup-Narayan-Rajbanshi commented Jan 23, 2025 • edited Loading

Changes

This PR doesn't introduce any:

This PR contains valid:

thenav56 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rup-Narayan-Rajbanshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rup-Narayan-Rajbanshi commented Jan 23, 2025 •

edited

Loading