diff --git a/examples/advanced/finance-end-to-end/README.md b/examples/advanced/finance-end-to-end/README.md
new file mode 100644
index 0000000000..d438085108
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/README.md
@@ -0,0 +1,360 @@
+# End-to-End Process Illustration of Federated XGBoost Methods
+
+This example demonstrates the use of an end-to-end process for credit card fraud detection using XGBoost.
+
+The original dataset is based on the [kaggle credit card fraud dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud).
+
+To illustrate the end-to-end process that is more realistic for financial applications, we manually duplicated the records to extend the data time span from 2 days to over 2 years, and added random transactional information. As our primary goal is to showcase the process, there is no need to focus too much on the data itself.
+
+The overall steps of the end-to-end process include the following:
+
+## Step 1: Data Preparation
+
+In a real-world application, this step is not necessary.
+
+* To prepare the data, we expand the credit card data by adding additional randomly generated columns, 
+including sender and receiver BICs, currency, etc.
+* We then split the data based on the Sender BIC. Each Sender represents one financial institution, 
+thus serving as one site (client) for federated learning.
+
+We illustrate this step in the notebook [prepare_data] (./prepare_data.ipynb). The resulting dataset looks like the following:
+
+![data](./figures/generated_data.png)
+
+Once we have this synthetic data, we like to split the data into 
+* historical data ( oldest data) -- 55%
+* training data 35 % 
+* test data remaining 10%
+
+```
+Historical DataFrame size: 626575
+Training DataFrame size: 398729
+Testing DataFrame size: 113924
+```
+Next we will split the data among different clients, i.e. different Sender_BICs. 
+For example: Sender = Bank_1, BIC =ZHSZUS33
+the client directory is **ZHSZUS33_Bank_1**
+
+For this site, we will have three files. 
+```
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/history.csv
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/test.csv
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/train.csv
+```
+![split_data](./figures/split_data.png)
+
+The python code for data generation is located at [prepare_data.py](./utils/prepare_data.py)
+
+## Step 2: Feature Analysis
+
+For this stage, we would like to analyze the data, understand the features, and derive (and encode) secondary features that can be more useful for building the model.
+
+Towards this goal, there are two options:
+1. **Feature Enrichment**: This process involves adding new features based on the existing data. For example, we can calculate the average transaction amount for each currency and add this as a new feature. 
+2. **Feature Encoding**: This process involves encoding the current features and transforming them to embedding space via machine learning models. This model can be either pre-trained, or trained with the candidate dataset.
+
+Considering the fact that the only two numerical features in the dataset are "Amount" and "Time", we will perform feature enrichment first. Optionally, we can also perform feature encoding. In this example, we use graph neural network (GNN): we will train the GNN model in a federated unsupervised fashion, and then use the model to encode the features for all sites.
+
+### Step 2.1: Rule-based Feature Enrichment 
+In this process, we will enrich the data and add a few new derived features to illustrate the process. 
+Whether such enrichment makes sense or not is task and data dependent, essentially, this process is adding hand-crafted features to the classifier inputs.
+
+#### Single-site operation example: enrichiment
+Since all sites follow the same procedures, we only need to look at one site. For example, we will look at the site with 
+the name "ZHSZUS33_Bank_1."
+
+The data enrichment process involves the following steps:
+
+1. **Grouping by Currency**: Calculate hist_trans_volume, hist_total_amount, and hist_average_amount for each currency.
+2. **Aggregation for Training and Test Data**: Aggregate the data in 1-hour intervals, grouped by currency. The aggregated value is then divided by hist_trans_volume, and this new column is named x2_y1.
+3. **Repeating for Beneficiary BIC**: Perform the same process for Beneficiary_BIC to generate another feature called x3_y2.
+4. **Merging Features**: Merge the two enriched features based on Time and Beneficiary_BIC.
+
+The resulting Dataset looks like this. 
+![enrich_data](./figures/enrichment.png)
+
+We save the enriched data into new csv files. 
+```
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/train_enrichment.csv
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/test_enrichment.csv
+```
+#### Single-site operation example: additional processing 
+After feature enrichment, we can normalize the numerical features and perform one-hot encoding for the categorical
+features. Without loss of generality, we will skip the categorical feature encoding in this example to avoid significantly increasing
+the file size (from 11 MB to over 2 GB).
+
+Similar to the feature enrichment process, we will consider only one site for now. The steps are straightforward: 
+we apply the scaler transformation to the numerical features and then merge them back with the categorical features.
+
+```
+    scaler = MinMaxScaler()
+    
+    # Fit and transform the numerical data
+    numerical_normalized = pd.DataFrame(scaler.fit_transform(numerical_features), columns=numerical_features.columns)
+    
+    # Combine the normalized numerical features with the categorical features
+    df_combined = pd.concat([categorical_features, numerical_normalized], axis=1)
+```
+the file is then saved to "_normalized.csv"
+
+```
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/train_normalized.csv
+/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/test_normalized.csv
+```
+
+#### Federated Enrichment and Normalization for All Sites
+We can easily convert the notebook code into the python code for federated execution on each site.
+
+##### Task code
+Convert the enrichment code for one-site to the federated learning, refer to [enrich.py](./nvflare/enrich.py)
+
+The main execution flow is the following:
+```
+def main():
+    print("\n enrichment starts \n ")
+
+    args = define_parser()
+
+    input_dir = args.input_dir
+    output_dir = args.output_dir
+
+    site_name = <site name>
+    print(f"\n {site_name =} \n ")
+
+    merged_dfs = enrichment(input_dir, site_name)
+
+    for ds_name in merged_dfs:
+        save_to_csv(merged_dfs[ds_name], output_dir, site_name, ds_name)
+
+```
+change this code to Federated ETL code, we just add few lines of code
+
+`flare.init()` to initialize the flare library,
+`etl_task = flare.receive()` to receive the global message from NVFlare,
+and `end_task = GenericTask()` `flare.send(end_task)` to send the message back to the controller.
+
+```
+def main():
+    print("\n enrichment starts \n ")
+
+    args = define_parser()
+    flare.init()
+    
+    input_dir = args.input_dir
+    output_dir = args.output_dir
+
+    site_name = flare.get_site_name()
+    print(f"\n {site_name =} \n ")
+
+    # receives global message from NVFlare
+    etl_task = flare.receive()
+    merged_dfs = enrichment(input_dir, site_name)
+
+    for ds_name in merged_dfs:
+        save_to_csv(merged_dfs[ds_name], output_dir, site_name, ds_name)
+
+    # send message back the controller indicating end.
+    end_task = GenericTask()
+    flare.send(end_task)
+```
+
+Similar adaptation is required for the normalization code, refer to [pre_process.py](./nvflare/pre_process.py) for details.
+
+##### Job code
+Job code is executed to trigger and dispatch the ETL tasks from the previous step. 
+For this purpose, we wrote the following script: [enrich_job.py](./nvflare/enrich_job.py)
+
+```
+def main():
+    args = define_parser()
+
+    site_names = args.sites
+    work_dir = args.work_dir
+    job_name = args.job_name
+    task_script_path = args.task_script_path
+    task_script_args = args.task_script_args
+
+    job = FedJob(name=job_name)
+
+    # Define the enrich_ctrl workflow and send to server
+    enrich_ctrl = ETLController(task_name="enrich")
+    job.to(enrich_ctrl, "server", id="enrich")
+
+    # Add clients
+    for site_name in site_names:
+        executor = ScriptExecutor(task_script_path=task_script_path, task_script_args=task_script_args)
+        job.to(executor, site_name, tasks=["enrich"], gpu=0)
+
+    if work_dir:
+        print(f"{work_dir=}")
+        job.export_job(work_dir)
+
+    if not args.config_only:
+        job.simulator_run(work_dir)
+```
+Here we define a ETLController for server, and ScriptExecutor for client side ETL script. 
+
+Similarly, we can write the normalization job code [pre_process_job.py](./nvflare/pre_process_job.py) for the server-side.
+
+### (Optional) Step 2.2: GNN Training & Feature Encoding
+Based on raw features, or combining the derived features from **Step 2.1**, we can use machine learning models to encode the features. 
+In this example, we use federated GNN to learn and generate the feature embeddings.
+
+First, we construct a graph based on the transaction data. Each node represents a transaction, and the edges represent the relationships between transactions. We then use the GNN to learn the embeddings of the nodes, which represent the transaction features.
+
+#### Single-site operation example: graph construction
+Since each site consists of the same Sender_BIC, to define the graph edge, we use the following rules:
+1. The two transactions are with the same Receiver_BIC.
+2. The two transactions time difference are smaller than 6000.
+
+The resulting graph looks like below, essentially an undirected graph with transactions (identified by `UETR`) as nodes and edges connecting two nodes that satisfy the above two rules.
+![edge_map](./figures/edge_map.png)
+
+#### Single-site operation example: GNN training and encoding
+We use the graph constructed in the previous step to train the GNN model. The GNN model is trained in a federated unsupervised fashion, and the embeddings are generated for each transaction.
+The GNN training procedure is similar to the unsupervised Protein Classification task in our [GNN example](../gnn/README.md) with customized data preparation steps.
+
+The results of the GNN training are:
+- a GNN model 
+- the embeddings of the transactions, in this example, they are of dimension 64
+![embedding](./figures/embeddings.png)
+
+#### Federated GNN Training and Encoding for All Sites
+Similar to Step 2.1, we can easily convert the notebook code into the python code for federated execution on each site. For simplicity, we will skip the code examples for this step.
+Please refer to the scripts:
+- [graph_construct.py](./nvflare/graph_construct.py) and [graph_construct_job.py](./nvflare/graph_construct_job.py) for graph construction
+- [gnn_train_encode.py](./nvflare/gnn_train_encode.py) and [gnn_train_encode_job.py](./nvflare/gnn_train_encode_job.py) for GNN training and encoding
+
+## Step 3: Federated Training of XGBoost
+Now we have enriched / encoded features, the last step is to run federated XGBoost over them. 
+Below is the xgboost job code
+
+```
+def main():
+    args = define_parser()
+
+    site_names = args.sites
+    work_dir = args.work_dir
+    job_name = args.job_name
+    root_dir = args.input_dir
+    file_postfix = args.file_postfix
+
+    num_rounds = 10
+    early_stopping_rounds = 10
+    xgb_params = {
+        "max_depth": 8,
+        "eta": 0.1,
+        "objective": "binary:logistic",
+        "eval_metric": "auc",
+        "tree_method": "hist",
+        "nthread": 16,
+    }
+
+    job = FedJob(name=job_name)
+
+    # Define the controller workflow and send to server
+    
+   controller = XGBFedController(
+        num_rounds=num_rounds,
+        training_mode="horizontal",
+        xgb_params=xgb_params,
+        xgb_options={"early_stopping_rounds": early_stopping_rounds},
+    )
+    job.to(controller, "server")
+
+    # Add clients
+    for site_name in site_names:
+        executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
+        job.to(executor, site_name, gpu=0)
+        data_loader = CreditCardDataLoader(root_dir=root_dir, file_postfix=file_postfix)
+        job.to(data_loader, site_name, id="data_loader")
+    if work_dir:
+        job.export_job(work_dir)
+
+    if not args.config_only:
+        job.simulator_run(work_dir)
+```
+
+In this code, all we need to write is a customized `CreditCardDataLoader`, which is XGBDataLoader,
+the rest of code is handled by XGBoost Controller and Executor. For simplicity, we only loaded the numerical feature in this example.
+
+## End-to-end Experiment
+You can run this from the command line interface (CLI) or orchestrate it using a workflow tool such as Airflow. 
+Here, we will demonstrate how to run this from a simulator. You can always export the job configuration and run
+it anywhere in a real deployment.
+
+Assuming you have already downloaded the credit card dataset and the creditcard.csv file is located in the current directory:
+
+* prepare data
+```
+python ./utils/prepare_data.py -i ./creditcard.csv -o /tmp/nvflare/xgb/credit_card
+```
+>Note: All Sender SICs are considered clients: they are 
+> * 'ZHSZUS33_Bank_1'
+> * 'SHSHKHH1_Bank_2'
+> * 'YXRXGB22_Bank_3'
+> * 'WPUWDEFF_Bank_4'
+> * 'YMNYFRPP_Bank_5'
+> * 'FBSFCHZH_Bank_6'
+> * 'YSYCESMM_Bank_7'
+> * 'ZNZZAU3M_Bank_8'
+> * 'HCBHSGSG_Bank_9'
+> * 'XITXUS33_Bank_10' 
+> Total 10 banks
+
+* enrich data
+```
+cd nvflare
+python enrich_job.py -c 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'FBSFCHZH_Bank_6' 'YMNYFRPP_Bank_5' 'WPUWDEFF_Bank_4' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'YSYCESMM_Bank_7' 'ZHSZUS33_Bank_1' 'HCBHSGSG_Bank_9' -p enrich.py  -a "-i /tmp/nvflare/xgb/credit_card/ -o /tmp/nvflare/xgb/credit_card/"
+cd ..
+```
+
+* pre-process data
+```
+cd nvflare
+python pre_process_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -p pre_process.py -a "-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/"
+cd ..
+```
+
+* construct graph
+```
+cd nvflare
+python graph_construct_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -p graph_construct.py -a "-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/"
+cd ..
+```
+
+* GNN Training and Encoding
+```
+cd nvflare
+python gnn_train_encode_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -p gnn_train_encode.py -a "-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/"
+cd ..
+```
+
+
+* XGBoost Job 
+
+We run XGBoost Job on two types of data: normalized, and GNN embeddings
+For normalized data, we run the following command
+```
+cd nvflare
+python xgb_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -i /tmp/nvflare/xgb/credit_card  -w /tmp/nvflare/workspace/xgb/credit_card/
+cd ..
+```
+Below is the output of last round of training (starting round = 0) 
+```
+...
+[9]	eval-auc:0.67596	train-auc:0.70582
+```
+For GNN embeddings, we run the following command
+```
+cd nvflare
+python xgb_job_embed.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -i /tmp/nvflare/xgb/credit_card  -w /tmp/nvflare/workspace/xgb/credit_card_embed/
+cd ..
+```
+Below is the output of last round of training (starting round = 0) 
+```
+...
+[9]	eval-auc:0.53788	train-auc:0.61659
+```
+For this example, the normalized data performs better than the GNN embeddings. This is expected as the GNN embeddings are produced with randomly generated transactional information, which adds noise to the data.
+
diff --git a/examples/advanced/finance-end-to-end/feature_enrichment.ipynb b/examples/advanced/finance-end-to-end/feature_enrichment.ipynb
deleted file mode 100644
index 884a0066b0..0000000000
--- a/examples/advanced/finance-end-to-end/feature_enrichment.ipynb
+++ /dev/null
@@ -1,1106 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "e6d10159-9c02-4bdd-ad6a-f9b7e19ac575",
-   "metadata": {},
-   "source": [
-    "## Feature Enrichment\n",
-    "\n",
-    "### Historical data enrichment\n",
-    "\n",
-    "Pick one client (Site, aka sender_BIC) to do the enrichment as every site will be the same process"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "7130bd7a-bda0-4592-818f-bd65c505baa3",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "site_input_dir = \"/tmp/dataset/horizontal_credit_fraud_data/\"\n",
-    "site_name = \"ZHSZUS33_Bank_1\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "9375ffaa-1143-43f5-b1a3-3ef45918e4bf",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Time</th>\n",
-       "      <th>Amount</th>\n",
-       "      <th>Class</th>\n",
-       "      <th>Sender_BIC</th>\n",
-       "      <th>Receiver_BIC</th>\n",
-       "      <th>UETR</th>\n",
-       "      <th>Currency</th>\n",
-       "      <th>Beneficiary_BIC</th>\n",
-       "      <th>Currency_Country</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.69</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YSYCESMM</td>\n",
-       "      <td>R7PCTKF9R1PVGXRXU9AB3J</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>200.0</td>\n",
-       "      <td>3.67</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>28P261NQ3D4WIZUY4RDXFO</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>900.0</td>\n",
-       "      <td>3.68</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YMNYFRPP</td>\n",
-       "      <td>2XJ54L8ED31VMBC1MYIK8L</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>1700.0</td>\n",
-       "      <td>34.09</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>Y3ZW8BUEF5UTB5LWVNEFPG</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>2900.0</td>\n",
-       "      <td>20.53</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>FHOWZR8Q77BXKIZHAC0781</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>62558</th>\n",
-       "      <td>39325300.0</td>\n",
-       "      <td>5.00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>EC9HYAUYQ3UARN1CMXER1C</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>62559</th>\n",
-       "      <td>39325900.0</td>\n",
-       "      <td>61.00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>6CT5WHMATEO4Z6UYDECPWR</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>62560</th>\n",
-       "      <td>39325900.0</td>\n",
-       "      <td>1.00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>GFCUM49U6M2LRN5NBEB9PK</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>62561</th>\n",
-       "      <td>39327200.0</td>\n",
-       "      <td>74.75</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>BLP8GYMXG6JWR104DT3Z8D</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>62562</th>\n",
-       "      <td>39327800.0</td>\n",
-       "      <td>0.99</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YMNYFRPP</td>\n",
-       "      <td>5QEHQCHK8JFTVNCYC4KQKG</td>\n",
-       "      <td>SGD</td>\n",
-       "      <td>HCBHSGSG</td>\n",
-       "      <td>Singapore</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>62563 rows × 9 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "             Time  Amount  Class Sender_BIC Receiver_BIC  \\\n",
-       "0             0.0    2.69      0   ZHSZUS33     YSYCESMM   \n",
-       "1           200.0    3.67      0   ZHSZUS33     ZNZZAU3M   \n",
-       "2           900.0    3.68      0   ZHSZUS33     YMNYFRPP   \n",
-       "3          1700.0   34.09      0   ZHSZUS33     XITXUS33   \n",
-       "4          2900.0   20.53      0   ZHSZUS33     XITXUS33   \n",
-       "...           ...     ...    ...        ...          ...   \n",
-       "62558  39325300.0    5.00      0   ZHSZUS33     FBSFCHZH   \n",
-       "62559  39325900.0   61.00      0   ZHSZUS33     ZNZZAU3M   \n",
-       "62560  39325900.0    1.00      0   ZHSZUS33     ZHSZUS33   \n",
-       "62561  39327200.0   74.75      0   ZHSZUS33     ZHSZUS33   \n",
-       "62562  39327800.0    0.99      0   ZHSZUS33     YMNYFRPP   \n",
-       "\n",
-       "                         UETR Currency Beneficiary_BIC Currency_Country  \n",
-       "0      R7PCTKF9R1PVGXRXU9AB3J      AUD        ZNZZAU3M        Australia  \n",
-       "1      28P261NQ3D4WIZUY4RDXFO      USD        XITXUS33    United States  \n",
-       "2      2XJ54L8ED31VMBC1MYIK8L      AUD        ZNZZAU3M        Australia  \n",
-       "3      Y3ZW8BUEF5UTB5LWVNEFPG      GBP        YXRXGB22   United Kingdom  \n",
-       "4      FHOWZR8Q77BXKIZHAC0781      USD        ZHSZUS33    United States  \n",
-       "...                       ...      ...             ...              ...  \n",
-       "62558  EC9HYAUYQ3UARN1CMXER1C      AUD        ZNZZAU3M        Australia  \n",
-       "62559  6CT5WHMATEO4Z6UYDECPWR      USD        XITXUS33    United States  \n",
-       "62560  GFCUM49U6M2LRN5NBEB9PK      GBP        YXRXGB22   United Kingdom  \n",
-       "62561  BLP8GYMXG6JWR104DT3Z8D      USD        ZHSZUS33    United States  \n",
-       "62562  5QEHQCHK8JFTVNCYC4KQKG      SGD        HCBHSGSG        Singapore  \n",
-       "\n",
-       "[62563 rows x 9 columns]"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import os\n",
-    "import random\n",
-    "import string\n",
-    "\n",
-    "import pandas as pd\n",
-    "history_file_name = os.path.join(site_input_dir, site_name,\"history.csv\" )\n",
-    "df_history = pd.read_csv(history_file_name)\n",
-    "df_history"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "3fe8e513-f041-4165-88b1-3b21607ca734",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Currency</th>\n",
-       "      <th>hist_trans_volume</th>\n",
-       "      <th>hist_total_amount</th>\n",
-       "      <th>hist_average_amount</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>AUD</td>\n",
-       "      <td>12572</td>\n",
-       "      <td>1094630.75</td>\n",
-       "      <td>87.068943</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>CHF</td>\n",
-       "      <td>12494</td>\n",
-       "      <td>1090937.46</td>\n",
-       "      <td>87.316909</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>GBP</td>\n",
-       "      <td>12496</td>\n",
-       "      <td>1121443.99</td>\n",
-       "      <td>89.744237</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>SGD</td>\n",
-       "      <td>12460</td>\n",
-       "      <td>1121692.50</td>\n",
-       "      <td>90.023475</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>USD</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Currency  hist_trans_volume  hist_total_amount  hist_average_amount\n",
-       "0      AUD              12572         1094630.75            87.068943\n",
-       "1      CHF              12494         1090937.46            87.316909\n",
-       "2      GBP              12496         1121443.99            89.744237\n",
-       "3      SGD              12460         1121692.50            90.023475\n",
-       "4      USD              12541         1124650.23            89.677875"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "\n",
-    "\n",
-    "history_summary = df_history.groupby('Currency').agg(\n",
-    "                     hist_trans_volume=('UETR', 'count'),\n",
-    "                     hist_total_amount=('Amount', 'sum'),\n",
-    "                     hist_average_amount=('Amount', 'mean')\n",
-    ").reset_index()\n",
-    "\n",
-    "history_summary"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "025ac920-c1c3-401f-b420-18c39b7d04d2",
-   "metadata": {},
-   "source": [
-    "# Enrich Feature with Currency"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "7aa07b6d-dc96-45e6-a467-8c770cafb84e",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "dataset_names = [\"train\", \"test\"]\n",
-    "results = {}\n",
-    "\n",
-    "temp_ds_df = {}\n",
-    "temp_resampled_df = {}\n",
-    "\n",
-    "\n",
-    "for ds_name in dataset_names:\n",
-    "    file_name = os.path.join(site_input_dir, site_name , f\"{ds_name}.csv\" )\n",
-    "    ds_df  = pd.read_csv(file_name)\n",
-    "    ds_df['Time'] = pd.to_datetime(ds_df['Time'], unit='s')\n",
-    "\n",
-    "    # Set the Time column as the index\n",
-    "    ds_df.set_index('Time', inplace=True)\n",
-    "    \n",
-    "    resampled_df = ds_df.resample('1H').agg(\n",
-    "                     trans_volume=('UETR', 'count'),\n",
-    "                     total_amount=('Amount', 'sum'),\n",
-    "                     average_amount=('Amount', 'mean')\n",
-    "                     ).reset_index()\n",
-    "    \n",
-    "    temp_ds_df[ds_name] = ds_df\n",
-    "    temp_resampled_df[ds_name] = resampled_df\n",
-    "    \n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a2e86bc5-e8ad-41f5-b343-29595a378c03",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "for ds_name in dataset_names:\n",
-    "        \n",
-    "    ds_df = temp_ds_df[ds_name]\n",
-    "    resampled_df = temp_resampled_df[ds_name]\n",
-    "    \n",
-    "    c_df = ds_df[['Currency']].resample('1H').agg({'Currency': 'first'}).reset_index()\n",
-    "    # Add Currency_Country to the resampled data by joining with the original DataFrame\n",
-    "    resampled_df2 = pd.merge(resampled_df, \n",
-    "                            c_df,\n",
-    "                            on='Time'\n",
-    "                            )\n",
-    "    resampled_df3 = pd.merge(resampled_df2, \n",
-    "                             history_summary,\n",
-    "                             on='Currency'\n",
-    "                            )\n",
-    "    resampled_df4 = resampled_df3.copy()\n",
-    "    resampled_df4['x2_y1'] = resampled_df4['average_amount']/resampled_df4['hist_trans_volume']\n",
-    "    \n",
-    "    ds_df = ds_df.sort_values('Time')\n",
-    "    resampled_df4 = resampled_df4.sort_values('Time')\n",
-    "    merged_df = pd.merge_asof(ds_df, resampled_df4, on='Time' )\n",
-    "    \n",
-    "    merged_df = merged_df.drop(columns=['Currency_y']).rename(columns={'Currency_x': 'Currency'})\n",
-    "\n",
-    "    \n",
-    "    results[ds_name] = merged_df\n",
-    "    \n",
-    "    \n",
-    "    \n",
-    "\n",
-    "print(results)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7051468f-2de0-4e41-a227-7fad4c9110af",
-   "metadata": {
-    "tags": []
-   },
-   "source": [
-    "# Enrich feature for beneficiary country"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "605095b7-a514-4346-b984-3590d79d13e4",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Beneficiary_BIC</th>\n",
-       "      <th>hist_trans_volume</th>\n",
-       "      <th>hist_total_amount</th>\n",
-       "      <th>hist_average_amount</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>12494</td>\n",
-       "      <td>1090937.46</td>\n",
-       "      <td>87.316909</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>HCBHSGSG</td>\n",
-       "      <td>12460</td>\n",
-       "      <td>1121692.50</td>\n",
-       "      <td>90.023475</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>6211</td>\n",
-       "      <td>572653.93</td>\n",
-       "      <td>92.199957</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>12496</td>\n",
-       "      <td>1121443.99</td>\n",
-       "      <td>89.744237</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>6330</td>\n",
-       "      <td>551996.30</td>\n",
-       "      <td>87.203207</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>5</th>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>12572</td>\n",
-       "      <td>1094630.75</td>\n",
-       "      <td>87.068943</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Beneficiary_BIC  hist_trans_volume  hist_total_amount  hist_average_amount\n",
-       "0        FBSFCHZH              12494         1090937.46            87.316909\n",
-       "1        HCBHSGSG              12460         1121692.50            90.023475\n",
-       "2        XITXUS33               6211          572653.93            92.199957\n",
-       "3        YXRXGB22              12496         1121443.99            89.744237\n",
-       "4        ZHSZUS33               6330          551996.30            87.203207\n",
-       "5        ZNZZAU3M              12572         1094630.75            87.068943"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "\n",
-    "history_summary2 = df_history.groupby('Beneficiary_BIC').agg(\n",
-    "                     hist_trans_volume=('UETR', 'count'),\n",
-    "                     hist_total_amount=('Amount', 'sum'),\n",
-    "                     hist_average_amount=('Amount', 'mean')\n",
-    ").reset_index()\n",
-    "\n",
-    "history_summary2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "edabd7be-4864-4964-9e25-df543d5985c6",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "dataset_names = [\"train\", \"test\"]\n",
-    "results2 = {}\n",
-    "for ds_name in dataset_names:\n",
-    "    ds_df = temp_ds_df[ds_name]\n",
-    "    resampled_df = temp_resampled_df[ds_name]\n",
-    "    \n",
-    "    c_df = ds_df[['Beneficiary_BIC']].resample('1H').agg({'Beneficiary_BIC': 'first'}).reset_index()\n",
-    "    \n",
-    "    # Add Beneficiary_BIC to the resampled data by joining with the original DataFrame\n",
-    "    resampled_df2 = pd.merge(resampled_df, \n",
-    "                            c_df,\n",
-    "                            on='Time'\n",
-    "                            )\n",
-    "    \n",
-    "    resampled_df3 = pd.merge(resampled_df2, \n",
-    "                             history_summary2,\n",
-    "                             on='Beneficiary_BIC'\n",
-    "                            )\n",
-    "    \n",
-    "    \n",
-    "    resampled_df4 = resampled_df3.copy()\n",
-    "    resampled_df4['x3_y2'] = resampled_df4['average_amount']/resampled_df4['hist_trans_volume']\n",
-    "   \n",
-    "    ds_df = ds_df.sort_values('Time')\n",
-    "    resampled_df4 = resampled_df4.sort_values('Time')\n",
-    "\n",
-    "    merged_df2 = pd.merge_asof(ds_df, resampled_df4, on='Time' )\n",
-    "    merged_df2 = merged_df2.drop(columns=['Beneficiary_BIC_y']).rename(columns={'Beneficiary_BIC_x': 'Beneficiary_BIC'})\n",
-    "    \n",
-    "    \n",
-    "    results2[ds_name] = merged_df2\n",
-    "\n",
-    "print(results2)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "a44309a2-e252-458d-a9dc-2691aea9360f",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/train_enrichment.csv\n",
-      "/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/test_enrichment.csv\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Time</th>\n",
-       "      <th>Class</th>\n",
-       "      <th>Amount</th>\n",
-       "      <th>Sender_BIC</th>\n",
-       "      <th>Receiver_BIC</th>\n",
-       "      <th>UETR</th>\n",
-       "      <th>Currency</th>\n",
-       "      <th>Beneficiary_BIC</th>\n",
-       "      <th>Currency_Country</th>\n",
-       "      <th>trans_volume</th>\n",
-       "      <th>total_amount</th>\n",
-       "      <th>average_amount</th>\n",
-       "      <th>hist_trans_volume</th>\n",
-       "      <th>hist_total_amount</th>\n",
-       "      <th>hist_average_amount</th>\n",
-       "      <th>x2_y1</th>\n",
-       "      <th>x3_y2</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>1971-04-01 04:30:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>348.06</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>MV2B0B0S1NUTY8OCOEQ2QE</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>United States</td>\n",
-       "      <td>4</td>\n",
-       "      <td>422.18</td>\n",
-       "      <td>105.545</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.008416</td>\n",
-       "      <td>0.016993</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>1971-04-01 04:35:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2.69</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YMNYFRPP</td>\n",
-       "      <td>CQD9INGI7GJATKWRK0D44Z</td>\n",
-       "      <td>SGD</td>\n",
-       "      <td>HCBHSGSG</td>\n",
-       "      <td>Singapore</td>\n",
-       "      <td>4</td>\n",
-       "      <td>422.18</td>\n",
-       "      <td>105.545</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.008416</td>\n",
-       "      <td>0.016993</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>1971-04-01 04:40:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>16.63</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>IJXYXLV8SF72RU3MRSJ542</td>\n",
-       "      <td>CHF</td>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>Switzerland</td>\n",
-       "      <td>4</td>\n",
-       "      <td>422.18</td>\n",
-       "      <td>105.545</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.008416</td>\n",
-       "      <td>0.016993</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>1971-04-01 04:51:40</td>\n",
-       "      <td>0</td>\n",
-       "      <td>54.80</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>B1850ZUIHTMT61N7HMIZYM</td>\n",
-       "      <td>CHF</td>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>Switzerland</td>\n",
-       "      <td>4</td>\n",
-       "      <td>422.18</td>\n",
-       "      <td>105.545</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.008416</td>\n",
-       "      <td>0.016993</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>1971-04-01 05:16:40</td>\n",
-       "      <td>0</td>\n",
-       "      <td>31.96</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>4BBLS9B31LWHZFF17RODX1</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "      <td>4</td>\n",
-       "      <td>292.64</td>\n",
-       "      <td>73.160</td>\n",
-       "      <td>12496</td>\n",
-       "      <td>1121443.99</td>\n",
-       "      <td>89.744237</td>\n",
-       "      <td>0.005855</td>\n",
-       "      <td>0.005855</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>40804</th>\n",
-       "      <td>1972-03-10 19:01:40</td>\n",
-       "      <td>0</td>\n",
-       "      <td>12.99</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>WPUWDEFF</td>\n",
-       "      <td>EBY8SA8UZOWNNJ2X7OUBZ2</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>United States</td>\n",
-       "      <td>1</td>\n",
-       "      <td>12.99</td>\n",
-       "      <td>12.990</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.001036</td>\n",
-       "      <td>0.002091</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>40805</th>\n",
-       "      <td>1972-03-10 21:30:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>52.34</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>3D4772259A6PY7Q7XVJ302</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "      <td>2</td>\n",
-       "      <td>272.62</td>\n",
-       "      <td>136.310</td>\n",
-       "      <td>12496</td>\n",
-       "      <td>1121443.99</td>\n",
-       "      <td>89.744237</td>\n",
-       "      <td>0.010908</td>\n",
-       "      <td>0.010908</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>40806</th>\n",
-       "      <td>1972-03-10 21:36:40</td>\n",
-       "      <td>0</td>\n",
-       "      <td>220.28</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YSYCESMM</td>\n",
-       "      <td>Z5VK0S69KASH3B82M6W5XV</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>United States</td>\n",
-       "      <td>2</td>\n",
-       "      <td>272.62</td>\n",
-       "      <td>136.310</td>\n",
-       "      <td>12496</td>\n",
-       "      <td>1121443.99</td>\n",
-       "      <td>89.744237</td>\n",
-       "      <td>0.010908</td>\n",
-       "      <td>0.010908</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>40807</th>\n",
-       "      <td>1972-03-10 22:30:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>60.50</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>HA4WJAB98YR8M9FIE0C2A1</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>United States</td>\n",
-       "      <td>2</td>\n",
-       "      <td>85.29</td>\n",
-       "      <td>42.645</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.003400</td>\n",
-       "      <td>0.006866</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>40808</th>\n",
-       "      <td>1972-03-10 22:58:20</td>\n",
-       "      <td>0</td>\n",
-       "      <td>24.79</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>9SJQ6WVX8CGS0P1DYYGQ45</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "      <td>2</td>\n",
-       "      <td>85.29</td>\n",
-       "      <td>42.645</td>\n",
-       "      <td>12541</td>\n",
-       "      <td>1124650.23</td>\n",
-       "      <td>89.677875</td>\n",
-       "      <td>0.003400</td>\n",
-       "      <td>0.006866</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>40809 rows × 17 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "                     Time  Class  Amount Sender_BIC Receiver_BIC  \\\n",
-       "0     1971-04-01 04:30:00      0  348.06   ZHSZUS33     YXRXGB22   \n",
-       "1     1971-04-01 04:35:00      0    2.69   ZHSZUS33     YMNYFRPP   \n",
-       "2     1971-04-01 04:40:00      0   16.63   ZHSZUS33     XITXUS33   \n",
-       "3     1971-04-01 04:51:40      0   54.80   ZHSZUS33     XITXUS33   \n",
-       "4     1971-04-01 05:16:40      0   31.96   ZHSZUS33     ZHSZUS33   \n",
-       "...                   ...    ...     ...        ...          ...   \n",
-       "40804 1972-03-10 19:01:40      0   12.99   ZHSZUS33     WPUWDEFF   \n",
-       "40805 1972-03-10 21:30:00      0   52.34   ZHSZUS33     YXRXGB22   \n",
-       "40806 1972-03-10 21:36:40      0  220.28   ZHSZUS33     YSYCESMM   \n",
-       "40807 1972-03-10 22:30:00      0   60.50   ZHSZUS33     YXRXGB22   \n",
-       "40808 1972-03-10 22:58:20      0   24.79   ZHSZUS33     ZHSZUS33   \n",
-       "\n",
-       "                         UETR Currency Beneficiary_BIC Currency_Country  \\\n",
-       "0      MV2B0B0S1NUTY8OCOEQ2QE      USD        XITXUS33    United States   \n",
-       "1      CQD9INGI7GJATKWRK0D44Z      SGD        HCBHSGSG        Singapore   \n",
-       "2      IJXYXLV8SF72RU3MRSJ542      CHF        FBSFCHZH      Switzerland   \n",
-       "3      B1850ZUIHTMT61N7HMIZYM      CHF        FBSFCHZH      Switzerland   \n",
-       "4      4BBLS9B31LWHZFF17RODX1      GBP        YXRXGB22   United Kingdom   \n",
-       "...                       ...      ...             ...              ...   \n",
-       "40804  EBY8SA8UZOWNNJ2X7OUBZ2      USD        XITXUS33    United States   \n",
-       "40805  3D4772259A6PY7Q7XVJ302      GBP        YXRXGB22   United Kingdom   \n",
-       "40806  Z5VK0S69KASH3B82M6W5XV      USD        ZHSZUS33    United States   \n",
-       "40807  HA4WJAB98YR8M9FIE0C2A1      USD        XITXUS33    United States   \n",
-       "40808  9SJQ6WVX8CGS0P1DYYGQ45      GBP        YXRXGB22   United Kingdom   \n",
-       "\n",
-       "       trans_volume  total_amount  average_amount  hist_trans_volume  \\\n",
-       "0                 4        422.18         105.545              12541   \n",
-       "1                 4        422.18         105.545              12541   \n",
-       "2                 4        422.18         105.545              12541   \n",
-       "3                 4        422.18         105.545              12541   \n",
-       "4                 4        292.64          73.160              12496   \n",
-       "...             ...           ...             ...                ...   \n",
-       "40804             1         12.99          12.990              12541   \n",
-       "40805             2        272.62         136.310              12496   \n",
-       "40806             2        272.62         136.310              12496   \n",
-       "40807             2         85.29          42.645              12541   \n",
-       "40808             2         85.29          42.645              12541   \n",
-       "\n",
-       "       hist_total_amount  hist_average_amount     x2_y1     x3_y2  \n",
-       "0             1124650.23            89.677875  0.008416  0.016993  \n",
-       "1             1124650.23            89.677875  0.008416  0.016993  \n",
-       "2             1124650.23            89.677875  0.008416  0.016993  \n",
-       "3             1124650.23            89.677875  0.008416  0.016993  \n",
-       "4             1121443.99            89.744237  0.005855  0.005855  \n",
-       "...                  ...                  ...       ...       ...  \n",
-       "40804         1124650.23            89.677875  0.001036  0.002091  \n",
-       "40805         1121443.99            89.744237  0.010908  0.010908  \n",
-       "40806         1121443.99            89.744237  0.010908  0.010908  \n",
-       "40807         1124650.23            89.677875  0.003400  0.006866  \n",
-       "40808         1124650.23            89.677875  0.003400  0.006866  \n",
-       "\n",
-       "[40809 rows x 17 columns]"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "final_results = {}\n",
-    "for name in results:\n",
-    "    df = results[name]\n",
-    "    df2 = results2[name]\n",
-    "    df3 = df2[[\"Time\", \"Beneficiary_BIC\", \"x3_y2\"]].copy()\n",
-    "    df4 = pd.merge(df, df3, on=['Time', 'Beneficiary_BIC'])\n",
-    "    final_results[name] = df4\n",
-    "\n",
-    "    \n",
-    "for name in final_results:\n",
-    "    site_dir = os.path.join(site_input_dir, site_name)\n",
-    "    os.makedirs(site_dir, exist_ok=True)\n",
-    "    enrich_file_name = os.path.join(site_dir, f\"{name}_enrichment.csv\")\n",
-    "    print(enrich_file_name)\n",
-    "    final_results[name].to_csv(enrich_file_name) \n",
-    "    \n",
-    "final_results[\"train\"]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "47c958c3-bf73-4ab3-a66f-414be10870ea",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[01;34m/tmp/dataset/horizontal_credit_fraud_data/\u001b[0m\n",
-      "├── \u001b[01;34mFBSFCHZH_Bank_6\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mHCBHSGSG_Bank_9\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── history.csv\n",
-      "├── \u001b[01;34mSHSHKHH1_Bank_2\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── test.csv\n",
-      "├── train.csv\n",
-      "├── \u001b[01;34mWPUWDEFF_Bank_4\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mXITXUS33_Bank_10\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mYMNYFRPP_Bank_5\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mYSYCESMM_Bank_7\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mYXRXGB22_Bank_3\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mZHSZUS33_Bank_1\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   ├── test_enrichment.csv\n",
-      "│   ├── train.csv\n",
-      "│   └── train_enrichment.csv\n",
-      "└── \u001b[01;34mZNZZAU3M_Bank_8\u001b[0m\n",
-      "    ├── history.csv\n",
-      "    ├── test.csv\n",
-      "    └── train.csv\n",
-      "\n",
-      "10 directories, 35 files\n"
-     ]
-    }
-   ],
-   "source": [
-    "! tree {site_input_dir}"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "791ba1db-0ccf-4b31-b838-828d8c6a98a6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "ls -al  /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "eae3d95a-180a-4fb6-b006-1fc1c144c5c4",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "! find /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/ -exec wc -l {} \\;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "02b30e6d-6433-4b42-9950-d1ca3d83e697",
-   "metadata": {},
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f9966065-80cb-4f85-adab-8c44f01fc8d1",
-   "metadata": {},
-   "source": [
-    "Let's go back to the [XGBoost Notebook](./xgboost.ipynb)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "28efd726-3a92-49bd-ac4f-b70627f1df57",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "85517284-0593-4c5a-ab02-cf31024b88db",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "nvflare_example",
-   "language": "python",
-   "name": "nvflare_example"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.19"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/examples/advanced/finance-end-to-end/figures/edge_map.png b/examples/advanced/finance-end-to-end/figures/edge_map.png
new file mode 100644
index 0000000000..62c0cc3b3c
Binary files /dev/null and b/examples/advanced/finance-end-to-end/figures/edge_map.png differ
diff --git a/examples/advanced/finance-end-to-end/figures/embeddings.png b/examples/advanced/finance-end-to-end/figures/embeddings.png
new file mode 100644
index 0000000000..2864d580ee
Binary files /dev/null and b/examples/advanced/finance-end-to-end/figures/embeddings.png differ
diff --git a/examples/advanced/finance-end-to-end/figures/enrichment.png b/examples/advanced/finance-end-to-end/figures/enrichment.png
new file mode 100644
index 0000000000..2d7b8c2cca
Binary files /dev/null and b/examples/advanced/finance-end-to-end/figures/enrichment.png differ
diff --git a/examples/advanced/finance-end-to-end/figures/generated_data.png b/examples/advanced/finance-end-to-end/figures/generated_data.png
new file mode 100644
index 0000000000..46a1aeef1f
Binary files /dev/null and b/examples/advanced/finance-end-to-end/figures/generated_data.png differ
diff --git a/examples/advanced/finance-end-to-end/figures/split_data.png b/examples/advanced/finance-end-to-end/figures/split_data.png
new file mode 100644
index 0000000000..b717a6cb5d
Binary files /dev/null and b/examples/advanced/finance-end-to-end/figures/split_data.png differ
diff --git a/examples/advanced/finance-end-to-end/images/enrichment.png b/examples/advanced/finance-end-to-end/images/enrichment.png
deleted file mode 100644
index 68f70b57e3..0000000000
Binary files a/examples/advanced/finance-end-to-end/images/enrichment.png and /dev/null differ
diff --git a/examples/advanced/finance-end-to-end/images/generated_data.png b/examples/advanced/finance-end-to-end/images/generated_data.png
deleted file mode 100644
index 4428a5f12d..0000000000
Binary files a/examples/advanced/finance-end-to-end/images/generated_data.png and /dev/null differ
diff --git a/examples/advanced/finance-end-to-end/images/split_data.png b/examples/advanced/finance-end-to-end/images/split_data.png
deleted file mode 100644
index b0d739bfc4..0000000000
Binary files a/examples/advanced/finance-end-to-end/images/split_data.png and /dev/null differ
diff --git a/examples/advanced/finance-end-to-end/notebooks/feature_enrichment.ipynb b/examples/advanced/finance-end-to-end/notebooks/feature_enrichment.ipynb
new file mode 100644
index 0000000000..13ca766500
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/notebooks/feature_enrichment.ipynb
@@ -0,0 +1,319 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e6d10159-9c02-4bdd-ad6a-f9b7e19ac575",
+   "metadata": {},
+   "source": [
+    "## Feature Enrichment\n",
+    "\n",
+    "### Historical data enrichment\n",
+    "\n",
+    "Pick one client (Site, aka sender_BIC) to do the enrichment as every site will be the same process"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7130bd7a-bda0-4592-818f-bd65c505baa3",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "site_input_dir = \"/tmp/dataset/horizontal_credit_fraud_data/\"\n",
+    "site_name = \"ZHSZUS33_Bank_1\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9375ffaa-1143-43f5-b1a3-3ef45918e4bf",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import random\n",
+    "import string\n",
+    "\n",
+    "import pandas as pd\n",
+    "history_file_name = os.path.join(site_input_dir, site_name,\"history.csv\" )\n",
+    "df_history = pd.read_csv(history_file_name)\n",
+    "df_history"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3fe8e513-f041-4165-88b1-3b21607ca734",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "history_summary = df_history.groupby('Currency').agg(\n",
+    "                     hist_trans_volume=('UETR', 'count'),\n",
+    "                     hist_total_amount=('Amount', 'sum'),\n",
+    "                     hist_average_amount=('Amount', 'mean')\n",
+    ").reset_index()\n",
+    "\n",
+    "history_summary"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "025ac920-c1c3-401f-b420-18c39b7d04d2",
+   "metadata": {},
+   "source": [
+    "# Enrich Feature with Currency"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7aa07b6d-dc96-45e6-a467-8c770cafb84e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "dataset_names = [\"train\", \"test\"]\n",
+    "results = {}\n",
+    "\n",
+    "temp_ds_df = {}\n",
+    "temp_resampled_df = {}\n",
+    "\n",
+    "\n",
+    "for ds_name in dataset_names:\n",
+    "    file_name = os.path.join(site_input_dir, site_name , f\"{ds_name}.csv\" )\n",
+    "    ds_df  = pd.read_csv(file_name)\n",
+    "    ds_df['Time'] = pd.to_datetime(ds_df['Time'], unit='s')\n",
+    "\n",
+    "    # Set the Time column as the index\n",
+    "    ds_df.set_index('Time', inplace=True)\n",
+    "    \n",
+    "    resampled_df = ds_df.resample('1H').agg(\n",
+    "                     trans_volume=('UETR', 'count'),\n",
+    "                     total_amount=('Amount', 'sum'),\n",
+    "                     average_amount=('Amount', 'mean')\n",
+    "                     ).reset_index()\n",
+    "    \n",
+    "    temp_ds_df[ds_name] = ds_df\n",
+    "    temp_resampled_df[ds_name] = resampled_df\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a2e86bc5-e8ad-41f5-b343-29595a378c03",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "for ds_name in dataset_names:\n",
+    "        \n",
+    "    ds_df = temp_ds_df[ds_name]\n",
+    "    resampled_df = temp_resampled_df[ds_name]\n",
+    "    \n",
+    "    c_df = ds_df[['Currency']].resample('1H').agg({'Currency': 'first'}).reset_index()\n",
+    "    # Add Currency_Country to the resampled data by joining with the original DataFrame\n",
+    "    resampled_df2 = pd.merge(resampled_df, \n",
+    "                            c_df,\n",
+    "                            on='Time'\n",
+    "                            )\n",
+    "    resampled_df3 = pd.merge(resampled_df2, \n",
+    "                             history_summary,\n",
+    "                             on='Currency'\n",
+    "                            )\n",
+    "    resampled_df4 = resampled_df3.copy()\n",
+    "    resampled_df4['x2_y1'] = resampled_df4['average_amount']/resampled_df4['hist_trans_volume']\n",
+    "    \n",
+    "    ds_df = ds_df.sort_values('Time')\n",
+    "    resampled_df4 = resampled_df4.sort_values('Time')\n",
+    "    \n",
+    "    merged_df = pd.merge_asof(ds_df, resampled_df4, on='Time' )\n",
+    "    merged_df = merged_df.drop(columns=['Currency_y']).rename(columns={'Currency_x': 'Currency'})\n",
+    "    \n",
+    "    results[ds_name] = merged_df\n",
+    "    \n",
+    "print(results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7051468f-2de0-4e41-a227-7fad4c9110af",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# Enrich feature for beneficiary country"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "605095b7-a514-4346-b984-3590d79d13e4",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "history_summary2 = df_history.groupby('Beneficiary_BIC').agg(\n",
+    "                     hist_trans_volume=('UETR', 'count'),\n",
+    "                     hist_total_amount=('Amount', 'sum'),\n",
+    "                     hist_average_amount=('Amount', 'mean')\n",
+    ").reset_index()\n",
+    "\n",
+    "history_summary2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "edabd7be-4864-4964-9e25-df543d5985c6",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "dataset_names = [\"train\", \"test\"]\n",
+    "results2 = {}\n",
+    "for ds_name in dataset_names:\n",
+    "    ds_df = temp_ds_df[ds_name]\n",
+    "    resampled_df = temp_resampled_df[ds_name]\n",
+    "    \n",
+    "    c_df = ds_df[['Beneficiary_BIC']].resample('1H').agg({'Beneficiary_BIC': 'first'}).reset_index()\n",
+    "    \n",
+    "    # Add Beneficiary_BIC to the resampled data by joining with the original DataFrame\n",
+    "    resampled_df2 = pd.merge(resampled_df, \n",
+    "                            c_df,\n",
+    "                            on='Time'\n",
+    "                            )\n",
+    "    \n",
+    "    resampled_df3 = pd.merge(resampled_df2, \n",
+    "                             history_summary2,\n",
+    "                             on='Beneficiary_BIC'\n",
+    "                            )\n",
+    "    \n",
+    "    \n",
+    "    resampled_df4 = resampled_df3.copy()\n",
+    "    resampled_df4['x3_y2'] = resampled_df4['average_amount']/resampled_df4['hist_trans_volume']\n",
+    "   \n",
+    "    ds_df = ds_df.sort_values('Time')\n",
+    "    resampled_df4 = resampled_df4.sort_values('Time')\n",
+    "\n",
+    "    merged_df2 = pd.merge_asof(ds_df, resampled_df4, on='Time' )\n",
+    "    merged_df2 = merged_df2.drop(columns=['Beneficiary_BIC_y']).rename(columns={'Beneficiary_BIC_x': 'Beneficiary_BIC'})\n",
+    "    \n",
+    "    results2[ds_name] = merged_df2\n",
+    "\n",
+    "print(results2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a44309a2-e252-458d-a9dc-2691aea9360f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "final_results = {}\n",
+    "for name in results:\n",
+    "    df = results[name]\n",
+    "    df2 = results2[name]\n",
+    "    df3 = df2[[\"Time\", \"Beneficiary_BIC\", \"x3_y2\"]].copy()\n",
+    "    df4 = pd.merge(df, df3, on=['Time', 'Beneficiary_BIC'])\n",
+    "    final_results[name] = df4\n",
+    "\n",
+    "    \n",
+    "for name in final_results:\n",
+    "    site_dir = os.path.join(site_input_dir, site_name)\n",
+    "    os.makedirs(site_dir, exist_ok=True)\n",
+    "    enrich_file_name = os.path.join(site_dir, f\"{name}_enrichment.csv\")\n",
+    "    print(enrich_file_name)\n",
+    "    final_results[name].to_csv(enrich_file_name) \n",
+    "    \n",
+    "final_results[\"train\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "47c958c3-bf73-4ab3-a66f-414be10870ea",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "! tree {site_input_dir}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "791ba1db-0ccf-4b31-b838-828d8c6a98a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ls -al  /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eae3d95a-180a-4fb6-b006-1fc1c144c5c4",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "! find /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/ -exec wc -l {} \\;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9966065-80cb-4f85-adab-8c44f01fc8d1",
+   "metadata": {},
+   "source": [
+    "Let's go back to the [XGBoost Notebook](../xgboost.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d8855463-ce23-44e5-b0ad-4e05d256ba8d",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nvflare_example",
+   "language": "python",
+   "name": "nvflare_example"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/advanced/finance-end-to-end/notebooks/gnn_train_encode.ipynb b/examples/advanced/finance-end-to-end/notebooks/gnn_train_encode.ipynb
new file mode 100644
index 0000000000..cc688bf729
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/notebooks/gnn_train_encode.ipynb
@@ -0,0 +1,319 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3d892b5e-2f3b-4182-bedb-d332bfc3a353",
+   "metadata": {},
+   "source": [
+    "# GNN Training and Encoding\n",
+    "\n",
+    "* Train a GNN based on enriched features in an unsupervised fashion, and use the resulting model to encode the input features."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8498bf1-d368-4d15-a5bf-559eb6e3918b",
+   "metadata": {},
+   "source": [
+    "## Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db9d04f0-a64d-457b-aacf-1a3737e07e12",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "site_input_dir = \"/tmp/dataset/horizontal_credit_fraud_data/\"\n",
+    "site_name = \"ZHSZUS33_Bank_1\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2d84f89f-fe0a-4387-92a2-49ca9143c141",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import pandas as pd\n",
+    "\n",
+    "dataset_names = [\"train\", \"test\"]\n",
+    "df_feats = {}\n",
+    "df_edges = {}\n",
+    "for ds_name in dataset_names:\n",
+    "    # Get feature and class\n",
+    "    file_name = os.path.join(site_input_dir, site_name, f\"{ds_name}_normalized.csv\")\n",
+    "    df = pd.read_csv(file_name, index_col=0)\n",
+    "    # Drop irrelevant columns\n",
+    "    df = df.drop(columns=[\"Currency_Country\",\n",
+    "                          \"Beneficiary_BIC\",\n",
+    "                          \"Currency\",\n",
+    "                          \"Receiver_BIC\",\n",
+    "                          \"Sender_BIC\"])  \n",
+    "    df_feats[ds_name] = df\n",
+    "    # Get edge map\n",
+    "    file_name = os.path.join(site_input_dir, site_name, f\"{ds_name}_edgemap.csv\")\n",
+    "    df = pd.read_csv(file_name, header=None)\n",
+    "    # Add column names to the edge map\n",
+    "    df.columns = [\"UETR_1\", \"UETR_2\"]\n",
+    "    df_edges[ds_name] = df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a95b6b9d-7046-4ed4-8a7e-ce1f74ddf694",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Prepared Data for Unsupervised GNN Training"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5bd5be54-c5e7-43c7-ad4f-de29a09bc7ec",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import torch\n",
+    "\n",
+    "node_ids = {}\n",
+    "node_features = {}\n",
+    "edge_indices = {}\n",
+    "weights = {}\n",
+    "labels = {}\n",
+    "\n",
+    "for ds_name in dataset_names:\n",
+    "    df_feat_class = df_feats[ds_name]\n",
+    "    df_edge = df_edges[ds_name]\n",
+    "\n",
+    "    # Sort the data by UETR\n",
+    "    df_feat_class = df_feat_class.sort_values(by=\"UETR\").reset_index(drop=True)\n",
+    "\n",
+    "    # Generate UETR-index map with the feature list\n",
+    "    node_id = df_feat_class[\"UETR\"].values\n",
+    "    map_id = {j: i for i, j in enumerate(node_id)}  # mapping nodes to indexes\n",
+    "    node_ids[ds_name] = node_id\n",
+    "    \n",
+    "    # Get class labels\n",
+    "    labels[ds_name] = df_feat_class[\"Class\"].values\n",
+    "\n",
+    "    # Map UETR to indexes in the edge map\n",
+    "    edges = df_edge.copy()\n",
+    "    edges.UETR_1 = edges.UETR_1.map(map_id)\n",
+    "    edges.UETR_2 = edges.UETR_2.map(map_id)\n",
+    "    edges = edges.astype(int)\n",
+    "\n",
+    "    # for undirected graph\n",
+    "    edge_index = np.array(edges.values).T\n",
+    "    edge_index = torch.tensor(edge_index, dtype=torch.long).contiguous()\n",
+    "    edge_indices[ds_name] = edge_index\n",
+    "    weights[ds_name] = torch.tensor([1] * edge_index.shape[1], dtype=torch.float)\n",
+    "\n",
+    "    # UETR mapped to corresponding indexes, drop UETR and class\n",
+    "    node_feature = df_feat_class.drop([\"UETR\", \"Class\"], axis=1).copy()\n",
+    "    node_feature = torch.tensor(np.array(node_feature.values), dtype=torch.float)\n",
+    "    node_features[ds_name] = node_feature\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70b192a7-05be-4591-b937-7bab878277ac",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Unsupervised GNN Training"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f326a613-e683-4f67-810d-aece3d90349e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import torch.nn.functional as F\n",
+    "from torch.utils.tensorboard import SummaryWriter\n",
+    "from torch_geometric.data import Data\n",
+    "from torch_geometric.loader import LinkNeighborLoader\n",
+    "from torch_geometric.nn import GraphSAGE\n",
+    "\n",
+    "output_dir = os.path.join(site_input_dir, site_name)\n",
+    "DEVICE = \"cuda:0\"\n",
+    "writer = SummaryWriter(output_dir)\n",
+    "epochs = 100\n",
+    "\n",
+    "# Converting data to PyG graph data format\n",
+    "train_data = Data(\n",
+    "    x=node_features['train'], edge_index=edge_indices['train'], edge_attr=weights['train']\n",
+    ")\n",
+    "\n",
+    "# Define the dataloader for graphsage training\n",
+    "loader = LinkNeighborLoader(\n",
+    "    train_data,\n",
+    "    batch_size=2048,\n",
+    "    shuffle=True,\n",
+    "    neg_sampling_ratio=1.0,\n",
+    "    num_neighbors=[10, 10],\n",
+    "    num_workers=6,\n",
+    "    persistent_workers=True,\n",
+    ")\n",
+    "\n",
+    "# Model\n",
+    "model = GraphSAGE(\n",
+    "    in_channels=node_features['train'].shape[1],\n",
+    "    hidden_channels=64,\n",
+    "    num_layers=2,\n",
+    "    out_channels=64,\n",
+    ")\n",
+    "optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)\n",
+    "model.to(DEVICE)\n",
+    "\n",
+    "for epoch in range(1, epochs + 1):\n",
+    "    model.train()\n",
+    "    running_loss = instance_count = 0\n",
+    "\n",
+    "    for data in loader:\n",
+    "        # get the inputs data\n",
+    "        data = data.to(DEVICE)\n",
+    "        # zero the parameter gradients\n",
+    "        optimizer.zero_grad()\n",
+    "        # forward + backward + optimize\n",
+    "        h = model(data.x, data.edge_index)\n",
+    "        h_src = h[data.edge_label_index[0]]\n",
+    "        h_dst = h[data.edge_label_index[1]]\n",
+    "        link_pred = (h_src * h_dst).sum(dim=-1)  # Inner product.\n",
+    "        loss = F.binary_cross_entropy_with_logits(link_pred, data.edge_label)\n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "        # add record\n",
+    "        running_loss += float(loss.item()) * link_pred.numel()\n",
+    "        instance_count += link_pred.numel()\n",
+    "    print(f\"Epoch: {epoch:02d}, Loss: {running_loss / instance_count:.4f}\")\n",
+    "    writer.add_scalar(\"train_loss\", running_loss / instance_count, epoch)\n",
+    "\n",
+    "# Save the model\n",
+    "torch.save(model.state_dict(), os.path.join(output_dir, \"model.pt\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7a5b581-1688-4c43-a83a-f3b152d05729",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## GNN Inference - Encoding the Raw Feature"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9dfe6156-1049-41c5-82d5-b81fa1814160",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Load the model and perform inference / encoding\n",
+    "model_enc = GraphSAGE(\n",
+    "    in_channels=node_features['train'].shape[1],\n",
+    "    hidden_channels=64,\n",
+    "    num_layers=2,\n",
+    "    out_channels=64,\n",
+    ")\n",
+    "model_enc.load_state_dict(torch.load(os.path.join(output_dir, \"model.pt\")))\n",
+    "model_enc.eval()\n",
+    "\n",
+    "embeds = {}\n",
+    "# Perform encoding\n",
+    "for ds_name in dataset_names:\n",
+    "    h = model_enc(node_features[ds_name], edge_indices[ds_name])\n",
+    "    embed = pd.DataFrame(h.cpu().detach().numpy())\n",
+    "    # Add column names as V_0, V_1, ... V_63\n",
+    "    embed.columns = [f\"V_{i}\" for i in range(embed.shape[1])]\n",
+    "    # Concatenate the node ids and class labels with the encoded features\n",
+    "    embed[\"UETR\"] = node_ids[ds_name]\n",
+    "    embed[\"Class\"] = labels[ds_name]\n",
+    "    # Move the UETR and Class columns to the front\n",
+    "    embed = embed[[\"UETR\", \"Class\"] + [col for col in embed.columns if col not in [\"UETR\", \"Class\"]]]\n",
+    "    embed.to_csv(os.path.join(output_dir, f\"{ds_name}_embedding.csv\"), index=False)\n",
+    "    embeds[ds_name] = embed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c1b8925c-6890-4a45-a9c4-f80399b463cc",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "! tree /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5adcd468-edaf-4759-ac2d-09902811c97a",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "embeds[\"train\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8591e4e1-74b1-465c-8124-eaf9829a6a8e",
+   "metadata": {},
+   "source": [
+    "Let's go back to the [XGBoost Notebook](../xgboost.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d926970e-a4e9-41a7-a166-0d11f8e9e320",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nvflare_example",
+   "language": "python",
+   "name": "nvflare_example"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb b/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb
new file mode 100644
index 0000000000..cd02ae469b
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/notebooks/graph_construct.ipynb
@@ -0,0 +1,197 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3d892b5e-2f3b-4182-bedb-d332bfc3a353",
+   "metadata": {},
+   "source": [
+    "# Graph Construction Step\n",
+    "\n",
+    "* Construct the graph for each site's transaction data\n",
+    "\n",
+    "Each node represents a transaction, and the edges represent the relationships between transactions. Since each site consists of the same Sender_BIC, to define the graph edge, we use the following rules:\n",
+    "\n",
+    "1. The two transactions are with the same Receiver_BIC.\n",
+    "2. The two transactions time difference are smaller than 6000.\n",
+    "\n",
+    "Note that in real applications, such rules should be designed according to the characteristics of the candidate data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8498bf1-d368-4d15-a5bf-559eb6e3918b",
+   "metadata": {},
+   "source": [
+    "### Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db9d04f0-a64d-457b-aacf-1a3737e07e12",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "site_input_dir = \"/tmp/dataset/horizontal_credit_fraud_data/\"\n",
+    "site_name = \"ZHSZUS33_Bank_1\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2d84f89f-fe0a-4387-92a2-49ca9143c141",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "import pandas as pd\n",
+    "dataset_names = [\"train\", \"test\"]\n",
+    "datasets = {}\n",
+    "\n",
+    "for ds_name in dataset_names:\n",
+    "    file_name = os.path.join(site_input_dir, site_name, f\"{ds_name}.csv\" )\n",
+    "    df = pd.read_csv(file_name)\n",
+    "    datasets[ds_name] = df\n",
+    "    print(df)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ccdc785e-9597-4083-b74a-2cacb25b20cb",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "df.columns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5bd5be54-c5e7-43c7-ad4f-de29a09bc7ec",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "edge_maps = {}\n",
+    "\n",
+    "info_columns = ['Time', 'Receiver_BIC', 'UETR']\n",
+    "time_threshold = 6000\n",
+    "\n",
+    "for ds_name in dataset_names:\n",
+    "    df = datasets[ds_name]\n",
+    "    \n",
+    "    # Find transaction pairs that are within the time threshold\n",
+    "    # First sort the table by 'Time'\n",
+    "    df = df.sort_values(by=\"Time\")\n",
+    "    # Keep only the columns that are needed for the graph edge map\n",
+    "    df = df[info_columns]\n",
+    "\n",
+    "    # Then for each row, find the next rows that is within the time threshold\n",
+    "    graph_edge_map = []\n",
+    "    for i in range(len(df)):\n",
+    "        # Find the next rows that is:\n",
+    "        # - within the time threshold\n",
+    "        # - has the same Receiver_BIC\n",
+    "        j = 1\n",
+    "        while (i + j < len(df) and df[\"Time\"].values[i + j] < df[\"Time\"].values[i] + time_threshold):\n",
+    "            if (df[\"Receiver_BIC\"].values[i + j] == df[\"Receiver_BIC\"].values[i]):\n",
+    "                graph_edge_map.append([df[\"UETR\"].values[i], df[\"UETR\"].values[i + j]])\n",
+    "            j += 1\n",
+    "\n",
+    "    print(f\"Generated edge map for {ds_name}, in total {len(graph_edge_map)} valid edges for {len(df)} transactions\")\n",
+    "\n",
+    "    edge_maps[ds_name] = pd.DataFrame(graph_edge_map)    \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7780ab4d-7d1d-4eda-96e1-eed9243eff11",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "edge_maps[\"train\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f326a613-e683-4f67-810d-aece3d90349e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "for name in edge_maps:\n",
+    "    site_dir = os.path.join(site_input_dir, site_name)\n",
+    "    os.makedirs(site_dir, exist_ok=True)\n",
+    "    edge_map_file_name = os.path.join(site_dir, f\"{name}_edgemap.csv\")\n",
+    "    print(\"save to = \", edge_map_file_name)\n",
+    "    # save to csv file without header and index\n",
+    "    edge_maps[name].to_csv(edge_map_file_name, header=False, index=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c1b8925c-6890-4a45-a9c4-f80399b463cc",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "! tree /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8591e4e1-74b1-465c-8124-eaf9829a6a8e",
+   "metadata": {},
+   "source": [
+    "Let's go back to the [XGBoost Notebook](../xgboost.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d926970e-a4e9-41a7-a166-0d11f8e9e320",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nvflare_example",
+   "language": "python",
+   "name": "nvflare_example"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/advanced/finance-end-to-end/pre_process.ipynb b/examples/advanced/finance-end-to-end/notebooks/pre_process.ipynb
similarity index 79%
rename from examples/advanced/finance-end-to-end/pre_process.ipynb
rename to examples/advanced/finance-end-to-end/notebooks/pre_process.ipynb
index 1a12df720a..d47167f261 100644
--- a/examples/advanced/finance-end-to-end/pre_process.ipynb
+++ b/examples/advanced/finance-end-to-end/notebooks/pre_process.ipynb
@@ -21,7 +21,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "id": "db9d04f0-a64d-457b-aacf-1a3737e07e12",
    "metadata": {
     "tags": []
@@ -91,27 +91,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "id": "ccdc785e-9597-4083-b74a-2cacb25b20cb",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Index(['Unnamed: 0', 'Time', 'Class', 'Amount', 'Sender_BIC', 'Receiver_BIC',\n",
-       "       'UETR', 'Currency', 'Beneficiary_BIC', 'Currency_Country',\n",
-       "       'trans_volume', 'total_amount', 'average_amount', 'hist_trans_volume',\n",
-       "       'hist_total_amount', 'hist_average_amount', 'x2_y1', 'x3_y2'],\n",
-       "      dtype='object')"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "df.columns"
    ]
@@ -154,8 +139,7 @@
     "    \n",
     "    # Combine the normalized numerical features with the categorical features\n",
     "    df_combined = pd.concat([categorical_features, numerical_normalized], axis=1)\n",
-    "    \n",
-    "    \n",
+    "        \n",
     "#     # one-hot encoding\n",
     "#     df_combined = pd.get_dummies(df_combined, columns=category_columns)\n",
     "\n",
@@ -175,57 +159,41 @@
    },
    "outputs": [],
    "source": [
-    "    \n",
     "for name in processed_dfs:\n",
     "    site_dir = os.path.join(site_input_dir, site_name)\n",
     "    os.makedirs(site_dir, exist_ok=True)\n",
     "    pre_processed_file_name = os.path.join(site_dir, f\"{name}_normalized.csv\")\n",
     "    print(pre_processed_file_name)\n",
-    "    processed_dfs[name].to_csv(pre_processed_file_name) \n"
+    "    processed_dfs[name].to_csv(pre_processed_file_name) "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "id": "c1b8925c-6890-4a45-a9c4-f80399b463cc",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[01;34m/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1\u001b[0m\n",
-      "├── history.csv\n",
-      "├── test.csv\n",
-      "├── test_enrichment.csv\n",
-      "├── test_normalized.csv\n",
-      "├── train.csv\n",
-      "├── train_enrichment.csv\n",
-      "└── train_normalized.csv\n",
-      "\n",
-      "0 directories, 7 files\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "! tree /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "e0a33628-acc7-4f42-b2fa-d066699e23eb",
-   "metadata": {},
-   "source": []
-  },
   {
    "cell_type": "markdown",
    "id": "8591e4e1-74b1-465c-8124-eaf9829a6a8e",
    "metadata": {},
    "source": [
-    "Let's go back to the [XGBoost Notebook](./xgboost.ipynb)"
+    "Let's go back to the [XGBoost Notebook](../xgboost.ipynb)"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "26989ac6-cedf-4c9d-8b25-60e0af758cfe",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
@@ -244,7 +212,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.19"
+   "version": "3.8.18"
   }
  },
  "nbformat": 4,
diff --git a/examples/advanced/finance-end-to-end/prepare_data.ipynb b/examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb
similarity index 50%
rename from examples/advanced/finance-end-to-end/prepare_data.ipynb
rename to examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb
index ff1b09a9b7..76d9264ae2 100644
--- a/examples/advanced/finance-end-to-end/prepare_data.ipynb
+++ b/examples/advanced/finance-end-to-end/notebooks/prepare_data.ipynb
@@ -14,6 +14,7 @@
    "metadata": {},
    "source": [
     "## Prepare Data\n",
+    "First download data from [kaggle credit card fraud dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) and save it to the below `data_path`\n",
     "### Based on the riginal data, add randome synthentic data to make full dataset\n",
     "* expand time in seconds x 200 times to cover 26 months\n",
     "* double the data record size\n",
@@ -24,60 +25,43 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "id": "84fe43c3-2e99-414f-91ef-b104578d8b0e",
    "metadata": {
     "tags": []
    },
    "outputs": [],
    "source": [
-    "data_path=\"creditcard.csv\"\n",
+    "data_path=\"../creditcard.csv\"\n",
     "out_folder=\"/tmp/dataset/horizontal_credit_fraud_data\"\n",
     "\n",
     "import shutil\n",
     "import os\n",
     "\n",
     "if os.path.exists(out_folder):\n",
-    "    shutil.rmtree(out_folder)\n"
+    "    shutil.rmtree(out_folder)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "id": "09248e1f-2066-459d-bf20-8ffc47b7f272",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "284808 creditcard.csv\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "! wc -l {data_path}"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "id": "5b8edae6-b906-4631-9294-dbe2e11391f1",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "old_max_days=1.9999074074074075\n",
-      "min_months=0.0, max_months=26.665432098765432\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "# %load_ext cudf.pandas\n",
     "import argparse\n",
@@ -192,223 +176,17 @@
     "    return df\n",
     "\n",
     "# Add random BIC and currency details to the DataFrame\n",
-    "df = generate_random_details(df)\n",
-    "\n"
+    "df = generate_random_details(df)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "id": "25b36fbf-f6b4-4a85-a022-748c21e6e309",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Time</th>\n",
-       "      <th>Amount</th>\n",
-       "      <th>Class</th>\n",
-       "      <th>Sender_BIC</th>\n",
-       "      <th>Receiver_BIC</th>\n",
-       "      <th>UETR</th>\n",
-       "      <th>Currency</th>\n",
-       "      <th>Beneficiary_BIC</th>\n",
-       "      <th>Currency_Country</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>0.0</td>\n",
-       "      <td>149.62</td>\n",
-       "      <td>0</td>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>WPUWDEFF</td>\n",
-       "      <td>V4ID8QTCIROHAP683AOX78</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.69</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>YSYCESMM</td>\n",
-       "      <td>R7PCTKF9R1PVGXRXU9AB3J</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>100.0</td>\n",
-       "      <td>378.66</td>\n",
-       "      <td>0</td>\n",
-       "      <td>HCBHSGSG</td>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>RP1SBN0Q5U58XBS8LQNE0J</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>100.0</td>\n",
-       "      <td>123.50</td>\n",
-       "      <td>0</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>YMNYFRPP</td>\n",
-       "      <td>MAPFA8RU98VZP4MD6VFN1J</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>200.0</td>\n",
-       "      <td>69.99</td>\n",
-       "      <td>0</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>FBSFCHZH</td>\n",
-       "      <td>3WX5XAGWK7F3CXRX6RZZK3</td>\n",
-       "      <td>USD</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>United States</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1139223</th>\n",
-       "      <td>69116200.0</td>\n",
-       "      <td>0.77</td>\n",
-       "      <td>0</td>\n",
-       "      <td>XITXUS33</td>\n",
-       "      <td>SHSHKHH1</td>\n",
-       "      <td>BEEX2F5NEHDU3YV8G17005</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1139224</th>\n",
-       "      <td>69116300.0</td>\n",
-       "      <td>24.79</td>\n",
-       "      <td>0</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>ZHSZUS33</td>\n",
-       "      <td>9SJQ6WVX8CGS0P1DYYGQ45</td>\n",
-       "      <td>GBP</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>United Kingdom</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1139225</th>\n",
-       "      <td>69116400.0</td>\n",
-       "      <td>67.88</td>\n",
-       "      <td>0</td>\n",
-       "      <td>YXRXGB22</td>\n",
-       "      <td>WPUWDEFF</td>\n",
-       "      <td>CGUZH7AV1YPIQCLCQMAWV6</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1139226</th>\n",
-       "      <td>69116400.0</td>\n",
-       "      <td>10.00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>WPUWDEFF</td>\n",
-       "      <td>WPUWDEFF</td>\n",
-       "      <td>9FZFL7WK3AA7K5C0Q6X5W3</td>\n",
-       "      <td>SGD</td>\n",
-       "      <td>HCBHSGSG</td>\n",
-       "      <td>Singapore</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1139227</th>\n",
-       "      <td>69116800.0</td>\n",
-       "      <td>217.00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>YMNYFRPP</td>\n",
-       "      <td>YSYCESMM</td>\n",
-       "      <td>AGKF0NGK83CJQTT5CU36PA</td>\n",
-       "      <td>AUD</td>\n",
-       "      <td>ZNZZAU3M</td>\n",
-       "      <td>Australia</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>1139228 rows × 9 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "               Time  Amount  Class Sender_BIC Receiver_BIC  \\\n",
-       "0               0.0  149.62      0   FBSFCHZH     WPUWDEFF   \n",
-       "1               0.0    2.69      0   ZHSZUS33     YSYCESMM   \n",
-       "2             100.0  378.66      0   HCBHSGSG     FBSFCHZH   \n",
-       "3             100.0  123.50      0   YXRXGB22     YMNYFRPP   \n",
-       "4             200.0   69.99      0   XITXUS33     FBSFCHZH   \n",
-       "...             ...     ...    ...        ...          ...   \n",
-       "1139223  69116200.0    0.77      0   XITXUS33     SHSHKHH1   \n",
-       "1139224  69116300.0   24.79      0   ZHSZUS33     ZHSZUS33   \n",
-       "1139225  69116400.0   67.88      0   YXRXGB22     WPUWDEFF   \n",
-       "1139226  69116400.0   10.00      0   WPUWDEFF     WPUWDEFF   \n",
-       "1139227  69116800.0  217.00      0   YMNYFRPP     YSYCESMM   \n",
-       "\n",
-       "                           UETR Currency Beneficiary_BIC Currency_Country  \n",
-       "0        V4ID8QTCIROHAP683AOX78      AUD        ZNZZAU3M        Australia  \n",
-       "1        R7PCTKF9R1PVGXRXU9AB3J      AUD        ZNZZAU3M        Australia  \n",
-       "2        RP1SBN0Q5U58XBS8LQNE0J      USD        ZHSZUS33    United States  \n",
-       "3        MAPFA8RU98VZP4MD6VFN1J      USD        ZHSZUS33    United States  \n",
-       "4        3WX5XAGWK7F3CXRX6RZZK3      USD        ZHSZUS33    United States  \n",
-       "...                         ...      ...             ...              ...  \n",
-       "1139223  BEEX2F5NEHDU3YV8G17005      GBP        YXRXGB22   United Kingdom  \n",
-       "1139224  9SJQ6WVX8CGS0P1DYYGQ45      GBP        YXRXGB22   United Kingdom  \n",
-       "1139225  CGUZH7AV1YPIQCLCQMAWV6      AUD        ZNZZAU3M        Australia  \n",
-       "1139226  9FZFL7WK3AA7K5C0Q6X5W3      SGD        HCBHSGSG        Singapore  \n",
-       "1139227  AGKF0NGK83CJQTT5CU36PA      AUD        ZNZZAU3M        Australia  \n",
-       "\n",
-       "[1139228 rows x 9 columns]"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "df"
    ]
@@ -431,28 +209,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "id": "47961d9f-c0fb-47fc-b901-5512be98ebf0",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Historical DataFrame size: 626575\n",
-      "Training DataFrame size: 398729\n",
-      "Testing DataFrame size: 113924\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "\n",
     "# Sort the DataFrame by the Time column\n",
     "df = df.sort_values(by='Time').reset_index(drop=True)\n",
     "\n",
-    "\n",
     "# Calculate the number of samples for each split\n",
     "total_size = len(df)\n",
     "historical_size = int(total_size * 0.55)\n",
@@ -475,14 +241,12 @@
     "# Display sizes of each dataset\n",
     "print(f\"Historical DataFrame size: {len(df_history)}\")\n",
     "print(f\"Training DataFrame size: {len(df_train)}\")\n",
-    "print(f\"Testing DataFrame size: {len(df_test)}\")\n",
-    "\n",
-    "\n"
+    "print(f\"Testing DataFrame size: {len(df_test)}\")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "id": "785c6028-e792-450b-a294-a6460b03fd9f",
    "metadata": {
     "tags": []
@@ -499,23 +263,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "id": "8cb273b7-4273-414b-9ef9-0da9f7d3e839",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'/tmp/dataset/horizontal_credit_fraud_data'"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "out_folder"
    ]
@@ -529,7 +282,7 @@
    },
    "outputs": [],
    "source": [
-    "!ls -al {out_folder}\n"
+    "!ls -al {out_folder}"
    ]
   },
   {
@@ -551,8 +304,7 @@
    "source": [
     "## Split Data for differnt Client sites\n",
     "\n",
-    "Now, split train, test, history data evenly for n = 2 training sites (Clients)\n",
-    "\n"
+    "Now, split train, test, history data according to Sender_BICs"
    ]
   },
   {
@@ -564,7 +316,6 @@
    },
    "outputs": [],
    "source": [
-    "\n",
     "files = [\"history\", \"train\", \"test\"]\n",
     "client_names = set()\n",
     "\n",
@@ -585,11 +336,7 @@
     "        group.to_csv(filename, index=False)\n",
     "        print(f\"Saved {name} {f} transactions to {filename}\")\n",
     "\n",
-    "print(client_names)\n",
-    "    \n",
-    "\n",
-    "\n",
-    "    \n"
+    "print(client_names)"
    ]
   },
   {
@@ -628,63 +375,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
    "id": "1c2c1e5e-8d95-4bc0-97bc-25e15e878433",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[01;34m/tmp/dataset/horizontal_credit_fraud_data/\u001b[0m\n",
-      "├── \u001b[01;34mFBSFCHZH_Bank_6\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mHCBHSGSG_Bank_9\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── history.csv\n",
-      "├── \u001b[01;34mSHSHKHH1_Bank_2\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── test.csv\n",
-      "├── train.csv\n",
-      "├── \u001b[01;34mWPUWDEFF_Bank_4\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mXITXUS33_Bank_10\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mYMNYFRPP_Bank_5\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mYSYCESMM_Bank_7\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mYXRXGB22_Bank_3\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "├── \u001b[01;34mZHSZUS33_Bank_1\u001b[0m\n",
-      "│   ├── history.csv\n",
-      "│   ├── test.csv\n",
-      "│   └── train.csv\n",
-      "└── \u001b[01;34mZNZZAU3M_Bank_8\u001b[0m\n",
-      "    ├── history.csv\n",
-      "    ├── test.csv\n",
-      "    └── train.csv\n",
-      "\n",
-      "10 directories, 33 files\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "!tree  /tmp/dataset/horizontal_credit_fraud_data/"
    ]
@@ -694,13 +388,13 @@
    "id": "30661d30-7032-4bde-9fb2-ce67897a2f55",
    "metadata": {},
    "source": [
-    "Let's go back to the [XGBoost Notebook](./xgboost.ipynb)"
+    "Let's go back to the [XGBoost Notebook](../xgboost.ipynb)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "87163fff-d4cb-485c-ba59-3582104dfcda",
+   "id": "e6a448c4-a7c3-4ec6-a44b-bd8d123a3200",
    "metadata": {},
    "outputs": [],
    "source": []
@@ -722,7 +416,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.19"
+   "version": "3.8.18"
   }
  },
  "nbformat": 4,
diff --git a/examples/advanced/finance-end-to-end/enrich.py b/examples/advanced/finance-end-to-end/nvflare/enrich.py
similarity index 100%
rename from examples/advanced/finance-end-to-end/enrich.py
rename to examples/advanced/finance-end-to-end/nvflare/enrich.py
diff --git a/examples/advanced/finance-end-to-end/enrich_job.py b/examples/advanced/finance-end-to-end/nvflare/enrich_job.py
similarity index 97%
rename from examples/advanced/finance-end-to-end/enrich_job.py
rename to examples/advanced/finance-end-to-end/nvflare/enrich_job.py
index ac9fbab9ea..73d170050f 100644
--- a/examples/advanced/finance-end-to-end/enrich_job.py
+++ b/examples/advanced/finance-end-to-end/nvflare/enrich_job.py
@@ -31,7 +31,6 @@ def main():
 
     job = FedJob(name=job_name)
 
-    # Define the enrich_ctrl workflow and send to server
     enrich_ctrl = ETLController(task_name="enrich")
     job.to(enrich_ctrl, "server", id="enrich")
 
diff --git a/examples/advanced/finance-end-to-end/nvflare/gnn_train_encode.py b/examples/advanced/finance-end-to-end/nvflare/gnn_train_encode.py
new file mode 100644
index 0000000000..b50abea8d2
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/nvflare/gnn_train_encode.py
@@ -0,0 +1,233 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+
+import numpy as np
+import pandas as pd
+import torch
+import torch.nn.functional as F
+from torch.utils.tensorboard import SummaryWriter
+from torch_geometric.data import Data
+from torch_geometric.loader import LinkNeighborLoader
+from torch_geometric.nn import GraphSAGE
+
+DEVICE = "cuda:0"
+
+# (1) import nvflare client API
+import nvflare.client as flare
+
+
+def edge_index_gen(df_feat_class, df_edges):
+    # Sort the data by UETR
+    df_feat_class = df_feat_class.sort_values(by="UETR").reset_index(drop=True)
+
+    # Generate UETR-index map with the feature list
+    node_id = df_feat_class["UETR"].values
+    map_id = {j: i for i, j in enumerate(node_id)}  # mapping nodes to indexes
+
+    # Get class labels
+    label = df_feat_class["Class"].values
+
+    # Map UETR to indexes in the edge map
+    edges = df_edges.copy()
+    edges.UETR_1 = edges.UETR_1.map(map_id)
+    edges.UETR_2 = edges.UETR_2.map(map_id)
+    edges = edges.astype(int)
+
+    # for undirected graph
+    edge_index = np.array(edges.values).T
+    edge_index = torch.tensor(edge_index, dtype=torch.long).contiguous()
+    weight = torch.tensor([1] * edge_index.shape[1], dtype=torch.float)
+
+    # UETR mapped to corresponding indexes, drop UETR and class
+    node_feat = df_feat_class.drop(["UETR", "Class"], axis=1).copy()
+    node_feat = torch.tensor(np.array(node_feat.values), dtype=torch.float)
+
+    return node_feat, edge_index, weight, node_id, label
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-i",
+        "--data_path",
+        type=str,
+        default="/tmp/dataset/credit_data",
+    )
+    parser.add_argument(
+        "--epochs",
+        type=int,
+        default=1,
+    )
+    parser.add_argument(
+        "-o",
+        "--output_path",
+        type=str,
+        default="/tmp/dataset/credit_data",
+    )
+    args = parser.parse_args()
+
+    # (2) initializes NVFlare client API
+    flare.init()
+    site_name = flare.get_site_name()
+
+    # Set up tensorboard
+    writer = SummaryWriter(os.path.join(args.output_path, site_name))
+
+    # Load the data
+    dataset_names = ["train", "test"]
+
+    node_features = {}
+    edge_indices = {}
+    weights = {}
+    node_ids = {}
+    labels = {}
+
+    for ds_name in dataset_names:
+        # Get feature and class
+        file_name = os.path.join(args.data_path, site_name, f"{ds_name}_normalized.csv")
+        df = pd.read_csv(file_name, index_col=0)
+        # Drop irrelevant columns
+        df = df.drop(columns=["Currency_Country", "Beneficiary_BIC", "Currency", "Receiver_BIC", "Sender_BIC"])
+        df_feat_class = df
+        # Get edge map
+        file_name = os.path.join(args.data_path, site_name, f"{ds_name}_edgemap.csv")
+        df = pd.read_csv(file_name, header=None)
+        # Add column names to the edge map
+        df.columns = ["UETR_1", "UETR_2"]
+        df_edges = df
+
+        # Preprocess data
+        node_feat, edge_index, weight, node_id, label = edge_index_gen(df_feat_class, df_edges)
+        node_features[ds_name] = node_feat
+        edge_indices[ds_name] = edge_index
+        weights[ds_name] = weight
+        node_ids[ds_name] = node_id
+        labels[ds_name] = label
+
+    # Converting training data to PyG graph data format
+    train_data = Data(x=node_features["train"], edge_index=edge_indices["train"], edge_attr=weights["train"])
+
+    # Define the dataloader for graphsage training
+    loader = LinkNeighborLoader(
+        train_data,
+        batch_size=2048,
+        shuffle=True,
+        neg_sampling_ratio=1.0,
+        num_neighbors=[10, 10],
+        num_workers=6,
+        persistent_workers=True,
+    )
+
+    # Model
+    model = GraphSAGE(
+        in_channels=node_features["train"].shape[1],
+        hidden_channels=64,
+        num_layers=2,
+        out_channels=64,
+    )
+
+    while flare.is_running():
+        # (3) receives FLModel from NVFlare
+        input_model = flare.receive()
+        print(f"current_round={input_model.current_round}/{input_model.total_rounds}")
+
+        # (4) loads model from NVFlare
+        model.load_state_dict(input_model.params)
+
+        # (5) perform encoding for both training and test data
+        def gnn_encode(model_param, node_feature, edge_index, id, label):
+            # Load the model and perform inference / encoding
+            model_enc = GraphSAGE(
+                in_channels=node_feature.shape[1],
+                hidden_channels=64,
+                num_layers=2,
+                out_channels=64,
+            )
+            model_enc.load_state_dict(model_param)
+            model_enc.to(DEVICE)
+            model_enc.eval()
+            node_feature = node_feature.to(DEVICE)
+            edge_index = edge_index.to(DEVICE)
+
+            # Perform encoding
+            h = model_enc(node_feature, edge_index)
+            embed = pd.DataFrame(h.cpu().detach().numpy())
+            # Add column names as V_0, V_1, ... V_63
+            embed.columns = [f"V_{i}" for i in range(embed.shape[1])]
+            # Concatenate the node ids and class labels with the encoded features
+            embed["UETR"] = id
+            embed["Class"] = label
+            # Move the UETR and Class columns to the front
+            embed = embed[["UETR", "Class"] + [col for col in embed.columns if col not in ["UETR", "Class"]]]
+            return embed
+
+        # Only do encoding for the last round
+        if input_model.current_round == input_model.total_rounds - 1:
+            print("Encoding the data with the final model")
+            for ds_name in dataset_names:
+                embed = gnn_encode(
+                    input_model.params,
+                    node_features[ds_name],
+                    edge_indices[ds_name],
+                    node_ids[ds_name],
+                    labels[ds_name],
+                )
+                embed.to_csv(os.path.join(args.output_path, site_name, f"{ds_name}_embedding.csv"), index=False)
+
+        optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+        model.to(DEVICE)
+        steps = args.epochs * len(loader)
+        for epoch in range(1, args.epochs + 1):
+            model.train()
+            running_loss = instance_count = 0
+            for data in loader:
+                # get the inputs data
+                data = data.to(DEVICE)
+                # zero the parameter gradients
+                optimizer.zero_grad()
+                # forward + backward + optimize
+                h = model(data.x, data.edge_index)
+                h_src = h[data.edge_label_index[0]]
+                h_dst = h[data.edge_label_index[1]]
+                link_pred = (h_src * h_dst).sum(dim=-1)  # Inner product.
+                loss = F.binary_cross_entropy_with_logits(link_pred, data.edge_label)
+                loss.backward()
+                optimizer.step()
+                # add record
+                running_loss += float(loss.item()) * link_pred.numel()
+                instance_count += link_pred.numel()
+            print(f"Epoch: {epoch:02d}, Loss: {running_loss / instance_count:.4f}")
+            writer.add_scalar(
+                "train_loss", running_loss / instance_count, input_model.current_round * args.epochs + epoch
+            )
+
+        print("Finished Training")
+        # Save the model
+        torch.save(model.state_dict(), os.path.join(args.output_path, site_name, "model.pt"))
+
+        # (6) construct trained FL model
+        output_model = flare.FLModel(
+            params=model.cpu().state_dict(),
+            metrics={"loss": running_loss},
+            meta={"NUM_STEPS_CURRENT_ROUND": steps},
+        )
+        # (7) send model back to NVFlare
+        flare.send(output_model)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/finance-end-to-end/nvflare/gnn_train_encode_job.py b/examples/advanced/finance-end-to-end/nvflare/gnn_train_encode_job.py
new file mode 100644
index 0000000000..16518a74c9
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/nvflare/gnn_train_encode_job.py
@@ -0,0 +1,116 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import argparse
+
+from torch_geometric.nn import GraphSAGE
+
+from nvflare import FedJob
+from nvflare.app_common.workflows.fedavg import FedAvg
+from nvflare.app_opt.pt.job_config.model import PTModel
+from nvflare.job_config.script_runner import ScriptRunner
+
+
+def main():
+    args = define_parser()
+
+    site_names = args.sites
+    n_clients = len(site_names)
+
+    work_dir = args.work_dir
+    task_script_path = args.task_script_path
+    task_script_args = args.task_script_args
+
+    job = FedJob(name="gnn_train_encode_job")
+
+    # Define the controller workflow and send to server
+    controller = FedAvg(
+        num_clients=n_clients,
+        num_rounds=args.num_rounds,
+    )
+    job.to(controller, "server")
+
+    # Define the model
+    model = GraphSAGE(
+        in_channels=10,
+        hidden_channels=64,
+        num_layers=2,
+        out_channels=64,
+    )
+    job.to(PTModel(model), "server")
+
+    # Add clients
+    for site_name in site_names:
+        executor = ScriptRunner(script=task_script_path, script_args=task_script_args)
+        job.to(executor, site_name)
+
+    if work_dir:
+        print(f"{work_dir=}")
+        job.export_job(work_dir)
+
+    if not args.config_only:
+        job.simulator_run(work_dir)
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-c",
+        "--sites",
+        nargs="*",  # 0 or more values expected => creates a list
+        type=str,
+        default=[],  # default if nothing is provided
+        help="Space separated site names",
+    )
+    parser.add_argument(
+        "-n",
+        "--num_rounds",
+        type=int,
+        default=100,
+        help="number of FL rounds",
+    )
+    parser.add_argument(
+        "-w",
+        "--work_dir",
+        type=str,
+        nargs="?",
+        default="/tmp/nvflare/jobs/xgb/workdir",
+        help="work directory, default to '/tmp/nvflare/jobs/xgb/workdir'",
+    )
+
+    parser.add_argument(
+        "-p",
+        "--task_script_path",
+        type=str,
+        nargs="?",
+        help="task script",
+    )
+
+    parser.add_argument(
+        "-a",
+        "--task_script_args",
+        type=str,
+        nargs="?",
+        default="",
+        help="",
+    )
+
+    parser.add_argument("-co", "--config_only", action="store_true", help="config only mode, will not run simulator")
+
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/finance-end-to-end/nvflare/graph_construct.py b/examples/advanced/finance-end-to-end/nvflare/graph_construct.py
new file mode 100644
index 0000000000..ae90c2f24b
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/nvflare/graph_construct.py
@@ -0,0 +1,125 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import argparse
+import os
+
+import pandas as pd
+
+# (1) import nvflare client API
+import nvflare.client as flare
+
+dataset_names = ["train", "test"]
+datasets = {}
+
+
+def main():
+    print("\n pre-process starts \n ")
+    args = define_parser()
+    input_dir = args.input_dir
+    output_dir = args.output_dir
+
+    flare.init()
+    site_name = flare.get_site_name()
+
+    # receives global message from NVFlare
+    etl_task = flare.receive()
+
+    print("\n receive task \n ")
+    edge_maps = edge_map_gen(input_dir, site_name)
+
+    save_edge_map(output_dir, edge_maps, site_name)
+
+    print("end task")
+
+    # send message back the controller indicating end.
+    etl_task.meta["status"] = "done"
+    flare.send(etl_task)
+
+
+def save_edge_map(output_dir, edge_maps, site_name):
+    for name in edge_maps:
+        site_dir = os.path.join(output_dir, site_name)
+        os.makedirs(site_dir, exist_ok=True)
+
+        edge_map_file_name = os.path.join(site_dir, f"{name}_edgemap.csv")
+        print("save to = ", edge_map_file_name)
+        # save to csv file without header and index
+        edge_maps[name].to_csv(edge_map_file_name, header=False, index=False)
+
+
+def edge_map_gen(input_dir, site_name):
+    edge_maps = {}
+    info_columns = ["Time", "Receiver_BIC", "UETR"]
+    time_threshold = 6000
+    for ds_name in dataset_names:
+
+        file_name = os.path.join(input_dir, site_name, f"{ds_name}.csv")
+        df = pd.read_csv(file_name)
+        datasets[ds_name] = df
+
+        # Find transaction pairs that are within the time threshold
+        # First sort the table by 'Time'
+        df = df.sort_values(by="Time")
+        # Keep only the columns that are needed for the graph edge map
+        df = df[info_columns]
+
+        # Then for each row, find the next rows that is within the time threshold
+        graph_edge_map = []
+        for i in range(len(df)):
+            # Find the next rows that is:
+            # - within the time threshold
+            # - has the same Receiver_BIC
+            j = 1
+            while i + j < len(df) and df["Time"].values[i + j] < df["Time"].values[i] + time_threshold:
+                if df["Receiver_BIC"].values[i + j] == df["Receiver_BIC"].values[i]:
+                    graph_edge_map.append([df["UETR"].values[i], df["UETR"].values[i + j]])
+                j += 1
+
+        print(
+            f"Generated edge map for {ds_name}, in total {len(graph_edge_map)} valid edges for {len(df)} transactions"
+        )
+
+        edge_maps[ds_name] = pd.DataFrame(graph_edge_map)
+
+    return edge_maps
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "-i",
+        "--input_dir",
+        type=str,
+        nargs="?",
+        default="/tmp/dataset/credit_data",
+        help="input directory where csv files for each site are expected, default to /tmp/dataset/credit_data",
+    )
+
+    parser.add_argument(
+        "-o",
+        "--output_dir",
+        type=str,
+        nargs="?",
+        default="/tmp/dataset/credit_data",
+        help="output directory, default to '/tmp/dataset/credit_data'",
+    )
+
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/finance-end-to-end/nvflare/graph_construct_job.py b/examples/advanced/finance-end-to-end/nvflare/graph_construct_job.py
new file mode 100644
index 0000000000..81f8033c30
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/nvflare/graph_construct_job.py
@@ -0,0 +1,100 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import argparse
+
+from nvflare import FedJob
+from nvflare.app_common.workflows.etl_controller import ETLController
+from nvflare.job_config.script_runner import ScriptRunner
+
+
+def main():
+    args = define_parser()
+
+    site_names = args.sites
+    work_dir = args.work_dir
+    job_name = args.job_name
+    task_script_path = args.task_script_path
+    task_script_args = args.task_script_args
+
+    job = FedJob(name=job_name)
+
+    graph_construct_ctrl = ETLController(task_name="graph_construct")
+    job.to(graph_construct_ctrl, "server", id="graph_construct")
+
+    # Add clients
+    for site_name in site_names:
+        executor = ScriptRunner(script=task_script_path, script_args=task_script_args)
+        job.to(executor, site_name, tasks=["graph_construct"])
+
+    if work_dir:
+        print(f"{work_dir=}")
+        job.export_job(work_dir)
+
+    if not args.config_only:
+        job.simulator_run(work_dir)
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-c",
+        "--sites",
+        nargs="*",  # 0 or more values expected => creates a list
+        type=str,
+        default=[],  # default if nothing is provided
+        help="Space separated site names",
+    )
+    parser.add_argument(
+        "-n",
+        "--job_name",
+        type=str,
+        nargs="?",
+        default="credit_card_graph_construct_job",
+        help="job name, default to xgb_job",
+    )
+    parser.add_argument(
+        "-w",
+        "--work_dir",
+        type=str,
+        nargs="?",
+        default="/tmp/nvflare/jobs/xgb/workdir",
+        help="work directory, default to '/tmp/nvflare/jobs/xgb/workdir'",
+    )
+
+    parser.add_argument(
+        "-p",
+        "--task_script_path",
+        type=str,
+        nargs="?",
+        help="task script",
+    )
+
+    parser.add_argument(
+        "-a",
+        "--task_script_args",
+        type=str,
+        nargs="?",
+        default="",
+        help="",
+    )
+
+    parser.add_argument("-co", "--config_only", action="store_true", help="config only mode, will not run simulator")
+
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/finance-end-to-end/pre_process.py b/examples/advanced/finance-end-to-end/nvflare/pre_process.py
similarity index 100%
rename from examples/advanced/finance-end-to-end/pre_process.py
rename to examples/advanced/finance-end-to-end/nvflare/pre_process.py
diff --git a/examples/advanced/finance-end-to-end/pre_process_job.py b/examples/advanced/finance-end-to-end/nvflare/pre_process_job.py
similarity index 99%
rename from examples/advanced/finance-end-to-end/pre_process_job.py
rename to examples/advanced/finance-end-to-end/nvflare/pre_process_job.py
index ce7c4a1c92..fbeba5df44 100644
--- a/examples/advanced/finance-end-to-end/pre_process_job.py
+++ b/examples/advanced/finance-end-to-end/nvflare/pre_process_job.py
@@ -81,6 +81,7 @@ def define_parser():
         nargs="?",
         help="task script",
     )
+
     parser.add_argument(
         "-a",
         "--task_script_args",
diff --git a/examples/advanced/finance-end-to-end/xgb_data_loader.py b/examples/advanced/finance-end-to-end/nvflare/xgb_data_loader.py
similarity index 97%
rename from examples/advanced/finance-end-to-end/xgb_data_loader.py
rename to examples/advanced/finance-end-to-end/nvflare/xgb_data_loader.py
index 0b88e21e46..a5eef83dad 100644
--- a/examples/advanced/finance-end-to-end/xgb_data_loader.py
+++ b/examples/advanced/finance-end-to-end/nvflare/xgb_data_loader.py
@@ -30,6 +30,7 @@ def __init__(self, root_dir: str, file_postfix: str):
         self.file_postfix = file_postfix
         for name in self.dataset_names:
             self.base_file_names[name] = name + file_postfix
+
         self.numerical_columns = [
             "Timestamp",
             "Amount",
@@ -53,6 +54,8 @@ def load_data(self) -> Tuple[xgb.DMatrix, xgb.DMatrix]:
         for ds_name in self.dataset_names:
             print("\nloading for site = ", self.client_id, f"{ds_name} dataset \n")
             file_name = os.path.join(self.root_dir, self.client_id, self.base_file_names[ds_name])
+            print(file_name)
+            print(self.numerical_columns)
             df = pd.read_csv(file_name)
             data_num = len(data)
 
diff --git a/examples/advanced/finance-end-to-end/nvflare/xgb_embed_data_loader.py b/examples/advanced/finance-end-to-end/nvflare/xgb_embed_data_loader.py
new file mode 100644
index 0000000000..aec50a5711
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/nvflare/xgb_embed_data_loader.py
@@ -0,0 +1,64 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+from typing import Tuple
+
+import pandas as pd
+import xgboost as xgb
+
+from nvflare.app_opt.xgboost.data_loader import XGBDataLoader
+
+
+class CreditCardEmbedDataLoader(XGBDataLoader):
+    def __init__(self, root_dir: str, file_postfix: str):
+        self.dataset_names = ["train", "test"]
+        self.base_file_names = {}
+        self.root_dir = root_dir
+        self.file_postfix = file_postfix
+        for name in self.dataset_names:
+            self.base_file_names[name] = name + file_postfix
+        self.numerical_columns = [f"V_{i}" for i in range(64)]
+
+    def initialize(
+        self, client_id: str, rank: int, data_split_mode: xgb.core.DataSplitMode = xgb.core.DataSplitMode.ROW
+    ):
+        super().initialize(client_id, rank, data_split_mode)
+
+    def load_data(self) -> Tuple[xgb.DMatrix, xgb.DMatrix]:
+        data = {}
+        for ds_name in self.dataset_names:
+            print("\nloading for site = ", self.client_id, f"{ds_name} dataset")
+            file_name = os.path.join(self.root_dir, self.client_id, self.base_file_names[ds_name])
+            print(file_name)
+            print(self.numerical_columns)
+            print("\n")
+            df = pd.read_csv(file_name)
+            data_num = len(data)
+
+            # split to feature and label
+            y = df["Class"]
+            x = df[self.numerical_columns]
+            data[ds_name] = (x, y, data_num)
+
+        # training
+        x_train, y_train, total_train_data_num = data["train"]
+        dmat_train = xgb.DMatrix(x_train, label=y_train, data_split_mode=self.data_split_mode)
+
+        # validation
+        x_valid, y_valid, total_valid_data_num = data["test"]
+        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=self.data_split_mode)
+
+        return dmat_train, dmat_valid
diff --git a/examples/advanced/finance-end-to-end/xgb_job.py b/examples/advanced/finance-end-to-end/nvflare/xgb_job.py
similarity index 100%
rename from examples/advanced/finance-end-to-end/xgb_job.py
rename to examples/advanced/finance-end-to-end/nvflare/xgb_job.py
diff --git a/examples/advanced/finance-end-to-end/nvflare/xgb_job_embed.py b/examples/advanced/finance-end-to-end/nvflare/xgb_job_embed.py
new file mode 100644
index 0000000000..26e76b60a0
--- /dev/null
+++ b/examples/advanced/finance-end-to-end/nvflare/xgb_job_embed.py
@@ -0,0 +1,120 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+from xgb_embed_data_loader import CreditCardEmbedDataLoader
+
+from nvflare import FedJob
+from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
+from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import FedXGBHistogramExecutor
+
+
+def main():
+    args = define_parser()
+
+    site_names = args.sites
+    work_dir = args.work_dir
+    job_name = args.job_name
+    root_dir = args.input_dir
+    file_postfix = args.file_postfix
+
+    num_rounds = 10
+    early_stopping_rounds = 10
+    xgb_params = {
+        "max_depth": 8,
+        "eta": 0.1,
+        "objective": "binary:logistic",
+        "eval_metric": "auc",
+        "tree_method": "hist",
+        "nthread": 16,
+    }
+
+    job = FedJob(name=job_name)
+
+    # Define the controller workflow and send to server
+    controller = XGBFedController(
+        num_rounds=num_rounds,
+        data_split_mode=0,
+        secure_training=False,
+        xgb_params=xgb_params,
+        xgb_options={"early_stopping_rounds": early_stopping_rounds},
+    )
+    job.to(controller, "server")
+
+    # Add clients
+    for site_name in site_names:
+        executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
+        job.to(executor, site_name)
+        data_loader = CreditCardEmbedDataLoader(root_dir=root_dir, file_postfix=file_postfix)
+        job.to(data_loader, site_name, id="data_loader")
+
+    if work_dir:
+        print("work_dir=", work_dir)
+        job.export_job(work_dir)
+
+    if not args.config_only:
+        job.simulator_run(work_dir)
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-c",
+        "--sites",
+        nargs="*",  # 0 or more values expected => creates a list
+        type=str,
+        default=[],  # default if nothing is provided
+        help="Space separated site names",
+    )
+    parser.add_argument(
+        "-n",
+        "--job_name",
+        type=str,
+        nargs="?",
+        default="xgb_job",
+        help="job name, default to xgb_job",
+    )
+    parser.add_argument(
+        "-w",
+        "--work_dir",
+        type=str,
+        nargs="?",
+        default="/tmp/nvflare/jobs/xgb/workdir",
+        help="work directory, default to '/tmp/nvflare/jobs/xgb/workdir'",
+    )
+    parser.add_argument(
+        "-i",
+        "--input_dir",
+        type=str,
+        nargs="?",
+        default="",
+        help="root directory for input data",
+    )
+    parser.add_argument(
+        "-p",
+        "--file_postfix",
+        type=str,
+        nargs="?",
+        default="_embedding.csv",
+        help="file ending postfix, such as '.csv', or '_embedding.csv'",
+    )
+
+    parser.add_argument("-co", "--config_only", action="store_true", help="config only mode, will not run simulator")
+
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/finance-end-to-end/readme.md b/examples/advanced/finance-end-to-end/readme.md
deleted file mode 100644
index 579748f474..0000000000
--- a/examples/advanced/finance-end-to-end/readme.md
+++ /dev/null
@@ -1,398 +0,0 @@
-# End-to-End Process Illustration of Federated XGBoost Methods
-
-This example demonstrates the use of an end-to-end process for credit card fraud detection using XGBoost.
-
-The original dataset is based on the [kaggle credit card fraud dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud).
-
-To illustrate the end-to-end process, we manually duplicated the records to extend the data time span from 2 days to over 2 years. Don't focus too much on the data itself, as our primary goal is to showcase the process.
-
-The overall steps of the end-to-end process include the following:
-
-## Prepare Data
-
-In a real-world application, this step is not necessary.
-
-* To prepare the data, we expand the credit card data by adding additional randomly generated columns, 
-including sender and receiver BICs, currency, etc.
-* We then split the data based on the Sender BIC. Each Sender represents one financial institution, 
-thus serving as one site (client) for federated learning.
-
-We illustrate this step in the notebook [prepare_data] (./prepare_data.ipynb). The resulting dataset looks like the following:
-
-![data](images/generated_data.png)
-
-Once we have the this synthetic data, we like to split the data into 
-* historical data ( oldest data) -- 55%
-* training data 35 % 
-* test data remaining 10%
-
-```
-Historical DataFrame size: 626575
-Training DataFrame size: 398729
-Testing DataFrame size: 113924
-```
-Next we will split the data among different clients, i.e. different Sender_BICs. 
-For example: Sender = JPMorgan_Case, BIC =CHASUS33
-the client directory is **CHASUS33_JPMorgan_Chase**
-
-For this site, we will have three files. 
-```
-5343086 Jul 29 08:31 history.csv
-977888  Jul 29 08:31 test.csv
-3409228 Jul 29 08:31 train.csv
-```
-![split_data](images/split_data.png)
-The python code for data generation is located at [prepare_data.py] (./prepare_data.py)
-
-## Initial Analysis
-
-We choose one site data for analysis
-
-### Feature Engineering 
-In this process, we will enrich the data and add a few new derived features to illustrate the process. 
-Whether this enrichment makes sense or not is not important, as you can always replace these steps with procedures that
-make sense to you.
-
-Since all sites follow the same procedures, we only need to look at one site. For example, we will look at the site with 
-the name "CHASUS33_JPMorgan_Chase."
-
-The data enrichment process involves the following steps:
-
-1. **Grouping by Currency**: Calculate hist_trans_volume, hist_total_amount, and hist_average_amount for each currency.
-2. **Aggregation for Training and Test Data**: Aggregate the data in 1-hour intervals, grouped by currency. The aggregated value is then divided by hist_trans_volume, and this new column is named x2_y1.
-3. **Repeating for Beneficiary BIC**: Perform the same process for Beneficiary_BIC to generate another feature called x3_y2.
-4. **Merging Features**: Merge the two enriched features based on Time and Beneficiary_BIC.
-The resulting dataset looks like this:
-
-The resulting Dataset looks like this. 
-![enrich_data](images/enrichment.png)
-
-We save the enriched data into a new csv file. 
-```
-CHASUS33_JPMorgan_Chase/train_enrichment.csv
-CHASUS33_JPMorgan_Chase/test_enrichment.csv
-```
-### Pre-processing 
-Once we enrich the features, we need to normalize the numerical features and perform one-hot encoding for the categorical
-features. However, we will skip the categorical feature encoding in this example to avoid significantly increasing
-the file size (from 11 MB to over 2 GB).
-
-Similar to the feature enrichment process, we will consider only one site for now. The steps are straightforward: 
-we apply the scaler transformation to the numerical features and then merge them back with the categorical features.
-
-```
-    scaler = MinMaxScaler()
-    
-    # Fit and transform the numerical data
-    numerical_normalized = pd.DataFrame(scaler.fit_transform(numerical_features), columns=numerical_features.columns)
-    
-    # Combine the normalized numerical features with the categorical features
-    df_combined = pd.concat([categorical_features, numerical_normalized], axis=1)
-```
-the file is then saved to "_normalized.csv"
-
-```
-CHASUS33_JPMorgan_Chase/train_normalized.csv
-CHASUS33_JPMorgan_Chase/test_normalized.csv
-```
-## Federated ETL 
-
-We can easily convert the notebook code into the python code
-### Feature Enrichment
-
-#### ETL Script
-convert the enrichment code for one-site to the federated learning is easy
-look at the [enrich.py](enrich.py)
-we capture the logic of enrichment in 
-
-```python
-def enrichment(input_dir, site_name) -> dict:
-    # code skipped 
-```
-the main function will be similar to the following. 
-
-```
-def main():
-    print("\n enrichment starts \n ")
-
-    args = define_parser()
-
-    input_dir = args.input_dir
-    output_dir = args.output_dir
-
-    site_name = <site name>
-    print(f"\n {site_name =} \n ")
-
-    merged_dfs = enrichment(input_dir, site_name)
-
-    for ds_name in merged_dfs:
-        save_to_csv(merged_dfs[ds_name], output_dir, site_name, ds_name)
-
-```
-change this code to Federated ETL code, we just add few lines of code
-
-flare.init()
-
-etl_task = flare.receive()
-
-end_task = GenericTask()
-
-flare.send(end_task)
-
-```
-
-def main():
-    print("\n enrichment starts \n ")
-
-    args = define_parser()
-    flare.init()
-    
-    input_dir = args.input_dir
-    output_dir = args.output_dir
-
-    site_name = flare.get_site_name()
-    print(f"\n {site_name =} \n ")
-
-    # receives global message from NVFlare
-    etl_task = flare.receive()
-    merged_dfs = enrichment(input_dir, site_name)
-
-    for ds_name in merged_dfs:
-        save_to_csv(merged_dfs[ds_name], output_dir, site_name, ds_name)
-
-    # send message back the controller indicating end.
-    end_task = GenericTask()
-    flare.send(end_task)
-```
-This is the feature enrichment script. 
-
-#### ETL Job
-
-Federated ETL requires both server-side and client-side code. The above ETL script is the client-side code.
-To complete the setup, we need server-side code to configure and specify the federated job. 
-For this purpose, we wrote the following script: [enrich_job.py](enrich.py)
-
-```
-
-def main():
-    args = define_parser()
-
-    site_names = args.sites
-    work_dir = args.work_dir
-    job_name = args.job_name
-    task_script_path = args.task_script_path
-    task_script_args = args.task_script_args
-
-    job = FedJob(name=job_name)
-
-    # Define the enrich_ctrl workflow and send to server
-    enrich_ctrl = ETLController(task_name="enrich")
-    job.to(enrich_ctrl, "server", id="enrich")
-
-    # Add clients
-    for site_name in site_names:
-        executor = ScriptExecutor(task_script_path=task_script_path, task_script_args=task_script_args)
-        job.to(executor, site_name, tasks=["enrich"], gpu=0)
-
-    if work_dir:
-        print(f"{work_dir=}")
-        job.export_job(work_dir)
-
-    if not args.config_only:
-        job.simulator_run(work_dir)
-```
-Here we define a ETLController for server, and ScriptExecutor for client side ETL script. 
-
-### Pre-process
-
-#### ETL Script
-
-Converting the pre-processing code for one site to federated learning is straightforward. 
-Refer to the  [pre_process.py](pre_process.py) script for details.
-
-```
-
-dataset_names = ["train", "test"]
-datasets = {}
-
-def main():
-    args = define_parser()
-    input_dir = args.input_dir
-    output_dir = args.output_dir
-
-    flare.init()
-    site_name = flare.get_site_name()
-    etl_task = flare.receive()
-    processed_dfs = process_dataset(input_dir, site_name)
-    save_normalized_files(output_dir, processed_dfs, site_name)
-
-    end_task = GenericTask()
-    flare.send(end_task)
-
-```
-#### ETL Job
-
-This is almost identical to the Enrichment job, besides the task name
-
-```
-def main():
-    args = define_parser()
-
-    site_names = args.sites
-    work_dir = args.work_dir
-    job_name = args.job_name
-    task_script_path = args.task_script_path
-    task_script_args = args.task_script_args
-
-    job = FedJob(name=job_name)
-
-    pre_process_ctrl = ETLController(task_name="pre_process")
-    job.to(pre_process_ctrl, "server", id="pre_process")
-
-    # Add clients
-    for site_name in site_names:
-        executor = ScriptExecutor(task_script_path=task_script_path, task_script_args=task_script_args)
-        job.to(executor, site_name, tasks=["pre_process"], gpu=0)
-
-    if work_dir:
-        job.export_job(work_dir)
-
-    if not args.config_only:
-        job.simulator_run(work_dir)
-```
-## Federated Training of XGBoost
-
-Now we have enriched and normalized features, we can directly run XGBoost. 
-Here is the xgboost job code
-
-```
-def main():
-    args = define_parser()
-
-    site_names = args.sites
-    work_dir = args.work_dir
-    job_name = args.job_name
-    root_dir = args.input_dir
-    file_postfix = args.file_postfix
-
-    num_rounds = 10
-    early_stopping_rounds = 10
-    xgb_params = {
-        "max_depth": 8,
-        "eta": 0.1,
-        "objective": "binary:logistic",
-        "eval_metric": "auc",
-        "tree_method": "hist",
-        "nthread": 16,
-    }
-
-    job = FedJob(name=job_name)
-
-    # Define the controller workflow and send to server
-    
-   controller = XGBFedController(
-        num_rounds=num_rounds,
-        training_mode="horizontal",
-        xgb_params=xgb_params,
-        xgb_options={"early_stopping_rounds": early_stopping_rounds},
-    )
-    job.to(controller, "server")
-
-    # Add clients
-    for site_name in site_names:
-        executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
-        job.to(executor, site_name, gpu=0)
-        data_loader = CreditCardDataLoader(root_dir=root_dir, file_postfix=file_postfix)
-        job.to(data_loader, site_name, id="data_loader")
-    if work_dir:
-        job.export_job(work_dir)
-
-    if not args.config_only:
-        job.simulator_run(work_dir)
-```
-
-In this code, all we need to write is ```CreditCardDataLoader```, which is XGBDataLoader,
-the rest of code is handled by XGBoost Controller and Executor. 
-
-in 
-```
-class CreditCardDataLoader(XGBDataLoader):
-```
-we only loaded the numerical feature in this example, in your case, you might to chagne that. 
-
-## Running end-by-end code
-You can run this from the command line interface (CLI) or orchestrate it using a workflow tool such as Airflow. 
-Here, we will demonstrate how to run this from a simulator. You can always export the job configuration and run
-it anywhere in a real deployment.
-
-Assuming you have already downloaded the credit card dataset and the creditcard.csv file is located in the current directory:
-
-* prepare data
-```
-python prepare_data.py -i ./creditcard.csv -o /tmp/nvflare/xgb/credit_card
-```
->>note
-> 
-> All Sender SICs are considered clients: they are 
-> * 'BARCGB22_Barclays_Bank'
-> * 'BSCHESMM_Banco_Santander'
-> * 'CITIUS33_Citibank'
-> * 'SCBLSGSG_Standard_Chartered_Bank'
-> * 'UBSWCHZH80A_UBS'
-> * 'BNPAFRPP_BNP_Paribas'
-> * 'CHASUS33_JPMorgan_Chase'
-> * 'HSBCHKHH_HSBC'
-> * 'ANZBAU3M_ANZ_Bank'
-> * 'DEUTDEFF_Deutsche_Bank'
-> Total 10 banks
-
-* enrich data
-
-
-```
-python enrich_job.py -c CHASUS33_JPMorgan_Chase HSBCHKHH_HSBC DEUTDEFF_Deutsche_Bank BARCGB22_Barclays_Bank BNPAFRPP_BNP_Paribas UBSWCHZH80A_UBS BSCHESMM_Banco_Santander ANZBAU3M_ANZ_Bank SCBLSGSG_Standard_Chartered_Bank CITIUS33_Citibank -p enrich.py  -a "-i /tmp/nvflare/xgb/credit_card/ -o /tmp/nvflare/xgb/credit_card/"
-```
-
-* pre-process data
-
-```
-python pre_process_job.py -c CHASUS33_JPMorgan_Chase HSBCHKHH_HSBC DEUTDEFF_Deutsche_Bank BARCGB22_Barclays_Bank BNPAFRPP_BNP_Paribas UBSWCHZH80A_UBS BSCHESMM_Banco_Santander ANZBAU3M_ANZ_Bank SCBLSGSG_Standard_Chartered_Bank CITIUS33_Citibank -p pre_process.py -a "-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/"
-
-```
-
-* XGBoost Job 
-Finally we take the normalized data and run XGboost Job
-
-```
-python xgb_job.py -c CHASUS33_JPMorgan_Chase HSBCHKHH_HSBC DEUTDEFF_Deutsche_Bank BARCGB22_Barclays_Bank BNPAFRPP_BNP_Paribas UBSWCHZH80A_UBS BSCHESMM_Banco_Santander ANZBAU3M_ANZ_Bank SCBLSGSG_Standard_Chartered_Bank CITIUS33_Citibank -i /tmp/nvflare/xgb/credit_card  -w /tmp/nvflare/workspace/xgb/credit_card/
-```
-Here is the output of last 9th and 10th round of training (starting round = 0) 
-```
-...
-
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-[19:58:27] [8]	eval-auc:0.67126	train-auc:0.71717
-
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[19:58:30] [9]	eval-auc:0.67348	train-auc:0.71769
-[07:33:54] Finished training
-```
-
-
-
diff --git a/examples/advanced/finance-end-to-end/prepare_data.py b/examples/advanced/finance-end-to-end/utils/prepare_data.py
similarity index 100%
rename from examples/advanced/finance-end-to-end/prepare_data.py
rename to examples/advanced/finance-end-to-end/utils/prepare_data.py
diff --git a/examples/advanced/finance-end-to-end/xgboost.ipynb b/examples/advanced/finance-end-to-end/xgboost.ipynb
index 0163088d21..eca92a0d88 100644
--- a/examples/advanced/finance-end-to-end/xgboost.ipynb
+++ b/examples/advanced/finance-end-to-end/xgboost.ipynb
@@ -9,9 +9,7 @@
     "\n",
     "This notebooks shows the how do we convert and existing tabular credit data, enrich and pre-process data using one-site (like centralized dataset) and then convert this centralized process into a federated ETL steps, easily. Then construct a federated XGBoost, the only thing user need to define is the XGboost data loader. \n",
     "\n",
-    "## Install requirements\n",
-    "\n",
-    "\n"
+    "## Install requirements\n"
    ]
   },
   {
@@ -33,9 +31,10 @@
     "tags": []
    },
    "source": [
-    "## Data Prepare Data \n",
+    "## Step 1: Data Preparation \n",
+    "First, we prepare the data by adding random transactional information to the base creditcard dataset following the below script:\n",
     "\n",
-    "* [prepare data](./prepare_data.ipynb)\n"
+    "* [prepare data](./notebooks/prepare_data.ipynb)"
    ]
   },
   {
@@ -43,20 +42,51 @@
    "id": "f69c008a-b19b-4c1a-b3c4-c376eccf53ba",
    "metadata": {},
    "source": [
-    "## Feature Enrichment\n",
+    "## Step 2: Feature Analysis\n",
+    "\n",
+    "For this stage, we would like to analyze the data, understand the features, and derive (and encode) secondary features that can be more useful for building the model.\n",
+    "\n",
+    "Towards this goal, there are two options:\n",
+    "1. **Feature Enrichment**: This process involves adding new features based on the existing data. For example, we can calculate the average transaction amount for each currency and add this as a new feature. \n",
+    "2. **Feature Encoding**: This process involves encoding the current features and transforming them to embedding space via machine learning models. This model can be either pre-trained, or trained with the candidate dataset.\n",
+    "\n",
+    "Considering the fact that the only two numerical features in the dataset are \"Amount\" and \"Time\", we will perform feature enrichment first. Optionally, we can also perform feature encoding. In this example, we use graph neural network (GNN): we will train the GNN model in a federated unsupervised fashion, and then use the model to encode the features for all sites. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05dcf825-6e31-4d10-9968-2f353eaa4cea",
+   "metadata": {},
+   "source": [
+    "### Step 2.1: Rule-based Feature Enrichment\n",
+    "\n",
+    "#### Single-site Enrichment and Additional Processing\n",
+    "The detailed feature enrichment step is illustrated using one site as example: \n",
     "\n",
-    "We can first examine how the feature enrichment is processed using just one-site. \n",
+    "* [feature_enrichments with-one-site](./notebooks/feature_enrichment.ipynb)\n",
     "\n",
-    "* [feature_enrichments with-one-site](./feature_enrichment.ipynb)\n",
+    "Similarly, we examine the additional pre-processing step using one site: \n",
     "\n",
-    "in order to run feature job on each site similar to above feature enrichment steps, we wrote an enrichment ETL job.\n",
+    "* [pre-processing with one-site](./notebooks/pre_process.ipynb)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8bc8bb99-a253-415e-8953-91af62ef22a2",
+   "metadata": {},
+   "source": [
+    "#### Federated Job to Perform on All Sites\n",
+    "In order to run feature enrichment and processing job on each site similar to above steps, we wrote federated ETL job scripts for client-side based on single-site implementations.\n",
     "\n",
-    "[enrichment script](./enrich.py)\n",
+    "* [enrichment script](./nvflare/enrich.py)\n",
+    "* [pre-processing script](./nvflare/pre_process.py) \n",
     "\n",
-    "Define a job to trigger running enrichnment script on each site: \n",
+    "Then we define job scripts on server-side to trigger and coordinate running client-side scripts on each site: \n",
     "\n",
-    "[enrich_job.py](./enrich_job.py)\n",
+    "* [enrich_job.py](./nvflare/enrich_job.py)\n",
+    "* [pre-processing-job](./nvflare/pre_process_job.py)\n",
     "\n",
+    "Example script as below:\n",
     "```\n",
     "# Define the enrich_ctrl workflow and send to server\n",
     "    enrich_ctrl = ETLController(task_name=\"enrich\")\n",
@@ -66,41 +96,39 @@
     "    for site_name in site_names:\n",
     "        executor = ScriptExecutor(task_script_path=task_script_path, task_script_args=task_script_args)\n",
     "        job.to(executor, site_name, tasks=[\"enrich\"], gpu=0)\n",
-    "```\n",
-    "\n",
-    "\n"
+    "```"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8bc8bb99-a253-415e-8953-91af62ef22a2",
+   "id": "8068c808-fce8-4f27-b0cf-76b486a24903",
    "metadata": {},
    "source": [
-    "## Pre-Processing \n",
+    "### (Optional) Step 2.2: GNN-based Feature Encoding\n",
+    "Based on raw features, or combining the derived features from **Step 2.1**, we can use machine learning models to encode the features. \n",
+    "In this example, we use federated GNN to learn and generate the feature embeddings.\n",
     "\n",
-    "We exam examine the steps for pre-processing using only one-site (one client) \n",
+    "First, we construct a graph based on the transaction data. Each node represents a transaction, and the edges represent the relationships between transactions. We then use the GNN to learn the embeddings of the nodes, which represent the transaction features.\n",
     "\n",
-    "* [pre-processing with one-site](./pre_process.ipynb)\n",
+    "#### Single-site operation example: graph construction\n",
+    "The detailed graph construction step is illustrated using one site as example:\n",
     "\n",
-    "Based on one-site, we create the pre-processing script\n",
+    "* [graph_construction with one-site](./notebooks/graph_construct.ipynb)\n",
     "\n",
-    "* [pre-processing script](./pre_process.py) \n",
+    "The detailed GNN training and encoding step is illustrated using one site as example:\n",
     "\n",
-    "then we define the pre-processing job to coordinate the pre-processing for all sites\n",
+    "* [gnn_training_encoding with one-site](./notebooks/gnn_train_encode.ipynb)\n",
     "\n",
-    "* [pre-processing-job](./pre_process_job.py)\n",
+    "#### Federated Job to Perform on All Sites\n",
+    "In order to run feature graph construction job on each site similar to the enrichment and processing steps, we wrote federated ETL job scripts for client-side based on single-site implementations.\n",
     "\n",
-    "```\n",
-    "    pre_process_ctrl = ETLController(task_name=\"pre_process\")\n",
-    "    job.to(pre_process_ctrl, \"server\", id=\"pre_process\")\n",
+    "* [graph_construction script](./nvflare/graph_construct.py)\n",
+    "* [gnn_train_encode script](./nvflare/gnn_train_encode.py)\n",
     "\n",
-    "    # Add clients\n",
-    "    for site_name in site_names:\n",
-    "        executor = ScriptExecutor(task_script_path=task_script_path, task_script_args=task_script_args)\n",
-    "        job.to(executor, site_name, tasks=[\"pre_process\"], gpu=0)\n",
+    "Similarily, we define job scripts on server-side to trigger and coordinate running client-side scripts on each site: \n",
     "\n",
-    "```\n",
-    " Similarly to the ETL job, we simply issue a task to trigger pre-process running pre-process script. "
+    "* [graph_construction_job.py](./nvflare/graph_construct_job.py)\n",
+    "* [gnn_train_encode_job.py](./nvflare/gnn_train_encode_job.py)"
    ]
   },
   {
@@ -110,39 +138,15 @@
     "tags": []
    },
    "source": [
+    "## Step 3: Federated XGBoost \n",
     "\n",
-    "    def load_data(self, client_id: str, split_mode: int) -> Tuple[xgb.DMatrix, xgb.DMatrix]:\n",
-    "        data = {}\n",
-    "        for ds_name in self.dataset_names:\n",
-    "            print(\"\\nloading for site = \", client_id, f\"{ds_name} dataset \\n\")\n",
-    "            file_name = os.path.join(self.root_dir, client_id, self.base_file_names[ds_name])\n",
-    "            df = pd.read_csv(file_name)\n",
-    "            data_num = len(data)\n",
+    "Now that we have the data ready, either enriched and normalized features, or GNN feature embeddings, we can fit them with XGBoost. NVIDIA FLARE has already has written XGBoost Controller and Executor for the job. All we need to provide is the data loader to fit into the XGBoost.\n",
     "\n",
-    "            # split to feature and label\n",
-    "            y = df[\"Class\"]\n",
-    "            x = df[self.numerical_columns]\n",
-    "            data[ds_name] = (x, y, data_num)\n",
-    "\n",
-    "\n",
-    "        # training\n",
-    "        x_train, y_train, total_train_data_num = data[\"train\"]\n",
-    "        data_split_mode = DataSplitMode(split_mode)\n",
-    "        dmat_train = xgb.DMatrix(x_train, label=y_train, data_split_mode=data_split_mode)\n",
+    "To specify the controller and executor, we need to define a Job. You can find the job construction in\n",
     "\n",
-    "        # validation\n",
-    "        x_valid, y_valid, total_valid_data_num = data[\"test\"]\n",
-    "        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=data_split_mode)\n",
-    "\n",
-    "        return dmat_train, dmat_valid\n",
-    "## Define XGBoost Job \n",
+    "* [xgb_job.py](./nvflare/xgb_job.py). \n",
     "\n",
-    "Now that we have the data ready, We can fit the data into XGBoost. NVIDIA FLARE has already has written XGBoost Controller and Executor for the job. All we need to provide is the data loader to fit into the XGBoost\n",
-    "To specify the controller and executor, we need to define a Job.  You can find the job construction can be find in\n",
-    "\n",
-    "* [xgb_job.py](./xgb_job.py). \n",
-    "\n",
-    "Here is main part of the code\n",
+    "Below is main part of the code\n",
     "\n",
     "```\n",
     "    controller = XGBFedController(\n",
@@ -167,11 +171,9 @@
     "  * test__normalized.csv\n",
     "  \n",
     "\n",
-    "Notice we assign defined a [```CreditCardDataLoader```](./xgb_data_loader.py), this a XGBLoader we defined to load the credit card dataset. \n",
+    "Notice we assign defined a [```CreditCardDataLoader```](./nvflare/xgb_data_loader.py), this a XGBLoader we defined to load the credit card dataset. \n",
     "\n",
     "```\n",
-    "\n",
-    "\n",
     "import os\n",
     "from typing import Optional, Tuple\n",
     "\n",
@@ -227,8 +229,6 @@
     "        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=data_split_mode)\n",
     "\n",
     "        return dmat_train, dmat_valid\n",
-    "\n",
-    "\n",
     "```\n",
     "\n",
     "We are now ready to run all the code"
@@ -239,12 +239,11 @@
    "id": "036417d1-ad58-4835-b59b-fae94aafded3",
    "metadata": {},
    "source": [
-    "## Run all the Jobs\n",
+    "## Run All the Jobs End-to-end\n",
     "Here we are going to run each job in sequence. For real-world use case,\n",
     "\n",
     "* prepare data is not needed, as you already have the data\n",
-    "* feature enrichment scripts need to be define based on your own enrichment rules\n",
-    "* pre-processing, you also need to change the pre-process script to define normalization and categorical encodeing\n",
+    "* feature enrichment / encoding scripts need to be defined based on your own technique\n",
     "* for XGBoost Job, you will need to write your own data loader \n",
     "\n",
     "Note: All Sender SICs are considered clients: they are \n",
@@ -259,6 +258,7 @@
     "* 'HCBHSGSG_Bank_9'\n",
     "* 'XITXUS33_Bank_10' \n",
     "Total 10 banks\n",
+    "\n",
     "### Prepare Data"
    ]
   },
@@ -271,7 +271,7 @@
    },
    "outputs": [],
    "source": [
-    "! python3 prepare_data.py -i ./creditcard.csv -o /tmp/nvflare/xgb/credit_card"
+    "! python3 ./utils/prepare_data.py -i ./creditcard.csv -o /tmp/nvflare/xgb/credit_card"
    ]
   },
   {
@@ -291,7 +291,9 @@
    },
    "outputs": [],
    "source": [
-    "! python enrich_job.py -c 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'FBSFCHZH_Bank_6' 'YMNYFRPP_Bank_5' 'WPUWDEFF_Bank_4' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'YSYCESMM_Bank_7' 'ZHSZUS33_Bank_1' 'HCBHSGSG_Bank_9' -p enrich.py  -a \"-i /tmp/nvflare/xgb/credit_card/ -o /tmp/nvflare/xgb/credit_card/\"\n"
+    "%cd nvflare\n",
+    "! python3 enrich_job.py -c 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'FBSFCHZH_Bank_6' 'YMNYFRPP_Bank_5' 'WPUWDEFF_Bank_4' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'YSYCESMM_Bank_7' 'ZHSZUS33_Bank_1' 'HCBHSGSG_Bank_9' -p enrich.py  -a \"-i /tmp/nvflare/xgb/credit_card/ -o /tmp/nvflare/xgb/credit_card/\"\n",
+    "%cd .."
    ]
   },
   {
@@ -311,7 +313,29 @@
    },
    "outputs": [],
    "source": [
-    "! python pre_process_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -p pre_process.py -a \"-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/\""
+    "%cd nvflare\n",
+    "! python3 pre_process_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -p pre_process.py -a \"-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/\"\n",
+    "%cd .."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "530f95e5-d104-43d3-8320-dd077d885799",
+   "metadata": {},
+   "source": [
+    "### Construct Graph"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a775b7a4-32de-4791-b17f-bc8291a5f885",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%cd nvflare\n",
+    "! python graph_construct_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -p graph_construct.py -a \"-i /tmp/nvflare/xgb/credit_card  -o /tmp/nvflare/xgb/credit_card/\"\n",
+    "%cd .."
    ]
   },
   {
@@ -331,7 +355,9 @@
    },
    "outputs": [],
    "source": [
-    "! python xgb_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -i /tmp/nvflare/xgb/credit_card  -w /tmp/nvflare/workspace/xgb/credit_card/"
+    "%cd nvflare\n",
+    "! python3 xgb_job.py -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4' -i /tmp/nvflare/xgb/credit_card  -w /tmp/nvflare/workspace/xgb/credit_card/\n",
+    "%cd .."
    ]
   },
   {
@@ -343,7 +369,7 @@
    "source": [
     "## Prepare Job for POC and Production\n",
     "\n",
-    "This seems to work well with Job running in simulator. Now we are ready to run in a POC mode, so we can simulate the deployment in localhost or simply deploy to production. \n",
+    "With job running well in simulator, we are ready to run in a POC mode, so we can simulate the deployment in localhost or simply deploy to production. \n",
     "\n",
     "All we need is the job definition. we can use job.export_job() method to generate the job configuration and export to given directory. For example, in xgb_job.py, we have the following\n",
     "\n",
@@ -368,7 +394,9 @@
    },
    "outputs": [],
    "source": [
-    "! python xgb_job.py -co -w /tmp/nvflare/workspace/xgb/credit_card/config -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4'  -i /tmp/nvflare/xgb/credit_card  "
+    "%cd nvflare\n",
+    "! python xgb_job.py -co -w /tmp/nvflare/workspace/xgb/credit_card/config -c 'YSYCESMM_Bank_7' 'FBSFCHZH_Bank_6' 'YXRXGB22_Bank_3' 'XITXUS33_Bank_10' 'HCBHSGSG_Bank_9' 'YMNYFRPP_Bank_5' 'ZHSZUS33_Bank_1' 'ZNZZAU3M_Bank_8' 'SHSHKHH1_Bank_2' 'WPUWDEFF_Bank_4'  -i /tmp/nvflare/xgb/credit_card  \n",
+    "%cd .."
    ]
   },
   {
@@ -459,7 +487,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.19"
+   "version": "3.8.18"
   }
  },
  "nbformat": 4,