{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7d79e1a1-c367-45be-bf44-215e8e6ddd64",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Using the Scatter widget\n",
    "\n",
    "For this example we need to install pandas. If you don't have it yet, just execute the cell bellow to install."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6d437fd-b576-4ac3-9e02-8ebf6f0ccd8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2b0b351-9385-4b88-a252-b47ad19bfe62",
   "metadata": {},
   "source": [
    "Now we can import the `trident_chemwidgets` and the `pandas` lib to import our csv dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2bbcd65-b38e-436f-8f37-fd248d01e95f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import trident_chemwidgets as tcw\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58c4d403-6d8f-4d25-a39d-5a16c95a8340",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First we import our dataset with pandas\n",
    "dataset = pd.read_csv('./zinc_subset.csv')\n",
    "dataset.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21807b51-ca88-4cb2-ac98-51c9865819da",
   "metadata": {},
   "source": [
    "Once we have our data, we can use the Histogram widget to display an interactive Histogram that we can use to explore and split or subset our data set. The Histogram widget accepts the following keyword arugements:\n",
    "\n",
    "- `data`: the dataset in pandas data frame format\n",
    "- `smiles`: the name of the column containing the molecular structure in SMILES format\n",
    "- `x`: the name of the column to plot along the x-axis\n",
    "- `y`: the name of the column to plot along the x-axis\n",
    "- `x_label`: (optional) the x-axis label to display, defaults to the string specified by `x` if a label is not provided\n",
    "- `y_label`: (optional) the y-axis label to display, defaults to the string specified by `y` if a label is not provided"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6afe53c3-5913-483b-84fd-e906204c6999",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we can use the scatter using the mwt as x axis and logp as y\n",
    "scatter = tcw.Scatter(data=dataset, smiles='smiles', x='mwt', y='logp', x_label='Molecular Weight', y_label='logP')\n",
    "scatter"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7da8627b-2a23-47e9-aae0-fc493b6e0298",
   "metadata": {},
   "source": [
    "In the example above, you have several ways to visualize the structures present in the underlying plot data. First, you can hover over any point in the plot and a tooltip with the structure will appear. This can be usefull for identifying outliers in your data set. \n",
    "\n",
    "You can also click and drag anywhere in the plot body to select a subset of the data. Your selected datapoints will be highlighted on the plot as colored points in a gray box. If you click the `SHOW STRUCTURES` button after you have selected the data points, a gallery of the molecular structures will be displayed to the right of the plot. If you then click `SAVE SELECTION`, the selected datapoints will be saved to an internal variable called `selection` that can be accessed as below. You do not need to click `SHOW STRUCTURES` before clicking `SAVE SELECTION`, though the gallery of selected structures will be displayed once `SAVE SELECTION` is clicked."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f99f81f2-d3ac-4c1e-ab84-4ec404a81782",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can use the `selection` property to access the subset of molecules you selected in the histogram above\n",
    "# The selection will be returned as a subset of the original pandas data frame\n",
    "scatter.selection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec5e3d82-de3e-45f4-9ab5-9e887015d665",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}