diff --git a/tutorials/kdd22/Session 2 Extreme Multi-label Ranking with PECOS.ipynb b/tutorials/kdd22/Session 2 Extreme Multi-label Ranking with PECOS.ipynb index 81f6ce5..fdb579f 100644 --- a/tutorials/kdd22/Session 2 Extreme Multi-label Ranking with PECOS.ipynb +++ b/tutorials/kdd22/Session 2 Extreme Multi-label Ranking with PECOS.ipynb @@ -11,7 +11,19 @@ "\n", "

\n", "\n", - "As shown in the above figure, to address the XMR problem, PECOS conceptually consists of three stages, including semantic label indexing, machine-learned matching, and ranking. For more details about XMR problem and model formulation, please refer to presentations in the PECOS Day. In this part of the tutorial, we will use XR-Linear as an example to demonstrate how to use PECOS to tackle real-world problems and understrand the model architecture in PECOS." + "As shown in the above figure, to address the XMR problem, PECOS conceptually consists of three stages, including semantic label indexing, machine-learned matching, and ranking. In this part of the tutorial, we will use XR-Linear as an example to demonstrate how to use PECOS to tackle real-world problems and understrand the model architecture in PECOS.\n", + "\n", + "### Install PECOS through Python PIP" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "6d9fa78b", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install libpecos" ] }, { @@ -232,7 +244,7 @@ "\n", "In PECOS, numerical features of instances can be in either a [dense NumPy matrix](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) or a [Compressed Sparse Row (CSR) matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) of shape `(nr_inst, nr_feat)`, where `nr_inst` and `nr_feat` are numbers of instances and features. Similary, labels of instances can be also presented as a dense or a sparse matrix of shape `(nr_inst, nr_labels)`, where `nr_labels` is the number of labels in the XMR problem. Note that for the sparse format, training labels should be a [Compressed Sparse Column (CSC) matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html) while testing labels should be a CSR matrix for the purpose of computational efficiency. For convenience, PECOS also provides APIs for loading features and labels from binary files in arbitary formats.\n", "\n", - "In addition to numerical features, PECOS also supports handling text data with transformer. Please refer to [Part 2](Part%202%20-%20Text%20Processing.ipynb) in this tutorial for more details about text processing in PECOS." + "In addition to numerical features, PECOS also supports handling text data with transformer." ] }, { @@ -330,7 +342,7 @@ "source": [ "### Training XR-Linear Negative Sampling and Sparsification\n", "\n", - "Negative sampling plays an important role in solving the XMR problem. PECOS currently provides two negative sampling schemes, including Teacher Forcing Negatives (TFN) and Matcher Aware Negatives (MAN). Please refer to [our report](https://arxiv.org/pdf/2010.05878.pdf)) and presentations in the [PECOS Day](https://w.amazon.com/bin/view/Search/MIDAS/Projects/PECOS/PecosDay/) for more details about negative sampling schemes.\n", + "Negative sampling plays an important role in solving the XMR problem. PECOS currently provides two negative sampling schemes, including Teacher Forcing Negatives (TFN) and Matcher Aware Negatives (MAN). Please refer to [our report](https://arxiv.org/pdf/2010.05878.pdf) for more details about negative sampling schemes.\n", "\n", "To reduce model sizes and improve efficiency, PECOS conduct model sparsification with a hyper-parameter `threshold`. The model weights with absolute values smaller than the threshold will be discarded." ] diff --git a/tutorials/kdd22/Session 3 Approximate Nearest Neighbor Search in PECOS.ipynb b/tutorials/kdd22/Session 3 Approximate Nearest Neighbor Search in PECOS.ipynb index 2574bd8..21b54aa 100644 --- a/tutorials/kdd22/Session 3 Approximate Nearest Neighbor Search in PECOS.ipynb +++ b/tutorials/kdd22/Session 3 Approximate Nearest Neighbor Search in PECOS.ipynb @@ -53,6 +53,24 @@ "* building the indexer (training)\n", "* inference (testing).\n", "\n", + "### Install PECOS through Python PIP" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f6df49a3", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install libpecos" + ] + }, + { + "cell_type": "markdown", + "id": "abb5ff7e", + "metadata": {}, + "source": [ "### Data Loading" ] }, diff --git a/tutorials/kdd22/Session 4 Utilities in PECOS.ipynb b/tutorials/kdd22/Session 4 Utilities in PECOS.ipynb index 18cd0c8..8175799 100644 --- a/tutorials/kdd22/Session 4 Utilities in PECOS.ipynb +++ b/tutorials/kdd22/Session 4 Utilities in PECOS.ipynb @@ -7,7 +7,19 @@ "source": [ "# Utilities in PECOS\n", "\n", - "PECOS provides various useful interfaces and utility functions for XMR problems and related tasks. In this session, we will introduce how to tackle arbitrary data formats for XMR, and then present some utilities in PECOS for efficient matrix operations and hierarchical clustering." + "PECOS provides various useful interfaces and utility functions for XMR problems and related tasks. In this session, we will introduce how to tackle arbitrary data formats for XMR, and then present some utilities in PECOS for efficient matrix operations and hierarchical clustering.\n", + "\n", + "### Install PECOS through Python PIP" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4eba0f0b", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install libpecos" ] }, { diff --git a/tutorials/kdd22/Session 5 XR-Transformer cookbook and Distributed PECOS.ipynb b/tutorials/kdd22/Session 5 XR-Transformer cookbook and Distributed PECOS.ipynb index 684272f..871fda8 100644 --- a/tutorials/kdd22/Session 5 XR-Transformer cookbook and Distributed PECOS.ipynb +++ b/tutorials/kdd22/Session 5 XR-Transformer cookbook and Distributed PECOS.ipynb @@ -9,7 +9,18 @@ "In many XMC applications, XR-Transformer is able to yield better performance than XR-Linear due to better extraction of semantic information. However, unlike the linear models, the training hyper-parameters need to be carefully set to achieve the best performance. Naively using the default setting will often lead to sub-optimal results.\n", "\n", "In this section, we will discuss about crucial components in training a good XR-Transformer model.\n", - "\n" + "\n", + "### Install PECOS through Python PIP" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7f4acc8", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install libpecos" ] }, { @@ -25,7 +36,7 @@ "* **Step2**: Fine-tune the transformer encoder on the chosen levels of the preliminary HLT.\n", "* **Step3**: Concatenate final instance embeddings and sparse features and train the linear rankers on the refined HLT.\n", "\n", - "

\n", + "

\n", "\n" ] }, diff --git a/tutorials/kdd22/imgs/pecos_xrtransformer.png b/tutorials/kdd22/imgs/pecos_xrtransformer.png new file mode 100644 index 0000000..10ff1d7 Binary files /dev/null and b/tutorials/kdd22/imgs/pecos_xrtransformer.png differ