This reposistory contains fraction of software for thesis/skripsi with title "Assessment Study on ktrain Library for Text Processing"/"Studi Pengkajian Library ktrain untuk Area Text Processing". This thesis uses ktrain version 0.25.3 to 0.26.2, but all code on this repository tested with ktrain 0.28.3. Ktrain source code is available at https://github.com/amaiya/ktrain.
- GNU/Linux OS based on Debian 11 Testing.
- Python 3.8.7
- 6C/12T CPU
- 16GB RAM
- GTX 1060 6GB
- You don't need to install all library if you only want to run specific notebook.
- You'll need to install Jupyter Notebook/Lab by yourself.
- Using virtual environment (such as
venv
) is strongly recommended.
pip install \
ktrain==0.28.3 \
torch==1.8.1 \
tensorflow==2.7.0 \
transformers==4.10.3 \
https://github.com/amaiya/stellargraph/archive/refs/heads/no_tf_dep_082.zip \
https://github.com/amaiya/eli5/archive/refs/heads/tfkeras_0_10_1.zip \
bokeh==2.3.0 \
wikiextractor==3.0.6 \
beautifulsoup4==4.9.3 \
graphviz==0.18 \
torchviz==0.0.2
This repository is organized into three directories: 01_feature_demonstration, 02_study_cases and 03_miscellaneous. Each directory contains README file which give more detailed information.
This directory contain few Jupyter Notebook used to demonstrate feature on ktrain. Most notebook isn't included since it's similar with tutorial/example available on ktrain repository.
This directory contain all Jupyter Notebook and script used for 3 study cases. The goal of these study cases to evaluate ktrain performance on real-life dataset. Different library and other research are used to evaluate ktrain performance. Tasks which used for study cases are,
- Zero Shot Classification
- Open Domain Question Answering
- Document Similarity
This directory contain some Jupyter Notebook and script which are helpful during my thesis.