Skip to content

wiseaidev/pydist2

Repository files navigation

pydist2

Documentation Status

pydist2 is a python library that provides a set of methods for calculating distances between observations. There are two main classes:

  • pdist1 which calculates the pairwise distances between observations in one matrix and returns a distance matrix.
  • pdist2 computes the distances between observations in two matrices and also returns a distance matrix.

Usage

pdist1(P, metric = "euclidean", matrix=False)
pdist2(P, Q, metric = "minkowski", exp = 3)

Arguments:

  • two matrices P and Q.
  • metric: The distance function to use.
  • exp: The exponent of the Minkowski distance.

Installation

The pydist2 library is available on Pypi. Thus, you can install the latest available version using pip:

pip install pydist2

Supported Python versions

pydist2 has been tested with Python 3.7 and 3.8.

For more information, please checkout the documentation which is available at readthedocs.

This program and the accompanying materials are made available under the terms of the MIT License.

Progress & Features

  • ☑ Commit the first code's version.

  • ☑ Support the following list of distances.

  • ☑ Display the distance in a matrix form(a combination for each pair of points):

    >>> X = np.array([[100, 100],[0, 100],[100, 0], [500, 400], [300, 600]])
    >>> pdist1(X,matrix=True) # by default, metric = 'euclidean'
    array([[100.    , 100.    , 100.    ,   0.    , 100.    ],
           [100.    , 100.    , 100.    , 100.    ,   0.    ],
           [500.    , 100.    , 100.    , 500.    , 400.    ],
           [538.5165, 100.    , 100.    , 300.    , 600.    ],
           [141.4214,   0.    , 100.    , 100.    ,   0.    ],
           [583.0952,   0.    , 100.    , 500.    , 400.    ],
           [583.0952,   0.    , 100.    , 300.    , 600.    ],
           [565.6854, 100.    ,   0.    , 500.    , 400.    ],
           [632.4555, 100.    ,   0.    , 300.    , 600.    ],
           [282.8427, 500.    , 400.    , 300.    , 600.    ]])
    

where the first column represents the distance between each pair of observations. for instance, the euclidean distance between (100. , 100.) and ( 0. , 100.) is 100.

  • ☑ Support numpy arrays of the same size only.

Todo list

  • ☐ Re-validate the correctness of the distances equations.
  • ☐ Performance tests & vectorization.
  • ☐ Adding new distances.
  • ☐ Adding a squared form of the distance.
  • ☐ Support tuples and list.
  • ☐ Remove numpy from dependencies.
  • ☐ Write more test cases.
  • ☐ Handling Exceptions.
  • ☐ Restructure the docs.