Skip to content

Commit

Permalink
Merge pull request #104 from capitalone/release-v0.7.2
Browse files Browse the repository at this point in the history
Release v0.7.2
  • Loading branch information
Faisal authored Feb 11, 2021
2 parents 1f58830 + 09af9f4 commit 1e8cdb1
Show file tree
Hide file tree
Showing 16 changed files with 482 additions and 283 deletions.
78 changes: 78 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
push:
branches: [develop, master]
pull_request:
branches: [develop, master]

jobs:
build:

runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]
spark: [2.4.3, 2.4.4, 2.4.5, 3.0.0, 3.0.1]
exclude:
- python-version: 3.8
spark: 2.4.3
- python-version: 3.8
spark: 2.4.4
- python-version: 3.8
spark: 2.4.5
- python-version: 3.9
spark: 2.4.3
- python-version: 3.9
spark: 2.4.4
- python-version: 3.9
spark: 2.4.5
env:
PYTHON_VERSION: ${{ matrix.python-version }}
SPARK_VERSION: ${{ matrix.spark }}

steps:
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Setup Java JDK
uses: actions/[email protected]
with:
java-version: 1.8

- name: Install Spark (2.4.x)
if: matrix.spark == '2.4.3' || matrix.spark == '2.4.4' || matrix.spark == '2.4.5'
run: |
wget -q -O spark.tgz https://archive.apache.org/dist/spark/spark-${{ matrix.spark }}/spark-${{ matrix.spark }}-bin-hadoop2.7.tgz
tar xzf spark.tgz
rm spark.tgz
echo "SPARK_HOME=${{ runner.workspace }}/datacompy/spark-${{ matrix.spark }}-bin-hadoop2.7" >> $GITHUB_ENV
echo "${{ runner.workspace }}/datacompy/spark-${{ matrix.spark }}-bin-hadoop2.7/bin" >> $GITHUB_PATH
- name: Install Spark (3.0.x)
if: matrix.spark == '3.0.0' || matrix.spark == '3.0.1'
run: |
wget -q -O spark.tgz https://archive.apache.org/dist/spark/spark-${{ matrix.spark }}/spark-${{ matrix.spark }}-bin-hadoop3.2.tgz
tar xzf spark.tgz
rm spark.tgz
echo "SPARK_HOME=${{ runner.workspace }}/datacompy/spark-${{ matrix.spark }}-bin-hadoop3.2" >> $GITHUB_ENV
echo "${{ runner.workspace }}/datacompy/spark-${{ matrix.spark }}-bin-hadoop3.2/bin" >> $GITHUB_PATH
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install pytest pytest-spark pypandoc
python -m pip install pyspark==${{ matrix.spark }}
python -m pip install .[dev,spark]
- name: Test with pytest
run: |
python -m pytest tests/
14 changes: 9 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
repos:
- repo: https://github.com/psf/black
rev: stable
- repo: https://github.com/psf/black
rev: 20.8b1
hooks:
- id: black
language_version: python3.7
types: [python]
- id: black
types: [file, python]
language_version: python3.7
- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.7.0
hooks:
- id: isort
24 changes: 0 additions & 24 deletions .travis.yml

This file was deleted.

1 change: 1 addition & 0 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @fdosani @elzzhu @jborchma
1 change: 0 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
include README.rst
include LICENSE
include requirements.txt
include test-requirements.txt
17 changes: 5 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.. image:: https://travis-ci.org/capitalone/datacompy.svg?branch=master
:target: https://travis-ci.org/capitalone/datacompy
.. image:: https://github.com/capitalone/datacompy/workflows/Python%20package/badge.svg
:target: https://github.com/capitalone/datacompy/actions
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/ambv/black

Expand Down Expand Up @@ -233,15 +233,8 @@ Using SparkCompare on Databricks
Contributors
------------

We welcome your interest in Capital One’s Open Source Projects (the "Project").
Any Contributor to the project must accept and sign a CLA indicating agreement to
the license terms. Except for the license granted in this CLA to Capital One and
to recipients of software distributed by Capital One, you reserve all right, title,
and interest in and to your contributions; this CLA does not impact your rights to
use your own contributions for any other purpose.
We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to
sign the `Contributor License Agreement (CLA) <https://cla-assistant.io/capitalone/datacompy>`_.

- `Link to Individual CLA <https://docs.google.com/forms/d/19LpBBjykHPox18vrZvBbZUcK6gQTj7qv1O5hCduAZFU/viewform>`_
- `Link to Corporate CLA <https://docs.google.com/forms/d/e/1FAIpQLSeAbobIPLCVZD_ccgtMWBDAcN68oqbAJBQyDTSAQ1AkYuCp_g/viewform>`_

This project adheres to the `Open Source Code of Conduct <https://developer.capitalone.com/resources/code-of-conduct>`_.
This project adheres to the `Open Source Code of Conduct <https://developer.capitalone.com/resources/code-of-conduct/>`_.
By participating, you are expected to honor this code.
2 changes: 1 addition & 1 deletion datacompy/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = "0.7.1"
__version__ = "0.7.2"
19 changes: 10 additions & 9 deletions datacompy/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -571,14 +571,15 @@ def report(self, sample_count=10):
].to_string()
report += "\n\n"

report += "Sample Rows with Unequal Values\n"
report += "-------------------------------\n"
report += "\n"
for sample in match_sample:
report += sample.to_string()
report += "\n\n"

if self.df1_unq_rows.shape[0] > 0:
if sample_count > 0:
report += "Sample Rows with Unequal Values\n"
report += "-------------------------------\n"
report += "\n"
for sample in match_sample:
report += sample.to_string()
report += "\n\n"

if min(sample_count, self.df1_unq_rows.shape[0]) > 0:
report += "Sample Rows Only in {} (First 10 Columns)\n".format(self.df1_name)
report += "---------------------------------------{}\n".format("-" * len(self.df1_name))
report += "\n"
Expand All @@ -587,7 +588,7 @@ def report(self, sample_count=10):
report += self.df1_unq_rows.sample(unq_count)[columns].to_string()
report += "\n\n"

if self.df2_unq_rows.shape[0] > 0:
if min(sample_count, self.df2_unq_rows.shape[0]) > 0:
report += "Sample Rows Only in {} (First 10 Columns)\n".format(self.df2_name)
report += "---------------------------------------{}\n".format("-" * len(self.df2_name))
report += "\n"
Expand Down
52 changes: 39 additions & 13 deletions docs/source/developer_instructions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,19 @@ Guidance for developers.
Pre-Commit Hooks
----------------

We use the excellent `pre-commit <https://pre-commit.com/>`_ to run the excellent
`black <https://github.com/ambv/black>`_ on all changes before commits. ``pre-commit`` is included
in the test requirements below, and you'll have to run ``pre-commit install`` once per environment
before committing changes, or else manually install ``black`` and run it. If you have ``pre-commit``
installed, trying to commit a change will first run black against any changed Python files, and force
you to add/commit any changes.
We use the excellent `pre-commit <https://pre-commit.com/>`_ to run several hooks on all changes before commits.
``pre-commit`` is included in the ``dev`` extra installs. You'll have to run ``pre-commit install`` once per environment
before committing changes.

The reason behind running black as a pre-commit hook is to let a machine make style decisions, based
on the collective wisdom of the Python community. The only change made from the default black setup
is to allow lines up to 100 characters long.
The reason behind running black, isort, and others as a pre-commit hook is to let a machine make style decisions, based
on the collective wisdom of the Python community.

Generating Documentation
------------------------

You will need to ``pip install`` the test requirements::
You will need to ``pip install`` the ``dev`` requirements::

pip install -r test-requirements.txt
pip install -e .[dev]

Then from the root of the repo you can type::

Expand Down Expand Up @@ -50,8 +46,8 @@ Run ``python -m pytest`` to run all unittests defined in the subfolder
Management of Requirements
--------------------------

Requirements of the project should be added to ``requirements.txt``. Optional
requirements used only for testing are added to ``test-requirements.txt``.
Requirements of the project should be added to ``requirements.txt``. Optional requirements used only for testing,
documentation, or code quality are added to ``setup.py`` and ``EXTRAS_REQUIRE``


Release Guide
Expand All @@ -74,3 +70,33 @@ disconnected and independent from the code: ``git checkout --orphan gh-pages``.

The repo has a ``Makefile`` in the root folder which has helper commands such as ``make sphinx``, and
``make ghpages`` to help streamline building and pushing docs once they are setup right.


Generating distribution archives (PyPI)
---------------------------------------

After each release the package will need to be uploaded to PyPi. The instructions below are taken
from `packaging.python.org <https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives>`_

Update / Install ``setuptools``, ``wheel``, and ``twine``::

pip install --upgrade setuptools wheel twine

Generate distributions::

python setup.py sdist bdist_wheel

Under the ``dist`` folder you should have something as follows::

dist/
datacompy-0.1.0-py3-none-any.whl
datacompy-0.1.0.tar.gz


Finally upload to PyPi::

# test pypi
twine upload --repository-url https://test.pypi.org/legacy/ dist/*

# real pypi
twine upload dist/*
12 changes: 5 additions & 7 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,15 @@ PyPI (basic)

A Conda environment or virtual environment is highly recommended:

conda (installs dependencies from Conda)
conda (installs dependencies from Conda Forge)
----------------------------------------

::

conda create --name test python=3
conda create --name test python=3.7
source activate test
git clone https://github.com/capitalone/datacompy.git
cd datacompy
conda install --file requirements.txt
pip install git+https://github.com/capitalone/datacompy.git
conda config --add channels conda-forge
conda install datacompy


virtualenv (install dependencies from PyPI)
Expand All @@ -38,5 +36,5 @@ virtualenv (install dependencies from PyPI)
virtualenv env
source env/bin/activate
pip install --upgrade setuptools pip
pip install git+https://github.com/capitalone/datacompy.git
pip install datacompy

8 changes: 0 additions & 8 deletions pyproject.toml

This file was deleted.

2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
pandas>=0.19.0,!=0.23.*
pandas>=0.25.0
numpy>=1.11.3
6 changes: 6 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[isort]
multi_line_output=3
include_trailing_comma=True
force_grid_wrap=0
use_parentheses=True
line_length=88
34 changes: 31 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from setuptools import setup, find_packages
import os

from setuptools import find_packages, setup

CURR_DIR = os.path.abspath(os.path.dirname(__file__))

with open(os.path.join(CURR_DIR, "README.rst"), encoding="utf-8") as file_open:
Expand All @@ -9,11 +10,38 @@
with open("requirements.txt", "r") as requirements_file:
raw_requirements = requirements_file.read().strip().split("\n")

INSTALL_REQUIRES = [line for line in raw_requirements if not (line.startswith("#") or line == "")]
INSTALL_REQUIRES = [
line for line in raw_requirements if not (line.startswith("#") or line == "")
]


exec(open("datacompy/_version.py").read())


# No versioning on extras for dev, always grab the latest
EXTRAS_REQUIRE = {
"spark": ["pyspark>=2.2.0"],
"docs": ["sphinx", "sphinx_rtd_theme"],
"tests": [
"pytest",
"pytest-cov",
],
"qa": [
"pre-commit",
"black",
"isort",
],
"build": ["twine", "wheel"],
}

EXTRAS_REQUIRE["dev"] = (
EXTRAS_REQUIRE["tests"]
+ EXTRAS_REQUIRE["docs"]
+ EXTRAS_REQUIRE["qa"]
+ EXTRAS_REQUIRE["build"]
)


setup(
name="datacompy",
version=__version__,
Expand All @@ -23,7 +51,7 @@
license="Apache-2.0",
packages=find_packages(),
install_requires=INSTALL_REQUIRES,
extras_require={"spark": ["pyspark>=2.2.0"]},
extras_require=EXTRAS_REQUIRE,
package_data={"": ["templates/*"]},
zip_safe=False,
)
7 changes: 0 additions & 7 deletions test-requirements.txt

This file was deleted.

Loading

0 comments on commit 1e8cdb1

Please sign in to comment.