Skip to content

Commit 3fa510b

Browse files
author
Matteo Felici
committed
Merge branch 'master' into format
2 parents 3e6d836 + 5fdd6f5 commit 3fa510b

File tree

266 files changed

+7420
-5952
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

266 files changed

+7420
-5952
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ repos:
33
rev: 19.10b0
44
hooks:
55
- id: black
6-
language_version: python3.7
6+
language_version: python3
77
- repo: https://gitlab.com/pycqa/flake8
88
rev: 3.7.7
99
hooks:

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
## What is it?
2222

23-
**pandas** is a Python package providing fast, flexible, and expressive data
23+
**pandas** is a Python package that provides fast, flexible, and expressive data
2424
structures designed to make working with "relational" or "labeled" data both
2525
easy and intuitive. It aims to be the fundamental high-level building block for
2626
doing practical, **real world** data analysis in Python. Additionally, it has
@@ -154,11 +154,11 @@ For usage questions, the best place to go to is [StackOverflow](https://stackove
154154
Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata).
155155

156156
## Discussion and Development
157-
Most development discussion is taking place on github in this repo. Further, the [pandas-dev mailing list](https://mail.python.org/mailman/listinfo/pandas-dev) can also be used for specialized discussions or design issues, and a [Gitter channel](https://gitter.im/pydata/pandas) is available for quick development related questions.
157+
Most development discussions take place on github in this repo. Further, the [pandas-dev mailing list](https://mail.python.org/mailman/listinfo/pandas-dev) can also be used for specialized discussions or design issues, and a [Gitter channel](https://gitter.im/pydata/pandas) is available for quick development related questions.
158158

159159
## Contributing to pandas [![Open Source Helpers](https://www.codetriage.com/pandas-dev/pandas/badges/users.svg)](https://www.codetriage.com/pandas-dev/pandas)
160160

161-
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
161+
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
162162

163163
A detailed overview on how to contribute can be found in the **[contributing guide](https://pandas.pydata.org/docs/dev/development/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
164164

asv_bench/benchmarks/arithmetic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -466,7 +466,7 @@ def setup(self, offset):
466466
self.rng = rng
467467

468468
def time_apply_index(self, offset):
469-
offset.apply_index(self.rng)
469+
self.rng + offset
470470

471471

472472
class BinaryOpsMultiIndex:

asv_bench/benchmarks/indexing.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,9 @@ def time_boolean_rows_boolean(self):
158158
class DataFrameNumericIndexing:
159159
def setup(self):
160160
self.idx_dupe = np.array(range(30)) * 99
161-
self.df = DataFrame(np.random.randn(10000, 5))
161+
self.df = DataFrame(np.random.randn(100000, 5))
162162
self.df_dup = concat([self.df, 2 * self.df, 3 * self.df])
163-
self.bool_indexer = [True] * 5000 + [False] * 5000
163+
self.bool_indexer = [True] * 50000 + [False] * 50000
164164

165165
def time_iloc_dups(self):
166166
self.df_dup.iloc[self.idx_dupe]

asv_bench/benchmarks/io/json.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,18 @@ def time_read_json_lines(self, index):
5353
def time_read_json_lines_concat(self, index):
5454
concat(read_json(self.fname, orient="records", lines=True, chunksize=25000))
5555

56+
def time_read_json_lines_nrows(self, index):
57+
read_json(self.fname, orient="records", lines=True, nrows=25000)
58+
5659
def peakmem_read_json_lines(self, index):
5760
read_json(self.fname, orient="records", lines=True)
5861

5962
def peakmem_read_json_lines_concat(self, index):
6063
concat(read_json(self.fname, orient="records", lines=True, chunksize=25000))
6164

65+
def peakmem_read_json_lines_nrows(self, index):
66+
read_json(self.fname, orient="records", lines=True, nrows=15000)
67+
6268

6369
class ToJSON(BaseIO):
6470

ci/azure/windows.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
CONDA_PY: "36"
1414
PATTERN: "not slow and not network"
1515

16-
py37_np141:
16+
py37_np18:
1717
ENV_FILE: ci/deps/azure-windows-37.yaml
1818
CONDA_PY: "37"
1919
PATTERN: "not slow and not network"

ci/deps/azure-windows-37.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ dependencies:
2222
- matplotlib=2.2.*
2323
- moto
2424
- numexpr
25-
- numpy=1.14.*
25+
- numpy=1.18.*
2626
- openpyxl
2727
- pyarrow=0.14
2828
- pytables

ci/deps/travis-36-locale.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ dependencies:
2727
- numexpr
2828
- numpy
2929
- openpyxl
30-
- pandas-gbq=0.8.0
30+
- pandas-gbq=0.12.0
3131
- psycopg2=2.6.2
3232
- pymysql=0.7.11
3333
- pytables

doc/source/development/contributing.rst

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,10 @@ want to clone your fork to your machine::
136136
This creates the directory `pandas-yourname` and connects your repository to
137137
the upstream (main project) *pandas* repository.
138138

139+
Note that performing a shallow clone (with ``--depth==N``, for some ``N`` greater
140+
or equal to 1) might break some tests and features as ``pd.show_versions()``
141+
as the version number cannot be computed anymore.
142+
139143
.. _contributing.dev_env:
140144

141145
Creating a development environment
@@ -270,7 +274,7 @@ Creating a Python environment (pip)
270274
If you aren't using conda for your development environment, follow these instructions.
271275
You'll need to have at least Python 3.6.1 installed on your system.
272276

273-
**Unix**/**Mac OS**
277+
**Unix**/**Mac OS with virtualenv**
274278

275279
.. code-block:: bash
276280
@@ -286,7 +290,31 @@ You'll need to have at least Python 3.6.1 installed on your system.
286290
python -m pip install -r requirements-dev.txt
287291
288292
# Build and install pandas
289-
python setup.py build_ext --inplace -j 0
293+
python setup.py build_ext --inplace -j 4
294+
python -m pip install -e . --no-build-isolation --no-use-pep517
295+
296+
**Unix**/**Mac OS with pyenv**
297+
298+
Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
299+
300+
.. code-block:: bash
301+
302+
# Create a virtual environment
303+
# Use an ENV_DIR of your choice. We'll use ~/Users/<yourname>/.pyenv/versions/pandas-dev
304+
305+
pyenv virtualenv <version> <name-to-give-it>
306+
307+
# For instance:
308+
pyenv virtualenv 3.7.6 pandas-dev
309+
310+
# Activate the virtualenv
311+
pyenv activate pandas-dev
312+
313+
# Now install the build dependencies in the cloned pandas repo
314+
python -m pip install -r requirements-dev.txt
315+
316+
# Build and install pandas
317+
python setup.py build_ext --inplace -j 4
290318
python -m pip install -e . --no-build-isolation --no-use-pep517
291319
292320
**Windows**
@@ -312,7 +340,7 @@ should already exist.
312340
python -m pip install -r requirements-dev.txt
313341
314342
# Build and install pandas
315-
python setup.py build_ext --inplace -j 0
343+
python setup.py build_ext --inplace -j 4
316344
python -m pip install -e . --no-build-isolation --no-use-pep517
317345
318346
Creating a branch
@@ -1275,8 +1303,8 @@ Performance matters and it is worth considering whether your code has introduced
12751303
performance regressions. pandas is in the process of migrating to
12761304
`asv benchmarks <https://github.com/spacetelescope/asv>`__
12771305
to enable easy monitoring of the performance of critical pandas operations.
1278-
These benchmarks are all found in the ``pandas/asv_bench`` directory. asv
1279-
supports both python2 and python3.
1306+
These benchmarks are all found in the ``pandas/asv_bench`` directory, and the
1307+
test results can be found `here <https://pandas.pydata.org/speed/pandas/#/>`__.
12801308

12811309
To use all features of asv, you will need either ``conda`` or
12821310
``virtualenv``. For more details please check the `asv installation

doc/source/ecosystem.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,23 @@ A good implementation for Python users is `has2k1/plotnine <https://github.com/h
153153
Spun off from the main pandas library, the `qtpandas <https://github.com/draperjames/qtpandas>`__
154154
library enables DataFrame visualization and manipulation in PyQt4 and PySide applications.
155155

156+
`D-Tale <https://github.com/man-group/dtale>`__
157+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158+
159+
D-Tale is a lightweight web client for visualizing pandas data structures. It
160+
provides a rich spreadsheet-style grid which acts as a wrapper for a lot of
161+
pandas functionality (query, sort, describe, corr...) so users can quickly
162+
manipulate their data. There is also an interactive chart-builder using Plotly
163+
Dash allowing users to build nice portable visualizations. D-Tale can be
164+
invoked with the following command
165+
166+
.. code:: python
167+
168+
import dtale; dtale.show(df)
169+
170+
D-Tale integrates seamlessly with jupyter notebooks, python terminals, kaggle
171+
& Google Colab. Here are some demos of the `grid <http://alphatechadmin.pythonanywhere.com/>`__
172+
and `chart-builder <http://alphatechadmin.pythonanywhere.com/charts/4?chart_type=surface&query=&x=date&z=Col0&agg=raw&cpg=false&y=%5B%22security_id%22%5D>`__.
156173

157174
.. _ecosystem.ide:
158175

@@ -303,6 +320,20 @@ provide a pandas-like and pandas-compatible toolkit for analytics on multi-
303320
dimensional arrays, rather than the tabular data for which pandas excels.
304321

305322

323+
.. _ecosystem.io:
324+
325+
IO
326+
--
327+
328+
`BCPandas <https://github.com/yehoshuadimarsky/bcpandas>`__
329+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
330+
331+
BCPandas provides high performance writes from pandas to Microsoft SQL Server,
332+
far exceeding the performance of the native ``df.to_sql`` method. Internally, it uses
333+
Microsoft's BCP utility, but the complexity is fully abstracted away from the end user.
334+
Rigorously tested, it is a complete replacement for ``df.to_sql``.
335+
336+
306337
.. _ecosystem.out-of-core:
307338

308339
Out-of-core

0 commit comments

Comments
 (0)