One dimentional ndarray
(N-dimentional array) with axis labels.
index
: Pandas representsSeries
as a numerical array.index
is stored seperately.
years = [1776, 1867, 1821]
countries = ['USA', 'Canada', 'Mexico']
indep = pd.Series(data=years, index=countries)
indep
USA 1776 Canada 1867 Mexico 1821 dtype: int64
indep['USA'] # or indep[0]
1776
pd.Series({ 'Sam': 5, 'Frank': 10, 'Spike': 7 })
Sam 5 Frank 10 Spike 7 dtype: int64
q1 = { 'Japan': 80, 'China': 450, 'India': 200, 'USA': 250 }
q2 = { 'Brazil': 100,'China': 500, 'India': 210,'USA': 260 }
sales_Q1 = pd.Series(q1)
sales_Q2 = pd.Series(q2)
sales_Q2.keys()
Index(['Brazil', 'China', 'India', 'USA'], dtype='object')
Same as in numpy.
sales_Q1 * 3
Japan 240 China 1350 India 600 USA 750 dtype: int64
Pandas uses NaN
for missing values.
sales_Q1 + sales_Q2
Brazil NaN China 950.0 India 410.0 Japan NaN USA 510.0 dtype: float64
- Use
.add()
(or other operator methods) to specify afill_value
.
sales_Q1.add(sales_Q2, fill_value=0)
Brazil 100.0 China 950.0 India 410.0 Japan 80.0 USA 510.0 dtype: float64
A group of Series
objects that share the same index.
d = np.random.randint(0, 100, (4, 3))
i = np.array(['CA', 'NY', 'AZ', 'TX'], dtype='U2')
c = np.array(['Jan', 'Feb', 'Mar'], dtype='U3')
df = pd.DataFrame(d, index=i, columns=c)
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 4 entries, CA to TX Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Jan 4 non-null int64 1 Feb 4 non-null int64 2 Mar 4 non-null int64 dtypes: int64(3) memory usage: 128.0+ bytes
columns
df.columns
Index(['Jan', 'Feb', 'Mar'], dtype='object')
index
df.index
Index(['CA', 'NY', 'AZ', 'TX'], dtype='object')
head
,tail
First or last 5 rows. Count may be changed.info
Gives information about a dataframe.describe
Gives basic statistical information on numerical columns.
df.describe()
Jan Feb Mar count 4.000000 4.000000 4.00000 mean 14.000000 39.500000 44.25000 std 8.406347 31.171568 18.20943 min 5.000000 4.000000 23.00000 25% 9.500000 28.000000 35.75000 50% 13.000000 37.000000 43.50000 75% 17.500000 48.500000 52.00000 max 25.000000 80.000000 67.00000
transpose
Transpose (or flip) columns and rows.
df.describe().transpose()
count mean std min 25% 50% 75% max Jan 4.0 14.00 8.406347 5.0 9.50 13.0 17.5 25.0 Feb 4.0 39.50 31.171568 4.0 28.00 37.0 48.5 80.0 Mar 4.0 44.25 18.209430 23.0 35.75 43.5 52.0 67.0
drop
Drops a row or a column depending on the axis.
df.drop('Mar', axis=1) # axis=0 row, axis=1 column
# df.drop('Mar', axis=1, inplace=True)
Jan Feb CA 5 4 NY 25 36 AZ 11 80 TX 15 38
inplace
may be used to permenantly delete. However, there are signs it will
be deprecated. Assigning the resulting dataframe to itself is more preferable.
- Grab a single column. Returns a pandas Series.
df['Jan']
CA 5 NY 25 AZ 11 TX 15 Name: Jan, dtype: int64
type(df['Jan'])
<class 'pandas.core.series.Series'>
- Grab multiple columns.
df[['Jan', 'Feb']]
Jan Feb CA 5 4 NY 25 36 AZ 11 80 TX 15 38
type(df[['Jan', 'Feb']])
<class 'pandas.core.frame.DataFrame'>
df['Z Feb'] = np.round((df['Feb'] - df['Feb'].mean()) / df['Feb'].std(), 2)
df['Z Feb']
CA -1.14 NY -0.11 AZ 1.30 TX -0.05 Name: Z Feb, dtype: float64
df = pd.read_csv('tips.csv')
df.head()
total_bill tip sex smoker day ... size price_per_person Payer Name CC Number Payment ID 0 16.99 1.01 Female No Sun ... 2 8.49 Christy Cunningham 3560325168603410 Sun2959 1 10.34 1.66 Male No Sun ... 3 3.45 Douglas Tucker 4478071379779230 Sun4608 2 21.01 3.50 Male No Sun ... 3 7.00 Travis Walters 6011812112971322 Sun4458 3 23.68 3.31 Male No Sun ... 2 11.84 Nathaniel Harris 4676137647685994 Sun5260 4 24.59 3.61 Female No Sun ... 4 6.15 Tonya Carter 4832732618637221 Sun2251
:
[5 rows x 11 columns]
Any primary key may be used as an index.
dfp = df.set_index('Payment ID')
dfp.head()
total_bill tip sex smoker ... size price_per_person Payer Name CC Number Payment ID ... Sun2959 16.99 1.01 Female No ... 2 8.49 Christy Cunningham 3560325168603410 Sun4608 10.34 1.66 Male No ... 3 3.45 Douglas Tucker 4478071379779230 Sun4458 21.01 3.50 Male No ... 3 7.00 Travis Walters 6011812112971322 Sun5260 23.68 3.31 Male No ... 2 11.84 Nathaniel Harris 4676137647685994 Sun2251 24.59 3.61 Female No ... 4 6.15 Tonya Carter 4832732618637221
:
[5 rows x 10 columns]
To reset index.
dfp.reset_index().head()
Payment ID total_bill tip sex smoker ... time size price_per_person Payer Name CC Number 0 Sun2959 16.99 1.01 Female No ... Dinner 2 8.49 Christy Cunningham 3560325168603410 1 Sun4608 10.34 1.66 Male No ... Dinner 3 3.45 Douglas Tucker 4478071379779230 2 Sun4458 21.01 3.50 Male No ... Dinner 3 7.00 Travis Walters 6011812112971322 3 Sun5260 23.68 3.31 Male No ... Dinner 2 11.84 Nathaniel Harris 4676137647685994 4 Sun2251 24.59 3.61 Female No ... Dinner 4 6.15 Tonya Carter 4832732618637221
:
[5 rows x 11 columns]
- Integer based indexing.
df.iloc[0]
total_bill 16.99 tip 1.01 sex Female smoker No day Sun time Dinner size 2 price_per_person 8.49 Payer Name Christy Cunningham CC Number 3560325168603410 Payment ID Sun2959 Name: 0, dtype: object
- Label based indexing.
dfp.loc["Sun2959"]
total_bill 16.99 tip 1.01 sex Female smoker No day Sun time Dinner size 2 price_per_person 8.49 Payer Name Christy Cunningham CC Number 3560325168603410 Name: Sun2959, dtype: object
- Indexing multiple rows.
dfp.iloc[0:4] # pythonic slicing may be used
dfp.loc[['Sun2959', 'Sun5260']].transpose()
Payment ID Sun2959 Sun5260 total_bill 16.99 23.68 tip 1.01 3.31 sex Female Male smoker No No day Sun Sun time Dinner Dinner size 2 2 price_per_person 8.49 11.84 Payer Name Christy Cunningham Nathaniel Harris CC Number 3560325168603410 4676137647685994
.append()
method has been deprecated. Use.concat()
instead.
row = df.iloc[0]
pd.concat([df, row.to_frame().T]).head()
total_bill tip sex smoker day ... size price_per_person Payer Name CC Number Payment ID 0 16.99 1.01 Female No Sun ... 2 8.49 Christy Cunningham 3560325168603410 Sun2959 1 10.34 1.66 Male No Sun ... 3 3.45 Douglas Tucker 4478071379779230 Sun4608 2 21.01 3.5 Male No Sun ... 3 7.0 Travis Walters 6011812112971322 Sun4458 3 23.68 3.31 Male No Sun ... 2 11.84 Nathaniel Harris 4676137647685994 Sun5260 4 24.59 3.61 Female No Sun ... 4 6.15 Tonya Carter 4832732618637221 Sun2251
:
[5 rows x 11 columns]