Pandas Tricks

Basics

Imports:

import pandas as pd
import numpy as np

Shuffle rows:

df.reindex(np.random.permutation(df.index))

Make index incremental again (e.g. after a shuffle):

df.reset_index(drop=True)

Drop rows that contain NaN:

df.dropna()

Split dataframe in half:

df1,df2 = np.array_split(df, 2)

Restructuring

From matrix to i,j,value:

from string import ascii_lowercase
import numpy as np
 
# Create nrows x ncols dataframe
nrows = 5
ncols = 3
a = np.random.rand(nrows,ncols)
df = pd.DataFrame(a)
df.columns = list(ascii_lowercase)[0:ncols]
df.index = list(ascii_lowercase.upper())[0:nrows]
 
# Restructure to i,j,value dataframe
df.T.unstack().reset_index(name='value')
 
# Note that I use .T because I like row-by-row enumeration

Merging

Basic merge on shared columns (inner, outer, left, right):

df1 = pd.DataFrame({'A': [1,2,3], 'B': [1,2,3]})
"""
   A  B
0  1  1
1  2  2
2  3  3
"""
 
df2 = pd.DataFrame({'A': [3,4,5], 'C': [1,2,3]})
"""
   A  C
0  3  1
1  4  2
2  5  3
"""
 
df1.merge(df2)  # inner (default)
"""
   A  B  C
0  3  3  1
"""
 
df1.merge(df2, how='outer')
"""
   A   B   C
0  1   1 NaN
1  2   2 NaN
2  3   3   1
3  4 NaN   2
4  5 NaN   3
"""
 
df1.merge(df2, how='left')
"""
   A  B   C
0  1  1 NaN
1  2  2 NaN
2  3  3   1
"""
 
df1.merge(df2, how='right')
"""
   A   B  C
0  3   3  1
1  4 NaN  2
2  5 NaN  3
"""

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.