NumPy
This section is devoted to NumPy tricks.
Sliding window (1D)
NumPy seems to lack (or I can’t find) a simple sliding window function for arrays, so I’ve implemented this one:
def sliding_1d(a, size, stride=1):
last_i = len(a) - size
num_seq = (last_i / stride) + 1
assert(num_seq == np.round(num_seq))
idx = np.arange(size)[None, :] + stride * np.arange(int(num_seq))[:, None]
return a[idx]
Use it like this:
a = np.arange(10)
# a = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
sliding_1d(a, 2, stride=2)
# array([[0, 1],
# [2, 3],
# [4, 5],
# [6, 7],
# [8, 9]])
Pandas
This section is devoted to Pandas tricks.
Basics
Imports:
import pandas as pd
import numpy as np
Shuffle rows:
df.reindex(np.random.permutation(df.index))
Make index incremental again (e.g. after a shuffle):
df.reset_index(drop=True)
Drop rows that contain NaN:
df.dropna()
Split dataframe in half:
df1,df2 = np.array_split(df, 2)
Restructuring
From matrix to i,j,value:
from string import ascii_lowercase
import numpy as np
# Create nrows x ncols dataframe
nrows = 5
ncols = 3
a = np.random.rand(nrows,ncols)
df = pd.DataFrame(a)
df.columns = list(ascii_lowercase)[0:ncols]
df.index = list(ascii_lowercase.upper())[0:nrows]
# Restructure to i,j,value dataframe
df.T.unstack().reset_index(name='value')
# Note that I use .T because I like row-by-row enumeration
Merging
Basic merge on shared columns (inner, outer, left, right):
df1 = pd.DataFrame({'A': [1,2,3], 'B': [1,2,3]})
"""
A B
0 1 1
1 2 2
2 3 3
"""
df2 = pd.DataFrame({'A': [3,4,5], 'C': [1,2,3]})
"""
A C
0 3 1
1 4 2
2 5 3
"""
df1.merge(df2) # inner (default)
"""
A B C
0 3 3 1
"""
df1.merge(df2, how='outer')
"""
A B C
0 1 1 NaN
1 2 2 NaN
2 3 3 1
3 4 NaN 2
4 5 NaN 3
"""
df1.merge(df2, how='left')
"""
A B C
0 1 1 NaN
1 2 2 NaN
2 3 3 1
"""
df1.merge(df2, how='right')
"""
A B C
0 3 3 1
1 4 NaN 2
2 5 NaN 3
"""