-
Yummy 3D plots
Very nice interactive 3D plots with Plotly. import plotly.graph_objects as go import numpy as np import pandas as pd # Read data from a csv Z = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv').values # Actually not necessary to provide X and Y… X = np.linspace(0, 1000, Z.shape[0]) Y = np.linspace(0, 1000, Z.shape[1]) fig = go.Figure(data=[go.Surface(x=X, y=Y, z=Z)]) fig.update_layout(title='Mt Bruno Elevation', […]
-
How to fill missing dates in Pandas
Create a pandas dataframe with a date column: import pandas as pd import datetime TODAY = datetime.date.today() ONE_WEEK = datetime.timedelta(days=7) ONE_DAY = datetime.timedelta(days=1) df = pd.DataFrame({‘dt’: [TODAY-ONE_WEEK, TODAY-3*ONE_DAY, TODAY], ‘x’: [42, 45,127]}) The dates have gaps: dt x 0 2018-11-19 42 1 2018-11-23 45 2 2018-11-26 127 Now, fill in the missing dates: r = […]
-
Cosine similarity in Python
Cosine similarity is the normalised dot product between two vectors. I guess it is called “cosine” similarity because the dot product is the product of Euclidean magnitudes of the two vectors and the cosine of the angle between them. If you want, read more about cosine similarity and dot products on Wikipedia. Here is how […]
-
How to explore two-dimensional data with a heatmap
https://seaborn.pydata.org/generated/seaborn.heatmap.html
-
How to display a Choropleth map in Jupyter Notebook
Here is the code: %matplotlib inline import geopandas as gpd import matplotlib as mpl # make rcParams available (optional) mpl.rcParams[‘figure.dpi’]= 144 # increase dpi (optional) world = gpd.read_file(gpd.datasets.get_path(“naturalearth_lowres”)) world = world[world.name != ‘Antarctica’] # remove Antarctica (optional) world[‘gdp_per_person’] = world.gdp_md_est / world.pop_est g = world.plot(column=’gdp_per_person’, cmap=’OrRd’, scheme=’quantiles’) g.set_facecolor(‘#A8C5DD’) # make the ocean blue (optional) Here […]
-
(Integer) Linear Programming in Python
Step one: brew install glpk pip install pulp Step two: from pulp import * prob = LpProblem(“test1”, LpMinimize) # Variables x = LpVariable(“x”, 0, 4, cat=”Integer”) y = LpVariable(“y”, -1, 1, cat=”Integer”) z = LpVariable(“z”, 0, cat=”Integer”) # Objective prob += x + 4*y + 9*z # Constraints prob += x+y = 10 prob += […]
-
How to select top-k items from each group in SQL
Here is an analytical query that you (and I) will often need to do if you work in e-commerce, marketing or similar domain. It answers the question, within each group of items (e.g. partitioned by territory, age groups or something else) what are the top-k items for some utility function over the items (e.g. the […]