(Integer) Linear Programming in Python

Step one:

brew install glpk
pip install pulp

Step two:

from pulp import * 
 
prob = LpProblem("test1", LpMinimize) 
 
# Variables 
x = LpVariable("x", 0, 4, cat="Integer") 
y = LpVariable("y", -1, 1, cat="Integer") 
z = LpVariable("z", 0, cat="Integer") 
 
# Objective 
prob += x + 4*y + 9*z 
 
# Constraints 
prob += x+y <= 5 
prob += x+z >= 10 
prob += -y+z == 7 
 
GLPK().solve(prob) 
 
# Solution 
for v in prob.variables():
    print v.name, "=", v.varValue 
 
print "objective=", value(prob.objective)

In the documentation there are further examples, e.g. one to minimise the cost of producing cat food.

How to select top-k items from each group in SQL

Here is an analytical query that you (and I) will often need to do if you work in e-commerce, marketing or similar domain. It answers the question, within each group of items (e.g. partitioned by territory, age groups or something else) what are the top-k items for some utility function over the items (e.g. the number of units sold)?

Assume we have a table, aggregated_sales(id INT, item TEXT, country TEXT, total_sold INT). Here is a sample:

ID  ITEM              COUNTRY  TOTAL_SOLD
1   Basketball        DK       125
2   Basketball        US       432943
3   Leather football  FO       64773 
4   Leather football  DK       56230
5   Boxing gloves     CU       9812

In SQL, here is how to get the top-3 selling items by country:

SELECT id, item, country, total_sold
FROM (
  SELECT
    *,
    row_number() OVER (PARTITION BY country ORDER BY total_sold DESC) as rn
  FROM aggregated_sales
) t
WHERE rn = 1;

Bonus info: the table sample was generated from a CSV file with the command column -s , -t filename.csv.