How to select top-k items from each group in SQL

Here is an analytical query that you (and I) will often need to do if you work in e-commerce, marketing or similar domain. It answers the question, within each group of items (e.g. partitioned by territory, age groups or something else) what are the top-k items for some utility function over the items (e.g. the number of units sold)?

Assume we have a table, aggregated_sales(id INT, item TEXT, country TEXT, total_sold INT). Here is a sample:

ID  ITEM              COUNTRY  TOTAL_SOLD
1   Basketball        DK       125
2   Basketball        US       432943
3   Leather football  FO       64773 
4   Leather football  DK       56230
5   Boxing gloves     CU       9812

In SQL, here is how to get the top-3 selling items by country:

SELECT id, item, country, total_sold
FROM (
  SELECT
    *,
    row_number() OVER (PARTITION BY country ORDER BY total_sold DESC) as rn
  FROM aggregated_sales
) t
WHERE rn = 1;

Bonus info: the table sample was generated from a CSV file with the command column -s , -t filename.csv.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.