Category: Mathematics

  • How to merge two disjoint random samples?

    The problem: Given two random samples, s1 and s2, of size k over two disjoint populations, p1 and p2, how to combine the two k-sized random samples into one k-sized random sample over p1 ∪ p2?

    The solution: k times, draw an element s1 ∪ s2; with probability d1 = |p1| / |p1 ∪ p2|, draw the next element from p1; with probability d2 = 1 – d1 draw the next element from p2.

    (the solution was on stackoverflow)

    In python:

    import random
    import numpy
    
    # sizes
    e1 = 1000
    e2 = 1000000
    
    # populations
    p1 = xrange(e1)
    p2 = xrange(e1, e2)
    
    # sample size
    k = 500
    
    # random samples
    s1 = random.sample(p1, k)
    s2 = random.sample(p2, k)
    
    # merge samples
    merge = []
    for i in range(k):
      if s1 and s2:
        merge.append(s1.pop() if random.random < len(p1) / float(len(p1)+len(p2)) else s2.pop())
      elif s1:
        merge.append(s1.pop())
      else:
        merge.append(s2.pop())
    
    # Validate
    hist = numpy.histogram(merge, bins=[0,500000,1000000])
    # The two bins should be roughly equal, i.e. the error should be small.
    print abs(hist[0][0] - hist[0][1]) / float(k)
    
    # alternatively, use filter to count values below 500K
    print abs(len(filter(lambda x: x<500000, merge)) - 250) / 500.0
    
  • Ordering cheat sheet

    Non-strict orders: ≤

    The symbol ≤ denotes a generalization of "less than or equal", and it defines either a partial or total ordering over a set P (in the table below a,b ∈ P):

    Constraint (Non-strict) partial order (Non-strict) total order
    Reflexivity: a ≤ a x x
    Antisymmetry: if a ≤ b and b ≤ a then a = b x x
    Transitivity: if a ≤ b and b ≤ c then a ≤ c x x
    Totality: either a ≤ b or b ≤ a x

    Strict orders: <

    The symbol < denotes a generalization of "less than", and it defines either a partial or total ordering over a set P (in the table below a,b ∈ P):

    Constraint (Strict) partial order (Strict) total order
    Irreflexivity: ¬(a < a) x x
    Asymmetry: if a < b then ¬(b < a) x x
    Transitivity: if a < b and b < c then a < c x x
    Totality: either a < b or b < a x

    Note the difference between asymmetry and antisymmetry.

    Type of relation Constraint
    Asymmetric relation if a < b then ¬(b < a)
    Antisymmetric relation
    (two equivalent definitions)
    if a ≤ b and b ≤ a then a = b
    if a ≠ b then ¬(b ≤ a)
  • Metrics cheat sheet

    Question: When can a distance function d(x,y) be called metric, pseudo-metric, quasi-metric or semi-metric?

    Constraint Metric Pseudo Quasi Semi
    Non-negativity: d(x,y) ≥ 0 x x x x
    Identity of indiscernibles: d(x,y)=0 ⇒ x=y x x x
    Symmetry: d(x,y) = d(y,x) x x x
    Triangle inequality: d(x,z) ≤ d(x,y)+d(y,z) x x x

    Table derived from Wikipedia article on metric spaces: http://en.wikipedia.org/wiki/Metric_(mathematics)