How to merge two disjoint random samples?

The problem: Given two random samples, s1 and s2, of size k over two disjoint populations, p1 and p2, how to combine the two k-sized random samples into one k-sized random sample over p1 ∪ p2?

The solution: k times, draw an element s1 ∪ s2; with probability d1 = |p1| / |p1 ∪ p2|, draw the next element from p1; with probability d2 = 1 – d1 draw the next element from p2.

(the solution was on stackoverflow)

In python:

import random
import numpy
 
# sizes
e1 = 1000
e2 = 1000000
 
# populations
p1 = xrange(e1)
p2 = xrange(e1, e2)
 
# sample size
k = 500
 
# random samples
s1 = random.sample(p1, k)
s2 = random.sample(p2, k)
 
# merge samples
merge = []
for i in range(k):
  if s1 and s2:
    merge.append(s1.pop() if random.random < len(p1) / float(len(p1)+len(p2)) else s2.pop())
  elif s1:
    merge.append(s1.pop())
  else:
    merge.append(s2.pop())
 
# Validate
hist = numpy.histogram(merge, bins=[0,500000,1000000])
# The two bins should be roughly equal, i.e. the error should be small.
print abs(hist[0][0] - hist[0][1]) / float(k)
 
# alternatively, use filter to count values below 500K
print abs(len(filter(lambda x: x<500000, merge)) - 250) / 500.0

Ordering cheat sheet

Non-strict orders: ≤

The symbol ≤ denotes a generalization of “less than or equal”, and it defines either a partial or total ordering over a set P (in the table below a,b ∈ P):

Constraint (Non-strict) partial order (Non-strict) total order
Reflexivity: a ≤ a x x
Antisymmetry: if a ≤ b and b ≤ a then a = b x x
Transitivity: if a ≤ b and b ≤ c then a ≤ c x x
Totality: either a ≤ b or b ≤ a x

Strict orders: <

The symbol < denotes a generalization of “less than”, and it defines either a partial or total ordering over a set P (in the table below a,b ∈ P):

Constraint (Strict) partial order (Strict) total order
Irreflexivity: ¬(a < a) x x
Asymmetry: if a < b then ¬(b < a) x x
Transitivity: if a < b and b < c then a < c x x
Totality: either a < b or b < a x

Note the difference between asymmetry and antisymmetry.

Type of relation Constraint
Asymmetric relation if a < b then ¬(b < a)
Antisymmetric relation
(two equivalent definitions)
if a ≤ b and b ≤ a then a = b
if a ≠ b then ¬(b ≤ a)

Metrics cheat sheet

Question: When can a distance function d(x,y) be called metric, pseudo-metric, quasi-metric or semi-metric?

Constraint Metric Pseudo Quasi Semi
Non-negativity: d(x,y) ≥ 0 x x x x
Identity of indiscernibles: d(x,y)=0 ⇒ x=y x x x
Symmetry: d(x,y) = d(y,x) x x x
Triangle inequality: d(x,z) ≤ d(x,y)+d(y,z) x x x

Table derived from Wikipedia article on metric spaces: http://en.wikipedia.org/wiki/Metric_(mathematics)