N choose K, with T targets. Probability of choosing at least one target
Looking at n choose k, but you want to get at least one of a target t. For example you have a bag of blue marbles, which contains some number of red marbles. N is the total number of marbles in the bag. K is the number of marbles being chosen. T is the number of red marbles.
This is probably well known, but I worked through it a few different ways anyway…
First a simulation (C++):
#include <iostream>
#include <vector>
using namespace std;
int main(int argc,char **argv) {
int n = 3000;
int k = 50;
int t = 30; // targets
vector<int> items(n,0);
for(int i=0;i<t;i++) {
int itm=rand()%items.size();
for(;items[itm] !=0;) itm = rand()%items.size();
items[itm] = 1;
//items[i] = 1;
}
//cout << "items: ";
//for(int i=0;i<items.size();i++) cout << items[i] << " ";
//cout << endl;
vector<int> count(t,0);
// rounds of choosing
int rounds=100000;
for(int j=0;j<rounds;j++) {
cout << "round: " << j << endl;
int tcount=0;
vector<int> selected;
for(int i=0;i<k;i++) {
int itm = rand()%items.size();
for(;std::count(selected.begin(), selected.end(), itm);) itm = rand()%items.size();
selected.push_back(itm);
//cout << "selected: " << itm << " " << items[itm] << endl;
if(i%100000==0) cout << i << endl;
}
//count items on target
for(int i=0;i<selected.size();i++) {
if(items[selected[i]] == 1) tcount++;
}
cout << "tcount: " << tcount << endl;
count[tcount]++;
}
int morezero=0;
for(int i=0;i<t;i++) {
if((count[i] > 0) || (i==0)) cout << i << " " << count[i] << endl;
if(i>=1) morezero+=count[i];
}
cout << "One or more: " << morezero << endl;
cout << "One or more fraction: " << (double)morezero/rounds << endl;
}
Next I calculated this by looking at the number of combinations calculating the fraction of combinations containing at least one target and total number of combinations:
from math import comb
import sys
def totalcomb(n, k, t):
return comb(n,k)
def targetcomb(n, k, t):
total = 0
total += comb(t,k)
for i in range(k-1,0,-1): #(i=k-1;i>0;i--)
# print(t," ",i, ", ", n, " ", k-i)
# Ways of choosing i items out of t. Multiply by
total += comb(t,i)*comb(n-t,k-i)
return total
n = int(sys.argv[1])
k = int(sys.argv[2])
t = int(sys.argv[3])
print (targetcomb(n,k,t)/totalcomb(n,k,t))
print (targetcomb(n,k,t))
print (totalcomb(n,k,t))
Then in terms of probability, using the probability that each draw does not contain a target:
from math import comb
import sys
def tprb(n, k, t):
totalp = 1
for i in range(0,k,1):
p = 1 - (t / (n - i))
totalp = totalp * p
return 1-totalp
n = int(sys.argv[1])
k = int(sys.argv[2])
t = int(sys.argv[3])
print (tprb(n,k,t))
All approaches give the same answer, the last is the least computationally taxing…