作业Machine Learning | Project | Python作业 | 作业Math |作业Algorithm – 这是一个机器学习的理论题目,涉及了数学,算法等方面的知识,实现的编程语言是python
Hoeffdings Inequality
1.1 Assignment
For the oral exam, students are asked to present 3 different Machine learning Techniques. They can be chosen from the problems given inYaser S. Abu-Mostafa, Malik Magdon-Ismael and Hsuan- Tien Lin "Learning from Data". Example problems are:
- Problems 1.4 and 1.5 (pp. 34-35). These problems study the Perceptron Learning Algorithm (PLA) on artifical data sets. The first one asks to investigate the PLA on data sets of different size and dimension. And the second one asks to study a variant that includes an extra parameter, the learning rate h.
- Problem 3.3 (p. 109) studies the PLA and its pocket variant on a orginal non-linear separable dataset and a third order polynomial feature transformation of that set.
- Problem 4.4 (p. 155) sets up an experimental framework to study overfitting.
- Problem 4.24 (p. 163) investigates linear regression with weight decay regularization.
- Also in the Problems Section of the e-Chapters on Neural Networks, Support Vector Ma- chines, and Similarity-Based Methods you will find assignments that you can do.
Alternatively , you can also present a project of your own choice but you should discuss this with me beforehand. Also, that project should involve the same amount of work needed to do the assginment described above.
1.2 Requirements
1.Jupyter notebook using the Python ecosystem, i.e. the librariesnumpyandscipyto imple-
ment the algorithms andmatplotlib(orseaborn) to display the results of the experiments.
2.The notebook should be self-contained, it contains text explaining the machine learning algo-
rithm and the hypotheses investigated in the assignment. Relevant theory is also mentioned.
3.Results are visualized.
1.3 Example
Cf. below
2 Introduction
In this notebook we study the Hoeffding inequality.
- We do the coin experiment described in Exercise 1.10 of "Learning from Data" (p. 23). Its purpose is to illustrate the difference betweenverifyinga hypothesis andlearningthe best hypothesis.
- The Hoeffding inequality only holds when some conditions are met. We check the inequaltiy on a number of distributions. For some of them these conditions hold, for others not.
- We compare the Hoeffding and Chebyshev inequalities. The second one is valid for every distribution with finite variance but gives a looser bound than Hoeffding when applicable.
But first of all we implement some functions to classify and visualize the probability distribu- tions of the modulescipy.stats.
3 Imports
Import the modulenumpy, fromscipythestatsmodule and frommaththe functionceilsince it returns an integer as opposed tonp.ceilthat returns a floating point number.
In [ 1 ]: import numpyas np import scipy.stats as ss from math import ceil % precision 10
Out[ 1 ]: ‘%.10f’
Import the modulenumba. It provides an easy way to vectorize Python functions. The result- ing vectorized function can then be applied to a whole range of values at once, cf. the cells below with the @vectorize decorator.
In [ 2 ]: fromnumba import vectorize, boolean, float64, int
Import whatever is needed to generate plots.
In [ 3 ]: frommatplotlib import pyplot as plt plt.style.use(‘seaborn-whitegrid’) % matplotlib inline
4 Probability distributions
We use some probability distributions from the modulescipy.stats. These can be classified as
- continuousordiscrete, e.g. Gaussian and Beta distributions are continuous while the binomial and Poisson distributions are discrete
- whose values areboundedornot, e.g. binomial and Beta distributions are bouned while and the Gaussian and Poisson distributions are unbouded.
The functionplot_rvvisualizes these distributions.
4.1 Some distributions withboundedrange
In [ 4 ]: B_30= ss.bernoulli(p=0.3) B_50= ss.bernoulli(p=0.5) U= ss.uniform() Beta_5_5= ss.beta( 5 , 5 )
4.2 Some distributions withunboundedrange
In [ 5 ]: G= ss.norm(loc= 0 , scale= 1 ) P_1 =ss.poisson(mu= 1 ) P_5 =ss.poisson(mu= 5 ) Chi2_5 =ss.chi2(df= 5 ,loc= 0 , scale= 1 )
In [ 6 ]: RVs =[B_30, B_50, U, Beta_5_5, G, P_1, P_5, Chi2_5]
4.3 Classify distributions
In [ 7 ]: def continuous_p(rv): rv_name =rv.dist.name return isinstance(getattr(ss, rv_name), ss.rv_continuous)
def discrete_p(rv):
rv_name =rv.dist.name
return isinstance(getattr(ss, rv_name), ss.rv_discrete)
In [ 8 ]: def bounded_by_unit_interval_p(rv): return rv.a== 0.0 and rv.b== 1.
def bounded_p(rv):
return np.NINF <rv.a and rv.b< np.PINF
4.4 Visualize distributions
In [ 9 ]: PPF_BOUNDS ={( False , False ):{‘ppf_min’:0.0, ‘ppf_max’:1.0}, ( False , True ):{‘ppf_min’:0.0, ‘ppf_max’:0.99}, ( True , False ):{‘ppf_min’:0.01, ‘ppf_max’:1.0}, ( True , True ):{‘ppf_min’:0.01, ‘ppf_max’:0.99}}
def get_ppf_bounds(rv):
global PPF_BOUNDS
bounds =PPF_BOUNDS[(rv.a==np.NINF, rv.b==np.PINF)]
return bounds['ppf_min'], bounds['ppf_max']
def get_domain_rv(rv):
'''Returns the part of the domain that will be visualized.'''
ppf_min, ppf_max= get_ppf_bounds(rv)
start, end =rv.ppf([ppf_min, ppf_max]).astype(np.int64)
kwargs =dict(num=end-start+ 1 , dtype=int) if discrete_p(rv) else dict(endpoint= True )
return np.linspace(start, end,**kwargs)
def get_ticks(rv, max_ticks= 10 ):
'''Returns the ticks on the x-axis. The maximum number is given by max_ticks and
the ticks correspond with integer values equally spaced between 0 and max_value.'''
domain =get_domain_rv(rv)
max_value=domain[- 1 ]
offset = 0 if max_value% max_ticks ==0 else 1
step= max_value//max_ticks+ offset
nb_ticks= max_value//step+ 1 + offset
ticks= np.arange(step*nb_ticks, step=step)
return ticks
In [ 10 ]: def plot_rv(rv): ”’Plot a distribution from scipy.stats”’ fig, ax = plt.figure(), plt.axes() domain =get_domain_rv(rv)
if discrete_p(rv):
f, label, style =rv.pmf,'pmf','ro'
ax.vlines(domain, 0 , f(domain), color='b', alpha=0.5, lw= 10 )
ax.set(xticks=get_ticks(rv))
else :
f, label, style =rv.pdf,'pdf','r-'
ax.plot(domain, f(domain), style, label=label)
rv_name = rv.dist.name
title =' {} ( {} )'.format(rv_name,', '.join([' {} = {:0.2f} '.format(k,v)
for k,v in rv.kwds.items()]))
ax.set(xlabel=r'$x$',
ylabel=r'$p(x)$',
title=title)
In [ 11 ]: plot_rv(P_5)
5 Coin Experiment
The purpose of the coin experiment is to illustrate the difference between verifying a hypothesis hm 2 H=fh 1 ,h 2 ,,hMgin a finite hypothesis setH, i.e.
P [jEin(hm) Eout(hm)j> ] 2 e ^2
(^2) N and learning the best hypothesisg 2 H. In the later case,gdepends on the datasetDusing during learning and the bound is looser, i.e. it now also containsM P [jEin(g) Eout(g)j> ] 2 Me ^2 (^2) N In the experiment we run a computer simulation where we flip 1,000 fair coins. Each coin is flipped independently 10 times. The experiment is repeated 1,000,000 times. Lets focus on 3 coins as follows: 1.c 1 is the first coin flipped, 2.crandis a coin you choose at random 3.cminis the coin that had the minimum frequency of heads (pick the earlier one in case of a tie). Let n 1 , n randand n minbe the fraction of heads obtained for the above three coins. The first two coins play the role of fixed hypotheses, i.e. the first and a random one, while the last coin plays the role of the final hypothesisg.
In [ 12 ]: def get_coin(heads, idx_coin= 0 ): ”’Input: the number of heads for each coin in each experiment. Output: the number of heads of the coin with given index.”’ return heads[idx_coin, :]
def get_max(heads):
'''Input: the number of heads for each coin per experiment.
Output: the maximum number of heads per experiment.'''
return np.max(heads, axis= 0 )
def get_min(heads):
'''Input: the number of heads for each coin per experiment.
Output: minimum number of heads per experiment.'''
return np.min(heads, axis= 0 )
def get_random(heads):
'''Input: the number of heads for each coin per experiment.
Output: the number of heads of a random coin per experiment.'''
nb_coins, nb_exps =heads.shape
rnd_indices = np.random.randint(low= 0 , high=nb_coins, size=nb_exps)
#return heads.T[np.arange(nb_exps), rnd_indices]
return heads[rnd_indices, np.arange(nb_exps)]
In [ 13 ]: def run_experiments(success_rate, nb_trials, nb_coins, nb_exps): # Generate the trials for each coin and repeat this for each experiment flips =np.random.binomial(n= 1 , p=success_rate, size=(nb_trials, nb_coins, nb_exps)) # Count the number of heads for each coin and repeat this for each experiment heads =np.sum(flips, axis= 0 ) coin_1, coin_rnd, coin_min = get_coin(heads, idx_coin= 0 ), get_random(heads), get_min(heads) return success_rate, flips, heads, coin_1, coin_rnd, coin_min
6 Run and visualize an experiment
6.1 Inspect
In [ 14 ]: def get_experiment_flips(flips, idx_exp): ”’Input: the flips of each coin in each experiment. Output: the flips of each coin in the experiment with given index.”’ return flips[:, :, idx_exp]
def get_coin_flips(flips, idx_coin, idx_exp= 0 ):
'''Input: the flips of each coin in each experiment.
Output: the flips of the coin in the experiment with given indices.'''
nb_trials, nb_coins, nb_exps =flips.shape
assert idx_coin< nb_coins and idx_exp <nb_exps
exp_flips =get_experiment_flips(flips, idx_exp)
coin_flips = exp_flips[:, idx_coin]
return coin_flips
6.2 Visualize
In [ 15 ]: def plot_histogram(mu, flips, heads, heads_1, heads_rnd, heads_min): # Get the data nb_trials, nb_coins, nb_exps =flips.shape max_value =nb_trials+ 1 data = [heads_1, heads_rnd, heads_min]
# Plot the 3 histograms
fig, ax = plt.figure(), plt.axes()
colors, labels = ['yellow', 'red', 'blue'], [r'$H_1$',r'$H_ {rnd} $',r'$H_ {min} $']
edges =np.linspace(start=-0.5, stop=max_value+0.5, endpoint= True , num=max_value+ 2 )
kwargs =dict(bins=edges, normed= 1 , histtype='stepfilled', alpha=0.5)
ax.hist(data, label=labels, color=colors, **kwargs)
# Plot the binomial
x =np.arange(max_value)
binom =ss.binom(n=nb_trials, p=mu)
ax.plot(x, binom.pmf(x), 'ko--', label='Binom')
# Give the information of the plot.
title ='Estimate of number of heads \n ' +\
r'for $N$ = {0:d} trials when the success rate $\mu$ = {1:1.1f} '.format(nb_trials, mu)
ax.set(xlabel=r'Number of heads $n$', xticks=get_ticks(rv=binom),
ylabel=r'Relative Frequency', title=title)
ax.legend(loc='best')
6.3 Take notice
It takes some time to run the experiment as described above. Therefor, we use 1,000 coins and repeat the experiment 10,000 times instead.
In [ 16 ]: MU, FLIPS, HEADS, HEADS_1, HEADS_rnd, HEADS_min= run_experiments(success_rate=0.5, nb_trials= 10 , nb_coins= 100 , nb_exps= 10000 )
6.3.1 Inspect the outcome of the first N experiments
In [ 17 ]: #FIRST_N = 5 #FLIPS.shape, HEADS.shape, HEADS_1[:FIRST_N], HEADS_rnd[:FIRST_N], HEADS_min[:FIRST_N]
In [ 18 ]: #get_coin_flips(flips=FLIPS, idx_coin=0, idx_exp=0)
6.3.2 Visualize the outcome ofallexperiments
In [ 19 ]: plot_histogram(mu=MU, flips=FLIPS, heads=HEADS, heads_1=HEADS_1, heads_rnd= HEADS_rnd, heads_min=HEADS_min)
6.4 Conclusion
From the figure above whereN=10 and if we take for instance =0.3 we can conclude that the Hoeffding bound holds for both the coinsc 1 andcrndbut not forcmin. For the specified values, the Hoeffding bound is33%while the probability for the event P [(j m n j)> ]is approximated by area of the bars0, 1, 9,and 10. Forc 1 andcrndthese areas are clearly less than 1/3 of the total area while forcminthat area is more than 90% and exceeds the Hoeffding bound. More details are given in the sections below.
7 Bernoulli and binomial distributions
Throwing up a coin with success rate m is represented by aBernoullirandom variable X Ber( m ),n=1,,N. Its distribution is given by P (X= 1 )= m and P (X= 0 )= 1 m. We interpret the outcomesHeadas 1 andTailas 0. The mean E (X)and variance V (X)of this distri- bution are m and m ( 1 m ), respectively. If we haveNindependent trials of this experiment then the sequence ofNoutcomes has a binomialdistributionB(N, m )where the parameterNis the number of trials and m the success rate of the coin. Its mean and variance areN m andN m ( 1 m ), respectively. Thesample meanof a sequence ofNtrials will be represented by n =#1Nwhere#1 is the number of1s in the sequence.
The probability that we findnsuccesses in theNtrials is given by
P (X=n)=
(
N
n
)
m n( 1 m )N n
where ( N n
)
N!
(N n)!n!
7.1 Plot the binomial distribution for N trials and success rate m
We plot the probability mass function orpm fand the cumulative distribution function orcdfof thebinomialdistribution for the success rates m =0.1, 0.2, 0.3, 0.4, 0.5and the given number of trails N. Plot either theprobability mass functionor thecumulative distribution functionof the binomial distribution for given success rate m when the number of trialsNis given.
In [ 20 ]: def plot_binomials(nb_trials, cdf= False , success_rates=[0.1, 0.2, 0.3,0.4, 0.5]): fig, ax = plt.figure(), plt.axes() colors =[‘r’,’g’,’b’, ‘y’,’k’]
x =np.arange(nb_trials+ 1 )
for success_rate, color in zip(success_rates, colors):
binom =ss.binom(n=nb_trials, p=success_rate)
f =binom.cdf if cdf else binom.pmf
marker = ' {0:s} o--'.format(color)
ax.plot(x, f(x), marker, label=r'$\mu$ = {0:0.1f} '.format(success_rate))
ax.set(xlabel=r'$N \nu$', xticks=get_ticks(rv=binom),
ylabel=r'Probability $p(N\nu|\mu, N)$',
title=r'Binomial distribution for $N$ = {0:d} trials'.format(nb_trials))
ax.legend(loc='best')
In [ 21 ]: plot_binomials(nb_trials= 10 , cdf= True )
7.2 EvaluatingP [(j m n j)> ] when n has a binomial distribution
1.For a given m and we evaluatej m n j> for all n. Note that we compare n and m
multiplied byNinstead and thatn= N n is the number of heads in the sample. This is
always an integer between0 andN.
2.Next we select the number of headsnfor whichjN m nj>N is True
3.Finally we sum the probabilities of thesen
In [ 22 ]: def make_event_p(mu, eps, N): @vectorize([boolean(int64)]) def event_p(heads): return np.abs(Nmu – heads) >Neps return event_p
In [ 23 ]: def prob(pmf, event): return np.sum(pmf(event))
7.2.1 Example evaluation
First,
1.Specify the success rate m , tolerance and number of trialsN
2.Generate the domain of the binomial distribution:f0, 1,,Ng
3.Return the event with the specified parameters
In [ 24 ]: mu, eps, N =0.5, 0.1, 10 domain =np.arange(N+ 1 ) event_p= make_event_p(mu, eps, N)
Next,
- Determine for element in the sample space whether it belongs to the specified event 5. Select the elements belonging to the event 6. Sum their probabilities using the probability mass function of the binomial distribution 7. This gives the probability of that event
In [ 25 ]: selected= event_p(domain) bad_event=domain[selected] pmf= ss.binom(n=N, p=mu).pmf prob(pmf, bad_event)
Out[ 25 ]: 0.
8 Inequalities
8.1 Chebyshev Inequality
The Chebyshev inequality applies to any random variable with finite variance s^2 but it provides a much looser bound than Hoeffding when the latter one is applicable. LetX 1 ,X 2 ,,XNbe independent and identically distributed random variables with finite variance s and letX=N^1 nN= 1 Xnbe theirsample mean. Then
P
[
jX E (X)j>
]
s^2
N ^2
Note that finite variance implies finite mean.
8.2 Hoeffding Inequality
The Hoeffding inequality states that when theseXnare bounded by the interval[0, 1]then
P
[
jX E (X)j>
]
2 e ^2
(^2) N When theXnare bounded by the interval[a,b]then the inequality becomes
P
[
jX E (X)j>
]
2 e