Example from Nassim Taleb for
How about that problem? pic.twitter.com/d8R7outZLy
— Nassim Nicholas Taleb (@nntaleb) April 6, 2018
An archer stands one meter away from a wall and shoots uniformly randomly to his right with his angle between zero and . Mark a spot right in front of the archer on the wall. What is the average distance between the arrows mark and that spot?
Intuitively it seems plausible that this average distance does not exist. This can be examined analytically and visualized with Monte Carlo Simulations.
The angle is uniformly distributed between 0 and :
Thus the distance becomes a random variable depending on the angle:
The average distance corresponds to the expected value of D = .
The angle is continously distributed with the probability density function , so could try to calculate as:
It is obvious that the upper bound of the integral has to be excluded from the domain of the tan(x)-Function, so the Integral can not be calculated straight forward with the antiderivative. Still, it could be an improper integral, so one idea could be to „sneak“ to the upper bound from within the domain of tan(x) and examine the boundary value of the integral. Let’s try:
We set the upper bound to and then find out what happens to the value of integral for .
For the result is and thus
or to be precise, this boundary value does not exist and thus also the integral for the expected value does not exist. The analytical examination shows the nonexistence of the expected value E(D) for the regarded problem.
How does this nonexistence show itself in in (simulated) data? Let’s find out using some Monte Carlo and Python!
The simulation examples shows the mean values of and $D$ as approximations for and .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
%matplotlib inline import numpy as np import matplotlib.pyplot as plt import math n_sim = 1000 angle_means = np.zeros(n_sim) distance_means = np.zeros(n_sim) for i in range(0 , n_sim): angle = np.random.uniform(low=0, high= (np.pi/2.0), size=(i+1,)) distance = np.tan(angle) angle_means[i] = np.mean(angle) distance_means[i] = np.mean(distance) plt1 = plt.plot(angle_means) plt.title("Mean of Angles as a Function of n") plt.xlabel("n") plt.ylabel("Mean of Angle") plt.grid() plt.savefig("angle_means.png", bbox_inches='tight') plt.show() plt1 = plt.plot(distance_means) plt.title("Mean of Distances as a Function of n") plt.ylabel("Mean of Distances") plt.xlabel("n") plt.yscale('log') plt.grid() plt.savefig("distance_means.png", bbox_inches='tight') plt.show()
It can be observed that the mean value of the „unproblematic“ variable stabilizes and converges to the expected value of 0.785 . No convergence is visible for the mean of the „problematic“ variable of the distance. This is how data look like when the first moment () does not exist. During the first 100 „shots“ it can be seen how the mean appears to be rising due to the unsymmetric distribution allowing positive high distance values, but not the opposite.
Perhaps you wonder why the code allows as an input value for the tan-function. The reason is that the tangens here is approximated by numerical calculations.