Topic: Statistics (Page 3)

You are looking at all articles with the topic "Statistics". We found 65 matches.

Hint: To view all topics, click here. Too see the most popular topics, click here instead.

πŸ”— St. Petersburg paradox

πŸ”— Economics πŸ”— Statistics

The St. Petersburg paradox or St. Petersburg lottery is a paradox related to probability and decision theory in economics. It is based on a particular (theoretical) lottery game that leads to a random variable with infinite expected value (i.e., infinite expected payoff) but nevertheless seems to be worth only a very small amount to the participants. The St. Petersburg paradox is a situation where a naive decision criterion which takes only the expected value into account predicts a course of action that presumably no actual person would be willing to take. Several resolutions are possible.

The paradox takes its name from its resolution by Daniel Bernoulli, one-time resident of the eponymous Russian city, who published his arguments in the Commentaries of the Imperial Academy of Science of Saint Petersburg (Bernoulli 1738). However, the problem was invented by Daniel's cousin, Nicolas Bernoulli, who first stated it in a letter to Pierre Raymond de Montmort on September 9, 1713 (de Montmort 1713).

Discussed on

πŸ”— 68–95–99.7 Rule

πŸ”— Mathematics πŸ”— Statistics

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In mathematical notation, these facts can be expressed as follows, where Pr() is the probability function, Ξ§ is an observation from a normally distributed random variable, ΞΌ (mu) is the mean of the distribution, and Οƒ (sigma) is its standard deviation:

Pr ( ΞΌ βˆ’ 1 Οƒ ≀ X ≀ ΞΌ + 1 Οƒ ) β‰ˆ 68.27 % Pr ( ΞΌ βˆ’ 2 Οƒ ≀ X ≀ ΞΌ + 2 Οƒ ) β‰ˆ 95.45 % Pr ( ΞΌ βˆ’ 3 Οƒ ≀ X ≀ ΞΌ + 3 Οƒ ) β‰ˆ 99.73 % {\displaystyle {\begin{aligned}\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )&\approx 68.27\%\\\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )&\approx 95.45\%\\\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )&\approx 99.73\%\end{aligned}}}

The usefulness of this heuristic especially depends on the question under consideration.

In the empirical sciences, the so-called three-sigma rule of thumb (or 3Οƒ rule) expresses a conventional heuristic that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7% probability as near certainty.

In the social sciences, a result may be considered "significant" if its confidence level is of the order of a two-sigma effect (95%), while in particle physics, there is a convention of a five-sigma effect (99.99994% confidence) being required to qualify as a discovery.

A weaker three-sigma rule can be derived from Chebyshev's inequality, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within the interval is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.

Discussed on

πŸ”— Winsorized Mean

πŸ”— Statistics

A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after winsorizing β€” replacing given parts of a probability distribution or sample at the high and low end with the most extreme remaining values, typically doing so for an equal amount of both extremes; often 10 to 25 percent of the ends are replaced. The winsorized mean can equivalently be expressed as a weighted average of the truncated mean and the quantiles at which it is limited, which corresponds to replacing parts with the corresponding quantiles.

Discussed on

πŸ”— Delphi Method

πŸ”— Statistics πŸ”— Futures studies

The Delphi method or Delphi technique ( DEL-fy; also known as Estimate-Talk-Estimate or ETE) is a structured communication technique or method, originally developed as a systematic, interactive forecasting method which relies on a panel of experts. The technique can also be adapted for use in face-to-face meetings, and is then called mini-Delphi or Estimate-Talk-Estimate (ETE). Delphi has been widely used for business forecasting and has certain advantages over another structured forecasting approach, prediction markets.

Delphi is based on the principle that forecasts (or decisions) from a structured group of individuals are more accurate than those from unstructured groups. The experts answer questionnaires in two or more rounds. After each round, a facilitator or change agent provides an anonymised summary of the experts' forecasts from the previous round as well as the reasons they provided for their judgments. Thus, experts are encouraged to revise their earlier answers in light of the replies of other members of their panel. It is believed that during this process the range of the answers will decrease and the group will converge towards the "correct" answer. Finally, the process is stopped after a predefined stop criterion (e.g., number of rounds, achievement of consensus, stability of results), and the mean or median scores of the final rounds determine the results.

Special attention has to be paid to the formulation of the Delphi theses and the definition and selection of the experts in order to avoid methodological weaknesses that severely threaten the validity and reliability of the results.

Discussed on

πŸ”— Secretary Problem

πŸ”— Mathematics πŸ”— Statistics

The secretary problem is a problem that demonstrates a scenario involving optimal stopping theory. The problem has been studied extensively in the fields of applied probability, statistics, and decision theory. It is also known as the marriage problem, the sultan's dowry problem, the fussy suitor problem, the googol game, and the best choice problem.

The basic form of the problem is the following: imagine an administrator who wants to hire the best secretary out of n {\displaystyle n} rankable applicants for a position. The applicants are interviewed one by one in random order. A decision about each particular applicant is to be made immediately after the interview. Once rejected, an applicant cannot be recalled. During the interview, the administrator gains information sufficient to rank the applicant among all applicants interviewed so far, but is unaware of the quality of yet unseen applicants. The question is about the optimal strategy (stopping rule) to maximize the probability of selecting the best applicant. If the decision can be deferred to the end, this can be solved by the simple maximum selection algorithm of tracking the running maximum (and who achieved it), and selecting the overall maximum at the end. The difficulty is that the decision must be made immediately.

The shortest rigorous proof known so far is provided by the odds algorithm (Bruss 2000). It implies that the optimal win probability is always at least 1 / e {\displaystyle 1/e} (where e is the base of the natural logarithm), and that the latter holds even in a much greater generality (2003). The optimal stopping rule prescribes always rejecting the first ∼ n / e {\displaystyle \sim n/e} applicants that are interviewed and then stopping at the first applicant who is better than every applicant interviewed so far (or continuing to the last applicant if this never occurs). Sometimes this strategy is called the 1 / e {\displaystyle 1/e} stopping rule, because the probability of stopping at the best applicant with this strategy is about 1 / e {\displaystyle 1/e} already for moderate values of n {\displaystyle n} . One reason why the secretary problem has received so much attention is that the optimal policy for the problem (the stopping rule) is simple and selects the single best candidate about 37% of the time, irrespective of whether there are 100 or 100 million applicants.

Discussed on

πŸ”— The Erlang Distribution

πŸ”— Statistics

The Erlang distribution is a two-parameter family of continuous probability distributions with support x ∈ [ 0 , ∞ ) {\displaystyle x\in [0,\infty )} . The two parameters are:

  • a positive integer k , {\displaystyle k,} the "shape", and
  • a positive real number Ξ» , {\displaystyle \lambda ,} the "rate". The "scale", ΞΌ , {\displaystyle \mu ,} the reciprocal of the rate, is sometimes used instead.

The Erlang distribution with shape parameter k = 1 {\displaystyle k=1} simplifies to the exponential distribution. It is a special case of the gamma distribution. It is the distribution of a sum of k {\displaystyle k} independent exponential variables with mean 1 / Ξ» {\displaystyle 1/\lambda } each.

The Erlang distribution was developed by A. K. Erlang to examine the number of telephone calls which might be made at the same time to the operators of the switching stations. This work on telephone traffic engineering has been expanded to consider waiting times in queueing systems in general. The distribution is also used in the field of stochastic processes.

Discussed on

πŸ”— Galton Board

πŸ”— Statistics

The bean machine, also known as the Galton Board or quincunx, is a device invented by Sir Francis Galton to demonstrate the central limit theorem, in particular that with sufficient sample size the binomial distribution approximates a normal distribution. Among its applications, it afforded insight into regression to the mean or "regression to mediocrity".

Discussed on

πŸ”— Fuzzy Logic

πŸ”— Mathematics πŸ”— Philosophy πŸ”— Statistics

Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1 both inclusive. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1.

The term fuzzy logic was introduced with the 1965 proposal of fuzzy set theory by Lotfi Zadeh. Fuzzy logic had, however, been studied since the 1920s, as infinite-valued logicβ€”notably by Łukasiewicz and Tarski.

Fuzzy logic is based on the observation that people make decisions based on imprecise and non-numerical information. Fuzzy models or sets are mathematical means of representing vagueness and imprecise information (hence the term fuzzy). These models have the capability of recognising, representing, manipulating, interpreting, and utilising data and information that are vague and lack certainty.

Fuzzy logic has been applied to many fields, from control theory to artificial intelligence.

Discussed on

πŸ”— Curse of dimensionality

πŸ”— Computing πŸ”— Mathematics πŸ”— Statistics πŸ”— Cognitive science

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.

Cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.

Discussed on

πŸ”— History of the Monte Carlo method

πŸ”— Computing πŸ”— Computer science πŸ”— Mathematics πŸ”— Physics πŸ”— Statistics

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

In physics-related problems, Monte Carlo methods are useful for simulating systems with many coupled degrees of freedom, such as fluids, disordered materials, strongly coupled solids, and cellular structures (see cellular Potts model, interacting particle systems, McKean–Vlasov processes, kinetic models of gases).

Other examples include modeling phenomena with significant uncertainty in inputs such as the calculation of risk in business and, in mathematics, evaluation of multidimensional definite integrals with complicated boundary conditions. In application to systems engineering problems (space, oil exploration, aircraft design, etc.), Monte Carlo–based predictions of failure, cost overruns and schedule overruns are routinely better than human intuition or alternative "soft" methods.

In principle, Monte Carlo methods can be used to solve any problem having a probabilistic interpretation. By the law of large numbers, integrals described by the expected value of some random variable can be approximated by taking the empirical mean (a.k.a. the sample mean) of independent samples of the variable. When the probability distribution of the variable is parameterized, mathematicians often use a Markov chain Monte Carlo (MCMC) sampler. The central idea is to design a judicious Markov chain model with a prescribed stationary probability distribution. That is, in the limit, the samples being generated by the MCMC method will be samples from the desired (target) distribution. By the ergodic theorem, the stationary distribution is approximated by the empirical measures of the random states of the MCMC sampler.

In other problems, the objective is generating draws from a sequence of probability distributions satisfying a nonlinear evolution equation. These flows of probability distributions can always be interpreted as the distributions of the random states of a Markov process whose transition probabilities depend on the distributions of the current random states (see McKean–Vlasov processes, nonlinear filtering equation). In other instances we are given a flow of probability distributions with an increasing level of sampling complexity (path spaces models with an increasing time horizon, Boltzmann–Gibbs measures associated with decreasing temperature parameters, and many others). These models can also be seen as the evolution of the law of the random states of a nonlinear Markov chain. A natural way to simulate these sophisticated nonlinear Markov processes is to sample multiple copies of the process, replacing in the evolution equation the unknown distributions of the random states by the sampled empirical measures. In contrast with traditional Monte Carlo and MCMC methodologies, these mean-field particle techniques rely on sequential interacting samples. The terminology mean field reflects the fact that each of the samples (a.k.a. particles, individuals, walkers, agents, creatures, or phenotypes) interacts with the empirical measures of the process. When the size of the system tends to infinity, these random empirical measures converge to the deterministic distribution of the random states of the nonlinear Markov chain, so that the statistical interaction between particles vanishes.

Despite its conceptual and algorithmic simplicity, the computational cost associated with a Monte Carlo simulation can be staggeringly high. In general the method requires many samples to get a good approximation, which may incur an arbitrarily large total runtime if the processing time of a single sample is high. Although this is a severe limitation in very complex problems, the embarrassingly parallel nature of the algorithm allows this large cost to be reduced (perhaps to a feasible level) through parallel computing strategies in local processors, clusters, cloud computing, GPU, FPGA, etc.

Discussed on