Topic: Statistics (Page 2)

You are looking at all articles with the topic "Statistics". We found 65 matches.

Hint: To view all topics, click here. Too see the most popular topics, click here instead.

πŸ”— Anna Karenina Principle

πŸ”— Statistics πŸ”— Sociology

The Anna Karenina principle states that a deficiency in any one of a number of factors dooms an endeavor to failure. Consequently, a successful endeavor (subject to this principle) is one where every possible deficiency has been avoided.

The name of the principle derives from Leo Tolstoy's book Anna Karenina, which begins:

All happy families are alike; each unhappy family is unhappy in its own way.

In other words: happy families share a common set of attributes which lead to happiness, while any of a variety of attributes can cause an unhappy family. This concept has been generalized to apply to several fields of study.

In statistics, the term Anna Karenina principle is used to describe significance tests: there are any number of ways in which a dataset may violate the null hypothesis and only one in which all the assumptions are satisfied.

Discussed on

πŸ”— Kalman Filter

πŸ”— Mathematics πŸ”— Statistics πŸ”— Systems πŸ”— Robotics πŸ”— Systems/Control theory

In statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. KΓ‘lmΓ‘n, one of the primary developers of its theory.

The Kalman filter has numerous applications in technology. A common application is for guidance, navigation, and control of vehicles, particularly aircraft, spacecraft and dynamically positioned ships. Furthermore, the Kalman filter is a widely applied concept in time series analysis used in fields such as signal processing and econometrics. Kalman filters also are one of the main topics in the field of robotic motion planning and control and can be used in trajectory optimization. The Kalman filter also works for modeling the central nervous system's control of movement. Due to the time delay between issuing motor commands and receiving sensory feedback, use of the Kalman filter supports a realistic model for making estimates of the current state of the motor system and issuing updated commands.

The algorithm works in a two-step process. In the prediction step, the Kalman filter produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some amount of error, including random noise) is observed, these estimates are updated using a weighted average, with more weight being given to estimates with higher certainty. The algorithm is recursive. It can run in real time, using only the present input measurements and the previously calculated state and its uncertainty matrix; no additional past information is required.

Optimality of the Kalman filter assumes that the errors are Gaussian. In the words of Rudolf E. KΓ‘lmΓ‘n: "In summary, the following assumptions are made about random processes: Physical random phenomena may be thought of as due to primary random sources exciting dynamic systems. The primary sources are assumed to be independent gaussian random processes with zero mean; the dynamic systems will be linear." Though regardless of Gaussianity, if the process and measurement covariances are known, the Kalman filter is the best possible linear estimator in the minimum mean-square-error sense.

Extensions and generalizations to the method have also been developed, such as the extended Kalman filter and the unscented Kalman filter which work on nonlinear systems. The underlying model is a hidden Markov model where the state space of the latent variables is continuous and all latent and observed variables have Gaussian distributions. Also, Kalman filter has been successfully used in multi-sensor fusion, and distributed sensor networks to develop distributed or consensus Kalman filter.

Discussed on

πŸ”— Jaccard Index

πŸ”— Computer science πŸ”— Statistics

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is frequently referred to as the Critical Success Index in meteorology. It was later developed independently by Paul Jaccard, originally giving the French name coefficient de communautΓ©, and independently formulated again by T. Tanimoto. Thus, the Tanimoto index or Tanimoto coefficient are also used in some fields. However, they are identical in generally taking the ratio of Intersection over Union. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

J ( A , B ) = | A ∩ B | | A βˆͺ B | = | A ∩ B | | A | + | B | βˆ’ | A ∩ B | . {\displaystyle J(A,B)={{|A\cap B|} \over {|A\cup B|}}={{|A\cap B|} \over {|A|+|B|-|A\cap B|}}.}

Note that by design, 0 ≀ J ( A , B ) ≀ 1. {\displaystyle 0\leq J(A,B)\leq 1.} If A intersection B is empty, then J(A,B)Β =Β 0. The Jaccard coefficient is widely used in computer science, ecology, genomics, and other sciences, where binary or binarized data are used. Both the exact solution and approximation methods are available for hypothesis testing with the Jaccard coefficient.

Jaccard similarity also applies to bags, i.e., Multisets. This has a similar formula, but the symbols mean bag intersection and bag sum (not union). The maximum value is 1/2.

J ( A , B ) = | A ∩ B | | A ⊎ B | = | A ∩ B | | A | + | B | . {\displaystyle J(A,B)={{|A\cap B|} \over {|A\uplus B|}}={{|A\cap B|} \over {|A|+|B|}}.}

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

d J ( A , B ) = 1 βˆ’ J ( A , B ) = | A βˆͺ B | βˆ’ | A ∩ B | | A βˆͺ B | . {\displaystyle d_{J}(A,B)=1-J(A,B)={{|A\cup B|-|A\cap B|} \over |A\cup B|}.}

An alternative interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference A β–³ B = ( A βˆͺ B ) βˆ’ ( A ∩ B ) {\displaystyle A\triangle B=(A\cup B)-(A\cap B)} to the union. Jaccard distance is commonly used to calculate an n Γ— n matrix for clustering and multidimensional scaling of n sample sets.

This distance is a metric on the collection of all finite sets.

There is also a version of the Jaccard distance for measures, including probability measures. If ΞΌ {\displaystyle \mu } is a measure on a measurable space X {\displaystyle X} , then we define the Jaccard coefficient by

J ΞΌ ( A , B ) = ΞΌ ( A ∩ B ) ΞΌ ( A βˆͺ B ) , {\displaystyle J_{\mu }(A,B)={{\mu (A\cap B)} \over {\mu (A\cup B)}},}

and the Jaccard distance by

d ΞΌ ( A , B ) = 1 βˆ’ J ΞΌ ( A , B ) = ΞΌ ( A β–³ B ) ΞΌ ( A βˆͺ B ) . {\displaystyle d_{\mu }(A,B)=1-J_{\mu }(A,B)={{\mu (A\triangle B)} \over {\mu (A\cup B)}}.}

Care must be taken if ΞΌ ( A βˆͺ B ) = 0 {\displaystyle \mu (A\cup B)=0} or ∞ {\displaystyle \infty } , since these formulas are not well defined in these cases.

The MinHash min-wise independent permutations locality sensitive hashing scheme may be used to efficiently compute an accurate estimate of the Jaccard similarity coefficient of pairs of sets, where each set is represented by a constant-sized signature derived from the minimum values of a hash function.

Discussed on

πŸ”— United States incarceration rate

πŸ”— United States πŸ”— Crime πŸ”— Statistics

In September 2013, the incarceration rate of the United States of America was the highest in the world at 716 per 100,000 of the national population. While the United States represents about 4.4 percent of the world's population, it houses around 22 percent of the world's prisoners. Corrections (which includes prisons, jails, probation, and parole) cost around $74 billion in 2007 according to the U.S. Bureau of Justice Statistics.

At the end of 2016, the Prison Policy Initiative estimated that in the United States, about 2,298,300 people were incarcerated out of a population of 324.2 million. This means that 0.7% of the population was behind bars. Of those who were incarcerated, about 1,316,000 people were in state prison, 615,000 in local jails, 225,000 in federal prisons, 48,000 in youth correctional facilities, 34,000 in immigration detention camps, 22,000 in involuntary commitment, 11,000 in territorial prisons, 2,500 in Indian Country jails, and 1,300 in United States military prisons.

Discussed on

πŸ”— Chernoff face

πŸ”— Mathematics πŸ”— Statistics

Chernoff faces, invented by Herman Chernoff in 1973, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty. Chernoff faces handle each variable differently. Because the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen (e.g. eye size and eyebrow-slant have been found to carry significant weight).

Discussed on

πŸ”— Berkson's Paradox

πŸ”— Statistics

Berkson's paradox also known as Berkson's bias or Berkson's fallacy is a result in conditional probability and statistics which is often found to be counterintuitive, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design. The effect is related to the explaining away phenomenon in Bayesian networks, and conditioning on a collider in graphical models.

It is often described in the fields of medical statistics or biostatistics, as in the original description of the problem by Joseph Berkson.

Discussed on

πŸ”— Queueing Theory

πŸ”— Computing πŸ”— Mathematics πŸ”— Statistics πŸ”— Systems πŸ”— Systems/Operations research

Queueing theory is the mathematical study of waiting lines, or queues. A queueing model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service.

Queueing theory has its origins in research by Agner Krarup Erlang when he created models to describe the system of Copenhagen Telephone Exchange company, a Danish company. The ideas have since seen applications including telecommunication, traffic engineering, computing and, particularly in industrial engineering, in the design of factories, shops, offices and hospitals, as well as in project management.

Discussed on

πŸ”— Nelson Rules

πŸ”— Statistics

Nelson rules are a method in process control of determining if some measured variable is out of control (unpredictable versus consistent). Rules, for detecting "out-of-control" or non-random conditions were first postulated by Walter A. Shewhart in the 1920s. The Nelson rules were first published in the October 1984 issue of the Journal of Quality Technology in an article by Lloyd S Nelson.

The rules are applied to a control chart on which the magnitude of some variable is plotted against time. The rules are based on the mean value and the standard deviation of the samples.

The above eight rules apply to a chart of a variable value.

A second chart, the moving range chart, can also be used but only with rules 1, 2, 3 and 4. Such a chart plots a graph of the maximum value - minimum value of N adjacent points against the time sample of the range.

An example moving range: if N = 3 and values are 1, 3, 5, 3, 3, 2, 4, 5 then the sets of adjacent points are (1,3,5) (3,5,3) (5,3,3) (3,3,2) (3,2,4) (2,4,5) resulting in moving range values of (5-1) (5-3) (5-3) (3-2) (4-2) (5-2) = 4, 2, 2, 1, 2, 3.

Applying these rules indicates when a potential "out of control" situation has arisen. However, there will always be some false alerts and the more rules applied the more will occur. For some processes, it may be beneficial to omit one or more rules. Equally there may be some missing alerts where some specific "out of control" situation is not detected. Empirically, the detection accuracy is good.

Discussed on

πŸ”— Non-transitive dice

πŸ”— Statistics

A set of dice is nontransitive if it contains three dice, A, B, and C, with the property that A rolls higher than B more than half the time, and B rolls higher than C more than half the time, but it is not true that A rolls higher than C more than half the time. In other words, a set of dice is nontransitive if the binary relation – X rolls a higher number than Y more than half the time – on its elements is not transitive.

It is possible to find sets of dice with the even stronger property that, for each die in the set, there is another die that rolls a higher number than it more than half the time. Using such a set of dice, one can invent games which are biased in ways that people unused to nontransitive dice might not expect (see Example).

Discussed on

πŸ”— Kullback–Leibler Divergence

πŸ”— Mathematics πŸ”— Physics πŸ”— Statistics

In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D KL ( P βˆ₯ Q ) {\displaystyle D_{\text{KL}}(P\parallel Q)} , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalized Pythagorean theorem (which applies to squared distances).

In the simple case, a relative entropy of 0 indicates that the two distributions in question have identical quantities of information. Relative entropy is a nonnegative function of two distributions or measures. It has diverse applications, both theoretical, such as characterizing the relative (Shannon) entropy in information systems, randomness in continuous time-series, and information gain when comparing statistical models of inference; and practical, such as applied statistics, fluid mechanics, neuroscience and bioinformatics.

Discussed on