Statistics 101

• Published on
20-Jul-2015

• View
98

0

Embed Size (px)

Transcript

A little bit of statistics

P( waow | news ) =?

Posterior probability

In case of independent items,

P( Observations | ) = product ofP( Observation1 | ) x P( Observation2 | ) x x P( ObservationZ | )

Bayes theorem

Bayes:P( | observations) P(observations)= P( observations | ) P()

So:P( | observations) = P(observations | ) x P() / P(observation)

So, by independ. Items + Bayes,

P( | observations ) is proportional toP() x P( obs1 | ) x x P(obsZ | )

Definitions:MAP (maximum a posteriori): find * such that P(*|observations) is max

BPE (Bayesian posterior expectation): find E = expectation of (|observations)

Maximum likelihood: P() uniform

there are other possible tools

ErrorEstimate = Expect. ( estimator)2

log-likelihood

Instead of probas, use log-probas.

Because:Products become sums ==> more precise on a computer for very small probabilities

Finding the MAP (or others estimates)

Dimension 1:Golden Search (unimodal)

Grid Search (multimodal, slow)

Robust search (compromise)

Newton Raphson (unimodal, precise expensive computations)

Dimension large:Jacobi algorithm

Or Gauss-Seidel, or Newton, or NewUoa, or ...

Jacobi algorithm for maximizing in dimension D>1

x=clever initialization, if possible

While ( ||x' x|| > epsilon )x'=current x

For each parameter x(i), optimize it by a 1Dim algorithm

with just a few iterates

Jacobi = great when the objective function can be restricted to 1 parameter

and then be much faster

Jacobi algorithm for maximizing in dimension D>1

x=clever initialization, if possible

While ( ||x' x|| > epsilon )x'=current x

For each parameter x(i), optimize it One iteration of robust search

But don't decrease the interval if optimum = close to current bounds

Jacobi = great when the objective function can be restricted to 1 parameter

and then be much faster

Possible use

Computing student's abilities, given item parameters

Computing item parameters, given student abilities

Computing both item parameters and student abilities (need plenty of data)

Priors

How to know P()?

Keep in mind that difficulties and abilities are translation invariant==> so you need a reference

==> possibly reference = average = 0

If you have a big database and trust your model (3PL ?), you can use Jacobi+MAP.

What if you don't like Jacobi's result?

Too slow? (initialization, epsilon larger, better 1D algorithm, better implementation...)

Epsilon too large?

Maybe you use Map whereas you want Bpe?==> If you get convergence and don't like the result, it's not because of Jacobi, it's because of the criterion.

Maybe not enough data?

Initializing IRT parameters?

Roughy approximations for IRT parameters:Abilities ()

Item parameters (a,b,c in 3PL models)

Priors can be very convenient for that.

Find with quantiles!
1. Rank students per performance.

Find with quantiles!
2. Cumulative distribution

ABILITIES

Find with quantiles!
3. Projections

MediumstudentBestN/(N+1)Worst1/(N+1)

ABILITIES

Find with quantiles!
3. Projections

MediumstudentBestN/(N+1)Worst1/(N+1)

ABILITIES

Equation version for approximating abilities

if you have a prior (e.g. Gaussian), then a simple solution: Rank students per score on the test

For student i over N, initialized at the prior's quantile 1 i/(N+1)

E.g. With Gaussian prior mu, sigma,then ability(i)=mu+sigma*norminv(1-i/(N+1))With norminv e.g. as in http://www.wilmott.com/messageview.cfm?catid=10&threadid=38771

Equation version for approximating item parameters

Much harder!There are formulas based on correlation. It's a very rough approximation.How to estimate bif c=0 ?

Approximating item parameters

Much harder!There are formulas based on correlation. It's a very rough approximation.How to estimate b=difficultyif c=0 ?Simple solution:Assume a=1 (discrimination)

Use the curve, or approximate b = 4.8 x (1/2 - proba(success))

If you know students' abilities, it's much easier

And for difficulty of items?
Use curve or approximation...

Codes

IRT in R: there are packages, it's free, and R is a widely supported language for statistics.

IRT in Octave: we started our implementation, but still very preliminary:No missing data (the main strength of IRT) ==> though this would be easy

No user-friendly interface to data

Others? I did not check

==> Cross-validation for comparing?

How to get the percentile from the ability

percentile is norm-cdf( (theta*-mu)/sigma).(some languages have normcdf included)

Slow/precise implementation of norm-cdf: http://stackoverflow.com/questions/2328258/cumulative-normal-distribution-function-in-c-c

Fastimplementation of norm-cdf: http://finance.bi.no/~bernt/gcc_prog/recipes/recipes/node23.html

Maybe fast Exp, if you want to save up time :-)