Statistics 101

  • Published on
    20-Jul-2015

  • View
    98

  • Download
    0

Embed Size (px)

Transcript

<p>A little bit of statistics</p> <p>P( waow | news ) =?</p> <p>Posterior probability</p> <p>In case of independent items,</p> <p>P( Observations | ) = product ofP( Observation1 | ) x P( Observation2 | ) x x P( ObservationZ | )</p> <p>Bayes theorem</p> <p>Bayes:P( | observations) P(observations)= P( observations | ) P()</p> <p>So:P( | observations) = P(observations | ) x P() / P(observation)</p> <p>So, by independ. Items + Bayes,</p> <p>P( | observations ) is proportional toP() x P( obs1 | ) x x P(obsZ | )</p> <p>Definitions:MAP (maximum a posteriori): find * such that P(*|observations) is max</p> <p>BPE (Bayesian posterior expectation): find E = expectation of (|observations)</p> <p>Maximum likelihood: P() uniform</p> <p>there are other possible tools </p> <p>ErrorEstimate = Expect. ( estimator)2</p> <p>log-likelihood</p> <p>Instead of probas, use log-probas.</p> <p>Because:Products become sums ==&gt; more precise on a computer for very small probabilities</p> <p>Finding the MAP (or others estimates)</p> <p>Dimension 1:Golden Search (unimodal)</p> <p>Grid Search (multimodal, slow)</p> <p>Robust search (compromise)</p> <p>Newton Raphson (unimodal, precise expensive computations)</p> <p>Dimension large:Jacobi algorithm</p> <p>Or Gauss-Seidel, or Newton, or NewUoa, or ... </p> <p>Jacobi algorithm for maximizing in dimension D&gt;1</p> <p>x=clever initialization, if possible</p> <p>While ( ||x' x|| &gt; epsilon )x'=current x</p> <p>For each parameter x(i), optimize it by a 1Dim algorithm</p> <p>with just a few iterates</p> <p>Jacobi = great when the objective function can be restricted to 1 parameter </p> <p>and then be much faster</p> <p>Jacobi algorithm for maximizing in dimension D&gt;1</p> <p>x=clever initialization, if possible</p> <p>While ( ||x' x|| &gt; epsilon )x'=current x</p> <p>For each parameter x(i), optimize it One iteration of robust search</p> <p>But don't decrease the interval if optimum = close to current bounds</p> <p>Jacobi = great when the objective function can be restricted to 1 parameter </p> <p>and then be much faster</p> <p>Possible use</p> <p>Computing student's abilities, given item parameters</p> <p>Computing item parameters, given student abilities</p> <p>Computing both item parameters and student abilities (need plenty of data)</p> <p>Priors</p> <p>How to know P()?</p> <p>Keep in mind that difficulties and abilities are translation invariant==&gt; so you need a reference </p> <p>==&gt; possibly reference = average = 0</p> <p>If you have a big database and trust your model (3PL ?), you can use Jacobi+MAP.</p> <p>What if you don't like Jacobi's result?</p> <p>Too slow? (initialization, epsilon larger, better 1D algorithm, better implementation...)</p> <p>Epsilon too large?</p> <p>Maybe you use Map whereas you want Bpe?==&gt; If you get convergence and don't like the result, it's not because of Jacobi, it's because of the criterion.</p> <p>Maybe not enough data?</p> <p>Initializing IRT parameters?</p> <p>Roughy approximations for IRT parameters:Abilities ()</p> <p>Item parameters (a,b,c in 3PL models)</p> <p>Priors can be very convenient for that.</p> <p>Find with quantiles!<br />1. Rank students per performance.</p> <p>Find with quantiles!<br />2. Cumulative distribution</p> <p>ABILITIES</p> <p>Find with quantiles!<br />3. Projections</p> <p>MediumstudentBestN/(N+1)Worst1/(N+1)</p> <p>ABILITIES</p> <p>Find with quantiles!<br />3. Projections</p> <p>MediumstudentBestN/(N+1)Worst1/(N+1)</p> <p>ABILITIES</p> <p>Equation version for approximating abilities </p> <p>if you have a prior (e.g. Gaussian), then a simple solution: Rank students per score on the test</p> <p>For student i over N, initialized at the prior's quantile 1 i/(N+1)</p> <p>E.g. With Gaussian prior mu, sigma,then ability(i)=mu+sigma*norminv(1-i/(N+1))With norminv e.g. as in http://www.wilmott.com/messageview.cfm?catid=10&amp;threadid=38771</p> <p>Equation version for approximating item parameters</p> <p>Much harder!There are formulas based on correlation. It's a very rough approximation.How to estimate bif c=0 ?</p> <p>Approximating item parameters</p> <p>Much harder!There are formulas based on correlation. It's a very rough approximation.How to estimate b=difficultyif c=0 ?Simple solution:Assume a=1 (discrimination)</p> <p>Use the curve, or approximate b = 4.8 x (1/2 - proba(success)) </p> <p>If you know students' abilities, it's much easier</p> <p>And for difficulty of items?<br />Use curve or approximation... </p> <p>Codes</p> <p>IRT in R: there are packages, it's free, and R is a widely supported language for statistics.</p> <p>IRT in Octave: we started our implementation, but still very preliminary:No missing data (the main strength of IRT) ==&gt; though this would be easy</p> <p>No user-friendly interface to data</p> <p>Others? I did not check</p> <p>==&gt; Cross-validation for comparing?</p> <p>How to get the percentile from the ability</p> <p>percentile is norm-cdf( (theta*-mu)/sigma).(some languages have normcdf included)</p> <p>Slow/precise implementation of norm-cdf: http://stackoverflow.com/questions/2328258/cumulative-normal-distribution-function-in-c-c</p> <p>Fastimplementation of norm-cdf: http://finance.bi.no/~bernt/gcc_prog/recipes/recipes/node23.html</p> <p>Maybe fast Exp, if you want to save up time :-)</p>