Descriptive statistics ii

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li> 1. Descriptive Statistics-IIDr Mahmoud Alhussami</li></ul><p> 2. Shapes of Distribution A third important property of data after locationand dispersion - is its shape Distributions of quantitative variables can bedescribed in terms of a number of features, manyof which are related to the distributions physicalappearance or shape when presented graphically. modality Symmetry and skewness Degree of skewness Kurtosis 3. Modality The modality of a distribution concernshow many peaks or high points there are. A distribution with a single peak, onevalue a high frequency is a unimodaldistribution. 4. Modality A distribution with twoor more peaks calledmultimodaldistribution. 5. Symmetry and Skewness A distribution is symmetric if the distribution could be splitdown the middle to form two haves that are mirror imagesof one another. In asymmetric distributions, the peaks are off center, witha bull of scores clustering at one end, and a tail trailing offat the other end. Such distributions are often describes asskewed. When the longer tail trails off to the right this is a positivelyskewed distribution. E.g. annual income. When the longer tail trails off to the left this is callednegatively skewed distribution. E.g. age at death. 6. Symmetry and Skewness Shape can be described by degree of asymmetry (i.e.,skewness). mean &gt; median positive or right-skewness mean = median symmetric or zero-skewness mean &lt; median negative or left-skewness Positive skewness can arise when the mean isincreased by some unusually high values. Negative skewness can arise when the mean isdecreased by some unusually low values. 7. Left skewed:Right skewed:Symmetric: 7 8. Shapes of the Distribution Three common shapes of frequencydistributions: A BCSymmetrical Positively Negativelyand bellskewed orskewed orshapedskewed toskewed tothe rightthe leftMarch 28, 20138 9. Shapes of the Distribution Three less common shapes of frequencydistributions: A B CBimodal ReverseUniformJ-shapedMarch 28, 20139 10. This guytook a VERYlong time!10 11. Degree of Skewness A skewness index can readily be calculated moststatistical computer program in conjunction withfrequency distributions The index has a value of 0 for perfectlysymmetric distribution. A positive value if there is a positive skew, andnegative value if there is a negative skew. A skewness index that is more than twice thevalue of its standard error can be interpreted as adeparture from symmetry. 12. Measures of Skewness or Symmetry Pearsons skewness coefficient It is nonalgebraic and easily calculated. Also itis useful for quick estimates of symmetry . It is defined as:skewness = mean-median/SD Fishers measure of skewness. It is based on deviations from the mean to thethird power. 13. Pearsons skewness coefficient For a perfectly symmetrical distribution, the mean willequal the median, and the skewness coefficient will bezero. If the distribution is positively skewed the meanwill be more than the median and the coefficient will bethe positive. If the coefficient is negative, thedistribution is negatively skewed and the mean less thanthe median. Skewness values will fall between -1 and +1 SD units.Values falling outside this range indicate a substantiallyskewed distribution. Hildebrand (1986) states that skewness values above0.2 or below -0.2 indicate severe skewness. 14. Assumption of Normality Many of the statistical methods that we willapply require the assumption that a variable orvariables are normally distributed. With multivariate statistics, the assumption isthat the combination of variables follows amultivariate normal distribution. Since there is not a direct test for multivariatenormality, we generally test each variableindividually and assume that they aremultivariate normal if they are individuallynormal, though this is not necessarily the case. 15. Evaluating normality There are both graphical and statistical methodsfor evaluating normality. Graphical methods include the histogram andnormality plot. Statistical methods include diagnostic hypothesistests for normality, and a rule of thumb that saysa variable is reasonably close to normal if itsskewness and kurtosis have values between 1.0and +1.0. None of the methods is absolutely definitive. 16. Transformations When a variable is not normally distributed, wecan create a transformed variable and test it fornormality. If the transformed variable is normallydistributed, we can substitute it in our analysis. Three common transformations are: thelogarithmic transformation, the square roottransformation, and the inverse transformation. All of these change the measuring scale on thehorizontal axis of a histogram to produce atransformed variable that is mathematicallyequivalent to the original variable. 17. Types of Data Transformations for moderate skewness, use a square roottransformation. For substantial skewness, use a logtransformation. For sever skewness, use an inversetransformation. 18. Computing Explore descriptivestatistics To compute the statistics needed for evaluating the normality of a variable, select the Explore command from the Descriptive Statistics menu. 19. Adding the variable to be evaluatedSecond, click on rightarrow button to movethe highlighted variableto the Dependent List.First, click on thevariable to be includedin the analysis tohighlight it. 20. Selecting statistics to be computed To select the statistics for the output, click on the Statistics command button. 21. Including descriptive statistics First, click on the Descriptives checkbox to select it. Clear the other checkboxes. Second, click on the Continue button to complete the request for statistics. 22. Selecting charts for the output To select the diagnostic charts for the output, click on the Plots command button. 23. Including diagnostic plots andstatisticsFirst, click on theNone option buttonon the Boxplots panelsince boxplots are notas helpful as othercharts in assessingnormality.Finally, click on theContinue button tocomplete the request. Second, click on the Normality plots with tests Third, click on the Histogram checkbox to includecheckbox to include a normality plots and thehistogram in the output. You hypothesis tests for may want to examine the normality. stem-and-leaf plot as well,though I find it less useful. 24. Completing the specifications for theanalysis Click on the OK button to complete the specifications for the analysis and request SPSS to produce the output. 25. The histogramHistogramAn initial impression of the normality of the distribution 50 can be gained by examining the histogram. 40In this example, the histogram shows a substantial violation of normality caused 30by a extremely large value in the distribution. 20 Frequency 10 Std. Dev = 15.35 Mean = 10.7 0 N = TIME SPENT ON THE INTERNET 26. The normality plotNormal Q-Q Plot of TOTAL TIME SPENT ON THE INTERNET 3 2 1 0The problem with the normality of thisvariables distribution is reinforced by the Expected Normal -1normality plot. -2If the variable were normally distributed, the red dots would fit the green line very closely. In this case, the red points in the -3 upper right of the chart indicate the-40 -200 20 40 60 80 100120 severe skewing caused by the extremely large data values.Observed Value 27. The test of normalityTests of Normality aKolmogorov-SmirnovShapiro-WilkStatistic df Sig. StatisticdfSig. TOTAL TIME SPENT.24693 .000.606 93 .000 ON THE INTERNETa. Lilliefors Significance Correction Problem 1 asks about the results of the test of normality. Since the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead. The null hypothesis for the test of normality states that the actual distribution of the variable is equal to the expected distribution, i.e., the variable is normally distributed. Since the probability associated with the test of normality is &lt; 0.001 is less than or equal to the level of significance (0.01), we reject the null hypothesis and conclude that total hours spent on the Internet is not normally distributed. (Note: we report the probability as 5641-4 865-14 127 15-24 490 25-3466 35-44 806 45-54 1,425 55-64 3,511 65-74 6,932 75-8410,101+859825 Total34,52485 86. Frequency TableDataFrequency CumulativeRelativeCumulativeFrequencyFrequency RelativeIntervals)%( )%(Frequency10-19 5 520-29182330-39103340-49134650-59 45060-69 45470-79 256Total 86 87. Cumulative Relative Frequency Cumulative Relative Frequency thepercentage of persons having ameasurement less than or equal to theupper boundary of the class interval. i.e. cumulative relative frequency for the 3rdinterval of our data example: 8.8+13.3+17.5 = 59.6%- We say that 59.6% of the children have weightsbelow 39.5 pounds.March 28, 2013 87 88. Number of Intervals There is no clear-cut rule on the numberof intervals or classes that should be used. Too many intervals the data may not besummarized enough for a clearvisualization of how they are distributed. Too few intervals the data may be over-summarized and some of the details of thedistribution may be lost.March 28, 2013 88 89. Presenting DataChart - Visual representation of afrequency distribution that helps to gain insight about what the data mean.-Built with lines, area &amp; text: barchartsEx: bar chart, pie chart 90. Bar Chart Simplest form of chart Used to display ETHICAL ISSUES SCALEnominal or ordinalITEM 8data 60 50 40 PERCENT 30 20 10 0 NeverSeldom Somet imes FrequentlyACTING AGAINST YOUR OWN PERSONAL/RELIGIOUS VIEWS 91. Horizontal Bar Chart CLINICAL PRACTICE AREAAcute Care Critical Care Gerontology CLINICAL PRACTICE AREAP ost Anesthesia Perinatal Clinical Research Family Nursing NeonatalPsych/Mental HealthCommunity Health General Practice OrthopedicsPrimary CareOperating RoomMedicalOncology Other 02 4 6810 12 14PERCENT 92. Cluster Bar Chart 70 60 50 PERCENT 40 30 Employment 20Full tim e RN 10 Part tim e RN 0Self employedDiplomaB achelor DegreeAs sociate Degree Post BacRN HIGHEST EDUCATION 93. Pie Chart Alternative to barchart Circle partitioned into Doctorate NonNursingDoctorate NursingpercentageMS NonNursingMS Nursing Missingdistributions ofJuris Doctor Diploma-Nursingqualitative variables BS NonNursingwith total area of100% AD NursingBS Nursing 94. Histogram Appropriate for interval, ratio andsometimes ordinal data Similar to bar charts but bars are placedside by side Often used to represent both frequenciesand percentages Most histograms have from 5 to 20 bars 95. Histogram8060FREQUENCY4020Std. Dev = 22.17Mean = 61.60 N = 439.00 SF-36 VITALITY SCORES 96. Pictures of Data: HistogramsBlood pressure data on a sample of 113 men 20 15 Number of Men 10 5 080 100120 140 160 Systolic BP (mmHg)Histogram of the Systolic Blood Pressure for 113 men. Each barspans a width of 5 mmHg on the horizontal axis. The height of eachbar represents the number of individuals with SBP in that range.March 28, 201396 97. Frequency PolygonFrequency Polygon20 First place a dot at the18 midpoint of the upper base of16 each rectangular bar.14Childrens w eights The points are connected with12 straight lines.10 At the ends, the points are 8 connected to the midpoints of 6 the previous and succeeding 4 intervals (these intervals have 2 zero frequency). 04.514.5 24.5 34.544.5 54.564.5 74.5 84.5 March 28, 201397 98. Hallmarks of a Good Chart Simple &amp; easy to read Placed correctly within text Use color only when it has a purpose, notsolely for decoration Make sure others can understand chart;try it out on somebody first Remember: A poor chart is worse than nochart at all. 99. Cumulative Frequency PlotWeights of Daycare Children Place a point with a horizontal120% axis marked at the upper class boundary and a vertical axis100% marked at the corresponding cumulative frequency.80% Each point represents theof Children cumulative relative frequency Percent60% and the points are connected40%with straight lines. The left end is connected to20% the lower boundary of the first interval that has data. 0% 9.5 19.529.5 39.5 49.5 59.569.5 79.5 89.5Weight RangeMarch 28, 2013 99 100. Coefficient of Correlation Measure of linear association between 2continuous variables. Setting: two measurements are made for eachobservation. Sample consists of pairs of values and youwant to determine the association between thevariables.March 28, 2013 100 101. Association Examples Example 1: Association between a mothersweight and the birth weight of her child 2 measurements: mothers weight and babys weightBoth continuous measures Example 2: Association between a risk factor anda disease 2 measurements: disease status and risk factor statusBoth dichotomous measurementsMarch 28, 2013101 102. Correlation Analysis When you have 2 continuousmeasurements you use correlationanalysis to determine the relationshipbetween the variables. Through correlation analysis you cancalculate a number that relates to thestrength of the linear association.March 28, 2013 102 103. Types of Relationships There are 2 types of relationships: Deterministic relationship the values of the 2variables are related through an exactmathematical formula. Statistical relationship this is not a perfectrelationship!!!March 28, 2013 103 104. Scatter Plots and Association You can plot the 2 variables in a scatter plot (oneof the types of charts in SPSS/Excel). The pattern of the dots in the plot indicate thestatistical relationship between the variables (thestrength and the direction). Positive relationship pattern goes from lower left toupper right. Negative relationship pattern goes from upper left tolower right. The more the dots cluster around a straight line thestronger the linear relationship.March 28, 2013104 105. Birth Weight Datax (oz) y(%) 112 63 111 66 x birth weight in ounces 107 72y increase in weight between 119 5270th and 100th days of life,92 75expressed as a percentage of8011881120birth weight84114 118 42 106 72 103 9094 91 105 106. Pearson Correlation Coefficient Birth Weight Data120110 Increase in Birth Weight )%(10090807060504070 80 90 100 110 120 130 140Birth Weight )in ounces(March 28, 2013 106 107. Calculations of CorrelationCoefficient In SPSS: Go to TOOLS menu and select DATAANALYSIS. Highlight CORRELATION and click ok Enter INPUT RANGE (2 columns of data thatcontain x and y) Click ok (cells where you want the answer tobe placed.March 28, 2013 107 108. Pearson Correlation Results x(oz)y(%) x (oz) 1 y(%)-0.94629 1Pearson Correlation Coefficient = -0.946Interpretation: - values near 1 indicate strong positive linear relationship - values near 1 indicate strong negative linear relationship - values near 0 indicate a weak linear associationMarch 28, 2013 108 109. !!!!CAUTION Interpreting the correlation c...</p>


View more >