A Gentle Introduction to Stata - Oregon State acock/hdfs361/stata1_3.pdf · A Gentle Introduction to…

  • Published on
    27-Jul-2018

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

  • A Gentle Introduction to Stata

  • A Gentle Introduction to Stata

    Alan AcockOregon State University

    A Stata Press PublicationSTATA CORPORATIONCollege Station, Texas

  • Stata Press, 4905 Lakeway Drive, College Station, Texas 77845

    Copyright c 2005 by StataCorp LPAll rights reservedTypeset in LATEX2Printed in the United States of America

    10 9 8 7 6 5 4 3 2 1

    ISBN !!

    This book is protected by copyright. All rights are reserved. No part of this book may be repro-duced, stored in a retrieval system, or transcribed, in any form or by any meanselectronic,mechanical, photocopying, recording, or otherwisewithout the prior written permission ofStataCorp LP.

    Stata is a registered trademark of StataCorp LP. LATEX2 is a trademark of the AmericanMathematical Society.

  • AcknowledgementsI would like to acknowledge the support of the Stata staff who have worked with

    me on this project. Special thanks goes to Lisa Gilmore, the Production Manager,xxxxx, the Copy Editor, and xxx for verifying all the commands used in this volume. Ialso want to thank my students who have tested my ideas for the book. They are toonumerous to mention, but special thanks goes to Patricia Meierdiercks and ShannonWanless.

    Bennet Fauber, during the time he was affiliated with Stata Corporation, providedhours and hours of support on all aspects of this project. He taught me the LATEX2document preparation system used by Stata Press and his patience with many of myproblems and mistakes has inspired me to have more patience with my own students.Bennet also had a major input on the topical coverage and organization of this volume.He provided the initial draft of chapter 4 and his superior expertise on Stata commands,data management, and do-files was critical. Bennet also provided extensive editorialsuggestions and substantive editing for the first three chapters of the book. Whateverquality this book has owes an enormous debt to Bennets conceptual and technicalcontributions. The books completion owes a lot to his encouragement.

    Finally, I would like to thank my wife, Toni Acock, for her support and for hertolerance for my endless excuses for why I could not do things. She had to pick upmany tasks I should have done and she usually smiled when told it was because I hadto finish this book.

  • Contents

    Preface xiii

    Notation and Typography xv

    1 Getting Started 1

    1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 The Stata Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Using an existing dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.4 An example of a short Stata session . . . . . . . . . . . . . . . . . . . . 9

    1.5 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    1.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2 Entering Data 19

    2.1 Creating a dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.2 An example questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.3 Develop a coding system . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.4 Entering data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.4.1 Labeling values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    2.5 Saving your dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    2.6 Checking the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    2.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3 Preparing Data for Analysis 37

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

  • viii Contents

    3.2 Plan your work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.3 Create value labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    3.4 Reverse code variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.5 Create and modify variables . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.6 Create scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    3.7 Save some of your data . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    Author index 59

    Subject index 61

  • List of Tables

    2.1 Example questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.2 Example codebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    2.3 Example coding sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.1 Sample project task list . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    3.2 NLSY97 sample codebook entries . . . . . . . . . . . . . . . . . . . . . 40

    3.3 Reverse coding plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.4 Arithmetic symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

  • List of Figures

    1.1 Statas opening screen . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2 The . Prefs . Save Windowing Preferences menu . . . . . . . . . . . . . 4

    1.3 The . Prefs . Stata compact setting appearance menu . . . . . . . . . . 5

    1.4 The Stata screen layout used in this book . . . . . . . . . . . . . . . . 6

    1.5 Statas tool bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.6 Stata command to open cancer dataset . . . . . . . . . . . . . . . . . . 8

    1.7 Histogram of age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.8 Histogram dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.9 The tabs on the histogram dialog box . . . . . . . . . . . . . . . . . . 12

    1.10 The Title tab of the histogram dialog box . . . . . . . . . . . . . . . . 13

    1.11 The Options tab of the histogram dialog box . . . . . . . . . . . . . . 13

    1.12 First attempt at an improved histogram . . . . . . . . . . . . . . . . . 14

    1.13 Final histogram of age . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.1 Data editor window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.2 Variable name and variable label . . . . . . . . . . . . . . . . . . . . . 27

    2.3 Define schemes dialog box . . . . . . . . . . . . . . . . . . . . . . . . . 29

    2.4 Define schemes dialog box . . . . . . . . . . . . . . . . . . . . . . . . . 29

    2.5 Describe dataset dialog box . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.1 Create new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.2 Recode: specifying recode rules on the main tab . . . . . . . . . . . . . 44

    3.3 Recode: specifying new variable name on the options tab . . . . . . . 44

    3.4 Create new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    3.5 Two-way tabulation dialog . . . . . . . . . . . . . . . . . . . . . . . . . 50

  • xii List of Figures

    3.6 The extended generate dialog . . . . . . . . . . . . . . . . . . . . . . . 52

    3.7 The extended generate dialog . . . . . . . . . . . . . . . . . . . . . . . 53

    3.8 Selecting variables to drop . . . . . . . . . . . . . . . . . . . . . . . . . 55

    3.9 Selecting observations with an expression . . . . . . . . . . . . . . . . 55

  • Preface

    This book was written with a particular reader in mind. This reader needs to learnStata, but has no prior experience with other statistical software packages and is learn-ing social statistics. When I learned Stata myself I found no books that I felt werewritten explicitly for this reader. There are certainly excellent books on Stata, but theyassumed prior experience with other packages such as SAS or SPSS, they assumed afairly advanced working knowledge of statistics, or both of these. These books are ableto move more quickly to more advanced topics, but they left my intended reader in thedust. Readers who have more background in statistical software and statistics than Iam assuming here will be able to read chapters quickly and even skip sections. The goalis to move the true beginner to a level of competence using Stata.

    With our target reader in mind, I make far more use of the Stata menu system thanany other books about Stata. Advanced users may not see the value to using the menusand the more people learn about Stata the less they will rely on the menus. Also, evenwhen using the menu system it is still important to save a record of the sequence ofcommands you ran. Even though I rely on the commands much more than the menusin my own work, I still find value in the menus. They include many options that I mightnot have known or might have forgotten. This is most evident with graphs where thevisual quality of graphs can be greatly enhanced using the menu system.

    To illustrate the menu system as well as graphics, I have included over 80 figures,many of which show menus. There are numerous tables and extensive Stata resultsthat are presented as they appear on the screen and are given a substantive interpreta-tion. This is done in the belief that beginning Stata users need to learn more than justhow to produce the results. It is also necessary to go through the results and interpretthem.

    I have tried to use real data. There are a few examples where it is just too mucheasier to illustrate a point with hypothetical data, but for the most part, I use datathat is in the public domain. The General Social Survey for 2002 is used in manychapters as is the National Survey of Youth, 1997. Ive simplified the files by droppingmany of the variables in the original datasets, but Ive kept all of the observations. Ihave tried to pick examples from a variety of social science fields and I have includedadditional variables so that instructors as well as readers can make additional examplesand exercises that are tailored to their discipline. People who are use to working withstatistics books that have contrived data with just a few observations, presumably sowork can be done by hand, may be surprised to see over 1,000 observations in ourdatasets. Working with these files provides better experience for other real world data

  • xiv Preface

    analysis.

    The exercises use the same datasets that are used in the rest of the book. A numberof the exercises require some data management prior to estimating a model. This isdone in the belief that learning data management requires a lot of practice and cannotbe isolated in a single chapter or single set of exercises.

    This book takes the student through much of what is done in introductory andintermediate statistics courses. We cover descriptive statistics, charts, graphs, tests ofsignificance for simple tables, tests for one and two variables, correlation and regression,analysis of variance, multiple regression, and logistic regression. By combining this withan introduction to creating and managing a dataset, students are well prepared to goeven further. More advanced statistical analysis using Stata is often even simpler from aprogramming point of view than what we cover. If an intermediate course goes beyondwhat we do with logistic regression to multinomial logistic regression, for example,the programming is simple enough. The command logit is simply replaced with thecommand mlogit. The added complexity of these advanced statistics is the statisticsthemselves and not the Stata commands that implement them. Therefore, althoughmore advanced statistics are not included in this book, the reader who learns thesestatistics will be more than able to learn the corresponding Stata commands from theStata documentation and help system.

    I assume the reader is running Stata on Windows based PC. Stata works as wellor better on Macs and Unix systems. Readers who are running Stata on one of thosesystems will have to make minor adjustments.

    Alan C. Acock Corvallis, Oregon

  • Notation and Typography

    We designed this book for you to learn by doing, so we expect you to read this bookwhile sitting at a computer so you can try using the sequences of commands containedin the book to replicate our results. In this way, you will be able to generalize thesesequences to suit your own needs.

    Generally, we use the typewriter font command to refer to Stata commands, syntax,and variables. We show commands as they will appear in Stata output. That includesthe . prompt, which is not part of the command and should not be typed.

    Except for some very small expository datasets, all the data we use in this book arefreely available for you to download. For example,

    . use http://www.stata-press.com/data/!!!/gss 2002.dta, clear

    Try it.

    This text complements the material in the Stata manuals but does not replace it.For example, how to generate tables and graphs are shown in Chapters 5 and 6, butthese are only a few of the possibilities described in the Stata Reference Manual. Ourhope is to give you sufficient background that you can use the manuals effectively.

  • 1 Getting Started

    1.1 Introduction1.2 The Stata Screen1.3 Using an existing dataset1.4 An example of a short Stata session1.5 Conventions1.6 Chapter Summary1.7 Exercises

    1.1 Introduction

    This book was written with the belief that the best way to learn data analysis is toactually do it with real data. These days, doing statistics means doing statistics with acomputer.

    Work along with the book

    Although it isnt absolutely necessary, youll probably find it very helpful to haveStata running while you read this book so you can follow along and experimentfor yourself when you have a question about something. Having your hands on akeyboard and replicating the instructions in this book will make the lessons thatmuch more effective, but more importantly, youll get in the habit of just tryingsomething new when you think of it and seeing what happens. In the end, that ishow you will really learn how Stata works. The other great advantage to followingalong is that you can save the examples we do for future use.

    Stata is a powerful tool for analyzing data. Stata can make statistics and dataanalysis fun because it does so much of the tedious work for you. A new user of Statashould start by using the menus. As you learn more about Stata you will be able todo more sophisticated analyses with Stata commands. Commands can be saved in files,that Stata calls do-files so a series of commands can be run all at once, which can save agreat deal of time. Learning Stata well is an investment that will pay off in saved time

    1

  • 2 Chapter 1. Getting Started

    later. Stata is constantly being extended with new capabilities, which can be installedby Stata itself from the internet. Stata is a program that grows with you.

    Stata is a command driven program. It has a remarkably simple command structurethat you use to tell it want you want it to do. You can use a menu to generate thecommands (this is a great way to learn the commands or prompt yourself if you dontreme...

Recommended

View more >