Data Analysis and Statistical Services for Students and Businesses 
Research Data Analysts, Online Statistics Help Methods Chapter Writing Guidance Qualitative and Quantitative Methods Thesis and Dissertation Statistics Consulting

Using Stata to Automate Summary Statistics in Longitudinal Data
by William Buchanan

Often, students find Stata to be a difficult program to use due to the command prompt interface; I know I did when I first started using the program.  However, one of the greatest benefits of Stata is the flexibility that it gives you to run different statistical procedures and to automate your work.  So, I wanted to provide you with some tips that can help you to generate the summary statistics that will be part of your dissertation. 

The first problem that you may encounter is the “shape” of the data.  Although it might make sense to people to create a single row of data for each subject, it doesn’t make much sense to statistical packages when they try to analyze longitudinal data.  The first step is to get your data in the correct shape.  For example, your data may like something like this:

SubjectID

Y12009

Y12010

Y12011

X12009

X12010

X12011

1

75

85

100

5

10

12

2

10

35

50

8

9

13

3

50

62

76

1

3

7

In Stata, this is referred to as data that is “wide.”  But, to do the analysis that you want to do, you need to get the data into “long” format like this:

SubjectID

Year

Y1

X1

1

2009

75

5

1

2010

85

10

1

2011

100

12

2

2009

10

8

2

2010

35

9

2

2011

50

13

3

2009

50

1

3

2010

62

3

3

2011

76

7

This is a pretty simple transformation in this case.  The following command would take the data from wide format and transform it into long format.

reshape long Y1 X1, i(SubjectID) j(year)

Here, you are telling Stata that variables that start with Y1 or X1 need to be transformed, since they represent data occurring over time; i(SubjectID) tells the program that it can identify how many rows need to be created by finding out the number of unique cases and adding an entry j(year) for every time that data was collected.  Once your data is formatted correctly, it is easy to automate your summary statistics using some simple Stata programming commands.

There are a few different commands in Stata that are used to create loops.  These will repeat the same process until it runs out of the values that you provide.  For longitudinal data analysis, a great advantage is the time variable (Year). 

forvalues i=2009/2011{

sum Y1 X1 if Year==`i’

}

The code above will run all of your summary stats for your dependent and independent variables.  It does this using the – forvalues – command to loop over the years, or any other numerical value, that you provide.  The command forvalues creates a local macro (kind of like an abbreviation) that you can use to create loops.  In the example above, the macro is called – i -.  The rest of the command tells Stata that your macro (i) is equal to numerical values 2009 through 2011 in increments of 1, so it is an abbreviation for 2009, 2010, 2011.  The forvalues loop begins with a curly bracket that opens to the right “{“ and ends with a curly bracket that opens to the left “}”.  The commands that you put inside of the brackets are the commands that Stata will loop over.  In this case, we told Stata to run the – sum – command to provide summary statistics for your Y1 and X1 variables if the year is equal to (==) the values from your macro. 

It’s important to know that when you use a local macro it needs to be enclosed with a left single quotation mark ` and a right single quotation mark ‘ in order to work; in the example above you should see `i’.  Macros can also be used elsewhere in the software and can help you to run your analysis more quickly.  For example, you could create a local macro with all of your control variables in it:

local controlvariables c1 c2 c3 c4 c5 c6 c7 c8 c9

So that you can save time when running all of your different statistical models:

regress y1 x1 `controlvariables’

regress y1 x2 `controlvariables’

regress y1 x3 `controlvariables’

I hope that these tips are helpful as you begin analyzing your data and as you move forward with your research.  If you have any questions please feel free to contact me through the network and I can work with you to develop your understanding of the Stata software package.


Return to William Buchanan's page


 

MENU

Home

-- Free Quote --
Select Your Statistics Consultant
(Bios)


Free Estimate

*** You must answer the submission questions to receive a response ***

To request a free quote, please CUT AND PASTE the questions and answers below into an e-mail:

2012@statisticstutors.com

Alternates (not hyperlinked):
2012stats @ Gmail.com
Help @ DissertationAdvising.com

*** You must answer the submission questions to receive a response ***

(01) Your name:

(02) E-mail address:

(03) Day/evening phone numbers (* Required -- in case the response to your e-mail bounces or the editors need clarification regarding the scope of service needed, deadline, etc.):

(04) City, State, Country (or time zone):

(05) Provide a short description of your project and your consulting needs (e.g., data analysis, report writing, charts/graphs, software tutoring):

(06) Are you using a particular brand of statistical software (e.g., SPSS, SAS, JMP, Excel, MatLab)?

(07) What is your academic department / research topic ?

(08) When is your final deadline?

(09) Would you want your consultant to produce tables/graphs/charts?

(10) Would you want your consultant to produce a results narrative?

(11) The name(s) of the statistician(s) you'd like to contact (optional):

(12) How did you learn about our service?:

Attach relevant files/documents: (e.g., spreadsheets, university guidelines, SPSS file, proposal, questionnaire, etc. Please zip large files)

Once your e-mail is received, the network coordinator will forward it (plus any attached files) to the consultant(s) you have selected. If you have not selected consultants, your e-mail will be sent to several consultant(s) chosen by the network coordinator.

If you sent a submission during U.S. business hours and do not get responses within 3 hours, please page the webmaster, and/or resend your submission to the alternate e-mail addresses, and/or leave voicemail for the webmaster: 469-789-3030.

The webmaster cannot quote prices for the freelance statisticians associated with this network. The statistician(s) will contact you directly after receiving your submission and will answer your questions regarding services offered, price, and turnaround time.


Associated Consulting Networks

Thesis and Dissertation Advisors On Call

Technical, Medical, Business, Legal, Education, and Scientific Writing

Thesis and Dissertation Editing

APA Format Experts

Book Editing Associates


Webmaster


Note: The webmaster assures that your submissions receive a response. The webmaster is not a statistician and cannot answer statistics questions. Please submit a request for service to receive a response from a member of our statisticians network.


Have you worked with one (or more)

of our consultants?

Please send feedback

to the network coordinator