import - Use values from Stata to R data.frame but show labels instead of values -
i quite new r
, more used stata
.
i managed read database stata
r
data.frame
using library(foreign)
.
data=read.dta("mydata.dta", convert.dates = true, convert.factors = true, missing.type = false, convert.underscore = false, warn.missing.labels = true)
values (in sens of stata language) not imported, labels imported.
let me explain little more. assume want manipulate education variable called "edu". in stata language, use numeric values instead of labels manipulate variable , data editor shows labels, long have defined labels. assume instance variable "edu" takes values 10 40, following code associates label each value:
label define lib_edu 10 "less high-school degree" 20 "12th grade or higher, no college degree" 30 "undergraduate level (2 4 years of college)" 40 "graduate level (5 years of college or more)", add; label values edu lib_edu;
then, when want manipulate variable, need use values. example if want drop dataset people label less high-school degree, do:
drop if edu==10
but in imported r
data.frame
, labels being imported factors. each factor associated level not correspond stata values since restarts 1. meanwhile, cannot use levels manipulate variable. if want drop dataset people label less high-school degree, have write entire label:
data <- data[data$edu!="less high-school degree",]
which not convenient @ all, when label long , complex.
is possible in stata, is: manipulate numeric values while editing data.frame labels, given data exported stata?
thanking in advance.
you can approach problem 2 directions: 1. can drop value labels within stata before import data r, or 2. can change data import settings data.frame within r. of these 2 routes easier depend degree on version of stata have , format of data.
option 1:
if want within stata, recommend first reading , possibly installing "label utilities" package ssc: sac inst labutil
. package contains, among many other useful tools manipulating labels, labdtch
or "label detach" command, dissociate value labels actual values in stata data. obviously, before importing data r.
option 2:
if data has been saved using stata version 13, r package readstata13
save time , effort. read package: see manual on cran.
if using readstata13
option, need combination of commands get.label
and/or get.label.name
, use them inputs get.origin.codes
looking for.
finally, if using readstata13
not option, should try specifying as.numeric(levels(f))[f]
in import command in r. reasons , more details, see stackoverflow question.
i recommend trying accomplish through r if possible, give more reproducible workflow. if end doing through stata, include short comment in r file explaining did in stata before importing data.
Comments
Post a Comment