Clive--
While my answer to Cristina Solera may leave something to be desired,
I think much of your reply is incorrect--perhaps you misread the
original post and my reply to it? Writing
Y = B0 + B1_X + B2_Z + B3_XZ + e
the variable X is an indicator, and Z is missing whenever X is zero,
because it is theoretically undefined (what is the characteristic of a
partner when there is no partner?). My point was that you can only
identify the "effect" of Z when X is one, since you only observe Z
when X is one, and this is like an interaction term XZ where XZ is
zero when X is zero (ignoring the fact that the product XZ would
actually be missing). That is, there is no Z and XZ per se, but it is
useful to think of partner's occ score as XZ. Including a dummy X and
a continuous variable that is missing where X is zero would of course
lead to X being dropped and the estimation being constrained to the
sample where X==1... the continuous variable can be seen as a main
effect, or an interaction--or neither--and you can only see its
"effect" in the subsample where it is defined.
My point was largely intended to address the interpretation of the
"effect" of X is this setting--by reframing the inclusion of a
variable W=max(Z,0) as the inclusion of XZ without including Z, it
makes clear that there is potential bias (of several varieties).
Thinking of the extra variable as XZ clarifies the calculation of
marginal effects--the simplistic approach would be to calculate the
"effect" of X as B1+B3*EZ where EZ is the mean of Z for the estimation
sample, not B1+B3 as you wrote... but I believe the model is more like
a logit, and certainly not the linear model you have written out. In
this setting, I think one would prefer to impute X=0 for all cases and
predict the probability of exit (Y=1), then impute X=1 for all cases
and predict the probability of exit. Think of it as a simple logit:
logit exit married marriedpocc, cluster(id)
replace married=0
replace marriedpocc=married*pocc
predict exit0, p
replace married=1
replace marriedpocc=married*pocc
predict exit1, p
g biased_fx_marriage=exit1-exit0
sum biased_fx_marriage
The difference of these predictions would be an estimate of the
marginal effect, right? And you might bootstrap the standard errors,
or figure out the analytic SE. The trouble is, W=max(Z,0) (or the
interaction term XZ) is zero for all the unmarried folk, and the
estimate of exit1 (the prediction when replacing married=1 everywhere)
you'd like to calculate is impossible.
Moving to a set of indicators, e.g. {notmarr, poorhusband,
richhusband}, based on values of Z (occ score for partner) also
clarifies the problems with calculating the effect of X in such a
model. Note, however, my emphasis in the original (in the part
omitted and replaced by [...] in the reply) on the lack of any clear
interpretation for the coefficients, given all the obvious sources of
bias.
I welcome comment, just to be clear--but please reread the original
statement of the problem, and my response, carefully.
--Austin
On 10/12/06, Clive Nicholas <Clive.Nicholas@newcastle.ac.uk> wrote:
Austin Nichols replied to Cristina Solera:
> Under the most generous assumptions, you can only identify the
> "effect" of partner's occupational score for those with partners, i.e.
> you can get a coefficient on the interaction married*partneroccscore
> but not on the main effect partneroccscore, letting the interaction be
> zero whenever someone has no partner (assuming haspartner==married in
> this context). If you include both variables, the "marginal effect"
> of moving from married==0 to married==1 will be much trickier to
> estimate. You might prefer a set of indicators {notmarr, poorhusband,
> richhusband}, or somesuch (with the excluded category the middle third
> of partners' occupational scores, perhaps).
[...]
I know nothing of discrete time-duration models, but I do know that if
you're going to fit an interactive version of such a model, the
interaction _and_ both of its composite variables must be included so as
not to improperly constrain the latter variables to be zero (which could
well lead to the remaining parameter estimates being biased), thus:
Y = B0 + B1_X + B2_Z + B3_XZ + e.
The -haspartner- (B1) variable could only be interpreted when -occscore-
(B2) variable equals zero. Cristina never defined the scoring for this
variable, but if an interaction (B3) of the two variables is sensible
given her data, model and theory, and "occupational score" is never 0,
then B1, even if it _can_ be estimated, is meaningless anyway.
Estimating the marginal effect of B1 would simply be achieved by
calculating B1 + B3. Calculating its standard error is a bit trickier,
but
it would be given by:
ME_B1 = sqrt((var_B1) + Z^2(B3) + 2Zcov(B1B3))
but, of course, this needs access to the variance-covariance matrix, and
only Cristina has that! See the link here
http://homepages.nyu.edu/~mrg217/interaction.html#literature
by Brambor, Clark and Golder for more on this, which also contains useful
Stata code to automate the calculation of MEs.
In essence, the main action is in the interaction: but the model has to
be
properly estimated and its underlying theroretical expectations sound.
CLIVE NICHOLAS
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/