Identifying the Variables
In the given dataset, the censoring variable will be stroke, because the cases will be censored according to it; the cases will be right-censored if stroke=0, i.e. for those patients who did not have a stroke while having been observed.
The time-to-event variable will be followed; it shows how long the patients had been observed before they suffered from a stroke or withdrew from the study (were censored).
Tied events are events which occurred at the same time (Borucka, 2014). As can be seen from the frequency table for the variable followed in Appendix 1, there were no tied events in the data; all the 31 events where followed=32.02 are censored.
For a research question “Is there an association between hypertension and the time a person was followed before experiencing a stroke?,” hypertension is the independent variable.
Testing the Assumption of Proportionality Using Kaplan-Meier Method
The hazard plot for the given dataset using time=followed, status=stroke, and factor=hypertension, is as follows:
The plot of hazard functions can be utilized in order to evaluate whether the assumption of proportionality of hazards is violated (Forthofer, Lee, & Hernandez, 2007).
It appears that the baseline hazards (the blue line, the one which depicts hazards for the group with no hypertension) are proportional to the hazards in the tested group (the green line, which represents hazards for the groups with hypertension) at least at some intervals:
- At the interval of time from 0 to approximately 10 months, the hazards do not appear proportional, for the baseline hazards remain constant and very close to 0, while the hazards in the tested group increase;
- During the interval of time from nearly 10 to roughly 22 months, the hazards are proportional (apparently), for both lines grow constantly, even though the hazards in the tested group grow faster;
- During the interval of time from approximately 22 to 32.02 months, the hazards also seem to be proportional, although the proportion appears to differ from that which held in the previous interval, for the hazards in the tested group go up considerably faster.
If the hazards are proportional in both the test/experimental group and the baseline group, it means that the assumption of proportional hazards is satisfied, and the Cox proportional hazards model can be built; there is no need to look for alternative tests or to try to adjust the test (Forthofer et al., 2007).
The Cox Proportional Hazards Test
Reporting and Interpreting the Test
The results of the test (time = followed, status = stroke, covariates = hypertension(Cat)) are as follows:
Therefore, the variable hypertension significantly predicted the difference in survival (that is, in the occurrence of a stroke): B(1)=-1.081, p<.001. Therefore, the hazard ratio based on the variable hypertension was: Exp(B)=.339 (95% confidence interval:.208-.553), which means that for a patient without hypertension, the stroke hazard is approximately.339 times of that of a patient with hypertension.
The hazard plot shows the change in cumulative hazard of stroke for groups with and without hypertension with the passage of time. The horizontal axis shows time, whereas the vertical axis shows the cumulative hazard, which equals to the negative log of probability of survival (Forthofer et al., 2007). Clearly, the hazard grows considerably faster in the group with hypertension.
Effects of the Presence of Ties
It is stressed that while calculating the partial likelihood function, the order in which events occur plays an important role, because each time an event takes place, certain expressions describing the states of all the subjects that are at risk at the time of that event are added; therefore, if two events are tied, it is not clear which of the participants should be considered having an event, and which ought to be counted as being at risk (Borucka, 2014, p. 95). This hinders the calculation of the partial likelihood function (Borucka, 2014).
To address the problem of ties, Borucka (2014) offers five different methods, the simplest of which is subtracting a very small random number from the time of tied events (it is stated to be quite effective), two more complicated ones are Efron and Breslow approximations (Efron approximation is claimed to be better), whereas the discrete model and the exact expression are the methods which are argued to result in the best model fit but are cumbersome.
References
Borucka, J. (2014). Methods for handling tied events in the Cox proportional hazard model. Studia Oeconomica Posnaniensia, 2(2), 91-106. Web.
Forthofer, R. N., Lee, E. S., & Hernandez, M. (2007). Biostatistics: A guide to design, analysis, and discovery (2nd ed.). Burlington, MA: Elsevier Academic Press.
Appendix 1
Frequencies table for the variable followed: