INTRODUCTION
Tobacco use is an important risk factor for the growing burden of non-communicable diseases in Southeast Asia1. More than 4000 people die every day due to tobacco use and secondhand smoke in the region1. In Nepal, tobacco use has a high prevalence and is responsible for more than 27000 deaths annually2. A review of the literature has shown the mean age of smoking initiation (AOI) ranged 10.2 to 18. 6 years in different populations3. In the Nepalese population, those who initiated smoking before the age of 16 years were addicted to nicotine4.
The surgeon general report asserts that adolescents who initiated smoking at an early age were more likely to be addicted to cigarettes and have more difficulty quitting in adulthood5,6. The majority of Nepalese adults who smoke started smoking before the age of 18 years4. Adolescents are the entry point through which tobacco addiction enters the population5. There was evidence that AOI was linked with the risk of cardiovascular disease and lung cancer, and the high smoking prevalence is associated with sleep apnea syndrome7-9. Therefore, the age of smoking initiation and the smoking prevalence are two prime and crucial indicators for policymakers in tobacco control and public health6.
Most of the tobacco control studies published in scientific journals test the hypothesis using the frequentist approach10,11. The frequentist approach estimates the p-value which is the probability of obtaining the results, assuming the null hypothesis (H0) is true12-16. This approach is based on the lower the p-value, the stronger the evidence against the null hypothesis. However, there is a practice of producing false positive results in scientific publications making statistical results significant (p<0.05) rather than testing a hypothesis17. This misuse is wide spread at all levels and needs to be addressed18,19. One of the alternatives to the p-value approach is the Bayesian approach which includes the Bayes factor (BF) that has been popular in recent years as a substitute for p-values by comparing their predicted success for observed data12-16,18-19.
The study aims to compare the prevalence of smoking using the frequentist and the Bayesian approach for STEPS survey data of 2019 with STEPS survey data of 2008 and 201320-22. Similarly, a comparison of the age of smoking initiation across genders has also been examined for STEPS survey data 201920.
METHODS
Study design and sampling techniques
This is an analysis based on secondary data collected from the WHO STEPS survey 2008, 2013 and 2019 in Nepal to compare the smoking prevalence rate and mean age of smoking initiation20-22. The STEPS survey is a nationally representative cross-sectional study carried out to collect up-to-date information on NCDs risk factors. In 2019, 5593 individuals from all seven provinces were selected by multistage cluster sampling. A total of 259 clusters/wards were selected as primary sampling units, maintaining 37 clusters from each province. From each cluster, 25 households were selected by using systematic sampling. One individual above 14 years and below 70 years was selected randomly and data were collected on Android tablets20. In 2013, 4200 individuals aged 15–69 years were selected by multistage cluster sampling from 70 IIakas of Nepal22. In 2008, 4328 respondents aged 15–64 years were selected from 15 districts of Nepal through cluster sampling23. The detailed sampling strategy is available in the STEPS survey 2019 report20-22.
Study variables
The following variables were selected as per the objective of the study20-22. Current smoker was a person who has smoked tobacco products in the last 30 days. Based on current smoking behavior, smoking prevalence was computed. Sex as a variable, with males and females who have initiated smoking or smoked their first cigarette, and age of smoking initiation as the age of an individual who had initiated smoking or smoked his/her first cigarette.
The data on the current smoking rate were collected from STEPS survey 2008 and 2013 reports because of the unavailability of raw data. We extracted two variables, i.e. current smoking behavior and the age of smoking initiation from the STEPS survey 2019 data set available in the supplementary section of the published article23. Then, the statistical analysis was planned to compare the prevalence rate between surveys and the mean age of smoking initiation between males and females for the STEPS survey 201920.
Statistical analysis
Both frequentist and Bayesian approaches are applied to draw inferences on study variables. First, the following hypothesis is set up for the frequentist approach and later it is used for the Bayesian approach.
Hypothesis I
Null hypothesis (H0): The smoking prevalence rates between 2013 and 2019 are similar (H0: δ=0).
Alternative hypothesis (H-): The smoking prevalence rate declined between 2013 and 2019 (H-: δ<0).
Hypothesis II
Null hypothesis (H0): The smoking prevalence rate is similar between 2008 and 2019 (H0: δ=0).
Alternative hypothesis (H-): The smoking prevalence rate declined between 2008 and 2019 (H: δ<0).
Hypothesis III
Null hypothesis (H0): There is no difference in the mean age of smoking initiation between males and females (STEPS survey 2019) (H0: δ=0). Alternative hypothesis (H1): There is a difference in the mean age of smoking initiation between males and females (STEPS survey 2019) (H1: δ≠0).
In the above, δ is the difference in proportion or effect size.
For the frequentist approach, the proportion test and t-test were applied for smoking prevalence and AOI, respectively. The p-value was set at 0.05. A p-value is a conditional probability where its calculation is based on an assumption that H0 is true, i.e. p (Evidence|H0)13,24,25. Bayesian inference approach focuses on the probability of statistical hypothesis given sample data, i.e. p (Hi|Evidence). The Bayes factor measures the odds favoring the alternative hypothesis against the null hypothesis and vice versa12-16,26. The Bayes factor can be computed for both two-tail and one-tail tests. For the two-tail test, the Bayes factor (BF01) represents a test of the null hypothesis (H0) against the alternative (H1) hypothesis. Likewise, The BF10 represents a test of the H1 against the H0. The BF10 is the ratio of 1 divided by BF01, i.e. BF10 = 1/BF01. For the one-tail test, the following Bayes factor is computed and presented: BFo+ (H0 vs H1+) and BF0- (H0 vs H-) and vice versa. Bayes factors range from 0 to ∞, and a Bayes factor of 1 indicates that both hypotheses predicted the data equally well26. If the values are above 1 for B10, the data provide evidence for the null hypothesis. For example, if the value of B10 is 3, then the data are three times more likely under H1 than H0 26. Alternatively, data supporting the null or alternative hypothesis can be visually presented using the probability wheel or pizza plot shown in Supplementary file Figure 126. The Bayes factor explains how the prior beliefs about the value of parameter θ change into posterior beliefs about the value of parameter θ. The posterior distribution can be summarized by a 95 % credible interval for the amount of change or effect size (δ)13, 24-28.
We tested hypotheses by computing BF0- and BF01 for the proportion of smoking and mean AOI, respectively. We have also presented a 95% credible limit for the posterior median effect size for both mean and proportion.
Sequential analysis is a robust visual analysis technique to monitor the sampling plan in the original research and provides evidence as the data accumulates27. This output figure of sequential analysis also provides information on the convergence of the BF for the different sample sizes which helps either to stop collecting data when a pre-defined BF is achieved27. The decision on evidence is made through the BF value equal to 1. The plot provides types of evidence for the hypothesis from anecdotal to very strong depending upon the value of the Bayes factor 27. The analysis for continuous variable provides 4 prior widths with their default values (r) of Cauchy distribution: maximum attainable Bayes factor, user prior (r=0.707), wide prior (r=1), and ultra-wide prior (r=1.414) which implies robustness17,26,27. A default Cauchy prior value was set at r= 1/√2 or 0.707 for this analysis26.
The raincloud plot was constructed to check the normality of continuous data, i.e. AOI. The data were found skewed and hence transformed to log10 (N) to make the distribution normal28. Next, we also computed the value of skewness to check the normality of the data. The value lies between -1 to 1 indicating the data are normal. All these statistical analyses were performed using JASP open-sourced software which is free, friendly and flexible with its default setting for Bayesian analysis26. JASP software performed Bayesian analysis through the Markov chain simulation-based estimation method26,27. As STEPS survey 2008 collected data for ages 18–64 years, we have extracted data from STEPS survey 2019 for the age group 15–64 too, so that a comparison can be made with that of STEPS survey 2008, this way we obtained the sample size of 5281 for 2019, though the sample size reported for this survey was 5593. We were unable to perform a Bayesian analysis between STEPS survey 2008 and 2013 because of the inaccessibility of raw data for 2013.
Ethical considerations
This study utilized publicly accessible de-identified secondary data from nationally representative surveys conducted in 2008, 2013, and 201920-22. The survey had taken written consent from each participant. If the study participants were under 18 years of age, an assent form was used and permission was taken from their guardians. Each participant’s privacy was protected at all times.
RESULTS
Smoking behavior of respondents
The STEPS survey 2019 reports a prevalence of 17.1% (SE=1.0) when considering the tobacco smoking age group 15–69 years (n=5593). To square the age group 15–64 years considered in STEPS survey 2008 data for the age group 15–64 years (n=5281) has been retained, which yielded a prevalence of 16.6% (SE=1.0) for the respondents who smoked tobacco products. The previous two STEPS surveys reported 18.5% (2013) and 26.2% (2008) of the respondents who smoked tobacco products. Comparing both surveys, the prevalence of smoking declined by 1.4% (2013 vs 2019, p=0.86) and 9.6% (2008 vs 2019, p<0.001).
Hypothesis I: Comparison of the smoking prevalence between 2013 and 2019 surveys
When the smoking prevalence for 2019 is compared with the smoking prevalence for 2013 (17.1% vs 18.5%, Bayesian hypothesis: H0: δ=0 vs H-: δ<0), the BF0- is found to be 56.59. It means the results are in favor of the null hypothesis by a factor of 57 compared to the alternative hypothesis. Supplementary file Figure 2 shows the posterior effect size of 0.183 (median) with a 95% credible interval (0.173–0.195). The grey dot in the prior line is below that of the posterior line (Supplementary file Figure 2) which represents that there is evidence in favor of the null hypothesis. The probability of the wheel also explains there is evidence to support the strong null hypothesis (i.e. nearly equal to 1/30 in the graph of Supplementary file Figure 2). Further, the sequential analysis (Supplementary file Figure 3) supports these results by showing very strong evidence for the null hypothesis because the Bayes factor lies above 1 and below 1000.
Hypothesis II: Comparison of the smoking prevalence between 2008 and 2019 surveys
The smoking prevalence for 2019 is found to be lower compared with smoking prevalence of 2008 (16.6% vs 26.2%, Bayesian hypothesis: H0: δ=0 vs H: δ<0), the Bayesian factor (BF0-) is found to be 2.38×10-43 which is essentially zero indicating evidence in favor of the alternative hypothesis (Supplementary file Figure 4). It means the prevalence of smoking declined by 10 over the 10-year period. The posterior median effect size is 0.182 with a 95% credible limit of 0.172–0.193. The grey dot of the prior distribution line is above that of the solid posterior line indicating the result is in favor of the alternative hypothesis (Supplementary file Figure 4). The area covered by the probability of the wheel is similar to the area explained for BF10=30 in Supplementary file Figure 1, which supports the alternative hypothesis. The sequential analysis (Supplementary file Figure 5) reveals there is strong evidence for the alternative hypothesis (H-) because most of the values fall above 1.
Hypothesis III: Comparison of age of smoking initiation between males and females
Supplementary file Figure 6 shows the distribution of age of smoking initiation of the respondents who currently smoke tobacco products for 2019. Most of the data are dispersed after 30 years for both males and females. These data are right-skewed for both males (Sk=1.78) and females (Sk=1.83). The median AOI is 17 (IQR: 15–20) years for both males (n=630) and females (n=393). After log-transformation, the distribution of AOI is normally distributed and the value of skewness lies between -1 and 1 (males: Sk=0.18; females: Sk=0.38) (Supplementary file Figure 7).
The mean log AOI for males and females was 1.24 (95% confidence limit: 1.23–1.25) and 1.24 (1.23–1.26), respectively. The frequentist approach shows that there is no difference in the mean log AOI between male and female respondents (t= -0.46, df=1021, p=0.65).
The BF01 is 12.54 which means nearly 13 times the results are produced in favor of the null hypothesis over the alternative hypothesis. In Supplementary file Figure 8, the grey dot in the solid line (posterior) is above the same dot in the dashed line indicating the data are in favor of the null hypothesis. The median posterior effect size is -0.03 (95% credible limit: -0.154–0.096). Supplementary file Figure 9 shows very strong evidence towards H0 with a wide range of Bayes factor from 12.55 to 24.92 (BF >10 strongly supports H0) having the different prior r, subsequently it shows strong evidence for the null hypothesis as the Bayes factor lies above 1.
DISCUSSION
The Bayesian approach has been widely used in the practice of medical research and tobacco control intervention12,29-30. The present study provides valuable information on the application of Bayesian analysis which is useful for tobacco control strategy.
Our study has included two important variables of tobacco control measures, i.e. prevalence rate of smoking and age of smoking initiation. First, we compared smoking prevalence between three STEPS surveys from 2008 to 2019 20,21. The BF provided strong evidence for the alternative hypothesis, i.e. smoking prevalence declined between 2008 and 2019. During this period, the country has implemented various tobacco control programs such as health warning and advertisement bans, raising taxes on tobacco, a ban on the promotion and sponsorship of tobacco-related products, pictorial warnings in packages, and anti-tobacco campaigns3,31. Although efforts to control tobacco use were implemented throughout the period, there was no apparent change in the prevalence of smoking during the latter half of 2013–2019. Political upheaval and close ties of politicians with the tobacco industry, and lack of coordination between different ministries, may be reasons for the unchanging smoking prevalence rate32-34. Further, this finding is particularly relevant for policymakers and stakeholders to identify the other causes behind it. Probably this demands in-depth and meticulous review and research.
Second, we compared the mean age (log age) of smoking initiation between males and females for the STEPS survey 2019. The BF indicates that the evidence is 13 times stronger in favor of the null hypothesis i.e. no difference in the mean age of smoking initiation. The policy-makers might use information about the relationship between smoking initiation and demographic factors to explore tailored interventions. Further studies could assess disparities in AOI based on factors such as gender, education level, place of residence, parental education, exposure to anti- and pro-tobacco messaging, tobacco-related knowledge, friends smoking, peer pressure, etc.35,36.
Data were transformed to ensure that data met model assumptions necessary for both frequentist and Bayesian analysis, especially to confirm that the data were normally distributed. Our findings show both the frequentist and Bayesian approaches have similar results because of the large sample size and it provides the result for the parametric approach.
Strengths and limitations
The strengths of our study are that both traditional and Bayesian hypotheses were presented and compared; besides the Bayes factor, the effect sizes were presented with credible CI to evaluate how sensitive a study was to discover it; default prior values were used for analysis; the sample is random and representative of the Nepalese population; the age adjustment was done to perform frequentist and Bayesian analysis to compare the prevalence rate between STEPS surveys 2008 and 2019.
There are some limitations of the study. The high non-response rate, more representative of women and unequal classification of study area between surveys can influence the statistical inference that assumes perfectly random selection. Due to the paucity of raw data for STEPS survey 2013, it is not feasible to perform Bayesian analysis and the significant prevalence difference from the STEPS 2008 could not be determined. This study did not measure any confounding effects, such as sociodemographic variables, family history of smoking etc., that were associated with smoking prevalence and age of smoking initiation. Despite the limitations, the study has presented two important variables (smoking prevalence and age of smoking initiation) in the tobacco control programs.
CONCLUSIONS
When data on smoking prevalence and age of smoking initiation from a nationally representative sample were analyzed using a Bayesian method, more precise results were achieved, which are critical for reducing tobacco consumption as part of any preventive strategy. The findings of this study suggest that immediate efforts should be made to understand the underlying cause behind the stationary prevalence rate of the smoking population in the last five years.