Thursday, January 21, 2010

Rating Scales: limitations & ideas for change

A visitor's comment from one of my previous posts reminded me of an issue I'd thought about before.

In mental health research, symptom scales are often used to measure therapeutic improvement. In depression, the most common scales are the Hamilton Depression Rating Scale (HDRS), the Montgomery-Ashberg Depression Rating Scale (MADRS), or sometimes the Beck Depression Inventory (BDI). The first two examples involve an interviewer assigning a score to a variety of different symptoms or signs. The last example is a scale which is filled out by a patient.

Here are examples of questions from the HDRS, with associated ranges of scoring:
depressed mood (0-4); decreased work & activities (0-4); social withdrawal (0-4); sexual symptoms (0-2); GI symptoms (0-2); weight loss (0-2); weight gain (0-2); appetite increase (0-3); increased eating (0-3); carbohydrate craving (0-3); insomnia (0-6); hypersomnia (0-4); general somatic symptoms (0-2); fatigue (0-4); guilt (0-4); suicidal thoughts/behaviours (0-4); psychological manifestations of anxiety (0-4); somatic manifestations of anxiety (0-4); hypochondriasis (0-4); insight (0-2); motor slowing (0-4); agitation (0-4); diurnal variation (0-2); reverse diurnal variation (0-3); depersonalization (0-4); paranoia (0-3); OCD symptoms (0-2)

One can see from this list that depressive syndromes which have many physical manifestations will obviously score much higher. The highest possible score on the 29-item HDRS is 89. It is likely that physical manifestations of acute depression resolve more quickly, particularly in response to medications. Therefore, the finding that more severe depressions have better response to medication could be simply an artifact of the fact that physical symptoms respond better and more quickly to physical treatments.

A person who is eating and sleeping poorly, is tired, feels and looks physically ill, who is not working, who is not seeing friends as much, and whose symptoms fluctuate in the day, would already get an HDRS score of up to 30 -- without actually feeling depressed or anxious at all! A person feeling very depressed, struggling through life with little pleasure, meaning, satisfaction, or joy -- but sleeping ok, eating ok, and forcing self through daily routines such as work, social relationships, etc. -- might only get a score of 4-6 on this scale.

I acknowledge that the many questions on the HDRS cover a variety of important symptom areas, and improvement in any one of these domains can be very significant.

But -- a big problem of the scale, for me, is that the relative significance of the different symptoms is arbitrarily fixed by the structure of the questionnaire. So, for example, are the 4 points for fatigue of equivalent importance to the 4 points for guilt, or social withdrawal, or depressed mood? Would different individuals rate the relative importance of these symptoms differently? Maybe some people might prefer to sleep better, rather than socialize with greater ease. Also, perhaps some of the symptom questions deserve to be "non-linear," or context-dependent. So, for example, perhaps mild or intermittent depressed mood might deserve a score of only "1". Moderately depressed mood might warrant a score of "5". Severe depressive mood might warrant a score of "20". Or, relentless moderate symptoms over a period of years might warrant a score of "20", while only short-term or episodic moderate symptoms might warrant a score of "5".

It would be interesting to change the weighting of these symptom scores, on an individualized basis.

Also, it would be interesting to see the results of depression treatment studies portrayed with all the separate symptom categories broken down (i.e. to see how the treatment changed each item on the HDRS). Many researchers or statisticians would complain that to portray, or make conclusions, about so many results at once, would reduce the statistical significance. Statistically, a so-called "Bonferroni correction" is necessary if multiple hypotheses are being made simultaneously: if n hypotheses are made, the statistical significance is reduced by a factor of 1/n. Based on this statistical idea, most researchers prefer to analyze just a single quantity, such as the HDRS score, instead of looking at each component of the score separately.

But, this analysis dilutes the data from any study, in the same way that the analysis of artworks in a museum would be diluted if each piece were summarized only by its mass or area.

A more complete analysis would portray every category at once. A graphical presentation would be reasonable, perhaps taking the form of a 3-d surface (once again). The x-axis could represent the different symptom areas (or scores on each item on the HDRS); the y-axis could represent time; and the z-axis could represent the severity. With this analysis, we could say that we are not actually making n hypotheses--we are making a single hypothesis, that the multifactorial pattern of symptom results, manifest as a 3-d surface, is changing over time. Each individual patient's symptom changes, in every symptom category, could be represented on the graph. In this way, no data, or analytic possibility, would be lost or diluted. The reader would be able to inspect every part of the data from the study, and perhaps notice interesting relationships which the original researchers had not considered.

Some patterns of change with different treatment could present in the following ways, as shown in such as 3-d surface:
1) some symptoms improve dramatically with time, while others are much slower to change, or don't change at all. In depression treatment studies, sleep or appetite might change very quickly with a potent antihistaminic drug...this would immediately lead to pronounced improvement on the overall HDRS score, but might not be associated with any significant improvement in mood, energy, concentration, etc.
2) some symptoms might improve immediately, but deteriorate right back to baseline or worse after a few weeks or months. Benzodiazepine treatment would produce such as pattern, in terms of sleep or anxiety improvement. A medication which is sedating but addictive might cause rapid HDRS improvement, but only a careful look at individual category changes over a long period of time would allow us to see the addiction/tolerance pattern. Some people drink alcohol to treat their anxiety symptoms -- such a behaviour might rapidly improve their HDRS scores! But of course, the scores would return to worse than baseline within a few weeks or months. And the person would probably have new symptoms and problems on top of their original ones. So, we must be cautious about getting too excited about claims of rapid HDRS change!
3) some treatments might cause a global change in most or all symptoms...this would be the goal of most treatment strategies. Such a pattern would imply that the multi-symptom syndrome (in this case, the "major depressive disorder" construct) is in fact valid, all components of which improving together with a single treatment.
4) some combined treatments might work well together...for example, a treatment which helps substantially with energy or concentration (such as a stimulant), together with a treatment which helps with mood, socialization, optimism, or anxiety (such as psychotherapy, or an antidepressant). These treatments on their own might appear to be equivalent if only the total HDRS score is considered (since each would reduce symptom points overall); the synergistic effect would only be apparent by looking at each symptom domain separately.

Finally, I think it is important to look at very broad, simple indicators of quality of life, or of general improvement. The "CGI" scale is one example, although it is awkward and imprecise in design, and most likely prone to bias.

Quality of life scales are important as well, in my opinion, since they look at overall satisfaction with life, rather than merely a collection of symptoms.

In practice, only a discussion with the person receiving the treatment can really assess whether it is worthwhile to continue the treatment or not. In such a discussion, the subjective pros and cons of the treatment can be weighed. Even if the treatment has had a minimal impact on a rating score, it might be subjectively beneficial to the person receiving it. And even if the treatment has produced large rating score changes, it might not be the person's preference to continue. I suppose the role of a prescriber is mainly to facilitate such a dialog, and contradict the patient's wishes only if the treatment is objectively causing harm.

Health benefits of dietary nut intake

Dietary nut intake is strongly associated with a variety of health benefits, particularly a lower risk of developing cardiovascular disease. Here is a link to a recent review of the subject:

This 2009 article describes a carefully controlled, inpatient, 4-day randomized study in which subjects were given a breakfast containing walnuts; or a "placebo" breakfast containing the same number of calories, and the same amount of carbs & fat, but no walnuts. The results showed that a breakfast containing walnuts leads to a significantly greater feeling of satiation (contentment and satisfaction with respect to food), at lunchtime:

Therefore, eating walnuts, as part of a balanced diet, is likely to maintain a feeling of satiation, and therefore reduce some of the physiological drives which can contribute to unhealthy eating behaviours.

This is a reference to a large prospective study of over 50 000 women followed over 8 years. The results included a multivariate analysis controlling for many other factors, such as physical activity, smoking, other dietary habits, etc. There was a slight reduction in weight gain or obesity in those who included more nuts in their diet, and in fact the more frequent the nut intake, the lower the risk of obesity:

With respect to mental health, I think that a balanced, healthy diet is important. Lifestyle habits, including nutritional choices, which reduce risk of cardiovascular disease, are likely also to reduce risk of degenerative brain disease. Walnuts are a source of omega-3 fatty acids, for which there is modest evidence of beneficial effects on mood.

Treatment of eating disorders requires deliberate attention to healthy, regular nutritional habits. Many individuals with eating disorders exclude certain types of food from their diets, based on an unfounded belief that the exclusion would lead to improved control of appetite or caloric intake.

Nuts in particular clearly deserve to be part of a healthy diet, unless there are issues such as food allergy.

Wednesday, January 13, 2010

Antidepressants only effective in severest depression?

A recent article in JAMA by Fournier et al. is a meta-analysis of antidepressant treatment effects assessed in relation to depression severity. Here's the reference:

The results show that antidepressants work significantly well, compared to placebo, only for very severe depression (corresponding to Hamilton Depression Rating Scale scores of at least 25).

The analysis is quite well-done, and the results are also presented in a graphical form clearly showing a linear increase in antidepressant effect as baseline depression scores increase.

The authors observe that antidepressants are most commonly prescribed to people who have milder depressions--a population in which they show that medications arguably do not work.

Here are a few of my criticisms of this study:

1) the duration of each trial included in the meta-analysis was between 6 and 11 weeks. In my opinion, depressive disorders are long-term, highly recurrent problems, which have a natural period over at least 6-11 months, not 6-11 weeks. Treatments to address mood disorders of any severity require much longer durations. The short duration could cause a significant under-estimation of treatment effects.

2) the study, like many, looks at "depression alone." In most real-life situations, outside of a research study, individuals have several different problems, such as mild depression + social anxiety, or mild depression + panic attacks, etc. The presence of other symptoms, particularly anxiety symptoms, most likely would increase the likelihood of antidepressants helping.

3) Milder depressions, just like more severe depressions, may actually improve more consistently with a "second step" such as combination with psychotherapy, or combining two different antidepressants. The mildness of a medical syndrome does not necessarily mean that the effective treatments need only to be "mild."

4) Milder depressive syndromes may be more prone to misdiagnosis.

5) current "resolution" to measure treatment effects in depression is quite poor. "Depression" is a very broad category. An analogy could be considering "abdominal pain" to be a diagnostic category. If "abdominal pain" is the only category, and is simply rated on a severity scale (rather than subcategorized to obtain a precise diagnosis), and the treatment offered for "abdominal pain" is appendectomy, then we would probably see no difference in treatment effectiveness between appendectomy and placebo. This is because appendectomy is only effective to treat appendicitis (a subset of the abdominal pain population), and is either ineffective or harmful in treating abdominal pain patients without appendicitis (except, perhaps, for those patients who have a placebo improvement of psychosomatic or factitious abdominal pain, an improvement which they attribute to having surgery).

We currently do not have the science to subcategorize depression in a more clinically meaningful way (there are subcategorization schemes, but they don't have much relevance in terms of treatment).

But we do have a research method which could improve "resolution":
-instead of comparing two populations of depressed individuals, one group receiving antidepressant (or some other treatment), and the other receiving placebo (or some other alternative), the study design could instead be to offer every individual courses of placebo, alternating with antidepressant (or "treatment one" alternating with "treatment two"). Each course of treatment would have to last an adequate length of time. The analysis would aim to show whether there is a subset of individuals who respond to the antidepressant, or a subset of individuals who do better with placebo. The averaged results over the whole group might show that antidepressant effects do not differ from placebo (just like appendectomy might not differ from placebo in treating "abdominal pain"), but the individualized result could show that some individuals improve substantially with the antidepressant (just like appendectomy would save the lives of the small group of "abdominal pain" patients who have appendicitis).


In the meantime, though, I think it is reasonable to recognize that antidepressants are less consistently helpful when symptoms are less severe.

Wednesday, January 6, 2010

A Gene-Environment-Phenotype Surface

I've been thinking of a way to describe the interaction between genes, environment, and phenotype qualitatively as a mathematical surface.

In this model, the x-axis would represent the range of genetic variation relevant to a given trait. If it was a single gene, the x-axis could represent all existing gene variants in the population. Or, the idea could be extended such that the x-axis could represent all possible variants of the gene (including the absence of the gene, represented as "negative infinity" on the x-axis). The middle of the x-axis (x=0) would represent the average expression of the relevant gene in the population.

The y-axis would represent the range of environmental variation relevant to a given trait. y=0 would represent the average environmental history in the population. y="negative infinity" would represent the most extreme possible environmental adversity. y="positive infinity" would represent the most extreme possible environmental enrichment.

The z-axis would represent the phenotype. For example, it could represent height, IQ, extroversion, conscientiousness, etc.

In my opinion, current expressions of "heritability" represent something like the partial derivative dz/dx at x=0 and y=0; or perhaps, since the calculation is based on a population sample, heritability would be the average of derivatives dz/dx over various sampled (x,y) points near x=0 and y=0.

Conventional heritability calculations give a severely limited portrait of the role of genes on phenotype, since it condenses the information from what is really a 3-dimensional surface into a single number (the heritability). This is like looking at a sculpture, then being told that the sculpture can be represented by a single number such as "0.6", based on the average tilt on the top centre of the artwork.

A more comprehensive idea of heritability would be to consider that it is the gradient, a component of which is dz/dx. This gradient would not be a fixed quantity, but could be considered a function of x and y.

It is particularly interesting to me to consider other properties of this surface, such as what is the derivative dz/dy at different values of y and x? This would determine the ease with which environmental change could change a phenotype regardless of genotype.

A variety of different shapes for this surface could occur:

1) z could plateau (asymptotically) as y approaches infinity. This implies that the phenotype could not be changed beyond a certain point, regardless of the degree of environmental enrichment.
2) z could appear to plateau as y increases, but this is only because we do not yet have existing environments y>p, where p is the best current enriched environment. It may be that z could increase substantially at some point y>j, where j>p. I believe this is the case for most medical and psychiatric problems. It implies that we must develop better environments. Furthermore, it may be that for some genotypes (values of x), z plateaus as y increases, but for other genotypes z changes more dynamically. This implies that some people may inherit greater or lesser sensitivity to environmental change.
3) dz/dx could be very high near the origin (x,y)=(0,0), leading to a high conventional estimate of heritability; but at different values of (x,y), dz/dx could be much smaller. Therefore, it may be that for some individual genomes or environmental histories, genetic effects may be much less relevant, despite what appears to be "high heritability" in a trait.
4) dz/dx could be very low near the origin, but much higher at other values of (x,y). Therefore, despite conventional calculations of heritability being low, there could be substantial genetic effects on phenotype for individuals with genotypes or environmental histories which are farther from the population mean.

The idea of x itself being fixed in an individual may also not be entirely accurate, since we now know of epigenetic effects. Also, evolving technology may allow us to change x therapeutically.

In order to describe such a "surface", many more data points would need to be analyzed, and some of these might be impossible to obtain in the current population.

But I think this idea might qualitatively improve our understanding of gene-environment interaction, in ways that could have practical applications (current heritability estimates are typically 0.5 for almost anything you can think of--this fact seems intuitively obvious, but is not very helpful to inspire therapy or change, can sometimes increase a person's sense of resignation about the possibility of therapeutic change, and can distort understanding about the relative impacts of genes and non-genetic environment).