Garth Kroeker: Rating Scales: limitations & ideas for change

A visitor's comment from one of my previous posts reminded me of an issue I'd thought about before.

In mental health research, symptom scales are often used to measure therapeutic improvement. In depression, the most common scales are the Hamilton Depression Rating Scale (HDRS), the Montgomery-Ashberg Depression Rating Scale (MADRS), or sometimes the Beck Depression Inventory (BDI). The first two examples involve an interviewer assigning a score to a variety of different symptoms or signs. The last example is a scale which is filled out by a patient.

Here are examples of questions from the HDRS, with associated ranges of scoring:
depressed mood (0-4); decreased work & activities (0-4); social withdrawal (0-4); sexual symptoms (0-2); GI symptoms (0-2); weight loss (0-2); weight gain (0-2); appetite increase (0-3); increased eating (0-3); carbohydrate craving (0-3); insomnia (0-6); hypersomnia (0-4); general somatic symptoms (0-2); fatigue (0-4); guilt (0-4); suicidal thoughts/behaviours (0-4); psychological manifestations of anxiety (0-4); somatic manifestations of anxiety (0-4); hypochondriasis (0-4); insight (0-2); motor slowing (0-4); agitation (0-4); diurnal variation (0-2); reverse diurnal variation (0-3); depersonalization (0-4); paranoia (0-3); OCD symptoms (0-2)

One can see from this list that depressive syndromes which have many physical manifestations will obviously score much higher. The highest possible score on the 29-item HDRS is 89. It is likely that physical manifestations of acute depression resolve more quickly, particularly in response to medications. Therefore, the finding that more severe depressions have better response to medication could be simply an artifact of the fact that physical symptoms respond better and more quickly to physical treatments.

A person who is eating and sleeping poorly, is tired, feels and looks physically ill, who is not working, who is not seeing friends as much, and whose symptoms fluctuate in the day, would already get an HDRS score of up to 30 -- without actually feeling depressed or anxious at all! A person feeling very depressed, struggling through life with little pleasure, meaning, satisfaction, or joy -- but sleeping ok, eating ok, and forcing self through daily routines such as work, social relationships, etc. -- might only get a score of 4-6 on this scale.

I acknowledge that the many questions on the HDRS cover a variety of important symptom areas, and improvement in any one of these domains can be very significant.

But -- a big problem of the scale, for me, is that the relative significance of the different symptoms is arbitrarily fixed by the structure of the questionnaire. So, for example, are the 4 points for fatigue of equivalent importance to the 4 points for guilt, or social withdrawal, or depressed mood? Would different individuals rate the relative importance of these symptoms differently? Maybe some people might prefer to sleep better, rather than socialize with greater ease. Also, perhaps some of the symptom questions deserve to be "non-linear," or context-dependent. So, for example, perhaps mild or intermittent depressed mood might deserve a score of only "1". Moderately depressed mood might warrant a score of "5". Severe depressive mood might warrant a score of "20". Or, relentless moderate symptoms over a period of years might warrant a score of "20", while only short-term or episodic moderate symptoms might warrant a score of "5".

It would be interesting to change the weighting of these symptom scores, on an individualized basis.

Also, it would be interesting to see the results of depression treatment studies portrayed with all the separate symptom categories broken down (i.e. to see how the treatment changed each item on the HDRS). Many researchers or statisticians would complain that to portray, or make conclusions, about so many results at once, would reduce the statistical significance. Statistically, a so-called "Bonferroni correction" is necessary if multiple hypotheses are being made simultaneously: if n hypotheses are made, the statistical significance is reduced by a factor of 1/n. Based on this statistical idea, most researchers prefer to analyze just a single quantity, such as the HDRS score, instead of looking at each component of the score separately.

But, this analysis dilutes the data from any study, in the same way that the analysis of artworks in a museum would be diluted if each piece were summarized only by its mass or area.

A more complete analysis would portray every category at once. A graphical presentation would be reasonable, perhaps taking the form of a 3-d surface (once again). The x-axis could represent the different symptom areas (or scores on each item on the HDRS); the y-axis could represent time; and the z-axis could represent the severity. With this analysis, we could say that we are not actually making n hypotheses--we are making a single hypothesis, that the multifactorial pattern of symptom results, manifest as a 3-d surface, is changing over time. Each individual patient's symptom changes, in every symptom category, could be represented on the graph. In this way, no data, or analytic possibility, would be lost or diluted. The reader would be able to inspect every part of the data from the study, and perhaps notice interesting relationships which the original researchers had not considered.

Some patterns of change with different treatment could present in the following ways, as shown in such as 3-d surface:
1) some symptoms improve dramatically with time, while others are much slower to change, or don't change at all. In depression treatment studies, sleep or appetite might change very quickly with a potent antihistaminic drug...this would immediately lead to pronounced improvement on the overall HDRS score, but might not be associated with any significant improvement in mood, energy, concentration, etc.
2) some symptoms might improve immediately, but deteriorate right back to baseline or worse after a few weeks or months. Benzodiazepine treatment would produce such as pattern, in terms of sleep or anxiety improvement. A medication which is sedating but addictive might cause rapid HDRS improvement, but only a careful look at individual category changes over a long period of time would allow us to see the addiction/tolerance pattern. Some people drink alcohol to treat their anxiety symptoms -- such a behaviour might rapidly improve their HDRS scores! But of course, the scores would return to worse than baseline within a few weeks or months. And the person would probably have new symptoms and problems on top of their original ones. So, we must be cautious about getting too excited about claims of rapid HDRS change!
3) some treatments might cause a global change in most or all symptoms...this would be the goal of most treatment strategies. Such a pattern would imply that the multi-symptom syndrome (in this case, the "major depressive disorder" construct) is in fact valid, all components of which improving together with a single treatment.
4) some combined treatments might work well together...for example, a treatment which helps substantially with energy or concentration (such as a stimulant), together with a treatment which helps with mood, socialization, optimism, or anxiety (such as psychotherapy, or an antidepressant). These treatments on their own might appear to be equivalent if only the total HDRS score is considered (since each would reduce symptom points overall); the synergistic effect would only be apparent by looking at each symptom domain separately.

Finally, I think it is important to look at very broad, simple indicators of quality of life, or of general improvement. The "CGI" scale is one example, although it is awkward and imprecise in design, and most likely prone to bias.

Quality of life scales are important as well, in my opinion, since they look at overall satisfaction with life, rather than merely a collection of symptoms.

In practice, only a discussion with the person receiving the treatment can really assess whether it is worthwhile to continue the treatment or not. In such a discussion, the subjective pros and cons of the treatment can be weighed. Even if the treatment has had a minimal impact on a rating score, it might be subjectively beneficial to the person receiving it. And even if the treatment has produced large rating score changes, it might not be the person's preference to continue. I suppose the role of a prescriber is mainly to facilitate such a dialog, and contradict the patient's wishes only if the treatment is objectively causing harm.

5 comments:

Rach said...: are these indexes/surveys used only once, or are they meant to be used as an ongoing assessment tool?; January 23, 2010 at 5:00 AM
GK said...: Hi Rach,

Usually the scales are used on an ongoing basis, though sometimes just for an initial assessment to help with "diagnosis." Most often they would be used in a research study: individuals would usually need to have a minimum score on a scale in order to qualify for the study. Then, the rating scales would be repeated every week or so, for the duration of the study (typically over a few months). The study would usually depict the change of total rating scale scores, of treatment vs. placebo.

Some therapists do rating scales with patients as a part of regular follow-up. I have heard of some therapists doing this every single session. I have never done this, because I think it interferes with a naturally flowing conversation. Also, it uses up at least 5-10 minutes of time every session, which in my opinion would often interrupt a patient's sense of freedom and control of the frame, and impose more (possibly unwelcome) formality on the therapeutic relationship. However, rating scales in therapy sessions could be useful in some cases; one advantage of rating scales is that they compel us to keep track of a wide range of symptoms every time, and invite us to ask questions about important areas that we might forget to consider otherwise.; January 24, 2010 at 1:58 PM
Rach said...: Garth,
Slightly off topic question, but it doesn't look like you've addressed it on the blog:

What are your thoughts on self-help workbooks? Are there books/other materials you suggest for people?; January 28, 2010 at 7:51 AM
Anonymous said...: http://www.ncbi.nlm.nih.gov/pubmed/20619943

Self-reported versus clinician-rated symptoms of depression as outcome measures in psychotherapy research on depression: a meta-analysis.

(HRSD and BDI)

Interesting
-----
Physician ratings of
measures of improvement following psychotherapy is small but significantly higher than the patient self report ratings; March 13, 2011 at 10:48 PM
GK said...: Thanks, another interesting reference.

A phenomenon I've often thought about is of depression and other psychological problems having various stages of recovery. Often times, aspects of mood which are more apparent to others improve sooner than aspects which are apparent only to self.

I don't see this necessarily as a contradiction between self-report and clinician-report, but as evidence of this type of staged recovery process.

It would be important for the clinician who observes some kind of objective improvement not to leave the patient feeling dismissed or poorly empathized with, if the patient is not feeling the same kind of positive improvement that the clinician sees.; March 14, 2011 at 2:51 PM

Garth Kroeker

Thursday, January 21, 2010

Rating Scales: limitations & ideas for change

5 comments:

My Blog List