This post was inspired by the following tweet by my friend Jeff Stier.
A good start, but woefully insufficient https://t.co/hKWLzPLc8q
— Jeff Stier (@JeffaStier)February 9, 2017
After reading the article I came up with 10 much more relevant keys and tweeted them to Jeff. Thinking about it a bit more (I only spent 30 minutes writing the 10 tweets) I realized that expanding on these points might be useful to those wanting some concrete information behind those 10 tweets. It turns out that, at the tenth tweet there’s a real argument that shows how the tobacco control policy of abstinence can’t succeed while a policy supporting harm reduction will. I’ll present each of the ten tweets and expand on them.
1. Do the investigators have expertise and training in the subject?
Kidney doctors don’t do brain surgery for a reason.
Expertise is a funny thing. Education surly plays a key role, but not the only one. Anyone who has studied skills acquisition, or done a CLEP test prep, can tell you that education only grants a body of knowledge – not the wisdom needed to correctly apply it. That wisdom comes from experience.
There is another qualification that falls under this heading and that is the focus of the experience. For example, let’s take a research paper finding that some chemical causes some condition. It is important to look at the qualifications of the people doing the research. One should expect that the researchers have experience with the condition, that they have diligently made efforts to determine that the chemical is or is not the cause, and that they have at least a working theory as to why that chemical, and not some other agent, is the cause.
2. Does the study contradict what was known before?
Extraordinary claims require extraordinary proof.
It is almost become a cottage industry producing results that contradict established knowledge. Everything from nutrition and disease to climate and education has seen headlines claiming that some “study” is changing the way we look at things. The studies themselves are often the product of experts far afield of their education and experience to begin with. But most important, there is no explanation in the study as to why this new finding should supersede the prior body of knowledge.
Occasionally something comes along that really does change how we look at things. But if you look carefully at these kinds of studies they are chock full of answers that support the findings difference from the original, established knowledge. They invite experts to critique the work, and often cite such critiques within the work itself. Authors will often express gratitude for their detractors as having been of great value in refining their findings. There is never an accusation against the character of those who provided the previous knowledge or the critics of the current work.
3. Was the study design based on real-world conditions?
Mice are not men. Cells are not people.
Cancer and many other diseases have been “cured” an uncountable number of times but only within the confines of a Petrie dish. Such is the way of science. Most of Edison’s filaments worked in a light bulb, but only one had the right characteristics to be the successful conclusion of the experiments. The road of scientific knowledge is littered with failure.
Headlines are chock full of examples where the reporter got a press release about a new study that finds that X is the latest answer to problem Y. The problem here is that reporters are poor curators of scientific facts. While what they write may be true, they and their editors sensationalize findings that often haven’t even been tested against real-world conditions. When such tests are performed the promised miracle never appears – not the thing a newspaper is going to publish above the fold with a bold headline.
Unfortunately for most readers this leaves them with fanciful ideas that are completely worthless. Even the idea that there is a “cure” for cancer is ludicrous – the basic genetics and cellular anomalies that produce cancer cells are so varied that there is no single treatment that could possibly arrest, reverse, or prevent cancer.
There is another problem as well: Real-world conditions are often idealized to make experiments easier to perform. Creating ideal conditions is not only an obvious compromise, but often involves over simplification of complex processes that interact with each other. Nevertheless, journals and the press will often headline the findings completely ignoring the fact that the evidence isn’t real-world.
4. Does the data support the conclusion?
What data may have been left out, what questions weren’t asked.
Paper after paper, particularly in the social health sciences arena, make conclusions that are not supported by the data presented. Yet tens of millions of web impressions, magazines, and newspapers are sold because of headlines that report these kinds of conclusions. It has become so prevalent that the term “junk science” was invented to categorize such research.
This does not mean that there aren’t good studies being done, or reported – just that these few good works are often drowned by the avalanche of studies that show conclusions that the data simply cannot support.
The basic problem is that the conclusions are based on an incomplete set of questions. A classic example is that something is “linked to” something else. Anyone who has taken a freshman level course in statistics learns that correlation is not causation. Where science is concerned, truth often contradict correlation. An everyday example is the notion of which rotates around the other: the sun around the earth, or the earth around the sun. The correlation of the sun’s movement through the sky naturally leads to the conclusion that the sun MUST rotate around the earth. But once more questions were asked about the path the sun takes through the sky it became obvious that the earth rotated around the sun, in an elliptical orbit.
In health matters such claims that X leads to Y are fraught with the correlation vs causation problem. In matters that involve the behavior of people there is never a single cause of a disease without a definitive explanation. One of the key questions asked incessantly is “is this the only cause?” – the more complex the problem the more difficult it is to find a single cause. Instead there are often many contributors – some internal like genetics, some external. The question often becomes which factor is the greatest contributor. Even this is a perilous question, as there may be several minor contributors that add up to a greater effect than any single major factor.
5. Is there a policy recommendation? Was the policy effect studied?
Were all consequences considered/disclosed.
It is inevitable that science gets called upon to support public policy. After all, our most reliable method to discover truths about our world ought to inform our decisions about how best to adjust ourselves and our environment. But science evolves slowly and public policy based on science must also evolve, but rarely is public policy written with that evolution in mind.
Instead many papers get published with the goal of manipulating public opinion. Often, in the conclusions, policy recommendations are made – like the prevalence of fat, sugar, violence on television must be changed to solve health problems. Unfortunately most of these papers are based on junk-science to begin with, but more importantly the policy recommendations are never fully disclosed. Policy changes that affect public behavior have consequences. Often those consequences rear their head as “unintended consequences” after a law or regulation is enacted.
Take for example, mandatory bicycle helmets. No one actually tested what would happen if mandatory helmet laws would be enacted. Sure, sciency polls were conducted and people responded, predictably, that helmets would be a good thing. But the “unintended consequence” was that once mandatory helmet laws came into effect, head injuries did drop. Why? Kids stopped riding bicycles in large numbers! The health implications of the lack of activity are far more harmful than a handful of head injuries.
6. Is the study size large enough to ensure the effect seen isn’t random?
Beware small study size with big conclusions.
There are lots of studies that purport to prove that something is effective. But look under the hood and they are very small studies for the effect they are supposed to support. Often small studies show a huge effect, when in fact the real effect is almost non-existent.
To see how this happens lets take two examples. In the first we’ll pick a hundred people and randomly assign them into two groups of 50 each. One group of 50 gets a fake treatment (called a placebo) while the other gets the new Vitamin X. We test each of the 100 people before and after the treatment for attention span. We find that 25 of our 50 placebo group had a lower attention span, while 10 of our 50 Vitamin X had a greater attention span. Our conclusion is that Vitamin X improves attention span in 20% of those who take it. Sounds good doesn’t it?
Now lets run the same experiment but increase the population ten times. We now have 500 people in each arm of the study. Our results are tabulated and we get 25 people with a lower attention span in our placebo group, but we get 50 people who have greater attention spans in our treatment group. Our conclusion is quite a bit different now isn’t it? Vitamin X only improved the attention span in 1% of our treatment group. That’s not nearly as good is it?
So what determines the correct sample size? Statisticians will tell you that you need a sample size large enough so that the variance of the thing you are measuring is less than 5% in the pre-test population (a 95% confidence interval). In our first example the placebo group had a 50% variance (half the sample had a lower attention span). That means that the sample size was too small to measure the effect. Our second example had a variance of 5% in the placebo group, indicating that the sample size was large enough to find an effect.
The point here is that sample size matters. It not only tells us whether we have a large enough sample to have some confidence the results are real, and knowing the variance of the sample allows us to determine how much the result really changed.
[Apologies to my statistics professor are in order here. The examples and analysis given are far from real statistical methods. To a general audience bombarded by the kinds of statistical evidence found of product labels and headlines I hope they convey enough to provoke curiosity – perhaps more people can understand how good statistical science has been perverted to serve propaganda.]
7. Has/can the study be replicated? Are the results consistent?
Pons & Fleishman couldn’t duplicate their own experiment.
In 1989 the world was shaken as newspapers, TV broadcasts, and even scientific magazines heralded the advent of a new power source on the horizon: Cold fusion. A dream come true, getting more power out of a system than you put into it – cold fusion was a promise of a world with unlimited energy. It seemed that Stanley Pons and Martin Fleishman had cracked the code that would enable this glorious new age.
But alas it was not to be. While there are those who still believe that there was some government cover up or corporate shenanigans afoot to foil our hero’s progress, nothing is further from the truth. What happened was science at its finest. Upon publication of their paper many scientists around the world wanted to duplicate their experiment. They used the setup described in the paper and got different results. They wrote the authors for clarification, invited them to their labs to oversee the repetition of the experiment. The result? The original experiment could not be duplicated. Indeed, when Pons & Fleishman were invited to reproduce their own experiment, they couldn’t produce the same results.
Science relies of being able to reproduce repeatable results consistently to validate a hypothesis (idea) and turn it into a working theory. To go from theory to that vaunted acclaim of “law” it must hold true, everywhere, all the time. These are deliberately high bars that maintain the integrity of science. The very reason people trust science.
In matters of health, particularly public health, it seems that these high standards have become rather flexible. There are many papers published that show results that cannot be duplicated yet are taken as gospel by policy experts and public health ministries. Often we hear that the research is so very expensive that we simply can’t afford to duplicate the results, and the the public can’t wait for this new discovery. Basing policy decisions on such a loose interpretation of science isn’t scientific, it’s propaganda.
8. What is the journal’s acceptance in the field, is it widely read?
Beware, niche publications are that way for a reason.
Anybody who has ever done real research finds out quickly that results matter. Grants are given to produce results. All that study and experimentation has to count for something right? In the research world there is a saying: Publish or Perish. In order to get or maintain grant funding it is necessary to publish results.
There are good publications in every field but there is only so much “space” in each volume so getting published in a prestigious journal is hard. To solve this dilemma niche publications have sprung up to fill the gap. If you’re studying the effect of some drug or diet on islet cells and can’t get space in NEJM or Lancet, you would likely find Pancreas to be a good place to publish your paper. Why? Because it’s likely read by people doing what you are doing, studying things having to do with the pancreas. All well and good if Pancreas is a subsidiary of a well known publisher and the editorial and review policies are just as top notch. Problem is, more often than not, such publications are created by like-minded researchers who just want to get their paper’s published to ensure funding.
It gets even worse with the advent of pay-to-place journals. These journals often have some publicly notable people on their editorial board, but not in any number large enough to overrule the editor. In taking a fee to publish they become paper mills, willing to publish anything. There are examples where researchers have deliberately published pure, obvious, junk and have seen their publication cited in the press. Why? Because these paper mills sell prestige by using publicist’s techniques to gain audience in the general press. Unfortunately the regular public has no knowledge of what’s going on, they only know a “study” was published that said something important.
9. Is the peer review critical?
Put more weight on critically peer reviewed work. Beware echo chamber reviews.
One of the key things that separate the above publications is peer review. What peer review is supposed to be is an open, critical, dialog between experts in the field regarding the research results presented. Think of it as giving a paper in front of your peers about something new you’ve discovered. Naturally your peers will have questions about your methods, your results, your conclusions. Sometimes their questions will point out something weak, missing, or in error. When that happens, in science, you go back and strengthen, fill in, or correct the findings.
The quality of a journal primarily rests on its peer review process. There are two main problems that occur though. The first problem is one where the reviewers don’t represent a range of views on the subject. Reviewers are invited by the editorial board to review submissions to the journal. If too many reviewers of a particular viewpoint are selected it creates an uncritical review panel for publications that share their viewpoint, and an overly critical panel for publications that may go against their views.
The second problem comes from the reviewers themselves. Being a reviewer adds to one’s resume so it is a good thing to be selected. But as a reviewer it becomes difficult to openly criticize colleagues. This conflict of interest should be obvious. Often this leads to a brief, cursory review of a paper. Questions which should be asked don’t get posed. When this happens the authors don’t get the feedback they need to fix or withdraw a paper until after it is published, when the rest of the experts get a chance to read the work, and find the faults that should have been discovered during peer review. I’m not talking about just subtle faults, but glaring errors in method, data, results.
Journalists know very little about peer review and the public only knows that “Scientists discover …”. By the time the errors are found the “study” is part of the public conscious and there is no good way to remove it. There is an old saying: “What is heard cannot be forgotten”. That is why junk-science propaganda becomes entrenched in the public mind, and public policy gets supported. All because critical peer review was ignored.
10. What is the real size of the effect? Compared to what?
Everything is relative, know what it’s relative to.
It seems that every day there is some headline claiming that something increases or decreases the risk of some disease by some fantastic amount. But what does that really mean? How should we work public policy to account for these headlines?
To the general public increasing the risk of a disease by 50% means that someone is 50% more likely to get the disease. No matter how many ways one tries to explain risk ratios, this is what they will believe. It is also complete bunk.
The reality behind these numbers and how they are reported is responsible for the public’s perception. Again, this falls squarely on the journalists who, not being schooled in public health statistics, just report the numbers. The key information isn’t presented at all. So the public mind tries to fill in that lack of information and believes the number is an increase in risk, period.
I’ll try to simplify the problem so that someone without a background in all this science can understand it. To do this we’ll start with something real basic, the odds of drawing two pairs in a poker hand. The odds are 20 to 1 that any given 5 cards dealt will contain two pairs. What the 20:1 odds means is that for every 20 hands drawn there will be 1 hand that has two pairs. If we figure this as the probability of drawing a hand that has two pairs the probability (risk) of drawing two pairs is about 5%.
Now, let’s suppose we increase the probability (risk) of drawing two pairs by 50%. That changes our number from 5% to 7.5%. What does that do to the odds? The odds change from 20:1 to 15:1 – for every 15 hands 1 will be two pair. We didn’t decrease the odds by half (10:1) even though we increased the probability by half.
Now let’s move to something more health related, like the odds of getting cancer. According the NIH calculator the number of people getting cancer of any kind for any gender between the age of 20 and 50 years averages 397 per 1000. In other words 397 out of every 1000 people aged 20 to 50 years old will get diagnosed with cancer sometime in their life. That’s a probability of 39.7% or odds of 2.5:1. For every 2.5 people 1 will get diagnosed with cancer sometime in their lifetime.
Let’s increase the risk by 50%. We go from 39.7% to 59.55%. Our odds go to 1.6:1 or for every 1.6 people, 1 will get diagnosed with cancer.
Now if something really increased cancer risk that much people would be dropping like flies right? So is that actually happening? No it isn’t. Why? The clear answer is that not everybody engages in the behavior that increases the risk.
But even if we account for that effect there’s still a sizeable number right? So cutting that behavior should reduce the number significantly right? Not exactly.
Let’s look at smoking. Smokers are said to increase their risk of getting cancer by at least 50% (often more but this keeps the math easy). Roughly 20% of the population smokes. So that means that that 1 in every 5 people have a higher risk. Assume the population is 100 million (again, to keep the math easy). 80% of that population has odds of 2.5:1 while the other 20% has odds of 1.6:1. So the result will be:
80% non-smokers at 2.5:1 odds means 30 million will get cancer
20% smokers at 1.6:1 odds means 12 million will get cancer
All totaled 42 million of our 100 million will get cancer.
Now let’s reduce the population who smoke by 50% by enacting strict laws.
90% non-smokers – 35 million will get cancer
10% smokers – 6 million will get cancer
For a total 41 million of our 100 million will get cancer
WTF? That’s right, reducing smoking by 50% doesn’t change the number of people who get cancer by that much. In fact it only reduces cancer diagnosis by 2%, barely moving the needle.
So let’s eliminate smoking altogether, that should make a big difference right? Let’s see:
100% non-smokers 38 million will get cancer
We’ve reduced cancer diagnosis by 4 million or 9.5%. In the meantime we’ve had to enact a 100% effective prohibition on smoking, something that isn’t possible in the real world. Just look at the total prohibition on illegal drugs that has been a disaster both in terms of not eliminating drugs but the high costs associated with enforcement.
Now, what if we cut the risk of getting cancer from smoking in half? So instead of a 50% increase risk we get a 25% percent increase in risk. Here’s the numbers:
We go from 39.7% risk to 49.6% or 2:1 odds.
At 20% smokers – 10 million get cancer
Our total becomes 40 million cancer diagnosis! Far more effective than reducing the number of people who smoke.
If we reduce the risk to just 5% for smokers (switching to smokeless tobacco or e-cigarettes reduces the risk to below 5% that of cigarettes).
We go from 39.7% risk to 41.7% or 2.4:1
At 20% smokers 8 million get cancer
Our total becomes 38 million cancer diagnosis
Hey, wait a minute! You mean reducing the risk has more effect than reducing the behavior? Exactly! Perhaps this is why those who use lower risk tobacco products are trying to make headlines, maybe they really do know something important.
That is why the effect size and what it’s relative to is important.