Search This Blog

Sunday, August 12, 2018

Data Torture and Dumb Analyses: Missteps With Big Data


Robert A. Harrington, MD: Hello. This is Bob Harrington from Stanford University. We’ll be having an interesting podcast today on theheart.org | Medscape Cardiology with a good friend and colleague, Frank Harrell.
Frank Harrell, PhD
There is no question that we’re living in an unprecedented time in regard to biomedical research. We have an incredible discovery engine going on right now where we can measure virtually any human biologic process. This includes the various “omics” (genomics, proteomics, metabolomics) and also things that measure continuously a variety of physiologic measurements, like heart rate, temperature, and heart rate variability.
All of this has given us the ability to collect enormous amounts of data on individuals. It’s also given us a tremendous ability to analyze those data in ways that perhaps we’ve not been able to do before, in part because of cloud computing and increasingly advanced computational methods. Many of us are interested in the concept of how to take this continually accruing information and include things like social media, GPS tracking, and zip code to gain insights into human health and disease that is beyond what we’ve been able to do before.
I’m really privileged to have as a guest my long-time friend and colleague, Dr Frank Harrell. Frank is professor of biostatistics at Vanderbilt University School of Medicine. He’s also an expert statistical advisor to the US Food and Drug Administration Center for Drug Evaluation and Research and their biostatistics group. Frank is the perfect person to have a conversation with about how we’ve arrived at this point in history in biomedical research. We can hear his ideas on the opportunities, challenges, and potential pitfalls he sees, particularly as we talk about some of the new advanced computational methods, including machine learning, neural networks, and so on. Frank, thanks for joining me here on theheart.org | Medscape Cardiology.
Frank E. Harrell, PhD: Absolute pleasure to be here, Bob.

Complexities of Having Vast Amounts of Data

Harrington: Do you want to give some broad comments on how you are thinking about the enormous amount of data that is helping inform the human health experience, and some challenges that it leaves for the community?
[Y]ou don’t necessarily get smarter as you are in a job for more and more decades, but you do gain perspective.
Harrell: Yes, it’s hard to know where to start because there is a genomics view of things and then there are all of the other fields—including, as you mentioned, modern, continual physiologic monitoring, which I actually believe has more promise than most of the other methods.
Harrington: Yes, let’s stay away from genomics right now and talk about the larger context of data.
Harrell: Okay. The sheer vastness of data is a challenge to everyone and the ready availability, but I think a lot of issues are not really well understood by clinician researchers and some biostatisticians. One of the things you learn about over time is that you don’t necessarily get smarter as you are in a job for more and more decades, but you do gain perspective.
One of the perspectives that statisticians get good at over time is knowing how much information content is needed to make a certain conclusion about something. Whether you’re trying to better diagnose patients, better prognosticate, or compare therapies, a certain amount of information is needed in order to have any hope of answering a question.
There is sort of a separate question about bias, and that is a really big issue in treatment comparisons. Even when you are not doing treatment comparisons, knowing what the limitations of the data are is something that a lot of people are not yet good about. They have this mistaken impression that because of the nature and ready availability of data, the data must have the information buried inside of it somewhere that allows you to answer almost any question.

Data ‘Torture’

Harrell: Somebody tweeted the other day, and I was quick to react to it, that they felt that there are new causal inference methods that can tell you in real time as a clinical trial is underway which patients are receiving the most benefit from the treatment. I just pointed out that that is mathematically impossible to do. It’s almost impossible to do at the end of the study, but while the study is unfolding, it’s really hard to do. There is this kind of crazy analogy between data torture and human torture. We know in human torture—and there is lots of evidence for this—that if you torture a human to obtain information, the human will confess to whatever the torturer wants to hear.
If you torture data, the data will confess and tell you what you want to hear.
The same will happen with data. If you torture data, the data will confess and tell you what you want to hear. Then the researcher kind of moves on and tries to make use of that, but it’s not reliable. It’s no more reliable with data than it is with torturing humans. There is this belief that if you use modern methods, all of a sudden there is more information in the data than there ever was.
You are seeing people apply machine learning, especially to more rare diseases like specific types of cancer, where they are trying to find out who is likely to have metastasis, or whatever they are trying to predict. They may have a limited number of patients, but they may have an unlimited number of possible features, like protein expression, gene expression, and SNPs [single nucleotide polymorphisms]. And now we are hearing all of this hype about the microbiome and all sorts of other “omes.”
If you have a limited number of subjects and you have tens of thousands of possible predictors, there is no mathematical way for that type of research to actually work. With one exception: if there is a smoking gun which somehow the whole world missed and no one published on before (which is unlikely). If there is a smoking gun, like, “If you have that characteristic, then everyone has a disease; and if you don’t have it, then no one has the disease,” you can find it, no matter what else is thrown into the data. That is just not the way things happen with research in the modern era.

Sample Size

Harrell: I blogged about this from the standpoint of, how many subjects would you need to [do a good] study on a single patient characteristic and relate that to something? You can think about this logically. The minimum sample size you would need to do something complex, like neural network, is going to be greater than if you had preselected one feature and wanted to see how that relates to patient outcome. At the heart of that is estimating something like a correlation coefficient. How many patients does it take to estimate a single correlation coefficient?
The answer is, it’s over 300 patients to estimate only that. That is with a highly focused prespecified single candidate feature for prediction. If that takes more than 300 and you publish a complicated machine learning result with less than 100 people that used more than 1000 candidate features, the hopefulness for that actually being sustained is just zero. Maybe you will recall back-to-back papers, maybe 10 years ago, on determining variants that predict breast cancer risk.[1,2]
They used the same sort of cohorts of women, same sort of screening—SNPs and GWAS [genome-wide association studies]—and everything was similar in the setup. In these two papers, the findings had not a single SNP in common. It was a stunning example of the impossibility of learning that much from so little.
Harrington: What is the road forward, Frank? Certainly, one of the opportunities today is the vastness of the data, and sometimes we are so enamored by the vastness that we can get lost in it.
There are tools that can help us make sense of the data, but in some ways what I’m hearing you say is that basic principles still apply. Not forgetting about your type 1 error is one of the issues that you are getting at here in terms of your false discovery rate. How we even think about visualizing the data might be helpful as we are looking for things. Do you want to talk about the type 1 error and data visualization, two topics that you’ve spent a lot of time on?
Harrell: Yes, I’d like to talk about things that are almost that.
Harrington: Okay.

False-Negative Rate

Harrell: The false-discovery rate, which is related to type 1 error, is a big deal, but people give far too little attention to the false-negative rate. People are publishing things that are announcing discoveries that are just barely publishable. It might be an odds ratio of 1.3 or something, and not clinically predictive of anything. They are ruling out a whole vast number of features that didn’t pass their feature screening, not really realizing that their false-negative rate was off the charts.
[P]arsimony is the enemy of predictive accuracy…
There is a real lack of appreciation of reliability of discoveries and reliability of nondiscoveries—especially the latter. I think that is really holding back research. People are dismissing things that do have information, and part of the reasoning is because they are seeking parsimony. I like to say that parsimony is the enemy of predictive accuracy. Nature has so many pathways and genetic backup systems and everything, and parsimony is not the way nature works. It’s the way things work sometimes in physics, but not so much in biology.
The idea that almost all research that you see published in a discovery mode is an attempt to be parsimonious is where it’s going seriously wrong. Better methods of analyzing the data will say, “What sort of signal is there if we don’t try to understand the signal?” The first step is to measure the signal that is predictive.
Are you trying to diagnose colon cancer? If you have suitable data with enough cases of colon cancer and controls, you can start to analyze it. You may find that there is a signal hidden among these thousands of variables to the tune of R= 0.4 in predicting a final diagnosis of colon cancer. Then you are content to publish a paper where the R= 0.04. My conclusion from that would be, there is a 0.36 of signal that you have no idea about because you tried to name names. You tried to be parsimonious and that is where you went wrong. That sort of research is really hard to justify. If you are only recovering one tenth of the signal for what your aim is—whether it’s diagnostic or prognostic or what—you are publishing something that gets on your curriculum vitae, it counts in promotion, but it’s never found to be of clinical utility, and you quickly abandon where the signal was in a lot of what you call losing features.
You abandoned that and were content to publish something that had almost no signal at all, but it was statistically significant. That is a lack of understanding about how multiple factors work together and what pathways are. I just see that as a rampant problem in imaging research, genetics, proteomics, and probably in microbiome, which I’ve had less exposure to.

Focusing on the Right Variables

Harrell: There is a different problem, and I would love your comment on this. There is a lack of understanding by many researchers about what sort of variables are really the ones they need to be concentrating on.
There is a lack of understanding by many researchers about what sort of variables are really the ones they need to be concentrating on.
A fantastic meta-analysis[3] showed that the history of genetic research in risk factors for depression is just a series of conflicting results with weak signals. They put it all together and tried to estimate how much of depression can be explained by genetic forces versus capturing the life everts of the person. How many tragedies (eg, loss of a spouse, loss of a child) had the person suffered?
They showed that life events just made fun of the genetic factors; there was no comparison. A lot of predictive exercises go forward where people are not really taking this into account. I heard a geneticist from the University of Washington say once, “If I had a choice of measuring cholesterol or knowing that someone was predisposed to hyperlipidemia, I’d measure the cholesterol every time.”
Harrington: A paper published during the past year or so[4] was looking at machine learning techniques. They say that the machine was better than the cardiologist at predicting cardiovascular events. Then they list all of the variables that the machine identified as being highly predictive. One of the variables that was most important was “no data available.” That points out your issue that you really need an understanding of what the biologic processes or what the clinical imperatives are.
Frank, it takes me back to the days of the Duke databank, with clinicians and statisticians talking about what were they observing that seemed to carry importance in the clinical setting, and then bringing that back and formally testing it. It is an exercise that we don’t want to forget. It should not be a black box. We should be thinking about what are the observations—biologically, clinically—that seem to be important.
Harrell: We spent a lot of time breaking things down into logical components that you could understand clinically, and they were highly predictive. What is a good way to score obstructive coronary artery disease? What is a good way to score ischemia, and what are the different manifestations of ischemia? What are the different manifestations of heart failure, and how do you put all of that together? How do you score peripheral vascular disease and so on? And we created indexes to summarize each of these phenomena.
That led to great stability over years and years of analyzing the data, instead of looking for individual features. The clinical interpretation was always there. People need to take into account what it is that is going to make sense, be predictive, and be useful for clinical decision-making. The paper you were referring to may be the same one that I saw, where they showed that if you used a lot of medical tests, you had the result of the test; and whether or not the test was ordered, the thing that was predictive was the physician test-ordering behavior. The machine learning algorithm at no point found that it needed to use the results of any of it. That is really interesting, because when you think about transporting that to another clinical setting where the practice patterns and test ordering are different, but maybe the meaning of the test results are not that different, I think they missed the boat.
Harrington: Yes, I agree with you. Frank, I could keep talking with you all day about machine learning and new ways of thinking about data. The lesson I’m taking out of this is, remember some basic principles of statistics as we think about doing good clinical research. Thank you for joining me here on Medscape Cardiology today.
My guest today has been Dr Frank Harrell, a professor of biostatistics at Vanderbilt university School of Medicine. Frank, thanks for joining us.

‘No Doubt’ Kratom Is an Opioid With High Abuse Potential


One of the two major psychoactive constituents in kratom has high abuse potential and may also increase the intake of other opiates, new research shows.
The finding contradicts claims by kratom makers that the substance has no abuse potential and supports the US Food and Drug Administration’s (FDA’s) view that kratom is an opioid.
Derived from the plant Mitragyna speciosa, kratom is receiving increased attention as an alternative to traditional opiates and as a replacement therapy for opiate dependence. Mitragynine (MG) and 7-hydroxymitragynine (7-HMG) are the two major psychoactive constituents of kratom. Although MG and 7-HMG share behavioral and analgesic effects with morphine, their reinforcing effects have not been fully established.
Results of a series of experiments with rats show that MG does not have abuse or addiction potential and reduces morphine intake, “desired characteristics of candidate pharmacotherapies for opiate addiction and withdrawal,” Scott Hemby, PhD, Department of Basic Pharmacological Sciences, High Point University, High Point, North Carolina, and colleagues report.
In contrast, 7-HMG should be considered a kratom constituent with “high abuse potential that may also increase the intake of other opiates,” the investigators note.
The study was published online June 27 in Addiction Biology.

“Intriguing” Data

“The study tells us that the most abundant alkaloid in kratom, MG, does not have abuse liability and actually decreases subsequent opiate intake. The 7-HMG data are intriguing because it does seem to have abuse liability,” Hemby told Medscape Medical News.
However, he said, “it’s important to remember that 7-HMG is about 2% of the alkaloid compound of the plant, whereas MG is about 60%. That’s about a 30-fold difference between those two alkaloids. That suggests to me that it probably wouldn’t be reinforcing if kratom were taken as a whole plant with all the alkaloids and everything else in it.”
One reason this is important to study, he said, is that there is evidence that levels of 7-HMG are elevated in certain strains of kratom or certain products that are being sold. This could be the result of deliberate adulteration of the product or the way the plant is harvested.
“For instance, if you leave it out in the sun to dry after it’s harvested, a fair amount of the MG will be converted into 7-HMG. So it could be the way the plant is harvested and not an intentional adulteration,” said Hemby.
There is no doubt that kratom is an opioid. What the FDA said was perfectly correct.Dr Scott Hemby
It’s also concerning, he said, that people are starting to recognize that higher levels of 7-HMG seem to be associated with pleasure or euphoria. “No one has sold 7-HMG on its own, but it’s possible that that could happen, and so it’s important to know that there is a possibility of abuse of that particular compound,” said Hemby.
He emphasized that the current experiments did not assess kratom itself, only the two psychoactive compounds of the plant. “My guess is, based on the ratio of MG to 7-HMG, it would not have abuse liability, but we are undertaking studies to look at that,” Hemby noted.
The FDA is cracking down on kratom. There are no FDA-approved uses for kratom, and the agency has advised against using kratom or its psychoactive compounds MG and 7-HMG in any form and from any manufacturer.
Kratom has been linked to more than 40 deaths. As previously reported by Medscape Medical News, a recent analysis of kratom by FDA scientists found that its compounds act like prescription-strength opioids. The findings led the FDA to label kratom an opioid.
“There is no doubt that kratom is an opioid. What the FDA said was perfectly correct,” Hemby told Medscape Medical News.
The study was supported by funding from the Fred Wilson School of Pharmacy, High Point University, and by funding from the National Institutes of Health. The authors have no relevant disclosures.
Addiction Biol. Published online June 27, 2018. Abstract

Valsartan Recall Expanded: FDA Probes Outside China, Other Products


More valsartan products are under recall, according to an announcement from the FDA.
Added to the list of products under recall are valsartan-containing products manufactured by Hetero Labs Limited in India, labeled as Camber Pharmaceuticals in the U.S. However, not all Camber valsartan products distributed in the U.S. are being recalled, the agency noted in an updated statement.
The trouble stems from the discovery of N-nitrosodimethylamine (NDMA) impurities in the recalled Camber products. Hetero Labs makes the active pharmaceutical ingredient for these valsartan products using a process akin to that of Zhejiang Huahai Pharmaceuticals, the Chinese supplier to affected companies in the first round of valsartan recalls announced in mid-July.
NDMA is a probable human carcinogen, according to lab tests.
Hetero Labs tests show their valsartan has too much NDMA, albeit at levels that are generally lower than what was discovered in the active pharmaceutical ingredient manufactured by Zhejiang.
FDA is testing valsartan products for NDMA and working with other manufacturers of valsartan active pharmaceutical ingredient to see if they might be at risk of NDMA formation in their manufacturing processes. Additionally, the agency is investigating whether other angiotensin II receptor blockers are also at risk of NDMA contamination.
Warnings of NDMA impurities in valsartan first emerged in Europe and the United Kingdom in early July.

FDA Warns on ‘Improper’ Use of Rupture of Membrane Tests in Pregnant Women


Tests that detect rupture of membranes (ROM) in pregnant women should not be used by themselves to diagnose this condition, the FDA said on Wednesday.
Citing 15 fetal deaths and multiple reports of health complications in pregnant women related to improper use of these tests, the agency reiterated that ROM tests should only be used in conjunction with other clinical assessments to manage patients.
“[T]he FDA has received information which indicates that health care providers may be over-relying on ROM test results when making critical patient management decisions, despite labeling instructions warning against this practice,” the agency said in a letter to healthcare providers.
In a news release, the FDA stated that these tests may provide a false negative result, and “providers may incorrectly assume ROM has not occurred” without additional clinical assessment.
“Our review of the risks associated with improper use of these ROM tests is ongoing, but we want to be transparent with providers and patients about the information that we have indicating an issue, and provide recommendations to minimize the risks,” Courtney Lias, PhD, director of the Division of Chemistry and Toxicology Devices in the FDA’s Center for Devices and Radiological Health, said in a statement.
ROM tests are point-of-care diagnostics that analyze vaginal secretions, and can help inform a provider when a rupture of membranes has occurred, the agency said. But they added that when these devices were reviewed through the pre-market clearance pathway, they “concurred with the manufacturers’ labeling recommendations warning providers to not use these tests independently.”
In addition, the FDA added the notice of a voluntary recall of Amnisure ROM Test Strips, and that 40,500 Amnisure tests have been recalled due to a “device malfunction.” However, the recall is unrelated to improper use of the tests, and they are “not aware of adverse events related to the recalled products.”

Bundling Doesn’t Cut Medicare Payments for Medical Conditions


Bundling of payments for five common medical conditions is not associated with changes in Medicare payments per episode or health outcomes, according to a study published in the July 19 issue of the New England Journal of Medicine.
Karen E. Joynt Maddox, M.D., M.P.H., from Washington University in St. Louis, and colleagues used Medicare claims (2013 through 2015) to identify admissions for the five most commonly selected medical conditions in the Bundled Payments for Care Improvement (BPCI) initiative: congestive heart failure (CHF), pneumonia, chronic obstructive pulmonary disease (COPD), sepsis, and acute myocardial infarction (AMI). Changes in standardized Medicare payments per episode of care (defined as the hospitalization plus 90 days after discharge) were compared for these conditions at BPCI hospitals and matched control hospitals.
The researchers found that at baseline, the average Medicare payment per episode of care across the five conditions at BPCI hospitals was $24,280, which decreased to $23,993 during the intervention period (P = 0.41). Over the same time period, control hospitals had an average payment for all episodes of $23,901, which decreased to $23,503 (P = 0.08; difference in differences, P = 0.79). There were no significant differences in clinical complexity, length of stay, emergency department use or readmission within 30 or 90 days after hospital discharge, or death within 30 or 90 days after admission between the intervention and control hospitals from baseline to the intervention period.
“For such bundling to work for medical conditions, however, more time, new care strategies and partnerships, or additional incentives may be required,” the authors write.

Working Out After Baby


Losing weight about 6 months after giving birth lowers a woman’s risk of being overweight in the future.
The best strategy to get back to pre-baby weight is a combination of diet and exercise, rather than diet alone. That’s because exercise boosts heart health and helps preserve muscle when you’re limiting calorie intake. It also takes more calories to maintain muscle than to maintain fat, which means you’ll burn more even at rest.
Once your doctor gives the OK, ease back into your exercise routine. Take a gradual but steady approach. Each day, eat a little less and exercise a little more. You might start by taking baby on short walks. Resist trying to see instant results. Rapid weight loss isn’t healthy, especially if you’re breast-feeding.
Because women who breast-feed are at temporary risk of loss in bone mineral density, do weight-bearing exercises, such as strength training. This will minimize bone loss and decrease your risk of osteoporosis in later years.
Also, take a complete approach to exercise by including various types. One study found that a combination of strength training and aerobic exercise three days a week over 16 weeks reduced body fat and increased lean mass, even without dieting.
Take a few precautions to make exercise safer, however.
Avoid working out in extreme temperatures and high humidity. Have a nutritious snack about an hour before your chosen activity, and drink some water before, during and after to stay properly hydrated.
You’ll also feel more comfortable if you wear clothing that allows for a full range of motion while offering needed support. That includes a supportive bra — if your breasts have changed since giving birth, you might need a new one in a different size.
More information
The American College of Obstetricians and Gynecologists has answers to frequently asked questions about exercise after pregnancy to get you started safely.

Risk-taking, antisocial teens 5 times more likely to die young


Adolescents with serious conduct and substance use problems are five times more likely to die prematurely than their peers, with roughly one in 20 dying by their 30s, according to new CU Boulder research.
The study, published today in the journal Addiction, also suggests that while drug and alcohol use among adolescents draws more attention, antisocial behavior — including rule-breaking tendencies — may be a more powerful predictor of early mortality.
“This research makes it clear that youth identified with conduct problems are at extreme risk for premature mortality, beyond that which can be explained by substance use problems, and in critical need of greater resources,” said lead author Richard Border, a graduate student with the Institute for Behavioral Genetics.
For the study, Border and his colleagues looked at death rates among 1,463 adolescents who had been arrested or referred to counseling for substance use problems and/or “conduct disorder,” a mental health disorder characterized by rule-breaking, aggression toward others, property destruction and deceitfulness or thievery.
They also followed 1,399 of their siblings and a control group of 904 adolescents of similar age and demographic background.
The researchers decided to do the study after, while following up with subjects from the ongoing Genetics of Antisocial Drug Dependence study launched in 1993, they made a troubling discovery: Several had already died. They used mortality data from the National Death Index to determine how many.
With an average follow-up age of 32.7 years, they found that 62 of the original study subjects — more than 4 percent — had died, compared to less than 1 percent of controls. Siblings of the study subjects also had higher mortality rates, with about 2.4 percent dying.
Substance-related deaths were the most common, along with traffic related deaths, suicides and deaths resulting from assaults.
“To see detailed, hard data from a cohort of adolescents we have been interviewing face-to-face over the years really makes tangible the dangers that these youth are facing as they go into adulthood,” said co-author John Hewitt, IBG director. “It’s a strikingly poor outcome and should be a major public health concern.”
When the researchers further analyzed the data, they were surprised to discover that while both conduct disorder and substance use severity were associated with increased mortality risk, conduct disorder was a more powerful independent risk factor.
“We pay a lot of attention to substance use and it is definitely important, but we don’t put as much attention on rule breaking,” said Hewitt. “Perhaps we should.”
Between 6 to 16 percent of boys and 2 to 9 percent of girls meet the criteria to be diagnosed with conduct disorder, previous studies show. Previous CU Boulder research suggests that genetic variants may play a role in making a child more prone to risk-taking or anti-social behaviors.
Because the study focused on youth whose conduct was serious enough they had been arrested or referred to therapy, it’s uncertain to what degree the findings apply to the broader population.
But the takeaway is clear, said Hewitt.
“If you have an adolescent who is exhibiting extreme conduct problems, seek help. It is not just a matter of stopping them from doing bad things. It could be a matter of keeping them alive.”
Story Source:
Materials provided by University of Colorado at BoulderNote: Content may be edited for style and length.

Journal Reference:
  1. Richard Border, Robin P. Corley, Sandra A. Brown, John K. Hewitt, Christian J. Hopfer, Shannon K. McWilliams, Sally Ann Rhea, Christen L. Shriver, Michael C. Stallings, Tamara L. Wall, Kerri E. Woodward, Soo Hyun Rhee. Independent predictors of mortality in adolescents ascertained for conduct disorder and substance use problems, their siblings and community controlsAddiction, 2018; DOI: 10.1111/add.14366