Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Monday, December 7, 2020

The Power of Numbers to Deceive

Large numbers are impressive, so people often fail to put them into perspective. That’s why advertisers and advocates love to use them to get more sales or to garner support for a cause. Examples pop up daily, especially in this age of COVID-19.

 

Every day the local and national news media blast out the latest coronavirus totals. Millions of cases and over one quarter million deaths get our attention. In fact, relative to other causes they are very big numbers. It is a serious problem, but no one reminds us that 15 million is less than five percent of the population. Furthermore, I have never heard a news organization report on the number of people who have recovered. It’s sure to also be in the millions. It seems that people come out on two extremes: they either don’t take it seriously enough or they are unnecessarily terrified of catching it and dying. Perhaps if the news gave a more honest, measured account, more people would have an appropriately moderate reaction.

 

I found another example in a health report. “The Federal Trade Commission is sending 70,142 checks and PayPal payments totaling $3,864,824 to consumers nationwide who bought Quell, a wearable device that supposedly would treat chronic pain throughout the body when placed below the knee.” The company was fined almost $4 million for misrepresenting their product! Consumers who were gullible enough to buy the product will be reimbursed. That’s about $55.10 each for a device that is advertised on sale for $99 – not exactly a money-back guarantee – but the $4 million fine seems impressive.

 

Class action lawsuits are also typical. Chipotle, accused of falsely advertising its food as GMO-free, settled for $6.5 million. After the lawyers took 30%, that left their customers with a claim of $2 per meal with a cap of $30 per household, but it “could be less than this depending on how many claims are made.” Again a big number reduced to peanuts per person.

 

Similarly Johnson & Johnson, a favorite target for lawsuits, was ordered to Pay $6.3 million in the Infant’s Tylenol Settlement. That came out to $2.15 a bottle.

 

The number of home burglaries in 2018 was 685,766, about half of total burglaries (1,230,149), and about half of those happened when people didn’t lock their doors. But when buying a home security system you will hear, “A burglary happens once every 26 seconds.”


When big numbers are spread across many people, it doesn’t add up to very much.

 

In some cases, the news media implies big numbers just in their tone. A couple of years ago they wanted us to believe that school shootings were common, but what happened to school shooting news when the presidential campaigns got into full swing? The problem didn’t go away, but the news did. 

 

According to Education Week, twenty-five incidents occurred on school grounds or during school-sponsored events resulting in five student deaths, only one under the age of 14. Any such deaths are tragic, but that’s five out of 56.4 million students. Should anyone panic over such minuscule probability?


Headlines about these extremely rare events like shark attacks are easy to ignore, but when kids are involved, it’s different. Parents are terrified at the prospect of their child being abducted. Nosey neighbors report them for letting kids walk alone and authorities respond. An elementary school in South Carolina won’t let the kids whose mother wants them to walk home, leave school without an adult. “Today, only 10% of American kids walk to school, down from about 50% in 1969.” 


Reuters tries to assure parents: “Kidnapped children make headlines, but abduction is rare….On average, fewer than 350 people under the age of 21 have been abducted by strangers in the United States per year since 2010.” Then Parents.com tells them, “Every 40 seconds in the United States, a child becomes missing or is abducted.” But they don't tell them that 0.1% are abducted by strangers, over 95% simply ran away and 99.8% are later found alive. 

 

It’s an endless battle against the media, politicians and charities selling products or ideas and raising  funds using big numbers. The best defense is perspective.

Monday, April 20, 2020

Understanding Experiments

When most people hear the word experiment, they picture a scientist in a lab coat with bubbling beakers and test tubes of mysterious liquids. This is not the real meaning. As a result of this misconception, news of the latest study is usually misleading.

An experiment is a rigorous process of testing an idea with the intention of either solving a problem or of improving a situation. Necessary first steps include defining the problem to be solved or determining how results will be measured. No one can claim improvement without measurements to compare (before and after). In the first case, the experiment is successful if the problem goes away. In the second, success is judged by the measurable amount of improvement.

When a well-designed experiment is successful, the implication is that the same solution can be applied widely to other situations: to solve the same problem elsewhere or to achieve the same amount of improvement. (Usually others replicate the experiment before findings are accepted.)

To achieve satisfactory results, any experiment must be well designed. Sloppy studies lead to problematic conclusions, ones that can’t be counted on to solve anything. Even the best experiments can yield bad information just based on the amount of diversity in the world and the fact that fluky things happen. 

Many experiments or studies the public is exposed to in the media relate to drugs or other remedies, and a strict procedure must be followed to ensure the conclusions are valid, otherwise the drugs or other remedies get on the market without proof that they are safe and effective.

In drug studies researchers try to choose a sufficiently large sample, because the larger the sample, the lower the chances of getting some oddball results just because you happened upon some atypical participants. A bigger sample tends to average out any unusual individual readings. 

The next step is to divide the sample into two parts that look as much alike as possible: same proportion by sex or race or education or income or background or location or any other feature that might affect the results. Sometimes this can only be done by random assignment to one group or the other. It's better if researchers have a good understanding of all the characteristics that might influence results and can make the two groups look as much alike as possible relative to those characteristics.

One group is treated – given the pill or the information or other treatment – the other, the control group, is given a fake pill (placebo) a sham treatment or left alone.

Afterward the groups are compared statistically to see if the change (hopefully an improvement) in the first group is significantly better than the change in the second. (Yes, there is often a change in the control group merely because the know they are participating in the experiment and believe the placebo is a real remedy.)

Ideally the people doing the testing are not aware of who is in which group (double-blind).

We know from the news that even these careful experiments can go wrong. Sometimes drugs are withdrawn from the market due to problems discovered only after they are released, released to a much larger sample size. That is why I have come down so hard and so often on vitamins and other dietary supplements where the law exempts them from the need for any research at all to prove safety and effectiveness. (They rely on endorsement, not proof, and use weasel words to imply effectiveness.)

Many times in the past I have also criticized experiments because they have been sloppy about their design. Sometimes they don’t define the problem until after the test is done and announce the findings with a press release. Sometimes the samples are too small due to budget constraints or laziness. Sometimes they rely on self-reporting so there is no real measurement. 

Finally businesses and educators like to say they are experimenting. In both cases it is rarely true. They don't set up two groups to compare. Then they measure by gut-feel.

Businesses are in too much of a hurry to do it right. They just try things, sometimes multiple things at the same time, so who knows which ones have a positive or negative or neutral influence on the final outcome.

In education they are still arguing about measurement. Teachers don’t want to be paid based on test scores, but haven't suggested a more acceptable criterion to measure their results. Yet school systems continue to try new methods and approaches without ever satisfying the first step – how to objectively measure real improvement.

Without understanding experiments, it’s too easy to be fooled by people who don’t know what they are doing and by people who do but are just trying to sell us a bill of goods. 

Monday, August 7, 2017

Linked To Does NOT Mean Causes

Recently I came across this Washington Post headline:  “59,000 farmer suicides in India over 30 years may be linked to climate change, study says.”  The article explains that a researcher looked back almost 50 years comparing data and climate information and “concluded that temperature may have ‘a strong influence’ on suicide rates during the growing season.”  Note the words “may have a strong [but unspecified] influence.”  The researcher goes on to project an increase in the number of “lives lost to self-harm” in India.

What are we to make of this?  The number of suicides correlates to the temperature.  Is Global Warming to blame?

Correlation is a mathematical expression of how closely any two measurements move relative to each other.  If the first one gets larger at exactly the same rate as the second, the correlation equals 1, a perfect positive correlation.  If the first one decreases at exactly the same rate that the second increases, the correlation equals -1, a perfect negative correlation.

In our imperfect world this rarely happens, so a correlation of 90% seems to indicate that the two measurements are moving very closely together.  A correlation close to zero shows no mathematical relationship.  The math is not difficult.  A laptop can do it easily and show the graph with points either closely tracking each other for a strong correlation or looking like a random scattering of points for a weak (near zero) correlation.

The most important thing to understand about correlation is that correlation is not causation!  This is a major point of emphasis in every first year statistics class.  Just because two things vary together, they are not necessarily related in any way.

Lots of things get bigger together and are related, like the size of a tree and the amount of lumber available from the tree or the number of pizzas (or amount of beer) needed to feed guests at a party – more people = more pizza.  Gas mileage (MPG) may be related to the size of a truck, the power of the engine or the weight of the load.  These correlations are easy to explain.

When two measures are correlated, sometimes one influences the other, or perhaps they are both affected by some unseen common factor.  Often though, two measures are mathematically related but have no logical link.  Some websites specialize in finding odd examples of these supposed relationships.

Here is one I’ve used before telling about a study correlating dog ownership with eating eggrolls, eating cabbage with having an innie bellybutton, and many others.  Another site, with lovely graphs, correlates the divorce rate in Maine with consumption of margarine, consumption of cheese with the number of people who died by becoming tangled in their bed sheets and several more entertaining examples.  There is the joking relationships between the stock market and skirt lengths or the stock market and the conference of the team that won the last Super Bowl.  I wrote last year about the journalist who compared many measurements and found a surprising correlation between eating dark chocolate and weight loss in one data set.  He later admitted that it was bad science and should not be taken seriously.  If you look hard enough, you can find all kinds of weird examples of data that correlate just by coincidence.  That’s what makes it tricky.

The use of correlation is quite common in studies about health and other areas.  Studies find a mathematical relationship and use the magic word linked.  An action or habit is linked to a healthy or unhealthy outcome.  Eating A is linked to longer or shorter life.  Researchers find relationships that may be the direct, pizza supply to number-of-guests, kind or it may be the cheese to death-by-bed-sheets kind.  But at that point it's only math, not reality.

This is what came to mind when I saw that headline.  To be fair, the researcher tries to explain how there can be an other-than-mathematical relationship.  “High temperatures in the growing season reduce crop yields, putting economic pressure on India's farmers.”  I think any farmer can tell you that weather often increases economic pressure, but how often does that lead to suicide?  Don’t we need more than just two sets of numbers to compare on a graph?  Is this reality or just math?


Whenever critical thinkers hear the word linked, they know someone has found a mathematical relationship.  Researchers hope to discover a real relationship so they can use the word causes, as in action A causes outcome B, but that is much more difficult.  Understanding this is very important to keep from panicking over the latest headline, insisting that someone do something or trying to persuade friends and family to change eating habits to avoid catastrophe.  You don’t have to be a math or statistical wizard to know that linked is not necessarily the same as causes.

Monday, May 11, 2015

How Large is the So-Called Gender Gap?

Last time I introduced the idea of a psycho fact, information that has been so frequently and commonly passed along that it is accepted as true without the need for further investigation or proof.  It falls into the category of “everyone knows that” because everyone says it or hears it so often.  The textbook example is the idea that we use only a small fraction of our brains and have so much untapped potential.  (This notion is a staple among the self-help gurus and motivational speaker crowd.)

Another bit of information that has been repeated so many times that it has become accepted as fact is that a woman is paid only 77 cents for every dollar a man earns, usually implying that this is true for exactly the same job even when age, education, experience, and all other factors are considered.  Related to this is the idea that men outnumber women in the STEM (Science, Technology, Engineering and Math) fields due both to societal pressures portraying these fields as unfeminine and to a belief by some that women have less aptitude in these areas.

The most recent challenge to the first idea appeared in Forbes as an in-depth examination of the employment and salary data from the UK, which shows, when comparing all male workers to all female workers, a similar gap to that found in the US; and it explains why the UK Statistics Authority characterizes this conclusion as “highly misleading.”  They emphasize the need for accurate statistics as a guide to government policy.  To do that “we need to know how much of that [gap] is because of choices that people make over working hours, what job they do, the flexibility they might prefer over pay and so on, and then see what’s left which might be the result of direct discrimination.”  Unless such a careful comparison is made, the numbers are worthless in guiding policy decisions.

Digging deeper into the data, the problems are clarified a bit.  “Given the society we do have, rather than the one that some of us might like, it’s really no surprise at all that more women work part time than men, what with juggling child care duties. And it’s also no surprise at all that part timers make less per hour than full timers.”  Some data actually contradicts the premise showing that women working part time actually make 3.4 % more than men do, but this is offset by the fact that 4 times as many women opt for part time work.  This choice factor is only one component of a highly complex calculation and many other factors must be considered in order to make a fair comparison.  The simplistic comparison of all men to all women results in the popular, but “highly misleading,” 77-cents-due-primarily-to-discrimination conclusion that many in the US immediately jump to.

On the second point, a recent report from the PBS News Hour challenges the notion that STEM fields are unattractive to women and that they are therefore so greatly outnumbered by men.  “On closer inspection, it turns out that these ‘truths’ are nothing more than assumptions, and that these assumptions are inconsistent with the facts.”  They go on to tell about, and show graphically, some STEM fields and academic majors where women actually outnumber men.  Despite these facts, well-intentioned institutions continue to set up educational opportunities and other strategies to try to deprogram young girls from hating math.  It’s a solution to a non-problem, often funded by taxpayer dollars or charitable donations that could be better directed.

What is the purpose of the constant repetition of these two related psycho facts?  They likely remain popular because it places women, in general, in a victim status giving some the opportunity to vent and to take out their frustrations when things don’t seem to be going their way.  It is not affirming, not inspiring, not supportive, but it is not intended to be.  Rather, the purpose seems to be to give power to the rescuers who promise results through strikes, protests, or legislative action; rescuers whose main interest is not justice or equality, but retention and growth of their own political popularity and power, even if it is based on a poorly developed and “highly misleading” statistic.


That is not to say there is no discrimination at all, but those who rely on these sloppy generalizations choose to wallow in victimhood and be taken advantage of by manipulative public figures rather than taking responsibility for their own individual situations.

Monday, December 1, 2014

Arrests, Proportions and Critical Thinking


In the aftermath of the incident in Ferguson, MO, there have been several reactions.  One was a recommendation that more police forces adopt the use of personal video devices with the expectation that a reviewable video of a scene will be more reliable than the possible conflicting testimonies of those involved or of other witnesses, especially when one or more persons involve may have been killed and cannot speak for themselves.  A second reaction is a call to reduce the biased actions of police officers as they carry out their duties along with the assumption that more diversity in the police force will be a step in solving the problem.

The second is highlighted in this article from the Indianapolis Star.  Following the lead of the USA Today, the columnist discusses in detail the disproportionate number of blacks that are arrested when compared to the distribution of the population.  The USA Today reviewed statistics of arrest records for numerous cities across the country.  The Star columnist wants to bring this message home to his readers by pointing out that the state is not “squeaky clean” when it comes to these statistics.  Like many cities in other states, several Indiana cities have a worse record than average and even worse than Ferguson, MO itself.  For example, Johnson County shows the highest racial disparity as “black people are nearly nine times as likely to be arrested as people of other races” and in Carmel, an affluent suburb of Indianapolis, “blacks are more than six times as likely to be arrested as others.”

The writer acknowledges that the explanation of the disparities “— educational and economic gaps that influence crime, or biased policing — isn't clear.”  Various police representatives point out that comparisons have been made between arrests and the makeup of the local population while not all arrests involve citizens of the city or county in question.  Arrests of visitors drawn to the area by a large shopping mall or other attraction may skew the results.

That is all well and good, but why pay so much attention to these kinds of numbers in the first place?  Admittedly, the explanation of the disparities is not clear, and other factors may well be involved, but it’s tempting to cite these statistics because they do tend to reinforce perceived issues.  But how much faith can we really put in them without some additional evidence?  Take another example:  there are slightly more women than men in the state of Indiana, yet in 2013 more than 5 times as many men as women were admitted to prison.  Is this by itself, by any stretch of the imagination, evidence of bias or profiling?  Likewise the US Department of Education shows a racial disparity among high school graduation rates, yet we see no outcry against teachers.

The point here is that we mustn’t be too quick to jump to conclusions.  Sometimes these disparities are truly symptoms of underlying problems, but sometimes it’s just the way things happen without any sinister implications.

Monday, May 5, 2014

Worthless Information


Facts are facts, but sometimes we get information that is incomplete and, as such, does us no good at all.  These facts don’t make us smarter or better informed.  They just sit there on the page begging for further information that never arrives.

Take this example from last week.  The writer talks about the number of people who have signed up for insurance under the Affordable Care Act.  She states a number of statistics including which plan was most popular, how many young people enrolled in private plans and the total number covered by Medicaid.  These facts are based on the new data released by the Department of Health and Human Services.

A couple of paragraphs into the article data about the distribution by race are given.  “The new data showed that 54% of those enrolling in insurance were women, while 63% of all enrollees were white. Of the remaining enrollees, the HHS report showed, 17% were African American, 11% Hispanic, 8% Asian, 1% multiracial, 0.3% American Indian/Alaska native and 0.1% native Hawaiian/Pacific islander.  This is the part that makes no sense without further explanation, yet none is given.  The percentages just hang there giving the impression that white women most benefited from the law.  In at least one newspaper, a secondary headline emphasized these numbers.  The question that should come to mind in reading this is:  what numbers should we use to compare?

Of the people eligible, what percent were white women or Hispanic or multiracial?  Are these numbers representative; do they show what was expected or is there some surprise there?  Nowhere is this answered.  They just sit on the page looking like important information.

Since the expected numbers are not available, the percent of the total population could make a little sense.  Those numbers are: white 79.96%, black 12.85%, Asian 4.43%, Amerindian and Alaska native 0.97%, native Hawaiian and other Pacific islander 0.18%, two or more races 1.61%... the US Census Bureau considers Hispanic to mean persons of Spanish/Hispanic/Latino origin…who may be of any race or ethnic group.”  Despite the disclaimer, this source shows Hispanics as 15.1%.  Possibly HHS uses different definitions from the US Census Bureau.  Even with this as a reference, the original percentages are not very helpful.

The lesson here is that even when journalists and government agencies think they are being helpful and informative by releasing and printing facts, they often are not.  It’s just ink on a page and not useful at all.  They don’t seem to realize it, so we must be ever on the alert for useless information and always prepared to ignore it.