Tuesday, February 18, 2014

Why it burns when you P.

The smaller the p-value, the better the result. True or false?




If you answered "True", then that is why it burns when you use and interpret p-values. It hurts us all. Stop it.






The p-value seems to have become the judge, jury, and executioner of evaluating scientific results. As I go further into my PhD, I see just how in love we are with p-values as well as all the signs of an abusive relationship. No matter how many horrors the p-value wreaks upon us, we just know it won't do it again. Plus, no matter how much we misuse and manipulate the P-value, it just won't leave us. It's here to stay and that's love  right?! Love or hate, it is pretty hard to imagine a world without p-values, but such a world did exist: p-values have only been around for less than a century. And before that: nothing was statistically significant! Yet scientists were able to somehow test hypotheses, come to conclusions, and develop theories anyway including Darwin, Mendel, Newton, Einstein…



Google Ngram snapshot Feb 17, 2014. 
Side note: It is not strictly true that the world was completely absent of significance before the P value. For example, John Arbuthnot seems to have stumbled upon a significant result in 1710 and the term "statistically significant" seems to have come up around the year 1600 according to google ngram (above). However, "P values", "statistical significance", and "statistically significant results" in their modern incarnations are a product of the 1920s.



So who invented the P-value, what was it originally used for, and how is it used in modern data analysis? Could science and data analysis exist without p-values once again? Should it? Many of you (i.e. my readers, i.e. referer spam bots) might have felt a sense of shock thinking that the almighty p-value ever had an inventor. What mere mortal could have created the mighty metric for "significance" itself?  Rather than tell you, I encourage you, Spam Bot, to read, "The Lady Tasting Tea" by David Salsburg. One result of inventing the p-value that is almost certainly statistically significant is the number of scientific papers that have included the p-value since. More interesting is that the p-value has become so omnipresent, so omnipotent, so mythological and magical that few if anyone cites the mere mortal inventor when they use it. If everyone did cite R. A. Fisher, he would quite possibly be the most cited scientist ever by no small margin (okay I told you, Spam Bot ...but I still think you should read that book).

The p-value has been a source of controversy ever since its invention. Unfortunately, the controversy has mostly gone on in the coolest statistics circles of which it is statistically unlikely that you or any of your ancestors are or were in. Good news! Due to increasingly bad science -- that is perhaps a consequence of the rise, abuse, misuse, and generally poor understanding of p-values -- the controversy has reached the foreground. 

There are already plenty of people that have provided prose preaching about the promiscuous rise and predicted plummet of the p-value's popularity and that promote posterior probabilities and other alternatives such as reporting effect sizes and confidence intervals. So I do not want to do my own version of that. See the end for recommended books and articles covering these issues. Rather, I want to ask you a few questions (feel free to answer in the comments section). This is not a comprehensive set of questions, but they are questions that I think any scientist/data analyst should have their own thoughts on. In fact, it would be amazing if average Joe Citizen had thoughts on these questions about the almighty p-value and related topics such as the False Discovery Rate, reproducibility, and validity. All of us are blasted in the face with statistically significant results on a daily basis. Popular views on whether eggs and coffee are good or bad for your health change all the time because of statistical significance. So, which is it: good or bad?!?! Can statistical significance ever answer those health questions definitively?






Take a moment to think about each of the following questions. There are some starting points to explore these topics afterward, but mostly I expect that you know how to use Google. For the uninitiated, remember that p-values get smaller with "higher significance" -- hence the tempting, but horribly wrong conclusion of "the smaller the p-value, the better the result":

1. What is a p-value? 

2. What does a p-value mean to you

3. What does p-value mean to your colleagues? 

4. Are results better when the p-value is smaller (i.e."more significant")?

5. What factors make a p-value tiny anyway?

6. When you are scanning a paper or report and see "highly significant" (tiny) p-values, how does that affect how you perceive the results?

7. When should you or anyone else report a p-value? 

8. Is it always necessary to include a p-value?

9. What is an effect size?

10. Does a tiny p-value guarantee a large effect size?

11. Is it possible to get a highly significant (tiny) p-value with a tiny effect size?

12. How does "n" (number of data points) affect the p-value?

13. What is a null hypothesis? or what is the null distribution?

14. Is there more than one possible null distribution to use for a given test? 

15. How do you pick a null distribution?

16. Is there a correct null distribution? or multiple correct null distributions ("nulls")? or mostly incorrect nulls? or do nulls have no inherent correctness at all?

17. What does the null distribution you use say about your data? 

18. If your p-value is "highly significant" and lets you conclude that your data almost certainly did not come from your null distribution, a negative assertion, does that give you any positive assertions to make about your data?

19. If your p-value is "not significant" and so you fail to reject the null hypothesis, does that mean the null hypothesis is therefore true?

20. If your p-value is "not significant" and so you fail to reject the null hypothesis, does that imply that your alternative hypothesis is therefore false?

21. If your p-value is "significant", would it come out statistically significant again if you repeated the experiment? What is the probability it would be significant again? Is that knowable? 

22. Is it possible to get a highly significant (tiny) p-value when your data have indeed come from the null distribution?

23. When you do 100 tests with a cut-off for significant p-values at 0.05, if all the tests are in fact from the null distribution, how many do you expect to be counted as significant anyway?  -- i.e. how many false positive tests will there be out of the 100 when the significance cutoff is 0.05? If you do 2.5 million tests with a significance cutoff of 0.00001, how many false positives would you expect to come through?

24. When doing multiple tests in general, how can you limit the number of false positives that come through? How will that affect the number of true positives that come through?

25. What is the Bonferroni correction?

26. What is the False Discovery Rate (FDR)? …and what are q-values?

27. Does the False Discovery Rate control all types of false discoveries?

28. What types of false positives are not accounted for when controlling the FDR?

29. When reading a paper or report, if the FDR is tiny, how does that affect how you perceive the results?

30. What is Bayes' Rule?

31. What is a prior probability? a likelihood? a posterior probability?

32. What is a prior distribution?

33. What is a posterior distribution?

34. Compare null distributions to prior distributions. 

35. Compare p-values and posterior probabilities.

36. Compare confidence intervals to credible intervals.

37. What is reproducibility? reliability? accuracy? validity?

38. What is the Irreproducible Discovery Rate (IDR)?

39. If a result is highly reproducible, does this ensure it is valid? 


40. If a result is not reproducible, does this guarantee it is not valid?

41. If results are both reproducible and valid, does this guarantee the interpretation is correct?

42. Are the non-reproducible elements across replicates always "noise"? Or could they be signal?

43. Is it possible that the most reproducible elements across replicates are in fact the noise? 


Books:

-- David Salsburg
-- This was a good read - a brief history of statistics, the birth and rise of statistical ideas, and the characters involved






-- James Schwartz
-- This is also a great history of statistics though not strictly relevant to the p-value topic. Think of it as the pre-P history. I mainly included it here because when I was done reading this, I read "The Lady Tasting Tea" and thought that in some ways it picked up where this book left off. Many of the great statisticians were interested in genetics - that is still true even today.








-- Wackerly, Mendenhall, Scheafer





Articles (chronicles) concerning the p-value battles:


Significance Questing -- Rothman, 1986



















Other articles (some of which concern p-value battles -- e.g. motivation for Bayesian Inference):





Wiki pages:





Other blogs, classes, etc that get into the topic of p-values -- for and against:







Hypothesis Testing -- slides by Vladimir Janis

The Statistical Significance Testing Controversy:A Critical Analysis  <-- R. Chris Fraley's syllabus for his class of that title. At bare-minimum, this is just a very extensive amount of references on these topics organized into points such as why to abandon significance tests, why to keep them, effect size, statistical power, etc.


IDR - a very new statistical idea:

Tuesday, February 11, 2014

The Oil-and-Water of Website Design

Hear ye, hear ye: I have a meagre-but-workable knowledge of designing and maintaining small websites. 

Over the years I've figured out some helpful website design/management practices that are probably pretty well known and obvious to real website managers, but I think are probably useful to anyone who likes to dabble. The underlying theme is layering and separation. Modularity. For example, the aesthetics of the website---the logo, the masthead---should be developed independently of coding (and preferably beforehand to help guide the coding). The coding itself naturally should separate into content, on-screen actions, and layout --- that is, e.g., your HTML, your javascript, and your CSS. 

I'm going to skip a bunch of personal history and go straight to a case study: my lab's website. A few years ago I thought it would be good for my lab to have a web presence, so I thought: (1) We need a cool name, (2) We need a cool logo, (3) We need a cool website. 

Turns out "cool" is defined such that "Terrestrial Lab" is a cool name and that this is a cool logo:


The definition is also flexible enough such that www.terrestrial-lab.org is a cool website. At the least, these will be our starting assumptions!

Before I made this website, my advisor often referred to our lab as the NJIT Middle and Upper Atmospheric Group. While that name was descriptive, I thought something like "Terrestrial Lab" was catchier. We're not studying rocks, oceans, volcanoes, or a plethora of other "terrestrial" features (hell, most of science fits under the "terrestrial" umbrella), so this name might seem remiss. But we do study space weather activity with a focus on the Earth's lower, middle, and upper atmosphere, and our work does make up the "terrestrial" component of our broader group, "Center for Solar-Terrestrial Research." So 1+1=2, and it's a great name!

With a hip new name, we could also have an easy-to-remember website URL. Our old website was TSMTM.net --- pretty dang easy to remember.....that is, if you're familiar with the -ospheres: troposphere, stratosphere, mesosphere, thermosphere, magnetosphere. But if you're anything other than an atmospheric scientist, I wouldn't feel bad if it just looks like a random string of letters. And, anyway, if you're anything like me, all you see at a glance is Teenage Mutant Ninja Turtles.

What's a hip new name without an eye-catching logo? I wanted something identifiable. Something simple and obvious from far. My idea was to make a logo that someone could mindlessly doodle in class. Something that would look good as a sticker, or on a T-shirt. Most importantly, it had to do all this while making us look like a serious group of scientists (we're actually just a bunch of clowns, so this is important!). 

I started with a triangle: simple, serious, pointy. I played around with it on a piece of paper, adding bits of text until I got things looking like how I wanted. Then I went digital with Adobe Illustrator after the idea was already down. I had to play around with fonts until for a while, but I finally nailed it --- with printed T-shirts and all:

 


Great, but a logo by itself though isn't enough for a website: one also needs a masthead of sorts. Some kind of graphic. Something that is nice looking, professional, and somehow relevant. I had always thought the periodograms I produced with magnetometer data from Antarctica had an aesthetic appeal to them, so I started there. Overall, this should be a fun part. Just play around. This was my final product:



Ok, so we have a name, a logo, a main page masthead design. Now it's time to get your hands dirty and code. I'm going to assume you know a lot of basics (some HTML, a sprinkle of javascript, a dash of CSS) and just list some things that have helped me. For starters, my website looks like this:


It's important to realize that the TL website looked like this even when I had much less efficient coding practices. I did not yet fully appreciate the oil-and-water of website design --- the natural separation of content, on-screen actions, and layout! The focus of my pointers below is on efficiency, which has nothing to do with the final product, but everything to do with maintaining the final product without going insane. 

1. Internal vs External Cascading Style Sheets: Formerly, I used internal cascading style sheets as my primary CSS with manual overrides in-line if necessary. This was fine when I first created the site as a proof-of-concept to show my advisor, but it quickly became awkward to maintain the site and develop new pages. If I wanted to change the CSS of one page, I'd have to manually go around and do it to the other pages as well---or just say "to heck with consistency!" 

Solution: I updated my site's primary CSS to an external style sheet (.css file) and now can do manual over-rides both (locally) in-line, like I formerly could, and (page-globally) in the header using the <style> tag. (Look at the code below and/or Google these things if you want to learn more.)


2. Internal vs External Javascript: 
Same story. I used to have it internal to each page on my site using the <script> tag. Although I rarely modify the javascript on the site, the individual pages look so much more cleaned up without it, which allows me to focus on a page's content. Each page now calls on an external .js file, which I only need to edit once if the need arises, not laboriously for each page.

3. The Navigation Bar: The meat of the TL website is the navigation bar at the top. It needs to remain constant as a user peruses the pages, which means it somehow has to exist within each page's HTML file. One terrible way to do it is to copy and paste the nav bar's HTML int each web page's HTML file. That's fine until you want to modify the nav bar --- to add new sections, delete old links, or edit the associated javascript or CSS. One small change to the nav bar requires manually going into each file and...oh man! How time-consuming. I used to actually do this. It was tiresome and oppressive, and it simply stamped out creativity and made website maintenance a chore. 

Solution: Again, nothing new conceptually: separation of dynamic content from static content. It's possible to use an external HTML file for the nav bar and simply have each webpage of the TL site refer to that file. This dramatically changes the web design and management experience. How to do it? The answer is Server-Side Includes, which allows one HTML file to call upon an external HTML file. The only caveat for using SSI is that you must change your webpage extensions from .html (or .htm) to .shtml. 

To learn basic SSI (which is likely all you need), the wikipedia article is good enough. 

The code pasted below shows more than meets the eye. These few lines in an HTML file used to be tens of lines! Maybe more. I used to have both JS scripts and a huge CSS code in the head of every page. Now the website's server simply pastes it there before it gets sent to the user's browser. Same thing for the SSI and my nav bar. That's it. The server does all the hard work before this page ever goes live in someone's browser.

<head>
<title>CSTR - Terrestrial Laboratory</title>
<script type="text/javascript" src="TL_script1.js"></script>
<script type="text/javascript" src="TL_script2.js"></script>
<link rel="stylesheet" type="text/css" href="TL_in_style.css">
</head>


<body>
<!--#include virtual="topLinkBar.html"-->

#
# all other HTML code for the page
#

</body>

And, so the secret to sanity is the separation of seemingly singular suppositions! (And, of course, the mastery of alliteration.)