Data is the New Oil

And Science itself is sitting on the data motherlode. Every year, we as researchers produce trillions of dollars of the stuff and, again, that is just a single year. We have over a century of science’s “dinosaur bones” in statistical data in the stacks. However, like the oil in the ground, data is of limited value in its raw form. Oil only becomes truly valuable when we can extract it, refine it, and distribute in an accessible form, such as gasoline powering an electrical generator. Our goal, yes “ours” as any project of this size has to be collective, is to convert science’s data into an accessible form and that begins with building a data refinery. Fortunately, there is already a scientific data refinery available: “meta-analysis.”

Meta-analysis is designed for the extraction and refinement of scientific data findings. Once done, they empirically demonstrate the strength of relationships or interventions found across a body of research, and thus, they are some of the most practically accessible and useful tools for the public’s understanding of science. Unfortunately, and I can attest through multiple tours of duty, meta-analysis is also a brutal process. Often starting with a massive hand-culling of hundreds or even thousands of articles, once these findings are located, their containing results are manually extracted and then quantitatively aggregated. It takes years and that’s what we are going change.

Some of the details of what we are trying to accomplish are outlined in the open access article “Cloud-Based Meta-Analysis to Bridge Science and Practice.” It shows all the magic that happens when we have converted a sizeable chunk of a field into an analyzable meta-analytic database. For example, we used it to help create two of world’s largest meta-analyses (i.e., “The Effects of Personality on Job Satisfaction and Life Satisfaction” and “The Happy Culture: A Theoretical, Meta-Analytic, and Empirical Review of the Relationship Between Culture and Wealth and Subjective Well-Being”). But it is missing a critical step.

Though we tried to improve the efficiency, it took us five years to create the underlying database and too many dollars. Furthermore, it is single coded, meaning with smaller meta-analyses, I wouldn’t trust the data. Consequently, it is a great proof of concept and you can use it to jump-start other meta-analyses (just as long as you double code the data yourself). Still, moving forward, we need to increase our coding speed and efficiency by orders of magnitude. Here’s how we are going to do it and how you can contribute.

Building Better Tools

With the advent of the R Statistical Platform, massive advances have been made regarding how we analyze meta-analytic data (e.g., http://www.metafor-project.org/doku.php). How we code the data in the first place, however, is often outdated. You would be surprised about how many key analyses, even meta-analytic labs, are still done in an Excel spreadsheet or the equivalent. We are building a cloud-based coding platform that we all can use, one that we can update continuously. If you have programmers, funding for them, or just a suggestion, we need you here.

Create Taxonomies

One of the earliest criticisms of meta-analysis was by Hans Eysenck who called it “mega-silliness.” He was referring to the commensurability problem, that is what should be grouped with what else. Sometimes we want oranges to be grouped with oranges, being both fruit, but never with orangutans. These ontologies need to be explicit and transparent, so we know exactly what is grouped together. A complete scientific ontology system, as provided by the company Ontotext, runs around $400,000 US dollars. While we are raising funds for this, if you like organizing, this may be for you.

Coding

As Zooniverse, the massive scientific crowd coding site, showed us, there is a lot of spare scientific brainpower eager to be put to good use. We need hundreds of coders effectively to tackle a field and they are certainly are out there. Getting people to code articles, we need a recruitment, selection, training and incentive components. You can help with all four of these. The last of them, incentives, we are presently handling in the traditional way. So…

Publishing

The coin of the realm is getting top tier articles and meta-analysis is the way do it. If our platform helps you publish, I think that does the trick for motivation. I will be very disappointed if we don’t enable just publications, but the biggest and most important publications of our field. Coding teams have the option of sequestering their data they coded privately for a few years while they seek publication, but the end game is open science and transparency.

Connecting and Collaborating

The number of coauthors on papers reflects the scope of the endeavor and degree of successful collaboration. Some places have teams of coders but the theory and research writing aren’t their forte. Elsewhere, we have the opposite. Connecting the two together enables bigger and better meta-analyses so that is part of our mandate. We want to better enable teams from multiple universities around the world to work together on definitive projects. Maybe your contribution is simply being able to manage the process. We all have roles to play. Define yours.

Where does this all lead us? Someplace remarkable, which we will talk about together in subsequent posts. For the final word, I’m leaving it to Kamila Markram, the CEO of Frontiers and a leader in the Open Science movement: https://blog.frontiersin.org/2017/04/19/open-science-can-save-the-planet/

Even though the world spends $2.3 Trillion on research to produce around 2 million research articles every year, still today, about 90% of our science results are locked away behind expensive paywalls, not widely available to the public, companies and even many researchers themselves. This produces the biggest bottleneck in today’s knowledge economy and society, stifling innovation and slowing down solutions to some of the biggest challenges humanity faces today: diseases and climate change. “Imagine how we could accelerate innovation, stimulate economic growth and all the solutions we need to live on a healthy and sustainable planet, if we were to open up our science fully and allow a free flow of scientific knowledge.”

Uncategorized

Data is the New Oil

Post a comment Cancel reply

Recent Posts

Categories

Menu

Helpful Links