By Jon Brock
The article below details the reasons why we need to get data right; and even more so with metadata. Because the combined effect of compiling (bad or inadequate) data does not make it more reliable; it likely compounds the error.
This is particularly problematic in scientific academy within the areas of medicine and nutrition research, as John Ioannidis has clearly exposed in his work.
Scientific data can go “absent without leave” for a number of different reasons:
- Scientists don’t archive their data properly and they lose track of it, can’t make sense of it, or their hard-drive dies and they don’t have a back-up. This happens surprisingly (and embarrassingly) often.
- Scientists begin a study but abandon it before it is completed due to lack of funds, unpromising preliminary results, or other priorities. The data might be useful in combination with data from other studies, but it’s not publishable on its own.
- Scientists selectively publish data that supports a particular theory. Inconvenient data are quietly forgotten.
- Scientists try and publish data but are unsuccessful because the results aren’t considered interesting enough by the scientific journals.
- Knowing how difficult it will be to publish a null result, scientists prioritise writing up studies that gave them more publishable results.
The end result is what’s become known as the “file drawer problem”. The published scientific literature represents only a small and biased sample of the research that has actually been conducted. The rest is stuffed away at the back of the metaphorical filing cabinet.
There’s a lot of wasted effort here — data collected and then not used. But the bigger problem is the bias in what is published.
https://towardsdatascience.com/why-metadata-matters-ab7253ea35c7