FAPESP and the Sustainable Development Goals

COVID-19 evidences the need for advances in scientific data management, specialists insist

COVID-19 evidences the need for advances in scientific data management, specialists insist

There is no universally adopted system or standard for collecting, documenting and sharing the massive amount of data from research on the disease, noted an expert who took part in the Second Latin American and Caribbean Scientific Data Management Workshop.

Published on 03/15/2021

By Elton Alisson  |  Agência FAPESP – The COVID-19 pandemic has evidenced the need to accelerate the implementation of systems for sharing scientific data that can be accessed easily and used by scientists anywhere in the world to advance knowledge on this and future public health emergencies.

Although studies conducted globally on the disease and the virus that causes it, SARS-CoV-2, have produced a diversity of results and data, there is no universally adopted system or standard for collecting, documenting, and sharing all this research output, said Ingrid Dillo, Co-Chair of the Research Data Alliance (RDA), in a presentation on the first day of the Second Latin American and Caribbean Scientific Data Management Workshop, held online on February 10, 2021.

“The spread of COVID-19 led to a rapid and massive research response with diverse outputs representing a challenge to interoperability,” Dillo said. “The pandemic has presented us with the challenge of enabling scientific data sharing as rapidly as possible while at the same time guaranteeing data quality. We have to strike a balance between fast collection, sharing, and accuracy.”

The lack of data sharing agreements among countries or organizations makes meeting this challenge all the harder, alongside differences in countries’ data infrastructures and systems, she added.

To help the global scientific community address these challenges, RDA set up a working group with more than 600 participants working in different areas across all continents. The group held many virtual meetings for a three-month period (April-June 2020) and produced a set of guidelines to help researchers and data stewards use best practices to maximize the efficiency of their response to COVID-19 and develop a blueprint for crisis management in future. It also created recommendations for funders and policymakers. 

“The aim of the guidelines and recommendations was to foster timely sharing of high-quality data and appropriate responses to public health emergencies,” Dillo said.

The 600-strong community was divided into subgroups and teams that discussed four main research areas: clinical data, epidemiology, omics (genomics, proteomics, metabolomics, etc.), and social sciences. Four cross-cutting themes were also defined: community participation, research software, indigenous data guidelines, and legal and ethical considerations.

One of the key recommendations is that governments, research institutions, and funders should foster global open science by policy and investment to expedite the flow of data across jurisdictions and sectors. 

Another is that early publication and release of research data and novel research software should be encouraged.

“All the working group’s recommendations are very important in the context of the pandemic and future public health emergencies. We’re looking to see if they can be embedded in the data management principles for research on infectious diseases but they have a much wider relevance and could have an even greater impact,” Dillo said. “They also apply to multiple stakeholders. Overall, they should lead to better science and faster solutions. We hope they will be applicable to different disciplines in the long run, as they are quite generic.”

Health data management and sharing involve many challenges. “The first is that health data is fragmented, often individualized by patient, and scattered among different management systems. It’s hard to aggregate all this data,” said Maurício Lima Barreto, an epidemiologist who works as a researcher at FIOCRUZ Bahia (the Bahia State unit of Oswaldo Cruz Foundation, a health technology institution subordinated to the Brazilian Ministry of Health) and as a professor at the Federal University of Bahia (UFBA).

The difficulty of integrating the data arises not only from scientific and processing issues but also from ethical and legal aspects, which are now in the spotlight in Brazil following the passage of the General Data Protection Law (LGPD), noted Barreto. He heads FIOCRUZ Bahia’s Center for Health Data and Knowledge Integration, which aims to interconnect dozens of public systems to integrate health data from 100 million Brazilians.

Advances in other areas

The participants in the event noted that other areas besides health, astronomy, social sciences, and high-energy physics, which have a long research data management and sharing history, have advanced in this respect, as exemplified by agricultural, earth, and environmental sciences.

In 2013, RDA set up an interest group to discuss food and agricultural data. The group has more than 260 members on all continents, representing stakeholders in managing food and agricultural research data production, aggregation, and consumption.

“The initiative has become a forum for sharing experiences and providing visibility to research and innovation in food and agricultural data, as well as a space for networking and blending ideas related to data management and interoperability,” said Hilary Hanahoe, RDA Secretary-General.

One of the outputs of this interest group is a set of “39 hints” to facilitate the use of “semantics to enhance the interoperability of data on agriculture”. The suggestions were adopted by the UN’s Food and Agriculture Organization (FAO).

“The interest group works alongside major international public and private initiatives that involve food and agriculture data sharing and management,” Hanahoe said.

In the environmental area, a group of researchers at the University of São Paulo’s Engineering School (POLI-USP) in Brazil, in partnership with colleagues in France, the United Kingdom, the United States, Japan, and Australia, plan to develop new tools for the sharing and reuse of data on the socio-economic impacts of nature reserves and conservation units on local communities.

The tools and metrics resulting from the project, which is known for short as PARSEC and is supported by FAPESP under the auspices of an agreement with the Belmont Forum, will enable better prediction and mitigation of the effects of actions that disrupt historical land-use practices and threaten local communities. The results will be useful to some 300,000 researchers worldwide who handle earth, space, and environmental data, said Pedro Luiz Pizzigatti Correa, a professor at POLI-USP and principal investigator for the project.

“We will combine satellite images with deep machine learning to predict poverty in the regions analyzed,” Pizzigatti Correa said.

In the area of earth sciences, an international group of scientists who specialize in remote sensing is working on the management and distribution of data relating to snow, ice, glaciers, frozen ground, and climate interactions that make up the planet’s cryosphere via NASA’s National Snow and Ice Data Center (NSIDC).

The term cryosphere refers to the places where low temperatures freeze water and turn it into ice, such as the Arctic, Antarctic and anywhere with high mountains, for example.

The project’s scientists and data management professionals work with data providers and users to create or publish products, tools and resources that ensure that past, present and future scientific data remain open and accessible to all those who wish to study Earth and its climate.

The cryosphere-related data curated by NSIDC ranges from small text files to terabytes of remote sensing data from NASA’s Earth Observing System satellite program.

“As an example of this mission, in 2018 NASA launched a polar orbiting-satellite called ICESat-2 primarily to look at sea ice and land ice in the Arctic and Antarctic, but it also collects data on everything in between,” said Steve Tanner, a researcher and project manager at NSIDC.

The satellite’s LIDAR-based instrument, called ATLAS (Advanced Topographic Laser Altimeter System), measures the time taken by the pulses of green laser light it constantly sends down to reflect off the ground and return to the receiver telescope. By matching those times with the satellite’s location in space, mission scientists determine the exact height of a specific point upon Earth’s surface – from glaciers and mountains to trees, buildings, and sea ice – as well as the depth of rivers, for example.

“A lot of the data from this satellite can be useful to coastal nations, nations with many inland waterways, and countries with large forests and agricultural areas like Brazil,” Tanner said. “NASA concentrates on making all this data publicly available to the world.” He urged the participants in the workshop to visit NSIDC’s portal, where a team of experts can be queried on these topics and help users download datasets.

NSIDC is a member of the World Data System (WDS), an interdisciplinary body of the International Science Council (ISC) whose goals include assuring open and equitable access to scientific data. 

“We encourage open and full free access to the scientific data stored in the repositories of all research organizations. We understand that there are circumstances in which this isn’t possible for ethical or legal reasons, but in all other cases scientific data must be made available with the shortest possible time delay,” said Alex de Sherbinin, Chair of WDS’s Scientific Committee.

Another principle advocated by WDS focuses on assuring the authenticity, quality, and integrity of shared scientific data. “Trustworthiness is particularly important in the age of post-truth,” Sherbinin said. “We really need to nurture trust in data proactively.”


The aim of the workshop, which was organized by FAPESP under the aegis of its Research Program on eScience and Data Science in partnership with WDS, RDA, and the Brazilian Academy of Sciences (ABC), was to discuss best practices in data repository management, and trends and prospects for scientific data systems. 

A previous edition was held in 2018 and resulted in the creation by ABC of a working group to draw up standards and guidelines relating to the treatment of open-access data for the advancement of science and technology in Brazil.

“In August 2020, the group issued a document on the challenges of scientific data management in Brazil. It is now preparing a new document focusing on open science,” said Luiz Davidovich, President of ABC. 

Davidovich highlighted the leadership in this area of Claudia Bauzer Medeiros, who acted as the moderator of the event. Medeiros is a professor at the University of Campinas’s Institute of Computing (IC-UNICAMP) in the state of São Paulo, and a member of ABC, WDS’s Scientific Committee, and the Steering Committee of the FAPESP Research Program on eScience and Data Science.

“FAPESP is one of the pioneers in open science initiatives in Latin America and the Caribbean,” said Luiz Eugênio Mello, FAPESP’s Scientific Director. “It is the first research funding agency in South America to require data management plans for all project submissions. This, in turn, is increasing awareness in the research community of the importance of data sharing as part of good scientific practices.”

The entire livestreamed event was recorded and can be watched at: www.youtube.com/watch?v=AB3BCPr43Ok.


Source: https://agencia.fapesp.br/35302