To date, the new type of coronavirus has caused 3.9 million deaths worldwide. At the start of the pandemic, investigations into the origins of SARS-CoV-2 were hampered by a lack of access to information from China, where cases first appeared.
Now, a Seattle-based researcher has discovered deleted files from Google Cloud that reveal 13 partial genetic sequences from some of the earliest cases of COVID-19 in Wuhan.
The sequences don’t tip the scales towards one of the many theories about how SARS-CoV-2 originated, or from it. For example, they do not support the theory of a virus leaking from a high-security laboratory in Wuhan. And yet, data suggests that the novel coronavirus was circulating even before the first major outbreak in a seafood market in the province of China was detected.
To pinpoint exactly how and where the virus originated, scientists need to find the so-called precursor virus from which all other strains are derived. So far, the earliest sequences have been mostly taken from cases at the Huanan Seafood Market in Wuhan. It was originally speculated that SARS-CoV-2 first appeared in late December 2019. However, cases from early December to November of that year had no market link. This indicates that the virus originated from a different location.
The cases found on the market include three mutations that are absent from the virus samples detected outside the market weeks later. Viruses without mutations more closely matched the coronaviruses found in horseshoe bats. Scientists are confident that the new coronavirus somehow originated from bats, so it is logical to assume that the progenitor did not have these mutations either.
And now Jesse Bloom of the Howard Hughes Medical Institute in Seattle has found that the deleted sequence data (probably some from the earliest samples of the virus) are also devoid of these mutations.
About a year ago, 241 genetic sequences from coronavirus patients disappeared from the Sequence Read Archive, an online database maintained by the National Institutes of Health (NIH).
Bloom noticed the missing sequences when he stumbled upon a spreadsheet in a study published in May 2020 in PeerJ magazine. They were part of the Wuhan University project PRJNA612766 and were supposedly uploaded to the archive. The scientist searched the archive database for sequences and received the message “Items not found.”
His investigation revealed that the deleted sequences were collected by Wuhan University Hospital. That being said, the preprint of a study published based on these sequences suggests they were taken from nasal swab samples from outpatients with suspected COVID-19 at the start of the epidemic.
Bloom was unable to find any explanation as to why the sequences were removed, and his emails to the study authors were not answered.
The scientist notes that “there is no convincing scientific reason for deleting the data.” The fact is that the sequences fully correspond to the samples described in the work. There are no corrections in the document. In addition, the study emphasizes that the samples were obtained from humans voluntarily, and sequencing shows no evidence of plasmid contamination or contamination of the samples. “It seems likely that the sequences were removed to hide their existence,” Bloom concludes.
An article with his findings was published on the biorxiv preprint site.