SFARI Base streamlines research
Unique Database of Simplex Families Aids Autism Investigators
Andrew Gerber was finishing his poster presentation a week before the 2009 International Meeting for Autism Research (IMFAR) when he made a surprising discovery. Using data from SFARI Base, the Simons Foundation’s database of phenotypic and genetic data about families on the autism spectrum, the Columbia University researcher found it was easier to predict the day-to-day functioning of a child on the spectrum by interviewing parents than by sending a clinician.
Suddenly, an email popped into Gerber’s inbox, notifying him that data from hundreds more families had just been added to SFARI Base. There was no way he would be able to incorporate the new data with so little time, he figured, but he decided to try anyway. He went to the SFARI website, downloaded the file, ran his program, and “suddenly, I'd gone from 300 [subjects] to 680,” he recalled.
Gerber’s experience with SFARI Base shows the software’s potential to streamline and accelerate the research process. The database is designed to allow researchers easy access to a huge collection of phenotypic and genetic information. SFARI Base houses information from the Simons Simplex Collection (SSC), a project that will establish a permanent repository of genetic samples from 2,000 families, each of which has only one child with an autism spectrum disorder and unaffected parents.
SFARI Base is one of the largest databases of its kind. The 2,000 families included in the SSC when the collection reaches its goal will make up about double the number currently included in Autism Speaks’ Autism Genetic Resource Exchange. As of May 2009, SFARI Base holds data for approximately 800 families and adds data from 200 new families each quarter.
The size of the database makes ease of use particularly important. SFARI Base allows users to navigate and extract data using nothing more than a standard Web browser. The data are comprehensive and robust: trained researchers at 13 university-affiliated research clinics collect blood samples as well as 4,000 to 6,000 phenotypic variables for each subject, and submit the information via specialized local computer systems.
Approved scientists who use SFARI Base (read the Researcher Welcome Packet to learn how) can access the data directly through the Web — the database has specialized tools for looking at, comparing and navigating through data — or, if they don’t know quite what they’re looking for, researchers can ask staff to help them find it. Researchers can also request DNA extracted from blood and from immortalized cell lines, cell lines and plasma. After one year, researchers must submit their data back to SFARI Base, which makes it “an ever-growing pool of new information,” said Stephen Johnson, informatics director for the Simons Foundation.
SFARI Base's data sharing policies are unique, said Leon Rozenblit, president of Prometheus Research, the New Haven, CT–based company that developed the database technology. Most databases require researchers to submit their articles, results or raw files — but researchers enter all of their actual research data into SFARI Base. “That’s very different and extremely exciting,” Rozenblit explained. It will make it possible for researchers to compare other researchers’ work to their own and “combine data sets in new and innovative ways,” he said.
The ever-changing nature of the database demands that SFARI Base be adaptive.
SFARI Base is the first database to store simplex autism information. These simplex families will help researchers investigate the role that de novo mutations play in autism. Data from families with multiple children on the spectrum (“multiplex” families) provide important information about inherited autism causes, whereas simplex data illuminate how autism develops in the womb or after birth. Germline mutations are found in about ten percent of cases, according to Elizabeth Varga, a research coordinator at The Research Institute at Nationwide Children’s Hospital in Columbus, OH.
Families on the autism spectrum sometimes participate in more than one research study, making it difficult to know whether one study’s findings are completely independent from another's. Those participating in the SSC are identified by a serial number that keeps their identities anonymous but recognizes them, based on their demographic information, when they participate in future research. With the help of the National Institutes of Health, SFARI Base uses novel global unique identifiers to identify and track its research subjects. (SFARI is currently recruiting families that have never participated in other studies, but they will have the freedom to participate in future research.) Not only will these National Database for Autism Research identifiers ensure that scientists aren’t studying the same subjects multiple times, but they will also allow researchers to combine data from different studies more easily, Johnson said.
As researchers uncover more about autism spectrum disorders, SFARI Base will evolve too. Researchers will be able to add entirely new phenotypic markers or types of genetic data.
“SFARI Base is at the core of much of what we want to accomplish at SFARI,” said Gerald D. Fischbach, the scientific director of SFARI. “Biological data are useless if they are not shared in a way that will enhance the research of the larger scientific community. SFARI Base is more than a repository of information. The information is first rate, to be sure, but it will also provide innovative methods of exploring a vast amount of genetic and phenotypic data. The data contained in SFARI Base will be linked to other databases in order to optimize chances for a better understanding of autism."