Millions of dollars and thousands of hours are lost navigating issues surrounding healthcare big data sharing and collaboration. With more and more genomic data generated and stored in different computational platforms, genomic data analysis across multiple cloud platforms is a major ongoing challenge (especially in terms of cost and runtime).
To tackle this challenge, we built an open-source framework, Swarm, for federated cloud computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL.
Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.
Swarm facilitates easier data sharing, analysis, and collaboration within or between different organizations and institutions – saving considerable time and money. If you’re interested in learning more or utilizing Swarm, please email the team: Innovations@stanford.edu