institute and many more - Cloud environment for analysis and interpretation of genomic and genome adjacent data - Allow for users to analyze protected data while maintaining security and compliance - Integrate multiple analysis and visualization environments A Galaxy for Cancer Genomics Research Goecks, Taylor, Blankenberg, Nekrutenko - Cloud environment for analysis and interpretation of cancer genomes and related datasets - Integrate with dozens of other tools funded under ITCR that #usegalaxy - Integrate with existing resources like the Cancer Research Data Commons and NCI cancer clouds AnVIL Both require bringing Galaxy into a certified secure environments (FISMA compliant), maintaining infrastructure level isolation between individual users to ensure data remains protected
data to the researcher - Copying/moving data is costly - Harder to enforce security - Redundant infrastructure - Siloed compute Goal: Bring researcher to the data - Reduced redundancy and costs - Active threat detection and auditing - Greater accessibility - Elastic, shared, compute
Gen3: Data models, indexing, querying AnVIL / Analysis Environments: Jupyter Notebooks, RStudio, Galaxy, ... FISMA Moderate 2 ATOs Pursuing FedRAMP All data use and analysis in a FISMA moderate environment Implemented on Primary data storage costs covered by AnVIL, user private data and compute billed directly through Google
for each user More overhead than the current user experience Can we provide secure Galaxy instances with (close to) the current Galaxy user experience?
User 1 Galaxy Instance User Compute Containers Shared DB (No protected Data) Anonymous User Unprivileged Galaxy Instance User 1 Galaxy Multiplexer Isolated Galaxy instances with a single interface
User 1 Galaxy Instance User Compute Containers Shared DB (No protected Data) User 2 Isolated Resources User Data and DB User 2 Galaxy Instance User Compute Containers Anonymous User Unprivileged Galaxy Instance User 1 User 2 Galaxy Multiplexer Isolated Galaxy instances with a single interface
These are already distributed by ❤CVMFS❤, but organized in a ad hoc manner due to the history of Galaxy ◦ Currently building an automated approach where metadata defining the complete set of reference and index data will live in Github, builds will be automated based on Github state, and successful builds deployed through ❤CVMFS❤ for replication to all site - Intergalactic Data Commission: https://github.com/usegalaxy-eu/idc • Common tools ◦ A common set of tools and a common tool menu organization is currently being defined. Tools and tool configuration will also be replicated through ❤CVMFS❤ ◦ This will ensure both that users will have the same user experience across different usegalaxy. ✱ instances, and that workflows can be moved between instances and still execute correctly and reproducibly ◦ Local custom tools will still be supported but clearly identified • See gxadmin, common tools on ❤CVMFS❤ + build + installation, and other coordination efforts developing
(~39 new since last year) - Tools: contributors to galaxyproject/tools-iuc: - ~195 (~38 new since last year) - ...and the ever vigilant Intergalactic Utilities Commission for handling these contributions and maintaining the quality of essential Galaxy tools - ...and everyone else who has contributed a tool to the ToolShed - Training: contributors to galaxyproject/training-material - ~114 (~34 new since last year) - ...and everyone who has conducted or attended Galaxy Training - Everyone who has contributed to Galaxy in other ways: - users, supporters, … - Funding: NSF and NIH (to our team), and all of the funders of the Global Galaxy Community