connectors to drive access and exploitation • Compute Access, Exchange & Compute on sensitive data • Standards Integration and interoperability of data and services. • Training Professional skills for managing and exploiting data European life sciences infrastructure
This includes aspects such as controlled terminology/ontology and services for ML model description and sharing, alignment to the ELIXIR Tools and Interoperability Platforms, as well as defining best practices for Machine Learning- related reviewing. • Machine Learning and reproducibility – This area focuses on the definition of the best practices for developing, sharing and reusing Machine Learning approaches (including, but not limited to, Machine Learning models, algorithms, frameworks and protocols including the DOME recommendations ), while at the same time involving the existing approaches in the ELIXIR Tools Platform. • Benchmarking of Machine Learning tools – In order to facilitate clear and objective comparison of ML-based tools, it is important to establish a benchmarking protocol; this may include datasets, protocols and services offered by the ELIXIR Tools Platform. • Training for Machine Learning – Machine Learning has been identified by the ELIXIR Training Platform gap analysis task as an existing need. As such, a particular area of focus for this group will be to design and produce training resources for supporting the ELIXIR community, based on the standards and approaches established by the ELIXIR Training Platform.
relevant, robust, and generalisable synthetic data generation methodologies • access to relevant, high quality synthetic datasets • Thanks to better availability of robust synthetic datasets for training data models, healthcare providers and industry should have a wider range of performant AI- based and other data-driven tools to support diagnostics, personalised treatment decision-making and prediction of health outcomes. Synthetic data
Model (CDM) is an open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence. A central component of the OMOP CDM is the OHDSI standardized vocabularies. • The OHDSI vocabularies allow organization and standardization of medical terms to be used across the various clinical domains of the OMOP common data model and enable standardized analytics that leverage the knowledge base when constructing exposure and outcome phenotypes and other features within characterization, population-level effect estimation, and patient-level prediction studies.
characterization, incidence, estimation, prediction) International community to develop tools, standards and best practices recommendation in health data normalization and interoperability European federated data network for real world data based evidence generation
compared to the original training sets Kullback-Leibler (KL) divergence, pairwise correlation difference How useful is this synthetic data for our downstream machine learning applications Accuracy, F1-score, ROC, and AUC-ROC Has any sensitive data been inadvertently synthesized by our model Membership inference, re-identification and attribute inference attacks