is small but there are many files! ❖ file size is large but there are just few files! ❖ Data size of bioinformatics! ❖ 1,000,000,000 records for a subject (person) is normal
Born Baby X 500 GB = 35 TB 30,000 patients X 10,000 cells X 500 GB = 1.5 X 1011 GB = 150 EB from Dr. Yu-Tai Wang 1. count by current NGS data! 2. not include civil medical institutes
Amazon EC2 cluster! ❖ Hadoop cluster! ❖ Many cores of CPU! ❖ Large Memory! ❖ High IO efficiency http://arstechnica.com/business/2012/05/amazons-hpc-cloud-supercomputing-for-the-99/
❖ Automatic build versioned Ensembl system (perl)! ❖ Include database, queuing services and analysis tools! ❖ Multi sites, multi species in one virtual machine! ❖ Help to build local & custom system from Tse-Ching Ho
Chef recipes Provision VM with Chef recipes Write Chef recipes Export VM by Virtualbox Setup Vagrantfile Create Vagrant box by Veewee Write definition of Vagrant box by Veewee Ensembl VM Automation from Tse-Ching Ho
of C/C++, bash, perl, java, ruby! ❖ Have both DNA and RNA re-sequence analysis! ❖ Enhanced quality control for DNA and RNA! ❖ Distributed computing pipeline! ❖ Support PBS, LSF, SGE platforms (queuing system) from Hannah Lin
integration of discrete clinical research documents! ❖ Origin data are excel/csv files collected in different time, by different people! ❖ Neo4j is good for cleanup such massive data set! ❖ Cooperation between biologist and programmer from Wei-Ming Wu, Chia-Hsuan Lee
on Rails, run by JRuby! ❖ ActiveRecord models for Oracle database! ❖ activerecord-oracle_enhanced-adapter gem! ❖ Import excel files to third party GUI client ! ❖ Third party server send XML request to API server from Wei-Ming Wu, Sean Wang
(rails, jruby) CSIS (java, oracle) Send data by XML Write into database Read data by client program Upload data Parse request Third Party Our Servers Windows GUI from Wei-Ming Wu, Sean Wang
❖ ActiveRecord models for Oracle database! ❖ activerecord-oracle_enhanced-adapter gem! ❖ User can define rules for checking data, usually values in filled forms! ❖ Run checking rules daily, not before filling forms from Wei-Ming Wu, Sean Wang
ActiveRecord models for Oracle database! ❖ activerecord-oracle_enhanced-adapter gem! ❖ Assign patients into different groups by randomization method! ❖ Cooperation between statistician and programmer from Wei-Ming Wu, Sean Wang
❖ ActiveRecord models for Oracle database! ❖ activerecord-oracle_enhanced-adapter gem! ❖ google_visualr gem for visualization! ❖ Count number of projects, forms, fields, records and patients from Wei-Ming Wu, Winnie Lui
management! ❖ data analysis and software! ❖ data processing and storage! ❖ application of bioinformatics in pharma research and development http://www.giichinese.com.tw/report/bc268909- bioinformatics-technologies-global-markets.html