folding, etc. ‣ Computer aided design: automobile design, testing ‣ Chemical engineering: molecule designing and processes ‣ Economics: Risk analysis, automated trading ‣ Geophysics: Oil and gas exploration ‣ Weather forecasting and aerodynamics ‣ Basic and applied research in universities A high performance computer is a supercomputer which rely on combined power of, sometimes, hundred of thousands of individual processing units, running in parallel.
nation’s petroleum usage. Half of engine input goes in countering the aerodynamic drag! How to make trucks more fuel efficient? https://www.youtube.com/watch?v=TnzUapgfgeU
to a better device: like fast multi-core architecture, etc. Single machines have performance saturation: they don’t scale up beyond a point. For the largest model, we need to device a better strategy? Moore law does not help as adding more transistors to chip leads to excessive heating and you have to keep your cpu in liquid nitrogen!
collection of cores ‣ Many programs can run at same time on many cores on a particular node ‣ These programs are separated from each others by a thread ‣ This kind of programming using threads can be accomplished using OpenMP ‣ OpenMP is supported by almost all compilers but runs on one node while MPI runs on several nodes ‣ MPI is used between nodes and OpenMP within a node.
is shared, e.g. within a node, with multiple threads ‣ Message Passing Interface (MPI) is used for distributed memory architectures ‣ Here, the computation happens on each node and data is transferred between nodes ‣ CUDA uses CUDA-enabled graphic processing unit (GPU) for general purpose processing, GPGPU. ‣ GPUs operate at lower frequencies but have many more cores which makes it faster than CPUs ‣ It does not support the full C standard while both OpenMP and MPI do
floating point operation is mathematical operation like addition, multiplication. etc. ‣ A calculator which can do 10 FLOPS is considered functional ‣ FLOPS = # cores * Frequency * operations per cycle ‣ An intel process of 2.5 GHz gives around 10 GFLOPS (10*10^9 FLOPS) for four core machine. ‣ Fastest super computer can do PETAFLOPS (10^15 FLOPS) ‣ FLOPS can have different levels of precision - single or double precision ‣ Double precision (64 bits) takes more memory than single one and takes longer for calculations than single.
1km x 1km, with around 10^9 cells ‣ If we assume 100 FLOPS per cell, then we need, around 10^11 FLOPS ‣ To forecast at an interval of 10 min for 10 days would require around 10^14 FLOPS ‣ We will need again a large number of FLOPS for problems in astronomy where bodies are attracted to each other. ‣ So problem has N^2 FLOPS. A galaxy has 10^11 stars! ‣ Similar problem comes in accounting for electrostatic interactions in protein folding
simulation rather than actually building and crashing them. ‣ This can reduce cost in automobile designs and is of paramount importance in design of aeroplanes, etc. ‣ An engineer at e-commerce company like Flipkart may get thousand of hits per second ‣ A HPC should be used there to ensure that the web server responds to customer without delay ‣ HPC can also be used for scientific visualisation which leads to better insights.