Datasets • “Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data.” • “Deriving knowledge and create abstractions of the input data.” • “Process of analyzing data to identify patterns or relationships.” • Predicting future events, behaviors, estimating values, etc. Monday, November 5, 12
most variable elements of the data set—that is, to find and explain the outliers. find the unexpected stellar object in this sky sweep (essentially parallel). •Category Two: understand the variations of the majority of the data set elements, with little interest in the outliers. understand the buying habits of most of our customers (optionally parallel). Monday, November 5, 12
training, • Resemble to biological networks in structure, • Not easy to use and to understand (opaque), • Cannot deal with missing data, • Feed the datasets to the network many times (epochs). Monday, November 5, 12
• Extraction of if , then , else rules from data based on statistical significance. • Output: how likely it is that certain patterns of attributes occur with other attributes in dataset objects (transparent). Monday, November 5, 12
precise three dimensional chart of our Galaxy by providing unprecedented positional and radial velocity measurements for about one billion stars. • Launch Date: 2013 • Mission End: 2018 (5 years) Monday, November 5, 12
for the Radial Velocity Spectrometer (RVS), • 7 CCDs for the Red Photometer, • 1 column of 7 CCDs for the Blue Photometer, • 9 columns of 7 CCDs forming the Astrometric field, • 2 columns of 7 CCDs for the sky mapper, • 1 column with 3 CCDs: two basic angle monitors, and one wave front sensor. Monday, November 5, 12
Dataset attributes names [3] Attribute name log-fl log-f2 log-aflhl-t log-aflh2-t log-aflh3-t log-aflh4-t log-aOhl-t log-aGh2-t log-cri'lO pdfl2 varrat B-V V-I Meaning log of the first fi-equency log of the second frequency log ampUtude first harmonic first frequency log ampUtude second harmonic first frequency log ampUtude third harmonic first frequency log ampUtude fourth harmonic first frequency log ampUtude first harmonic second frequency log ampUtude second harmonic second frequency amplitude ratio between harmonics of the first frequency phase difference between harmonics of first frequency variance ratio before and after first frequency subtraction color index color index reate some templates of a given classes of variable stars and run the scanning law Gaia Dataset Attributes (Dimensions) Monday, November 5, 12
SIMD: Single Instruction, Multiple Data • MISD: Multiple Instruction, Single Data • MIMD: Multiple Instruction, Multiple Data Types of Parallel Computing Hardware-Oriented Monday, November 5, 12
different data • Task-parallel: Different programs, different data • Dataflow: Pipelined parallelism • MIMD: Different programs, different data • SPMD: Same program, different data Monday, November 5, 12
in the early 1990s by the MPI Forum. • API for communication between nodes of a distributed memory parallel computer (typically a workstation cluster). • Fortran, C, and C++. • Low-level parts of API: • Fast transfer of data from user program to network, • Supporting multiple modes of message synchronization available on HPC platforms. • Higher level parts of the API: • Organization of process groups and providing the kind of collective communications seen in typical parallel applications. Monday, November 5, 12
receiving messages. • General platform for Single Program Multiple Data (SPMD) parallel computing on distributed memory architectures. • Directly comparable with the PVM (Parallel Virtual Machine) environment • Introduced the important abstraction of a communicator, which is an object something like an N-way communication channel, connecting all members of a group of cooperating processes. • Introduced partly to support using multiple parallel libraries without interference. • Introduced a novel concept of datatypes, used to describe the contents of communication buffers. • Introduced partly to support “zero-copying” message transfer. Monday, November 5, 12
operations ultimately go through instances of the Comm class. • A communicator defines two things: • Group of processes—the participants in some kind of parallel task or subtask • A communication context. • The idea is that the same group of processes might be involved in more than one kind of “ongoing activity”. • We don’t want these distinct “activities” to interfere with one another. • We don’t want messages that are sent in the context of one activity to be accidentally received in the context of another. This would be a kind of race condition. • Messages sent on one communicator can never be received on another. MPI Message Passing Interface Monday, November 5, 12
fixed set of processes, which never changes in the lifetime of the group. • The number of processes in the group associated with a communicator can be found by the Size() method of the Comm. • Each process in a group has a unique rank within the group, an integer value between 0 and Size() – 1. This value is returned by the Rank() method. MPI Message Passing Interface Monday, November 5, 12
in total • Different architectures • Latencies: • 0.2 – 210 ms daytime • 0.2 – 66 ms night • Bandwidth: 9KB/s – 11MB/s • 80% efficiency Monday, November 5, 12