Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Importing and serving millions of records

Avatar for DamirSvrtan DamirSvrtan
February 01, 2018
150

Importing and serving millions of records

Avatar for DamirSvrtan

DamirSvrtan

February 01, 2018
Tweet

Transcript

  1. VVV

  2. COPY IN RUBY db_conn = ActiveRecord::Base.connection.raw_connection db_conn.copy_data('COPY measurements FROM STDIN')

    do CSV.foreach('measurements.csv', headers: true) do |row| db_conn.put_copy_data(row.fields + [Time.zone.now]) end end
  3. module Oxeanpedia module Argo class CopyData method_object :file_path CSV_FIELD_NAMES =

    [ 'ARGOS_ID', 'LATITUDE (degree_north)', 'LONGITUDE (degree_east)', 'PRES (decibar)', 'TEMP (degree_Celsius)' ] def call pgconn.copy_data(copy_statement, enco) do CSV.foreach(file_path, headers: true) do |row| pgconn.put_copy_data( ['argo'] + [date_field(row)] + row.fields(*CSV_FIELD_NAMES) + [file_path] ) end end end private def date_field(row) row['DATE (YYYY/MM/DD HH:MI:SS)'] || row['DATE (YYYY-MM-DDTHH:MI:SSZ)'] end def pgconn @pgconn ||= ActiveRecord::Base.connection.raw_connection end def time @time ||= Time.zone.now end def enco PG::TextEncoder::CopyRow.new end def copy_statement <<-HEREDOC.strip_heredoc COPY argo_measurements (source, measured_on, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, filename) FROM STDIN HEREDOC end end end end
  4. WITH moved_rows AS ( DELETE FROM argo_measurements am WHERE filename

    = '#{filename}' AND latitude IS NOT NULL AND longitude IS NOT NULL AND temperature_in_celsius IS NOT NULL AND depth_in_meters IS NOT NULL RETURNING am.* ) INSERT INTO measurements (source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, created_at, updated_at, location) SELECT source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, current_timestamp, current_timestamp, ST_SetSRID(ST_MakePoint(longitude, latitude),4326) as location FROM moved_rows ON CONFLICT (latitude, longitude, depth_in_meters, measured_on) DO NOTHING
  5. > PG string > PG Ruby row objects > ActiveRecord

    objects > ActiveRecordSerializer objects > JSON string as JSON API
  6. SELECT json_build_object( 'data', json_agg( json_build_object( 'id', m.id, 'type', 'measurements', 'attributes',

    json_build_object( 'latitude', m.latitude, 'longitude', m.longitude, 'depth_in_meters', m.depth_in_meters, 'temperature_in_celsius', m.temperature_in_celsius, 'measured_on', m.measured_on ) ) ) ) as result FROM ..