Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Importing and serving millions of records

DamirSvrtan
February 01, 2018
120

Importing and serving millions of records

DamirSvrtan

February 01, 2018
Tweet

Transcript

  1. VVV

  2. COPY IN RUBY db_conn = ActiveRecord::Base.connection.raw_connection db_conn.copy_data('COPY measurements FROM STDIN')

    do CSV.foreach('measurements.csv', headers: true) do |row| db_conn.put_copy_data(row.fields + [Time.zone.now]) end end
  3. module Oxeanpedia module Argo class CopyData method_object :file_path CSV_FIELD_NAMES =

    [ 'ARGOS_ID', 'LATITUDE (degree_north)', 'LONGITUDE (degree_east)', 'PRES (decibar)', 'TEMP (degree_Celsius)' ] def call pgconn.copy_data(copy_statement, enco) do CSV.foreach(file_path, headers: true) do |row| pgconn.put_copy_data( ['argo'] + [date_field(row)] + row.fields(*CSV_FIELD_NAMES) + [file_path] ) end end end private def date_field(row) row['DATE (YYYY/MM/DD HH:MI:SS)'] || row['DATE (YYYY-MM-DDTHH:MI:SSZ)'] end def pgconn @pgconn ||= ActiveRecord::Base.connection.raw_connection end def time @time ||= Time.zone.now end def enco PG::TextEncoder::CopyRow.new end def copy_statement <<-HEREDOC.strip_heredoc COPY argo_measurements (source, measured_on, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, filename) FROM STDIN HEREDOC end end end end
  4. WITH moved_rows AS ( DELETE FROM argo_measurements am WHERE filename

    = '#{filename}' AND latitude IS NOT NULL AND longitude IS NOT NULL AND temperature_in_celsius IS NOT NULL AND depth_in_meters IS NOT NULL RETURNING am.* ) INSERT INTO measurements (source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, created_at, updated_at, location) SELECT source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, current_timestamp, current_timestamp, ST_SetSRID(ST_MakePoint(longitude, latitude),4326) as location FROM moved_rows ON CONFLICT (latitude, longitude, depth_in_meters, measured_on) DO NOTHING
  5. > PG string > PG Ruby row objects > ActiveRecord

    objects > ActiveRecordSerializer objects > JSON string as JSON API
  6. SELECT json_build_object( 'data', json_agg( json_build_object( 'id', m.id, 'type', 'measurements', 'attributes',

    json_build_object( 'latitude', m.latitude, 'longitude', m.longitude, 'depth_in_meters', m.depth_in_meters, 'temperature_in_celsius', m.temperature_in_celsius, 'measured_on', m.measured_on ) ) ) ) as result FROM ..