Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Processing in PHP - PHPers 2024 Poznań

Data Processing in PHP - PHPers 2024 Poznań

Norbert Orzechowicz

June 24, 2024
Tweet

Other Decks in Technology

Transcript

  1. Orders Report • order_id – uuid • created_at – datetime

    • updated_at – datetime • discount – float (nullable) • address – structure{street: string, city: string, zip: string, country: string} • notes – list<string> • items – list<structure{sku: string, quantity: string, price: float}> https://flow-php.com
  2. Extraction • Database • File • API • Streams •

    Queues / Topics https://flow-php.com
  3. • Filtering • Merging • Cleaning • Grouping • Aggregating

    • Deduplicating • Sorting • Partitioning Transformation https://flow-php.com
  4. +----------------------+----------------------+----------------------+----------+----------------------+----------------------+----------------------+ | order_id | created_at | updated_at | discount |

    address | notes | items | +----------------------+----------------------+----------------------+----------+----------------------+----------------------+----------------------+ | 48f7b4b3-48dc-3095-8 | 2024-04-23T01:35:12+ | 2024-04-23T01:35:12+ | | {"street":"56896 Pow | ["Sed cumque sit vol | [{"sku":"SKU_0005"," | | b8670686-1e52-36ee-9 | 2024-04-14T09:00:12+ | 2024-04-14T09:00:12+ | | {"street":"596 Derek | ["Fugiat saepe atque | [{"sku":"SKU_0004"," | | dc052d5e-2b2c-3b2a-9 | 2024-03-03T08:03:02+ | 2024-03-03T08:03:02+ | 40.08 | {"street":"51760 Koe | ["Aliquid voluptatem | [{"sku":"SKU_0004"," | | 6984b96b-6a27-367f-9 | 2024-04-03T16:18:07+ | 2024-04-03T16:18:07+ | | {"street":"9722 Doll | ["Est sit atque quos | [{"sku":"SKU_0004"," | | cb21141a-5494-33ea-9 | 2024-04-25T17:47:49+ | 2024-04-25T17:47:49+ | 2.38 | {"street":"11398 Abs | ["Est atque doloremq | [{"sku":"SKU_0005"," | | c9dc07fc-fa46-3f32-9 | 2024-03-27T12:44:03+ | 2024-03-27T12:44:03+ | | {"street":"78980 Bri | ["Sit aut laudantium | [{"sku":"SKU_0003"," | | 9b828e2d-b509-3485-b | 2024-04-12T06:33:52+ | 2024-04-12T06:33:52+ | | {"street":"6434 Chet | ["Ad consequuntur qu | [{"sku":"SKU_0005"," | | 6f619e18-05aa-306b-8 | 2024-06-10T21:17:45+ | 2024-06-10T21:17:45+ | | {"street":"8038 Crai | ["Dolorum recusandae | [{"sku":"SKU_0005"," | | 7814b135-500f-3137-9 | 2024-05-14T07:39:00+ | 2024-05-14T07:39:00+ | | {"street":"26190 Cor | ["Est quis necessita | [{"sku":"SKU_0005"," | +----------------------+----------------------+----------------------+----------+----------------------+----------------------+----------------------+ 10 rows Output https://flow-php.com
  5. +------------------------------------+-------------------------+-------------------------+--------+--------------------------------------------+--------------------------------+----------------------+ |order_id |created_at |updated_at |discount|address |notes |items | +------------------------------------+-------------------------+-------------------------+--------+--------------------------------------------+--------------------------------+----------------------+ |7833e6cb-b123-37f7-bee5-c0fea6dd6787|2024-04-26T06:01:52+00:00|2024-04-26T06:01:52+00:00|2.09

    |"{""street"":""64428 Nitzsche Locks"" |""city"":""Lake Deontechester"" |""zip"":""15111"" | |5aa4fb2b-7bc5-3d7c-a9a9-04e88f831b00|2024-01-23T22:53:49+00:00|2024-01-23T22:53:49+00:00|46.87 |"{""street"":""5751 Jamal Drive"" |""city"":""Port Delmer"" |""zip"":""57385"" | |77a297db-b911-3800-a017-7f16502f324f|2024-02-20T00:44:03+00:00|2024-02-20T00:44:03+00:00|29.19 |"{""street"":""831 Murphy Haven"" |""city"":""West Alessandroport""|""zip"":""65846-1195""| |ed030a18-df55-38ce-90a3-c3461032c150|2024-02-15T22:03:47+00:00|2024-02-15T22:03:47+00:00|null |"{""street"":""8617 Lebsack Cape Suite 285""|""city"":""New Leonel"" |""zip"":""43725"" | |d4b7921e-7729-322f-89a1-51e1c5198678|2024-04-12T04:26:42+00:00|2024-04-12T04:26:42+00:00|10.44 |"{""street"":""523 Charlene Mount Apt. 694""|""city"":""Bruenstad"" |""zip"":""40291"" | |ff342e29-a6f8-3df3-b1d0-0adb376557fd|2024-03-24T08:49:27+00:00|2024-03-24T08:49:27+00:00|45.48 |"{""street"":""822 Carmel Common Apt. 560"" |""city"":""Abigailport"" |""zip"":""64470"" | |9771e63f-16e6-311a-a974-a0d60a06fea4|2024-02-14T00:23:18+00:00|2024-02-14T00:23:18+00:00|null |"{""street"":""87471 Jaylon Place"" |""city"":""Cummingsmouth"" |""zip"":""11956-0536""| |eaf137da-c206-3252-a5f5-428b1d4eb4f1|2024-05-03T18:01:13+00:00|2024-05-03T18:01:13+00:00|19.16 |"{""street"":""4494 Kunze Tunnel Apt. 465"" |""city"":""Lake Sabinaland"" |""zip"":""60381-1971""| |64a4ee3d-66e3-3b5e-9830-562c051e6576|2024-01-02T17:57:03+00:00|2024-01-02T17:57:03+00:00|27.79 |"{""street"":""425 Oren Manors"" |""city"":""Lake Vincent"" |""zip"":""69860"" | |24099ba6-9131-3714-84da-d3a59ede3cd1|2024-01-27T22:47:54+00:00|2024-01-27T22:47:54+00:00|45.2 |"{""street"":""328 Daniel Inlet Apt. 768"" |""city"":""Jedediahville"" |""zip"":""77120-2693""| +------------------------------------+-------------------------+-------------------------+--------+--------------------------------------------+--------------------------------+----------------------+ Output https://flow-php.com
  6. order_id created_at updated_at discount address notes items 0 e13d7098-5a78-33... 2024-06-17T19:24...

    2024-06-17T19:24... 12.45 {"street":"9742 ... ["Doloremque cum... [{"sku":"SKU_000... 1 947df050-3abb-3f... 2024-02-23T19:18... 2024-02-23T19:18... NaN {"street":"37051... ["Neque dolor et... [{"sku":"SKU_000... 2 6315f9e2-86bf-33... 2024-04-02T11:30... 2024-04-02T11:30... 47.10 {"street":"792 G... ["Et porro fugia... [{"sku":"SKU_000... 3 4cccb632-fade-34... 2024-05-06T00:17... 2024-05-06T00:17... 19.76 {"street":"30203... ["Aliquam saepe ... [{"sku":"SKU_000... 4 82384f8c-9adb-38... 2024-05-10T11:17... 2024-05-10T11:17... NaN {"street":"757 T... ["Beatae nesciun... [{"sku":"SKU_000... 5 e3fcf736-0f8c-3d... 2024-01-25T20:14... 2024-01-25T20:14... NaN {"street":"9088 ... ["Provident quam... [{"sku":"SKU_000... 6 b987a49a-b4c5-37... 2024-06-03T23:22... 2024-06-03T23:22... NaN {"street":"6867 ... ["Quibusdam maio... [{"sku":"SKU_000... 7 663523a9-713b-33... 2024-03-22T23:31... 2024-03-22T23:31... 25.88 {"street":"1577 ... ["In rem maxime ... [{"sku":"SKU_000... 8 6259fa2c-ec68-36... 2024-05-10T10:12... 2024-05-10T10:12... 21.67 {"street":"987 L... ["Voluptatem non... [{"sku":"SKU_000... 9 f7153c83-34b6-37... 2024-02-26T09:20... 2024-02-26T09:20... 18.93 {"street":"2039 ... ["Culpa error re... [{"sku":"SKU_000... Output https://flow-php.com
  7. Dataset Processing Visualization Size of data frame defines memory consuption

    Memory = Size of columns in rows * number of rows *simplified version https://flow-php.com
  8. https://flow-php.com Native Types • Integer - scalar • String -

    scalar • Boolean - scalar • Float - scalar • Object • Resource • Null • Enum • Callable • Array
  9. https://flow-php.com Logical Types • DateTime • Uuid • Json •

    List • Map • Structure • XML • XMLElement
  10. https://flow-php.com Logical Types Logical types are more specific implementations of

    native types Different programming languages will provide different logical/native types
  11. https://flow-php.com Logical Types • DateTime (object) • Uuid (object) •

    Json (array) • List (array) • Map (array) • Structure (array) • XML (object) • XMLElement (object)
  12. https://flow-php.com Logical Types: List A list is a collection of

    elements where each element is indexed by its position in the list
  13. https://flow-php.com Logical Types: Map A map (also known as a

    dictionary or associative array) stores key-value pairs, where each key is unique and associated with a single value. * In the PHP, main purpose of map is to guarantee a type of keys and values since regular array is not enforcing them.
  14. https://flow-php.com Nullability All types can be nullable But not all

    programming languages handles nulls the same way
  15. Transformation is a process of converting, cleansing and structuring data

    into usable format. https://flow-php.com example of transforming string into Date Time object
  16. External sorting is a type of sorting algorithms that can

    handle large amounts of data https://flow-php.com
  17. First we need to turn our orders dataset into order

    line items dataset https://flow-php.com
  18. Schema can be used to either validate dataset or to

    improve extraction performance https://flow-php.com
  19. While working with big datasets and complex transformations schema validation

    is necessary to guarantee data quality https://flow-php.com
  20. https://flow-php.com Use cases Building data storages (lakehouses/warehouses/lakes) Generating reports Consuming

    API’s Systems synchronizations Building projections Converting datasets formats Initial data analysis