Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bye-Bye Query Spaghetti: Write Queries You'll A...

Avatar for Tobias Lampert Tobias Lampert
September 02, 2025

Bye-Bye Query Spaghetti: Write Queries You'll Actually Understand Using Pipelined SQL Syntax

Are your SQL queries becoming tangled webs that are difficult to decipher, debug, and maintain? This talk explores how to write shorter, more debuggable, and extensible SQL code using Pipelined SQL, an alternative syntax where queries are written as a series of orthogonal, understandable steps. We'll survey which databases and query engines currently support pipelined SQL natively or through extensions, and how it can be used on any platform by compiling pipelined SQL to any SQL dialect using open-source tools. A series of real-world examples, comparing traditional and pipelined SQL syntax side by side for a variety of use cases, will show you how to simplify existing code and make complex data transformations intuitive and manageable.

Avatar for Tobias Lampert

Tobias Lampert

September 02, 2025
Tweet

More Decks by Tobias Lampert

Other Decks in Technology

Transcript

  1. Bye-Bye Query Spaghetti: Write Queries You'll Actually Understand Using Pipelined

    SQL Syntax Tobias Lampert, Lotum media GmbH September 2025
  2. Icon About me Tobias Lampert Analytics Engineer, Team BI Lotum

    media GmbH Back end developer turned data+AI guy • Data Engineering • Data Science • Data Architecture • Data Platforms
  3. Icon Play together. We create mobile games that millions of

    friends and families play together every day. Bad Nauheim, Germany 50 Employees 1 Billion Downloads About Lotum
  4. Icon Lotum franchises >400M Downloads 4 Pics 1 Word >100M

    Downloads Word Blitz >100M Downloads Quiz Planet Classics: Word Games & More >1M Downloads
  5. Icon Analytics Engineering at Lotum extract raw data transform &

    load staging area Data Warehouse Analytics tracking data + other data sources ~300M events per day SQL SQL SQL
  6. Icon Things I don’t like about SQL Order of Operations

    Counterintuitive: SQL's logical execution order is different from its written order Confusing and Redundant Filtering Clauses multiple different keywords for filtering Lack of Reusability calculated column aliases cannot be reused in a WHERE within the same query Rigid Structure no sequential, multi-stage operations within a single query block Repetitive Boilerplate constant repetition of SELECT ... FROM ... WHERE ... makes queries very wordy Black Box Debugging Experience no standard way to step through the execution and inspect intermediate tables
  7. Icon SQL's Filtering Clauses SQL offers several ways to filter

    data, but they don't all happen at the same time: Understanding the order of operations is crucial for writing correct and efficient SQL. WHERE JOIN ON HAVING QUALIFY
  8. Icon SQL's Filtering Clauses Filters rows as they are read

    from the tables. Operates on individual rows before any grouping or aggregation takes place. WHERE Filters data as the tables are being joined. Happens during the FROM/JOIN phase, before WHERE. JOIN ON Filters the results of a GROUP BY clause. Operates on aggregated values, not individual rows. HAVING Filters the results of window functions. Acts like a HAVING clause for window functions. QUALIFY It’s all just filters!
  9. Icon SQL’s Rigid Structure SELECT … FROM … JOIN …

    WHERE … GROUP BY … HAVING … QUALIFY … ORDER BY … LIMIT SQL clause order is strictly enforced Limited to one WHERE, GROUP BY, HAVING Complex logic requires CTEs or subqueries (temporary tables for sequential steps) CTEs and subqueries create a nested inside-out data flow
  10. Icon Inside-out Data Flow Standard SQL's data flow starts from

    the most nested FROM clause Logic is difficult to trace In large queries, simply locating the starting point of the logic can be a challenge Each CTE step requires significant boilerplate code Even with CTEs, you can't read the query top-to-bottom, you still have to jump between named references SELECT … FROM ( SELECT … FROM mytable ) WITH step_one AS ( SELECT … FROM mytable ) SELECT … FROM step_one
  11. Icon What You Write Isn't What Runs Declarative Language SQL

    specifies the structure of the desired result, not the steps to achieve it Mismatched Execution Order The logical processing order of a query does not follow its written syntax Steep Learning Curve This counter-intuitive flow is hard to understand for newcomers. Mental Translation Developers need to translate written code into its logical execution flow Unexpected Results This mismatch can lead to confusing errors and unexpected query behavior.
  12. Icon What You Write Isn't What Runs SELECT s.sale_date, SUM(s.quantity),

    RANK() OVER (ORDER BY s.sale_date) FROM sales AS s JOIN customers USING(customer_id) WHERE s.country = 'Germany' GROUP BY s.sale_date HAVING SUM(s.quantity) > 3 ORDER BY s.sale_date from join filter aggregation filter sort window projection
  13. Icon Dataframe APIs pd.merge( left_df, right_df) .query("…") .groupby(…) .agg(…) .filter(…)

    .assign( another_column=… ) .query("…") left_df .join(right_df) .filter(…) .group_by(…) .agg(…) .filter(…) .with_columns(…) .filter(…) left_df .join(right_df) .filter(…) .groupBy(…) .agg(…) .filter(…) .withColumn(…) .filter(…) .select(…)
  14. Icon Dataframe APIs clear order of operations debug by inspecting

    intermediate DataFrame states apply the same type of transformation multiple times one consistent method for all data filtering
  15. Icon 2024: Google proposes Pipe Syntax Basic ideas: • SQL

    itself is good, but the syntactic structure isn’t • There is a large SQL ecosystem and existing userbase which has grown over decades • No alternative language has ever become mainstream • Address issues by fixing the syntactic structure of SQL instead of migrating away from it
  16. Icon What is SQL pipe syntax? Think of it like

    a data pipeline: Data flows from one step to the next, getting transformed along the way. Write database queries by chaining sequential operations together Pass the result of one step as the input to the next, creating a logical, top-to-bottom data flow The pipe character |> sends the data from one operator to the next Each operator takes a table as input and produces a new table as output Operators can be applied in any order, any number of times FROM WHERE JOIN SELECT GROUP BY ORDER BY LIMIT AS Many of these operators reuse the existing grammar for standard SQL clauses
  17. Icon What is SQL pipe syntax? input table SQL operator

    transformed output table Flow of operations filter join aggregation projection sort end result |>
  18. Icon Let’s write some pipe syntax SQL! Starting with a

    source In pipe syntax, you don't have to start with SELECT. You can begin directly with the data source, i.e. the FROM clause. FROM orders is perfectly valid pipe syntax SQL. FROM orders selects the orders table as the starting point for our data pipeline and returns the full table (all rows and columns). FROM orders
  19. Icon Let’s write some pipe syntax SQL! Filtering with WHERE

    The WHERE operator filters your data based on a condition you provide. It works just like the WHERE clause in standard SQL, but it's applied as a distinct step in the pipeline. In this query, the data from the orders table is "piped" to the WHERE operator, which filters for rows where the order status is 'F'. FROM orders |> WHERE orderstatus = 'F'
  20. Icon Let’s write some pipe syntax SQL! Selecting Columns The

    SELECT operator is used to choose which columns you want in your final result. In this query, the SELECT operator picks out only the customer_id and totalprice columns. Every operation in pipe syntax produces a table, so a final SELECT * is optional. It is implicit if you omit it. FROM orders |> WHERE orderstatus = 'F' |> SELECT customer_id, totalprice
  21. Icon Let’s write some pipe syntax SQL! The AGGREGATE operator

    is used for calculations like SUM(), COUNT(), and AVG(). Aggregates are calculated for each group. The result table only contains the grouping columns (dimensions) and aggregated columns (measures). Aggregating Data FROM orders |> WHERE orderstatus = 'F' |> SELECT customer_id, totalprice FROM orders |> WHERE orderstatus = 'F' |> SELECT customer_id, totalprice |> AGGREGATE SUM(totalprice) AS total_spent GROUP BY customer_id
  22. Icon Let’s write some pipe syntax SQL! SELECT customer_id, SUM(order_amount)

    AS total_spent FROM orders GROUP BY customer_id Traditional SQL FROM orders |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id Pipe Syntax SQL SELECT customer_id, SUM(order_amount) AS total_spent FROM orders GROUP BY customer_id Much cleaner than traditional SQL: Repeating column names in both the SELECT and GROUP BY clauses is not necessary
  23. Icon Let’s write some pipe syntax SQL! The JOIN operator

    combines rows from two tables based on a related column. After a JOIN, columns from both tables are available for the next steps in your pipeline. You specify the table to join and the matching condition using ON. Different types like LEFT JOIN, RIGHT JOIN, OUTER JOIN are also supported. FROM orders |> JOIN customers ON orders.customer_id = customers.customer_id |> SELECT orders.id, customers.name Joining Tables
  24. Icon Let’s write some pipe syntax SQL! Chain multiple SQL

    operators to perform a complex task in a clear, step-by-step manner FROM orders |> WHERE orderstatus = 'F' |> AS o |> JOIN customers ON o.customer_id = customers.customer_id |> AGGREGATE SUM(totalprice) AS total_spent GROUP BY customers.name |> ORDER BY total_spent DESC |> SELECT total_spent, name |> LIMIT 5 Putting It All Together Remember: In pipe syntax, operators can be placed in any order - but the order affects the end result
  25. Icon Convenience Projection Operators Add a new calculated column while

    keeping the existing ones. FROM lineitem |> EXTEND quantity*price AS cost EXTEND Updates the values in an existing column. FROM orders |> SET status = LOWER(status) SET Removes a column from the output. FROM orders |> DROP comment DROP Renames specified columns. FROM orders |> RENAME price AS price_usd RENAME
  26. Icon Where does pipe syntax shine? multiple aggregations without inside-out

    data flow iterative development and debugging adding quick filters use aliases in any subsequent operator
  27. Icon Inside-out data flow Where does pipe syntax shine? Traditional

    SQL SELECT AVG(items_per_order) AS avg_items_per_order FROM ( SELECT order_id, COUNT(product_id) AS items_per_order FROM orders GROUP BY order_id ) Pipe SQL Syntax FROM orders |> AGGREGATE COUNT(product_id) AS items_per_order GROUP BY order_id |> AGGREGATE AVG(items_per_order) AS avg_items_per_order
  28. Icon Inside-out data flow Where does pipe syntax shine? Traditional

    SQL SELECT AVG(num_trips) AS avg_trips_per_year, payment_type FROM ( SELECT EXTRACT(YEAR FROM start_ts) AS year, payment_type, COUNT(*) AS num_trips FROM taxi_trips GROUP BY year, payment_type ) GROUP BY payment_type Pipe SQL Syntax FROM taxi_trips |> EXTEND EXTRACT(YEAR FROM start_ts) AS year |> AGGREGATE COUNT(*) AS num_trips GROUP BY year, payment_type |> AGGREGATE AVG(num_trips) AS avg_trips_per_year GROUP BY payment_type
  29. Icon SQL Pipe Syntax For any valid chain of n

    SQL pipe operators, any subset of the first m <= n operators also represents a valid query. You can run the query up to any point in the pipeline just by deleting or commenting out the lines below it. Inspect the intermediate data at each stage and pinpoint exactly where your logic is going wrong. Iterative Development & Debugging Where does pipe syntax shine? Traditional SQL Debugging a complex query often involves: • commenting out large sections of code • running subqueries in isolation to check their output Making sure the modified query is still valid and correct can be tricky!
  30. Icon Iterative Development & Debugging Where does pipe syntax shine?

    FROM orders |> WHERE orderstatus = 'F' |> JOIN customers ON orders.customer_id = customers.customer_id |> SUM(totalprice) AS total_spent GROUP BY customer_id |> ORDER BY total_spent DESC |> SELECT total_spent customers.name |> LIMIT 5 Each prefix of a query (up to a pipe character) is also a valid query
  31. Icon Adding Quick Filters Where does pipe syntax shine? SQL

    pipe operators may follow any valid SQL query Appending SQL pipe syntax to traditional SQL queries is perfectly valid! SELECT customer_id, SUM(totalprice) AS total_spent, CASE WHEN SUM(totalprice) > 10000 THEN 'High Spender' ELSE 'Low Spender' END AS tier FROM orders GROUP BY customer_id |> WHERE tier = 'High Spender'
  32. Icon Adding Quick Filters Where does pipe syntax shine? Running

    an existing query on a subset of the data: Add a WHERE at the very top FROM taxi_trips |> EXTEND EXTRACT(YEAR FROM start_ts) AS year |> AGGREGATE COUNT(*) AS num_trips GROUP BY year, payment_type |> AGGREGATE AVG(num_trips) AS avg_trips_per_year GROUP BY payment_type FROM taxi_trips |> WHERE trip_total > 10.0 |> EXTEND EXTRACT(YEAR FROM start_ts) AS year |> AGGREGATE COUNT(*) AS num_trips GROUP BY year, payment_type |> AGGREGATE AVG(num_trips) AS avg_trips_per_year GROUP BY payment_type
  33. Icon Simplified Keywords SELECT department_name, RANK() OVER (ORDER BY AVG(salary))

    AS department_rank FROM employees WHERE country = 'Germany' GROUP BY department_name HAVING COUNT(*) > 10 QUALIFY department_rank <= 5 FROM employees |> WHERE country = 'Germany' |> AGGREGATE AVG(salary) AS avg_salary, COUNT(*) AS num_employees GROUP BY department_name |> WHERE num_employees > 10 |> EXTEND RANK() OVER (ORDER BY avg_salary) AS department_rank |> WHERE department_rank <= 5 |> SELECT department_name, department_rank SQL Pipe Syntax Traditional SQL Where does pipe syntax shine?
  34. Icon Aliases: Define Once, Reuse Instantly Valid Alternative: WHERE EXTRACT(YEAR

    FROM order_date)= 2021 SQL pipe syntax FROM sales |> EXTEND EXTRACT(YEAR FROM order_date) AS sales_year |> SELECT sales_year, customer_id SELECT EXTRACT(YEAR FROM order_date) AS sales_year, customer_id FROM sales Traditional SQL INVALID WHERE sales_year = 2021 |> WHERE sales_year = 2021 VALID Where does pipe syntax shine?
  35. Icon CTE 2 I’ll use a linear flow everywhere! Complex

    queries with a tree-like structure may require nesting even in pipe syntax result Subqueries are valid in pipe syntax CTEs can be defined using the |> WITH operator JOIN CTE 1 source table CTE 1 CTE 2 source table Can be expressed using a linear operator structure Nested CTEs or subqueries:
  36. Icon Which engines support pipe syntax? version 4.0 and above

    (released May 2025) https://spark.apache.org/docs/latest/sql-pipe-syntax.html runtime 16.2 and above (released February 2025) https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-pipeline https://cloud.google.com/bigquery/docs/pipe-syntax-guide • all GoogleSQL-based products, including BigQuery, F1 and Spanner • also available in ZetaSQL (open-source version of GoogleSQL)
  37. Icon Any other DBs / query engines? Community extension: install

    psql from community Very close to the Google/Databricks/Spark syntax EXTEND, SET and AGGREGATE not supported, uses traditional GROUP BY instead Team has experimented with pipe syntax in 2024 Maintainers “not overly impressed with the Google pipe syntax” Will add it if PostgreSQL adds support for it ➜ Very Unlikely for Now PR (from December 2024) is still work-in-progress ClickHouse CTO likes it “That's a very good feature, and we should continue.” ➜ Probably available in the future
  38. Icon No support for your engine? Use an SQL transpiler!

    pipe syntax SQL traditional SQL transpile import sqlglot sqlglot.transpile( your_sql_query, read="bigquery", write="mysql", pretty="true" ) supports pipe syntax since v26.25.0 (2025-06-03) https://github.com/tobymao/sqlglot
  39. Icon Can I use AI to write pipe syntax? explicitly

    request “Write the query in GoogleSQL pipe syntax.” traditional SQL pipe syntax SQL AI Beware of: • syntax hallucinations • claims like “there is no such thing as SQL pipe syntax” Easy fix: Provide the pipe syntax documentation in the context write queries in SQL pipe syntax translate traditional SQL to pipe syntax
  40. Icon …but I use a linter / formatter! supports SQL

    pipe syntax since version 3.4.1 (released 2025-06-13) FROM employees |> WHERE country = 'Germany' |> AS e |> INNER JOIN departments AS d ON e.department_id = d.id |> AGGREGATE AVG(e.salary) AS average_salary, COUNT(e.id) AS employee_count GROUP BY d.department_name |> WHERE employee_count > 10 Linting Formatting
  41. Icon SQL Pipe Syntax: Is It Right for You? You

    write complex analytical queries iteratively You're working in a modern data stack Readability for complex logic is a priority You are comfortable with a functional programming style You’re working with relational database engines like MySQL, SQL Server, or Oracle You need maximum performance on transactional databases You prefer a purely declarative approach Your queries are simple You have graph-like data flows
  42. Icon Thank you for your attention! Contact me on LinkedIn:

    linkedin.com/in/tlampert we’re hiring! lotum.com/jobs Fabian Wessel Tobi