• Latest
A Blazingly Fast DBMS With Full SQL Join Support

A Blazingly Fast DBMS With Full SQL Join Support

March 25, 2023
Vision Pro Apple Walks Through Mixed Reality Headset Design

Vision Pro Apple Walks Through Mixed Reality Headset Design

June 6, 2023
tvOS 17 beta 1 now available to developers

tvOS 17 beta 1 now available to developers

June 6, 2023
Tears of the Kingdom Speedrunners Are Taking Breaking the Game to a New Level

Tears of the Kingdom Speedrunners Are Taking Breaking the Game to a New Level

June 6, 2023
Apple will make dev kits available to help developers build and test apps for Vision Pro headset

Apple will make dev kits available to help developers build and test apps for Vision Pro headset

June 6, 2023
Boltgun’ & ‘Monster Menu’, Plus New Releases and Sales – TouchArcade

Boltgun’ & ‘Monster Menu’, Plus New Releases and Sales – TouchArcade

June 6, 2023
Have Some More Trailers For Black Desert Online’s Upcoming Land Of The Morning Light Expansion Before Next Week’s Launch

Have Some More Trailers For Black Desert Online’s Upcoming Land Of The Morning Light Expansion Before Next Week’s Launch

June 6, 2023
Marques Brownlee Goes Clubbing and Gets Rejected by Everyone

Marques Brownlee Goes Clubbing and Gets Rejected by Everyone

June 6, 2023
iOS 17 locks access to your Safari private browsing tabs behind Face ID auth

iOS 17 locks access to your Safari private browsing tabs behind Face ID auth

June 6, 2023
DevOps in Legacy Systems – DZone

DevOps in Legacy Systems – DZone

June 6, 2023
Marques Brownlee's 2-Phone Setup: Why He Carries Two Phones and What You Can Learn

Marques Brownlee's 2-Phone Setup: Why He Carries Two Phones and What You Can Learn

June 6, 2023
Install iOS 17 beta – How-to

Install iOS 17 beta – How-to

June 6, 2023
CPU vs. GPU Intensive Applications

CPU vs. GPU Intensive Applications

June 6, 2023
Advertise with us
Tuesday, June 6, 2023
Bookmarks
  • Login
  • Register
GetUpdated
  • Game Updates
  • Mobile Gaming
  • Playstation News
  • Xbox News
  • Switch News
  • MMORPG
  • Game News
  • IGN
  • Retro Gaming
  • Tech News
  • Apple Updates
  • Jailbreak News
  • Mobile News
  • Software Development
  • Photography
  • Contact
No Result
View All Result
GetUpdated
No Result
View All Result
GetUpdated
No Result
View All Result
ADVERTISEMENT

A Blazingly Fast DBMS With Full SQL Join Support

March 25, 2023
in Software Development
Reading Time:15 mins read
0 0
0
Share on FacebookShare on WhatsAppShare on Twitter


ClickHouse is an open-source real-time analytics database built and optimized for use cases requiring super-low latency analytical queries over large amounts of data. To achieve the best possible performance for analytical applications, combining tables in a data denormalization process is typical. Flattened tables help minimize query latency by avoiding joins at the cost of incremental ETL complexity, typically acceptable in return for sub-second queries. 

However, denormalizing data isn’t always practical for some workloads, for instance, those coming from more traditional data warehouses. Sometimes, part of the source data for analytical queries needs to remain normalized. These normalized tables take less storage and provide flexibility with data combinations, but they require joins at query time for certain types of analysis.

Fortunately, contrary to some misconceptions, joins are fully supported in ClickHouse! In addition to supporting all standard SQL JOIN types, ClickHouse provides additional JOIN types useful for analytical workloads and time-series analysis. ClickHouse allows you to choose between six different algorithms for the join execution or allow the query planner to adaptively choose and dynamically change the algorithm at runtime, depending on resource availability and usage.

Join Types Supported in Clickhouse

We use Venn diagrams and example queries on a normalized IMDB dataset originating from the relational dataset repository to explain the available join types in ClickHouse.

Instructions for creating and loading the tables are here. The dataset is also available in our playground for users wanting to reproduce queries.

We are going to use four tables from our example dataset:

Example dataset tables

The data in that four tables represent movies. A movie can have one or many genres. The roles in a movie are played by actors. The arrows in the diagram above represent foreign-to-primary-key-relationships. e.g., the movie_idcolumn of a row in the genres table contains theid value from a row in the movies table.

There is a many-to-many relationship between movies and actors. This many-to-many relationship is normalized into two one-to-many relationships by using the roles table. Each row in the roles table contains the values of the id fields of the movies table and the actors’ table. 

Inner Join

The Inner Join returns, for each pair of rows matching on join keys, the column values of the row from the left table, combined with the column values of the row from the right table. If a row has more than one match, then all matches are returned (meaning that the cartesian product is produced for rows with matching join keys).

This query finds the genre(s) for each movie by joining the movies table with the genres table:

SELECT
    m.name AS name,
    g.genre AS genre
FROM movies AS m
INNER JOIN genres AS g ON m.id = g.movie_id
ORDER BY
    m.year DESC,
    m.name ASC,
    g.genre ASC
LIMIT 10;

┌─name───────────────────────────────────┬─genre─────┐
│ Harry Potter and the Half-Blood Prince │ Action    │
│ Harry Potter and the Half-Blood Prince │ Adventure │
│ Harry Potter and the Half-Blood Prince │ Family    │
│ Harry Potter and the Half-Blood Prince │ Fantasy   │
│ Harry Potter and the Half-Blood Prince │ Thriller  │
│ DragonBall Z                           │ Action    │
│ DragonBall Z                           │ Adventure │
│ DragonBall Z                           │ Comedy    │
│ DragonBall Z                           │ Fantasy   │
│ DragonBall Z                           │ Sci-Fi    │
└────────────────────────────────────────┴───────────┘

10 rows in set. Elapsed: 0.126 sec. Processed 783.39 thousand rows, 21.50 MB (6.24 million rows/s., 171.26 MB/s.)

Note that the Inner keyword can be omitted. The behavior of the Inner Join can be extended or changed by using one of the following other join types.

(Left/Right/Full) Outer Join

(Left/Right/Full) Outer Join

The Left Outer Join behaves like Inner Join; plus, for non-matching left table rows, ClickHouse returns default values for the right table’s columns. 

A Right Outer Join query is similar and also returns values from non-matching rows from the right table together with default values for the columns of the left table.

A Full Outer Join query combines the left and right outer join and returns values from non-matching rows from the left and the right table, together with default values for the columns of the right and left table, respectively.  

Note that ClickHouse can be configured to return NULLs instead of default values (however, for performance reasons, that is less recommended).

This query finds all movies that have no genre by querying for all rows from the movies table that don’t have matches in the genres table and therefore gets (at query time) the default value 0 for the movie_id column:

SELECT m.name
FROM movies AS m
LEFT JOIN genres AS g ON m.id = g.movie_id
WHERE g.movie_id = 0
ORDER BY
    m.year DESC,
    m.name ASC
LIMIT 10;


┌─name──────────────────────────────────────┐
│ """Pacific War, The"""                    │
│ """Turin 2006: XX Olympic Winter Games""" │
│ Arthur, the Movie                         │
│ Bridge to Terabithia                      │
│ Mars in Aries                             │
│ Master of Space and Time                  │
│ Ninth Life of Louis Drax, The             │
│ Paradox                                   │
│ Ratatouille                               │
│ """American Dad"""                        │
└───────────────────────────────────────────┘

10 rows in set. Elapsed: 0.092 sec. Processed 783.39 thousand rows, 15.42 MB (8.49 million rows/s., 167.10 MB/s.)

Note that the Outer keyword can be omitted.

Cross Join

The Cross Join produces the full cartesian product of the two tables without considering join keys. Each row from the left table is combined with each row from the right table.

The following query, therefore, is combing each row from the movies table with each row from the genres table:

SELECT
    m.name,
    m.id,
    g.movie_id,
    g.genre
FROM movies AS m
CROSS JOIN genres AS g
LIMIT 10;

┌─name─┬─id─┬─movie_id─┬─genre───────┐
│ #28  │  0 │        1 │ Documentary │
│ #28  │  0 │        1 │ Short       │
│ #28  │  0 │        2 │ Comedy      │
│ #28  │  0 │        2 │ Crime       │
│ #28  │  0 │        5 │ Western     │
│ #28  │  0 │        6 │ Comedy      │
│ #28  │  0 │        6 │ Family      │
│ #28  │  0 │        8 │ Animation   │
│ #28  │  0 │        8 │ Comedy      │
│ #28  │  0 │        8 │ Short       │
└──────┴────┴──────────┴─────────────┘

10 rows in set. Elapsed: 0.024 sec. Processed 477.04 thousand rows, 10.22 MB (20.13 million rows/s., 431.36 MB/s.)

While the previous example query alone didn’t make much sense, it can be extended with a where clause for associating matching rows to replicate Inner join behavior for finding the genre(s) for each movie:

SELECT
    m.name,
    g.genre
FROM movies AS m
CROSS JOIN genres AS g
WHERE m.id = g.movie_id
ORDER BY
    m.year DESC,
    m.name ASC
LIMIT 10;

┌─name───────────────────────────────────┬─genre─────┐
│ Harry Potter and the Half-Blood Prince │ Action    │
│ Harry Potter and the Half-Blood Prince │ Adventure │
│ Harry Potter and the Half-Blood Prince │ Family    │
│ Harry Potter and the Half-Blood Prince │ Fantasy   │
│ Harry Potter and the Half-Blood Prince │ Thriller  │
│ DragonBall Z                           │ Action    │
│ DragonBall Z                           │ Sci-Fi    │
│ DragonBall Z                           │ Fantasy   │
│ DragonBall Z                           │ Comedy    │
│ DragonBall Z                           │ Adventure │
└────────────────────────────────────────┴───────────┘

10 rows in set. Elapsed: 0.441 sec. Processed 783.39 thousand rows, 21.50 MB (1.78 million rows/s., 48.78 MB/s.)

An alternative syntax for Cross Join specifies multiple tables in the from clause separated by commas.

ClickHouse is rewriting a Cross Join to an Inner Join if there are joining expressions in the where section of the query.

We can check that for the example query via EXPLAIN SYNTAX (that returns the syntactically optimized version into which a query gets rewritten before being executed):

EXPLAIN SYNTAX
SELECT
    m.name AS name,
    g.genre AS genre
FROM movies AS m
CROSS JOIN genres AS g
WHERE m.id = g.movie_id
ORDER BY
    m.year DESC,
    m.name ASC,
    g.genre ASC
LIMIT 10;

┌─explain─────────────────────────────────────┐
│ SELECT                                      │
│     name AS name,                           │
│     genre AS genre                          │
│ FROM movies AS m                            │
│ ALL INNER JOIN genres AS g ON id = movie_id │
│ WHERE id = movie_id                         │
│ ORDER BY                                    │
│     year DESC,                              │
│     name ASC,                               │
│     genre ASC                               │
│ LIMIT 10                                    │
└─────────────────────────────────────────────┘

11 rows in set. Elapsed: 0.077 sec.

The Inner Join clause in the syntactically optimized Cross Join query version contains the all keyword, that got explicitly added in order to keep the cartesian product semantics of the Cross Join even when being rewritten into an Inner Join, for which the cartesian product can be disabled.

And because, as mentioned above, the Outer keyword can be omitted for a Right Outer Join, and the optional all keyword can be added. You can write All Right Join and it will work all right.

Left/Right Semi Join

A Left Semi Join query returns column values for each row from the left table that has at least one join key match in the right table. Only the first found match is returned (the cartesian product is disabled).

A Right Semi Join query is similar and returns values for all rows from the right table with at least one match in the left table, but only the first found match is returned.

This query finds all actors/actresses that performed in a movie in 2023. Note that with a normal (Inner) join, the same actor/actress would show up more than one time if they had more than one role in 2023:

SELECT
    a.first_name,
    a.last_name
FROM actors AS a
LEFT SEMI JOIN roles AS r ON a.id = r.actor_id
WHERE toYear(created_at) = '2023'
ORDER BY id ASC
LIMIT 10;

┌─first_name─┬─last_name──────────────┐
│ Michael    │ 'babeepower' Viera     │
│ Eloy       │ 'Chincheta'            │
│ Dieguito   │ 'El Cigala'            │
│ Antonio    │ 'El de Chipiona'       │
│ José       │ 'El Francés'           │
│ Félix      │ 'El Gato'              │
│ Marcial    │ 'El Jalisco'           │
│ José       │ 'El Morito'            │
│ Francisco  │ 'El Niño de la Manola' │
│ Víctor     │ 'El Payaso'            │
└────────────┴────────────────────────┘

10 rows in set. Elapsed: 0.151 sec. Processed 4.25 million rows, 56.23 MB (28.07 million rows/s., 371.48 MB/s.)

Left/Right Anti Join

A Left Anti Join returns column values for all non-matching rows from the left table.

Similarly, the Right Anti Join returns column values for all non-matching right table rows.

An alternative formulation of our previous outer join example query is using an anti join for finding movies that have no genre in the dataset:

SELECT m.name
FROM movies AS m
LEFT ANTI JOIN genres AS g ON m.id = g.movie_id
ORDER BY
    year DESC,
    name ASC
LIMIT 10;

┌─name──────────────────────────────────────┐
│ """Pacific War, The"""                    │
│ """Turin 2006: XX Olympic Winter Games""" │
│ Arthur, the Movie                         │
│ Bridge to Terabithia                      │
│ Mars in Aries                             │
│ Master of Space and Time                  │
│ Ninth Life of Louis Drax, The             │
│ Paradox                                   │
│ Ratatouille                               │
│ """American Dad"""                        │
└───────────────────────────────────────────┘

10 rows in set. Elapsed: 0.077 sec. Processed 783.39 thousand rows, 15.42 MB (10.18 million rows/s., 200.47 MB/s.)

Left/Right/Inner Any Join

Left/Right/Inner Any Join

A Left Any Join is the combination of the Left Outer Join + the Left Semi Join, meaning that ClickHouse returns column values for each row from the left table, either combined with the column values of a matching row from the right table or combined with default column values for the right table, in case no match exists. If a row from the left table has more than one match in the right table, ClickHouse only returns the combined column values from the first found match (the cartesian product is disabled).

Similarly, the Right Any Join is the combination of the Right Outer Join + the Right Semi Join.

And the Inner Any Join is the Inner Join with a disabled cartesian product.

We demonstrate the Left Any Join with an abstract example using two temporary tables (left_table and right_table) constructed with the values table function:

WITH
    left_table AS (SELECT * FROM VALUES('c UInt32', 1, 2, 3)),
    right_table AS (SELECT * FROM VALUES('c UInt32', 2, 2, 3, 3, 4))
SELECT
    l.c AS l_c,
    r.c AS r_c
FROM left_table AS l
LEFT ANY JOIN right_table AS r ON l.c = r.c;

┌─l_c─┬─r_c─┐
│   1 │   0 │
│   2 │   2 │
│   3 │   3 │
└─────┴─────┘

3 rows in set. Elapsed: 0.002 sec.

This is the same query using a Right Any Join:

WITH
    left_table AS (SELECT * FROM VALUES('c UInt32', 1, 2, 3)),
    right_table AS (SELECT * FROM VALUES('c UInt32', 2, 2, 3, 3, 4))
SELECT
    l.c AS l_c,
    r.c AS r_c
FROM left_table AS l
RIGHT ANY JOIN right_table AS r ON l.c = r.c;

┌─l_c─┬─r_c─┐
│   2 │   2 │
│   2 │   2 │
│   3 │   3 │
│   3 │   3 │
│   0 │   4 │
└─────┴─────┘

5 rows in set. Elapsed: 0.002 sec.

This is the query with an Inner Any Join:

WITH
    left_table AS (SELECT * FROM VALUES('c UInt32', 1, 2, 3)),
    right_table AS (SELECT * FROM VALUES('c UInt32', 2, 2, 3, 3, 4))
SELECT
    l.c AS l_c,
    r.c AS r_c
FROM left_table AS l
INNER ANY JOIN right_table AS r ON l.c = r.c;

┌─l_c─┬─r_c─┐
│   2 │   2 │
│   3 │   3 │
└─────┴─────┘

2 rows in set. Elapsed: 0.002 sec.

ASOF Join 

The ASOF Join, implemented for ClickHouse in 2019 by Martijn Bakker and Artem Zuikov, provides non-exact matching capabilities. If a row from the left table doesn’t have an exact match in the right table, then the closest matching row from the right table is used as a match instead.

This is particularly useful for time-series analytics and can drastically reduce query complexity.

We will do time-series analytics of stock market data as an example. A quotes table contains stock symbol quotes based on specific times of the day. The price is updated every 10 seconds in our example data. A trades table lists symbol trades – a specific volume of a symbol got bought at a specific time:

Quotes and Trades tables

In order to calculate the concrete cost of each trade, we need to match the trades with their closest quote time.

This is easy and compact with the ASOF Join, where we use the ON clause for specifying an exact match condition and the AND clause for specifying the closest match condition — we are looking for the closest row from the quotes table exactly or before the date of a trade:

SELECT
    t.symbol,
    t.volume,
    t.time AS trade_time,
    q.time AS closest_quote_time,
    q.price AS quote_price,
    t.volume * q.price AS final_price
FROM trades t
ASOF LEFT JOIN quotes q ON t.symbol = q.symbol AND t.time >= q.time
FORMAT Vertical;

Row 1:
──────
symbol:             ABC
volume:             200
trade_time:         2023-02-22 14:09:05
closest_quote_time: 2023-02-22 14:09:00
quote_price:        32.11
final_price:        6422

Row 2:
──────
symbol:             ABC
volume:             300
trade_time:         2023-02-22 14:09:28
closest_quote_time: 2023-02-22 14:09:20
quote_price:        32.15
final_price:        9645

2 rows in set. Elapsed: 0.003 sec.

Note that the ON clause of the ASOF Join is required and specifies an exact match condition next to the non-exact match condition of the AND clause.

ClickHouse currently doesn’t support (yet) joins without any part of the join keys performing strict matching.

Summary

This blog post showed how ClickHouse supports all standard SQL Join types, plus specialized joins to power analytical queries. We described and demonstrated all supported join types.



Source link

ShareSendTweet
Previous Post

Incredibly frustrating yet incredibly fun

Next Post

Street Fighter 6 Collector’s Edition Is Up for Preorder

Related Posts

DevOps in Legacy Systems – DZone

June 6, 2023
0
0
DevOps in Legacy Systems – DZone
Software Development

I had a discussion with one of my then-manager-colleagues about ensuring the movement of a planned item even if the...

Read more

CPU vs. GPU Intensive Applications

June 6, 2023
0
0
CPU vs. GPU Intensive Applications
Software Development

CPU vs. GPU Intensive Applications Computing has become increasingly important in our daily lives and society as a whole. While...

Read more
Next Post

Street Fighter 6 Collector's Edition Is Up for Preorder

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

© 2021 GetUpdated – MW.

  • About
  • Advertise
  • Privacy & Policy
  • Terms & Conditions
  • Contact

No Result
View All Result
  • Game Updates
  • Mobile Gaming
  • Playstation News
  • Xbox News
  • Switch News
  • MMORPG
  • Game News
  • IGN
  • Retro Gaming
  • Tech News
  • Apple Updates
  • Jailbreak News
  • Mobile News
  • Software Development
  • Photography
  • Contact

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?