SQL joins with Ecto

A review of SQL joins and how to do them in Ecto

Going back to basics

As back-end developers, SQL and relational databases are often a significant part of our daily workflows. Whilst I think that learning new things is an important part of being a developer, I find it is also quite useful to go back to basics once in a while. This is why in this article I am going to do both and we'll review the basics of SQL joins but we'll query the database using Ecto, a database wrapper for Elixir.

Because we are going to need some test data to play with, we'll also set up a sample Elixir project, add Ecto to it and configure it to connect to a local instance of Postgres.

What are joins?

In relational databases data is stored in tables. Joins allow us to combine rows from two or more tables in order to answer questions that require data from multiple sources. A somewhat classic example might be a customers table containing customer data and an orders table containing order details. If we wanted to display a list of orders with corresponding customer names we'd need to join the two tables together.

What is Ecto?

Ecto is a database wrapper library for the Elixir programming language. It allows us to access relational databases from Elixir applications without having to write plain SQL. It provides a feature-rich and readable query DSL which covers the vast majority of queries one might typically need. On top of queries, Ecto provides numerous other features, such as Schemas, which we are deliberately not going to use here to keep things as simple as possible.

Sample project

This and the following section include steps required to set up a project using Ecto and populate the database with some sample data. If you don't want to run code examples on your machine have a brief look at our sample data below and feel free to jump ahead to joins.

In order to let us play with Ecto in the IEx console we'll set up a sample project and add Ecto as a dependency. We'll use Elixir 1.7.3 and PostgreSQL which is a popular open-source relational database. Create a new Elixir project by running:

$ mix new joins --sup
$ cd joins

Now, open up mix.exs and add Ecto as a dependency:

# mix.exs

defp deps do
  [
    {:ecto, "~> 2.0"},
    {:postgrex, "~> 0.11"}
  ]
end

We have added Ecto and Postgrex, a PostgreSQL driver that Ecto uses to speak to PostgreSQL databases. We can go ahead and fetch them now.

$ mix deps.get

In Ecto, all communication between our application and the database happens through a module called Repo. Fortunately, Ecto comes with a mix task which will generate the repo for us. Let's go ahead and run the following:

$ mix ecto.gen.repo -r Joins.Repo

The output includes the following lines:

* creating lib/joins
* creating lib/joins/repo.ex
* updating config/config.exs

Not only does it tell us that the repo module has indeed been created for us, it also says that config/config.exs has been updated as well. This file is where we tell Ecto how to connect to the database so let's go ahead and update it now. We need to update the file to tell the app which repo to use and what username and password are needed to connect to the instance of PostgreSQL we are running locally.

# config/config.exs
config :joins, ecto_repos: [Joins.Repo]

config :joins, Joins.Repo,
  adapter: Ecto.Adapters.Postgres,
  database: "joins_repo",
  username: "user",
  password: "pass",
  hostname: "localhost"

The last things that is left is to make sure that the repo is included in our app's supervision tree by updating lib/joins/application.ex to say:

# lib/joins/application.ex

children = [
  Joins.Repo,
]

Now Ecto is configured to connect to the database server. We can verify this by asking Ecto to create the joins_repo database we've just defined in config.exs:

$ mix ecto.create

The database for Joins.Repo has been created

Great!

Sample data

Before we can start joining tables we need to have some tables and data to work with. For the purpose of this article we'll look at the following two tables: countries and coffee_exporters:

countries

id | country        | population
---|----------------|-----------
1  |         Brazil | 209288278
2  |        Vietnam | 95540800
3  | United Kingdom | 66181585
4  |       Colombia | 49065615

coffee_exporters

id | country_id | tons
---|------------|--------
1  |          1 | 2592000
2  |          2 | 1650000
3  |          4 | 810000
4  |          5 | 660000

To get this data into our database we need to create the tables first. We'll do so by generating a migration:

$ mix ecto.gen.migration create_countries_and_coffee_exporters
* creating priv/repo/migrations
* creating priv/repo/migrations/20181007132829_create_countries_and_coffee_exporters.exs

Open the newly created file (your timestamp at the beginning of the file name will be different) and edit is so that it says as follows:

# priv/repo/migrations/20181007132829_create_countries_and_coffee_exporters.exs

defmodule Joins.Repo.Migrations.CreateCountriesAndCoffeeExporters do
  use Ecto.Migration

  def change do
    create table(:countries) do
      add :country, :string
      add :population, :integer
    end

    create table(:coffee_exporters) do
      add :country_id, :integer
      add :tons, :integer
    end
  end
end

Now run the following:

$ mix ecto.migrate

17:24:27.525 [info]  == Running Joins.Repo.Migrations.CreateCountriesAndCoffeeExporters.change/0 forward
17:24:27.526 [info]  create table countries
17:24:27.532 [info]  create table coffee_exporters
17:24:27.537 [info]  == Migrated in 0.0s

As you can see, the tables have been successfully created.

The last step is to put some data in the database. We could do this with plain SQL but, because we are trying to learn Ecto here, we'll use Ecto for inserting data into the database. Open up the interactive console with iex -S mix and try the following:

iex(1)> Joins.Repo.insert_all("countries", [%{country: "Brazil", population: 209288278}, %{country: "Vietnam", population: 95540800}, %{country: "United Kingdom", population: 66181585}, %{country: "Colombia", population: 49065615}])
{4, nil}

iex(2)> Joins.Repo.insert_all("coffee_exporters", [%{country_id: 1, tons: 2592000}, %{country_id: 2, tons: 1650000}, %{country_id: 4, tons: 810000}, %{country_id: 5, tons: 660000}])
{4, nil}

We have successfully inserted our data. We can run a simple query to make sure all works as expected. Let's see all data in our countries table.

iex(3)> require Ecto.Query
Ecto.Query
iex(4)> Ecto.Query.from("countries", select: [:id,  :country, :population]) |>
...(4)> Joins.Repo.all()
[
  %{country: "Brazil", id: 1, population: 209288278},
  %{country: "Vietnam", id: 2, population: 95540800},
  %{country: "United Kingdom", id: 3, population: 66181585},
  %{country: "Colombia", id: 4, population: 49065615}
]

We got our data back as expected. Now, we can finally look at some joins.

Joins

Let's say we want to find out how many people live in Brazil. This is fairly simple, we need to query our countries table like so:

iex(5)> Ecto.Query.from("countries", select: [:population], where: [country: "Brazil"]) |>
...(5)> Joins.Repo.all()
[%{population: 209288278}]

Answering the question above was fairly simple because all data we needed lives in one table.

Inner join

Now, suppose that we want to find out how much coffee Brazil exports. The answer to this question is located in an entirely different database table. In order to find it we'd need to:

We can do so using a SQL join. We'll join coffee_exporters to countries and we'll use the country_id column as the foreign key. A foreign key means that we expect the countries table to have a record the id of which matches the value stored in the country_id column. We can do it in Ecto like so:

iex(6)> query = Ecto.Query.from c in "countries",
...(6)> join: ce in "coffee_exporters",
...(6)> on: ce.country_id == c.id,
...(6)> where: c.country == "Brazil",
...(6)> select: %{tons_of_coffee_exported: ce.tons}
iex(7)> Joins.Repo.one(query)
%{tons_of_coffee_exported: 2592000}

And we got our answer! If you think of two database tables as sets, an inner join can be thought of as the intersection of those two sets - it only returns those records which have been successfully joined and therefore exist in both tables. Some like to represent this as the following Venn diagram:

Venn diagram - inner join

In Ecto, an inner join can be achieved by using join and on clauses, as shown above. Note that, compared to our previous queries, we had to use in expressions to give our database tables aliases which we then used in where and select clauses, e.g. c.country or ce.tons. If you are familiar with SQL you'll immediately notice how similar they are to AS clauses.

Left join

We said that an inner join includes only those records which have been successfully joined, any records which don't match the on clause are ignored. A left join is different - it will always include the first database table (i.e. the left one), even if the other table doesn't have a corresponding row. In terms of Venn diagrams, it could be thought of as shown below.

Venn diagram - left join

Let's look at an example. You might have already noticed that the United Kingdom doesn't have a corresponding record in the coffee_exporters table. Let's see what would happen if we tried to see how much coffee it exports using a regular join. This time we'll also include the country's name in the results.

iex(8)> query = Ecto.Query.from c in "countries",
...(8)> join: ce in "coffee_exporters",
...(8)> on: ce.country_id == c.id,
...(8)> where: c.country == "United Kingdom",
...(8)> select: %{country: c.country, tons_of_coffee_exported: ce.tons}
iex(9)> Joins.Repo.all(query)
[]

We got an empty array back! This is because there aren't any results. Like we've just discussed, the UK doesn't have a matching record in the coffee_exporters table and it gets ignored by a regular (inner) join.

Let's see what happens when we use a left join instead.

iex(10)> query = Ecto.Query.from c in "countries",
...(10)> left_join: ce in "coffee_exporters",
...(10)> on: ce.country_id == c.id,
...(10)> where: c.country == "United Kingdom",
...(10)> select: %{country: c.country, tons_of_coffee_exported: ce.tons}
iex(11)> Joins.Repo.one(query)
%{country: "United Kingdom", tons_of_coffee_exported: nil}

Now the UK was returned! All we had to do was use a left_join clause instead of join.

When can a left join be useful? Imagine a country can only be considered a coffee exporter if it exists in the coffee_exporters table. We could then ask a question: which countries are not coffee exporters? The following query would give us an answer:

iex(12)> query = Ecto.Query.from c in "countries",
...(12)> left_join: ce in "coffee_exporters",
...(12)> on: ce.country_id == c.id,
...(12)> where: is_nil(ce.tons),
...(12)> select: %{country: c.country}
iex(13)> Joins.Repo.all(query)
[%{country: "United Kingdom"}]

Here we have changed the where clause of our query to say is_nil(ce.tons). By selecting only those countries with NULL in the tons column we correctly get just the UK back.

Right join

A right join is essentially a left join reversed - it is the records from the second (right) table that always make it into the final result, even if they don't have corresponding records in the first table.

Venn diagram - right join

There is a row in the coffee_exporters table that doesn't have a corresponding country in the countries table. We can find out which row it is using the following query. Notice how we are using right_join now.

iex(14)> query = Ecto.Query.from c in "countries",
...(14)> right_join: ce in "coffee_exporters",
...(14)> on: ce.country_id == c.id,
...(14)> where: is_nil(c.id),
...(14)> select: %{country_id: ce.country_id, tons_of_coffee_exported: ce.tons}
iex(15)> Joins.Repo.all(query)
[%{country_id: 5, tons_of_coffee_exported: 660000}]

And we can immediately see that the row that is missing a country is the one with country_id set to 5 (if you are curious the country that is missing is actually Indonesia).

Left and right joins are so similar I consider the right join to be a mirror reflection of the left join. The question above could actually be answered with a left join if we changed the order of tables in the query.

iex(16)> query = Ecto.Query.from ce in "coffee_exporters",
...(16)> left_join: c in "countries",
...(16)> on: ce.country_id == c.id,
...(16)> where: is_nil(c.id),
...(16)> select: %{country_id: ce.country_id, tons_of_coffee_exported: ce.tons}
iex(17)> Joins.Repo.all(query)
[%{country_id: 5, tons_of_coffee_exported: 660000}]

Full join

A full join is a combination of a left join and a right join, it includes all records regardless of whether they have a matching row in the other table or not. Imagine we want to find all table entries that don't have a matching row, regardless of which table they come from. The following does the trick.

iex(18)> query = Ecto.Query.from c in "countries",
...(18)> full_join: ce in "coffee_exporters",
...(18)> on: ce.country_id == c.id,
...(18)> where: is_nil(c.id) or is_nil(ce.country_id),
...(18)> select: %{country: c.country, country_id: ce.country_id}
iex(19)> Joins.Repo.all(query)
[%{country: nil, country_id: 5}, %{country: "United Kingdom", country_id: nil}]

Here we used full_join and tweaked the where clause to include both conditions we saw in previous queries. We got two results back: The UK from the left join example and the orphaned record from the right join example.

For completeness, let`s have a look at a Venn diagram for a full join: it includes all matching records from both tables.

Venn diagram - full join

Cross join

The last kind of a join we are going to look at today is the cross join, also known as a cartesian product. A cross join combines each row from table A with each row from table B returning all possible combinations (which we can then naturally filter with where clauses).

Imagine we want to find all countries which are more populous than the UK and we want to do so in one database query. In order to do so we'll us a cross join to join the countries table to itself:

iex(20)> query = Ecto.Query.from c1 in "countries",
...(20)> cross_join: c2 in "countries",
...(20)> where: c1.country == "United Kingdom" and c2.population > c1.population,
...(20)> select: %{country: c2.country}
iex(21)> Joins.Repo.all(query)
[%{country: "Brazil"}, %{country: "Vietnam"}]

Brazil and Vietnam are more populous than the UK. Looking at our original sample data it makes sense.

The essential thing to remember about the cross join is that it returns all possible combinations of rows from the two tables. When we apply where queries to it we start eliminating some row combinations and the cross join effectively becomes a regular join. Consider the following solution to the problem:

iex(22)> query = Ecto.Query.from c1 in "countries",
...(22)> join: c2 in "countries",
...(22)> on: c1.country == "United Kingdom" and c2.population > c1.population,
...(22)> select: %{country: c2.country}
iex(23)> Joins.Repo.all(query)
[%{country: "Brazil"}, %{country: "Vietnam"}]

We managed to obtain an identical result by using a regular join instead of a cross join and moving our where constraints to the on clause.

Summary

In this article we've reviewed SQL joins and practiced them using Ecto. We saw how Ecto's expressive syntax makes joins easy and gives us flexibility which is on par with using plain SQL. We also needed a sample app with some data so we have had a look at configuring Ecto for a brand new Elixir project as well.

We have barely scratched the surface of Ecto here. It includes many other tools which we could have used in the examples above, e.g. schemas or associations, but I've left them out for brevity. I highly recommend looking at Ecto's excellent documentation to find out what else is available.