Organizing complex SQL in Rails

October 6, 2016
Nowadays, Rails is the most popular framework for bootstrapping web services.

All services need to read and write their data somehow. In other words, they need to work with a database. Rails does this job really well in a modern and simple way.

As you may know, the framework brings you ActiveRecord (AR) which is find really useful. I think that AR is the most successful part of Rails despite many articles criticising its design and many other problems.

But these architectural issues are not the focus of this article. Let's talk about organizing SQL queries in Rails.

Scopes, relations, Arel simplify building general purpose SQL queries like select * from users where id = ?. You just need to break queries into logical blocks, organize scopes and combine them. But what if we have SQL-heavy queries with a lot of statements and even with CTE? Breaking such SQL queries into scopes is not a simple task and often it's not the best solution. After analyzing many of our projects we have collected the most popular approaches that help us solve many problems when we dive into the SQL-Hell. Let's talk about them.
Approach 1: Stick with Arel
Many people try to avoid writing plain SQL and stick with Arel. Some Arel extensions such as postgres_ext allow you to use advanced database features in Arel.

Arel is a great tool but only for small or mid-size queries. When we talk about queries with simple filters it's not a problem to break them and replace with several lines of Arel instructions.
  scope :pending,  -> { 
    t = QuovoLogin.arel_table
    where(
      t[:status].in(%w(login questions config syncing)).
        or(t[:status].eq(nil)).
        or(t[:in_progress].eq(true)).
        or(t[:migration_status].in(%w(update link))).
        or(t[:require_portfolio_link].eq(true))
    )
  }
But what if we talk about complex queries with grouping and aggregates? Sometimes Arel converts them into totally unreadable code.

Compare those two snippets:

Using Arel:
def allocated_in(allocation_keys)
  t = HoldingAllocation.arel_table

  allocations = HoldingAllocation.
    where(t[:kind].eq('class')).
    where(t[:holding_id].eq(Holding.arel_table[:id]))

  Holding.where(
            Arel::Nodes::Equality.new(
              Arel::Nodes::Grouping.new(allocations.select('count(*)').ast), 
              1
            )
          ).where(
            allocations.where(t[:key].in(allocation_keys)).exists
          ).to_a
end
Using SQL:
def allocated_in(allocation_keys)
  Holding.find_by_sql([<<-SQL, keys: allocation_keys])
    WITH allocated AS (
      SELECT holding_id FROM holding_allocations
      WHERE key IN :keys
        AND kind = 'class'
      GROUP by holding_id HAVING count(*) = 1
    )
    SELECT * FROM holdings
    WHERE id IN (SELECT * FROM allocated)
  SQL
end
Arel allows us to keep complex queries flexible and database-agnostic, but sometimes the cost of this is totally unreadable code that is hard to support. Such Arel queries are also hard to debug since you don't see final SQL in code – you need to search for it in logs or use the Rails console for SQL debugging. That is also not a perfect option.

Before you stick with Arel I strongly recommend considering beautiful plain SQL.
Approach 2: Plain SQL
If you keep SQL lines monolith and you don't need to combine them in different ways, then Plain SQL is a great option for you. In the real world, you can use this approach for pdf/csv/xls export:
def export
  sql = <<-SQL
    WITH balances AS (
      SELECT sum(current_value) AS value, COUNT(*) AS holdings_count, parent_account_id AS id
      FROM holdings
      WHERE parent_account_id IS NOT NULL
      GROUP BY parent_account_id
    ),
    owners AS (
      SELECT id, email, concat(users.first_name, ' ', users.last_name) AS fullname
      FROM users
    )
    SELECT accounts.name, owners.email, owners.fullname, balances.holdings_count, balances.value
    FROM accounts
    INNER JOIN balances ON balances.id = accounts.id
    INNER JOIN owners ON owners.id = accounts.user_id
    ORDER BY balances.value DESC
  SQL
  ActiveRecord::Base.connection.select_all(sql).rows.map do |account, email, fullname, holdings, value|
    [account, email, fullname, Integer(holdings), Float(value)]
  end
end
In most cases, such exports are not flexible. There are no user-defined filters so you don't need to pass a lot of options from Rails to SQL. In these cases, my recommendation is to keep SQL as it is, without converting to Arel.

It's easy to understand what's going on without multiple switching between logs ↔︎ pg ↔︎ code. It's easy to debug: just copy SQL from code and paste it into the pg console, and replace the placeholders with some values.
Note: Since you don't use Arel here, pay attention to options that allow you pass to the database in cases when you use string interpolation you need to sanitize them manually.
Approach 3: SQL Views
You can go deeper and replace plain SQL query with SQL view:
CREATE VIEW account_balances AS (
  SELECT sum(current_value) AS balance, parent_account_id AS account_id
  FROM holdings
  WHERE parent_account_id IS NOT NULL
  GROUP BY parent_account_id
)
It's a perfect option when you need to reference to aggregated view in different reports or implement full-text search across multiple tables.

SQL views have several benefits:
  • Replacing static SQL to database allows to keep the application code light.
  • Views open the door to advanced caching techniques such as materialized views that could drastically improve app performance.
  • You can join views in other SQL queries and reduce code duplication.

For ActiveRecord, there is no difference between an SQL table and a view so you can easily bind an AR model to the view and add extra scopes to the existing SQL view.
class AccountBalance < ActiveRecord::Base
  self.primary_key = 'account_id'
  scope :positive, -> { where('balance > 0') }
end
However, it's not easy to manage SQL views using Rails migrations but, the Thoughtbot scenic Gem simplifies this process.
Approach 4: In-place scopes
Sometimes you need to operate with totally flexible SQL queries. We already know that Arel is a key to query flexibility. But what if we also need to share Arel parts between several reports? As an option, we can use AR scopes chaining.

However, placing all such scopes in the model may bloat your model and make it hard to read. Using Rails concerns may solve a problem with bloating but what if you need this bunch of scopes only in a few specific places relevant to reports application code?

In this case, I recommend using ActiveRecord extending method. It allows you to organize scopes in modules and connect them when you need.

You can isolate scopes with reports from the rest of application code:
module Allocation::HoldingExt
  def allocated_in(keys)
    # arel code that returns relation
  end
end

scope = Holding.all.extending(Allocation::HoldingExt)
scope.allocated_in(%w(stocks bonds))
Conclusion
I described the four methods that we use everyday in Castle projects. It does not mean that in one project we use only one way. Sometimes we mix them for better performance/readability. In some projects, we have many plain, SQL queries that work perfectly.

Despite that fact, we still love Rails Active Record (at least the database queries part). It's not perfect and in some moments it's too overloaded but it's still the most useful part of Rails.