I was poking around in a Rails console, testing whether all the different calling styles for AR group behave identically under the hood and produce the same SQL. Mostly they do, but there is one to watch out for...
Setup with example models
Imagine a typical app scenario where we have three models and their associations
class User < ApplicationRecord
belongs_to :country
belongs_to :tenant
end
class Country < ApplicationRecord
has_many :users
end
class Tenant < ApplicationRecord
has_many :users
end
GROUP & COUNT
To quickly get the number of users per tenant, we could use something like the following
User.group(:tenant_id).count # { 1 => 10, 2 => 40 }Results in a Hash with Tenant IDs for keys, and the corresponding User count. 👍
Let's now imagine our query requires multi-column grouping. For example, if we wanted to query the number of users per tenant/country combination:
User.group(:tenant_id, :country_id).count
# { [1, "UK"] => 5, [1, "FR"] => 5, [2, "FR"] => 40 } ....
Again, results in a Hash that has an array of [Tenant ID, Country ID] combination for keys, and the corresponding User count. Tenant ID 1 has 5 Users from the UK and 5 Users from FR. 👍
While all the above examples are correct, the group method accepts multiple argument formats. All the following will execute. However, one of those should be avoided as it produces a different result. Can you guess which one?
User.group(:tenant_id, :country_id).count # 1
User.group(:tenant_id).group(:country_id).count # 2
User.group("tenant_id, country_id").count # 3
User.group("tenant_id", "country_id").count # 4
The right ways ✅
User.group(:tenant_id, :country_id).count # 1
User.group(:tenant_id).group(:country_id).count # 2
User.group("tenant_id", "country_id").count # 4
# => { [1, "UK"] => 5, [1, "FR"] => 5, [2, "FR"] => 40 }
SQL:
SELECT COUNT(*) AS "count_all",
"users"."tenant_id" AS "users_tenant_id",
"users"."country_id" AS "users_country_id"
FROM "users"
GROUP BY "users"."tenant_id", "users"."country_id"
The one to avoid ⛔️
The one to avoid is # 3, which uses a single string for multiple columns.
User.group("tenant_id, country_id").count # 3
# => { "UK" => 5, "FR" => 40 }SQL:
SELECT COUNT(*) AS "count_all",
tenant_id, country_id AS "tenant_id_country_id"
FROM "users"
GROUP BY tenant_id, country_id
Notice how ActiveRecord creates a single alias tenant_id_country_id for what it interprets as one expression. The database still groups correctly, but AR only uses the last column's value (country_id) as the hash key, causing rows with different tenant_id but the same country_id to overwrite each other. So even if you were to sum the values with wrongly_grouped_users.values.sum, the output would be wrong for any legitimate use case.
Mostly a note to self, but hopefully interesting to you too!