Fixing an ActiveRecord join constraint bug
This is a more technical dive into an issue I fixed in Rails. If you want a less technical/more personal blog, check out that blog.
Background:โ
All code examples and links are from rails Rails 7.2.1. This was resolved in Rails 7.2.2.
Glossary:
- Association - A high level declaration of how two models are related in Rails.
- Reflection - A detailed representation of how an association is generated in Rails.
- Reflection Chain - A list of reflections that connect two models. Used by Rails to generate SQL joins.
ActiveRecord builds a reflection chain that connects the parent model to it's joined association.
Given the following class:
class Parent < ActiveRecord::Base
has_one :child
has_one :grandchild, through: :child
end
Parent.joins(:child)
has a reflection chain of Parent -> Child
and Parent.joins(:grandchild)
has a reflection chain of Parent -> Child -> Grandchild
.
Problem:โ
.left_outer_joins()
resulted in incorrect SQL joins when a child association shared the same parent association but not entire association ancestry.
I found this problem while working on our internal record filtering gem. A simplified overview of our set up is below.
class PurchaseOrder < ActiveRecord::Base
belongs_to :created_by, class_name: "User"
has_one :created_by_contact, class_name: "Contact", through: :created_by, source: :scoped_contact
belongs_to :approved_by, class_name: "User"
has_one :approved_by_contact, class_name: "Contact", through: :approved_by, source: :scoped_contact
end
class User < ActiveRecord::Base
has_many :account_memberships, inverse_of: :user, dependent: :destroy
has_one :scoped_contact, class_name: "Contact", through: :scoped_account_membership, source: :contact
end
class AccountMembership < ActiveRecord::Base
belongs_to :contact
end
When we perform an inner join we get:
-- PurchaseOrder.joins(:created_by_contact, :approved_by_contact).to_sql
SELECT
"purchase_orders".*
FROM
"purchase_orders"
INNER JOIN "users" ON "users"."id" = "purchase_orders"."created_by_id"
INNER JOIN "account_memberships" ON "account_memberships"."user_id" = "users"."id"
INNER JOIN "contacts" ON "contacts"."id" = "account_memberships"."contact_id"
INNER JOIN "users" "approved_bies_purchase_orders_join" ON "approved_bies_purchase_orders_join"."id" = "purchase_orders"."approved_by_id"
INNER JOIN "account_memberships" "scoped_account_memberships_purchase_orders_join" ON "scoped_account_memberships_purchase_orders_join"."user_id" = "approved_bies_purchase_orders_join"."id"
INNER JOIN "contacts" "approved_by_contacts_purchase_orders" ON "approved_by_contacts_purchase_orders"."id" = "scoped_account_memberships_purchase_orders_join"."contact_id"
We see that we correctly join "purchase_orders"."created_by_id"
to contacts
and "purchase_orders"."approved_by_id"
to approved_by_contacts_purchase_orders
(an alias for contacts
).
However, if we want to use .left_joins
we see a problem.
-- PurchaseOrder.left_joins(:created_by_contact, :approved_by_contact)
SELECT
"purchase_orders".*
FROM
"purchase_orders"
LEFT OUTER JOIN "users" ON "users"."id" = "purchase_orders"."created_by_id"
LEFT OUTER JOIN "account_memberships" ON "account_memberships"."user_id" = "users"."id"
LEFT OUTER JOIN "contacts" ON "contacts"."id" = "account_memberships"."contact_id"
LEFT OUTER JOIN "contacts" "approved_by_contacts_purchase_orders" ON "approved_by_contacts_purchase_orders"."id" = "account_memberships"."contact_id"
We correctly join "purchase_orders"."created_by_id"
to contacts
but "purchase_orders"."created_by_id"
is also joined to approved_by_contacts_purchase_orders
.
We are missing the join to the approved_by_contact
๐ฑ
Explanation of Problem:โ
Given the following code:
PurchaseOrder.left_joins(:created_by_contact, :approved_by_contact)
The created_by_contact
association with the PurchaseOrder
model has a reflection chain of PurchaseOrder(created_by) -> User -> AccountMembership -> Contact
.
When left joining on created_by_contact
, the make_constraints
method caches the reflection chain at each level. The cache key is the last reflection item of the reflection chain.
The following cache is generated for the created_by_contact
reflection chain.
{
Contact: [PurchaseOrder(created_by), User, AccountMembership, Contact],
AccountMembership: [PurchaseOrder(created_by), User, AccountMembership],
User: [PurchaseOrder(created_by), User],
PurchaseOrder(created_by): [PurchaseOrder(created_by)]
}
Since created_by_contact
is the first association joined, every reflection chain item is added to the cache.
The approved_by_contact
association with the PurchaseOrder
model has a reflection chain of PurchaseOrder(approved_by) -> User -> AccountMembership -> Contact
.
When we add :approved_by_contact
to the .left_joins()
method, the make_constraints
method checks the cache and finds a match on the Contact
reflection key for the value of [PurchaseOrder(created_by), User, AccountMembership, Contact]
and stops building any further join constraints.
Solution:โ
Now that I know the problem, I can fix it. The make_constraints
method didn't have enough information to discern if the reflection chain was a full or partial match.
I updated the code within the make_constraints
method and passed the entire reflection chain to use as the cache key.
This allowed make_constraints
to successfully match on when the reflection chain was the same but not match in my case when the reflection chain was different.
I created a PR to Rails with my fix and it was merged after three days! Two months later, it was released as part of Rails v7.2.2.