Published: 2011-04-12

Updated: 2011-11-03

Tagged: postgresql, rails, sql, subqueries, web applications

SQL subqueries provide a convenient and elegant way to formulate otherwise rather lengthy SQL expressions. I will show one example in a Ruby on Rails web application in this post.

Using SQL-Subqueries in Web Applications

Performing Business Logic in SQL

Performing involved business logic within relational data-stores has been eschewed in the past by many application developers. Thanks to the spread of ORMs it was even considered bad style to write any handcrafted SQL within certain circles. It seems that this trend has somewhat swapped over into the Ruby on Rails community.

On the other hand, performing computations in the SQL database is in general faster. Moreover, the code tends to be much more concise and clear, at least for more complicated queries.

Opinions rise high in this matter. I am myself not deeply opposed to perform computation in SQL and I do leverage it more and more. It seems that it is becoming more accepted, see e.g. Geoffrey Grosenbach's screencast on PostgreSQL. Active Record itself is an ORM framework that provides relatively direct access to the underlying SQL system. This is possible due to Ruby being a very dynamic language. But it is also designed in this way on purpose:

Some object-relational mappers seek to eliminate the use of SQL entirely, hoping for object-oriented purity by forcing all queries through an OO layer. Active Record does not. It was built on the notion that SQL is neither dirty nor bad, just verbose in the trivial cases.

This quote is given by "David"(!) in Agile Web Development with Rails, 4th edition.

Subqueries

So, what I am going to discuss here is a particular variant of Subqueries, how these can help to formulate concise code, and also how to keep the number of intermediate views in bounds.

A subquery is essentially a separate query that is nested within an outer query. A subquery may or may not be independent from its outer query, and it may return a scalar or even multiple rows. I recommend consulting Learning SQL 2nd edition by Alan Beaulieu; it features as a whole chapter on subqueries. There is also a Subqueries section in PostgreSQL: Introduction and Concepts by Bruce Momjian.

Running Example

I am using code from a real project. I will not give the full schema definition since it would exceed what I am planning to squeeze in this post. The complete code can be retrieved from my github repository. The project is named Surveys. Conveniently, there is a sql-file that contains the schema definition for reference.

We are going to build a list of users of which each

has a designated virtualdesktop but
has not submitted to the currently oven survey.

I presume that the use case is rather clear. In the end, we will have to consult three virtual tables to compute the result. All of the following code is tested on PostgreSQL, it might need adjustment for other systems.

Views only

We contrast the use of subqueries with using traditional views first.

Our first view will lists the ids of all users which own a virtualdesktop. This is exactly the case when there exists a virtualdesktop that has a user_id reference.

-- all users which have a virtual desktop
              CREATE VIEW active_users AS
                select DISTINCT users.id as user_id from users
                JOIN virtualdesktops ON users.id = virtualdesktops.user_id ORDER BY user_id;

A query result might look like:

The next view lists all users that have made a submission to the currently ongoing survey.

-- all users which have submitted to the currently open survey
              CREATE VIEW users_with_current_submissions AS
                select DISTINCT user_id from questionnaires
                JOIN opensurveys
                  ON questionnaires.survey_id = opensurveys.id ORDER BY user_id;

There is only one user that has submitted to the current survey, for example:

 user_id
              ---------
                   123

Next, we build a status table:

-- status
              CREATE VIEW current_submission_status AS
                select active_users.user_id as id, active_users.user_id as user_id, (users_with_current_submissions.user_id is NOT NULL) as submitted from active_users
                LEFT OUTER JOIN users_with_current_submissions
                  ON active_users.user_id = users_with_current_submissions.user_id;

The query result could look like this:

 id  | user_id | submitted
              -----+---------+-----------
               123 |     123 | t
               127 |     127 | f
               128 |     128 | f

Finally, we can construct the desired result by performing an inner join with an additional condition:

CREATE VIEW users_wosubm_and_desk AS
                select users.id, users.uid from users
                JOIN current_submission_status
                  ON users.id = current_submission_status.id AND current_submission_status.sumbitted is false;

The query result doesn't contain the user with the id 123:

 id  |   uid
              -----+----------
               127 | client
               128 | Client-1

In the end, we used three intermediate views. The first view builds on an inner join, the second on an outer join, and the last one on an inner join with some logic. Presumably there should be a more concise way to do this using joins and views. It might be difficult to achieve the same level of elegance that subqueries provide.

Subqueries

Let us get to the final implementation with subqueries right away:

CREATE VIEW users_wosubm_and_desk AS
                select * from users WHERE id NOT IN
                  -- SUBQUERY all users which have submitted to the currently open survey
                  (select DISTINCT user_id from questionnaires JOIN opensurveys ON questionnaires.survey_id = opensurveys.id ORDER BY user_id)
                 AND id IN
                  -- SUBQUERY all users which have a virtual desktop
                  (select DISTINCT users.id as user_id from users JOIN virtualdesktops ON users.id = virtualdesktops.user_id ORDER BY user_id) ;

The following shows a query on the view (some columns omitted):

 id  | is_admin | is_client | is_manager |   uid    |
              -----+----------+-----------+------------+----------+-
               127 | f        | t         | f          | client   |
               128 | f        | t         | f          | Client-1 |

As I mentioned, there are various types of subqueries, which can be constructed. The main use case seems to be to construct and use subqueries that return a particular value. Each of our subqueries (potentially) returns several "rows" and hence they are to be used in conjunction of IN or NOT IN. Note in particular that the use of NOT IN spared us from construction any logic building on functions as in the case with solely using views with joins.

Conclusion

The use of subqueries can result in much more concise and clear code. They can help to avoid spamming the namespace with numerous trivial views.

On the downside, there are no intermediate views that can be tested in isolation. Subqueries that are "to large" might be hard to get right in the first place and costly to maintain.