MySQL :: MySQL 5.1 with Maria Reference Manual :: 3.6.4 The Rows Holding the Group-wise Maximum of a Certain Column

MySQL 5.1 with Maria Reference Manual :: 3 Tutorial :: 3.6 Examples of Common Queries :: 3.6.4 The Rows Holding the Group-wise Maximum of a Certain Column

« 3.6.3 Maximum of Column per Group

3.6.5 Using User-Defined Variables »

Section Navigation [Toggle]

3.6 Examples of Common Queries
3.6.1 The Maximum Value for a Column
3.6.2 The Row Holding the Maximum of a Certain Column
3.6.3 Maximum of Column per Group
3.6.4 The Rows Holding the Group-wise Maximum of a Certain Column
3.6.5 Using User-Defined Variables
3.6.6 Using Foreign Keys
3.6.7 Searching on Two Keys
3.6.8 Calculating Visits Per Day
3.6.9 Using AUTO_INCREMENT

3.6.4. The Rows Holding the Group-wise Maximum of a Certain Column

Task: For each article, find the dealer or dealers with the most expensive price.

This problem can be solved with a subquery like this one:

SELECT article, dealer, price
FROM   shop s1
WHERE  price=(SELECT MAX(s2.price)
              FROM shop s2
              WHERE s1.article = s2.article);

+---------+--------+-------+
| article | dealer | price |
+---------+--------+-------+
|    0001 | B      |  3.99 |
|    0002 | A      | 10.99 |
|    0003 | C      |  1.69 |
|    0004 | D      | 19.95 |
+---------+--------+-------+

The preceding example uses a correlated subquery, which can be inefficient (see Section 12.2.9.7, “Correlated Subqueries”). Other possibilities for solving the problem are to use an uncorrelated subquery in the FROM clause or a LEFT JOIN:

SELECT s1.article, dealer, s1.price
FROM shop s1
JOIN (
  SELECT article, MAX(price) AS price
  FROM shop
  GROUP BY article) AS s2
  ON s1.article = s2.article AND s1.price = s2.price;

SELECT s1.article, s1.dealer, s1.price
FROM shop s1
LEFT JOIN shop s2 ON s1.article = s2.article AND s1.price < s2.price
WHERE s2.article IS NULL;

The LEFT JOIN works on the basis that when s1.price is at its maximum value, there is no s2.price with a greater value and the s2 rows values will be NULL. See Section 12.2.8.1, “JOIN Syntax”.

Previous / Next / Up / Table of Contents

User Comments

Posted by Kasey Speakman on October 3 2009 10:49pm

[Delete] [Edit]

The examples here by MySQL are not that helpful if you are working with temporary tables and need to add conditions to the query... like prices only from this year, for instance.

The first example falls down with temporary tables because you can't select from the same temp table twice in the same query (get "can't reopen table" error). It also doesn't give you just distinct articles, if that's what you were after. It will show the same article twice in the case where you have 2 articles from different dealers for the same price.

The second example is both difficult to understand and cumbersome to manage if you have extra query conditions (i.e. don't want to select the whole table). With large tables, you will end up needing to repeat query conditions in inner and outer queries (or at least in the LEFT JOIN ON part) for query speed.

I found another method for group-wise max that is a bit more straightforward and less duplicating of conditions:

SELECT *
FROM (SELECT *
FROM shop
[WHERE conditions]
ORDER BY price DESC) AS s
GROUP BY article

The inner query orders all records by highest price. The outer query uses a group by, which simply grabs the first distinct article that it finds. In this case the first one it finds will be the highest one since we ordered them that way.

This query is probably not as efficient as others, but it's straightforward, and it's easier on query developers when they need to use temp tables or needed to query on less than an entire table. It's only slightly less efficient than the queries given by MySQL on this page, in my tests.

Posted by Robin Palotai on October 17 2009 11:28am

[Delete] [Edit]

Speakman: Your solution sounds good, but is it stable?
The manual tells that non-grouped fields have an indeterminate value if not all values of rows are the same.

Right now the value at the first encountered row may be used, but may it happen in the future releases that other random row will be selected? It would be nice to hear an insider's opinion.

Posted by O T on November 19 2009 3:40pm

[Delete] [Edit]

I suggest the following syntax which orders by column1 as usual but returns the corresponding column2 when given:

max( column1 [, column2 ] )

Posted by Rick James on December 7 2009 4:14pm

[Delete] [Edit]

Speakman's solution ("group by trick", as I call it) is stable and far more efficient than the methods given in the article.

This is Order(N*logN); the article is Order(N*N). That is, as the number of rows grows, the article's methods slow down quadratically -- a million rows would require a trillion operations.

Add your own comment.

Top / Previous / Next / Up / Table of Contents