That Report Isn’t Dense Enough

Recently I have written several SQL reports where there were just not enough rows. No, I wasn’t getting paid by the word, but the reports were misleading. Data is stored as a result of something happening, not when something doesn’t happen. Sometimes those missing rows are very interesting clues to business performance.

Adding missing rows is called data densification. For example, a purchasing report might show item and quantity levels for a given period, and the transactions that affected inventory within that period. If the month end inventory for a normally popular item is sufficient, but sales were low, that would be interesting. However, knowing that for 15 days of the month there were none available to sell which may have caused the low sales would be insightful.

In my most recent need for densification, a monthly sales line graph showed some sales variation throughout the month. However, the graph showed consistent sales.

That sales report is based on a SQL statement similar to this, which returns the total sales for each day:

SELECT TO_CHAR (sales_date, 'Mon DD, YYYY') sales_date,
       SUM (quantity * price) total_sales
  FROM sparse_sales
 WHERE sales_date BETWEEN to_date ('01-JAN-09', 'DD-MON-YY') AND to_date ('31-JAN-09', 'DD-MON-YY') +.99999
GROUP BY TO_CHAR (sales_date, 'Mon DD, YYYY')
ORDER BY sales_date;

Output:

SALES_DATETOTAL_SALES
Jan 05, 2009502.5
Jan 06, 2009125
Jan 15, 2009225
Jan 25, 2009758.75
Jan 26, 2009120
Jan 27, 2009337.5
Jan 30, 2009402
Widget Sales.
Widget Sales.

What about all those days between actual sales? How do we select something that isn’t there?

The answer is an outer join to another, unrelated table to fill in the missing rows. The E-Business Suite I support has a bom_calendar_dates table that I often use to densify sales data. It’s simply a list of consequtive dates for the next 50 years. If the database you are reporting from does not have a dates table, you might consider creating one for densification as well. Here’s a simple script to create a dates table in your schema of choice: Dates Table (Edit: If creating a database object is not an option for you, then you can use a row generator to accomplish the same thing. In fact, I now think this is preferable. Check out my updated post: An All SQL Option to Densifying Dates.)

First, we’re going to select the same report date range from our new calendar table:

SELECT c.dates
  FROM calendar c
 WHERE c.dates BETWEEN to_date ('01-JAN-09', 'DD-MON-YY') AND to_date ('31-JAN-09', 'DD-MON-YY');

Next, LEFT OUTER JOIN our original sales data (as an inline query) to the set of rows provided from the calendar table. A NVL() function will supply any missing sales numbers (using zero as the null value). I also replaced the date format mask with TRUNC() in the SELECT and GROUP BY clauses to allow for a clean join to the calendar dates column.

 SELECT TO_CHAR (c.dates, 'Mon DD, YYYY') sales_date,
       NVL (s.total_sales,0) total_sales
  FROM calendar c
  LEFT OUTER JOIN (SELECT TRUNC (sales_date) sales_date,
                          SUM (quantity * price) total_sales
                     FROM sparse_sales
                    WHERE sales_date BETWEEN to_date ('01-JAN-09', 'DD-MON-YY') AND to_date ('31-JAN-09', 'DD-MON-YY') +.99999
                  GROUP BY TRUNC (sales_date)) s ON c.dates = s.sales_date
 WHERE c.dates BETWEEN to_date ('01-JAN-09', 'DD-MON-YY') AND to_date ('31-JAN-09', 'DD-MON-YY')
ORDER BY c.dates;

Now our output shows a more complete picture:

SALES_DATETOTAL_SALES
Jan 01, 20090
Jan 02, 20090
Jan 03, 20090
Jan 04, 20090
Jan 05, 2009502.5
Jan 06, 2009125
Jan 07, 20090
Jan 08, 20090
Jan 09, 20090
Jan 10, 20090
Jan 11, 20090
Jan 12, 20090
Jan 13, 20090
Jan 14, 20090
Jan 15, 2009225
Jan 16, 20090
Jan 17, 20090
Jan 18, 20090
Jan 19, 20090
Jan 20, 20090
Jan 21, 20090
Jan 22, 20090
Jan 23, 20090
Jan 24, 20090
Jan 25, 2009758.75
Jan 26, 2009120
Jan 27, 2009337.5
Jan 28, 20090
Jan 29, 20090
Jan 30, 2009402
Jan 31, 20090
Widget Sales.
Widget Sales.

For several weeks sales were flat with increasing activity at the end of the month. Although this technique is most often applied to sparse dates, it can also be used to fill in missing number sequences as well. For some reports you may need to repeat the densification across some subset, such as for a set of products. For this a partition outer join is just the ticket, but that is material for another post.

What Do You Think?

* Required