Access Queries

© Richard Rasala, 2009

Last modified: November 30 2009, 9:50 PM

      

We begin with an overview of database concepts that will be made quite concrete in the subsequent examples.

Database tables contain the original data in the database and queries provide a means to create artificial tables whose information is derived from that original data. Tables are normally stored as data on disk. This data is loaded into computer memory as needed. Queries are stored as a set of instructions for building a query result table directly in computer memory. Each time a query is executed, it will execute the stored instructions to produce a query result table using the current values in the database. A query result table will therefore always be up-to-date.

Database tables are carefully designed to avoid redundant information, that is, information that depends on other information and therefore could be deduced. For example, a contact address will normally store only the zipcode because the city and state can be stored without redundancy in a separate zipcode-city-state table. The reason that database people take great pains to avoid redundant information is that redundancy frequently leads to errors. If data is redundant, people maintaining the data may change one part of the information and forget to change the related parts. Thus errors are introduced by virtue of forgetfulness.

Because database tables are structured to avoid redundancy, the information that one needs to use may be spread across multiple tables and may even need to be calculated. The purpose of queries is to make the information one needs available in an artficial table that may be carefully crafted by the person who designs the query. A key to the power of queries is that a query may be based not only on the original data tables but also on the result tables produced by earlier queries that serve as “helper queries”. This allows queries to be constructed in a series of layers that are easy to comprehend and test.

The structure of a database table consists of a sequence of named columns called fields that may contain text or some more specific data type such number or date. The data in the table falls in zero or more rows whose order is normally undefined so that the database system may choose an order that optimizes storage. By use of a query, one may return a sorted version of a table if desired.

The structure of a query result table is exactly the same as that of a table whose data originates on disk. It too has named columns called fields. What is interesting is that the cell data returned from a query may come directly from one cell in a table or helper query or may be calculated using data from multiple cells in one or more columns and/or rows in one or more tables and/or helper queries. Hence queries are quite general.

Since the information in an ordinary table or a query result table may be used in exactly the same way to define additional queries, when we use the word table below, we will include both situations.

A fundamental question in databases is how to relate the data that resides in two different tables. Let’s call the two tables Table1 and Table2. Then we can define a relation between the two tables if we can find a field Field1 in Table1 and a field Field2 in Table2 that potentially should contain values in common. In this situation, we define the notion of a link or join:

We say that a row in Table1 is linked or joined to a row in Table2 if the value in Field1 of the first row is identical to the value in Field2 of the second row.

Defining the relation between two tables will then enforce an important constraint on a query:

Once a relation is defined between Table1 and Table2 then a query will only consider pairs of linked rows from the two tables. In particular, if a row in a table links with nothing, then that row is ignored.

The most common case is that two tables have no direct relation or have exactly one direct relation.

There are cases when there may be more than one valid relation between two tables. In such cases, the constraints for linked rows may be even stronger. However, be warned that Access thinks that more than one relation is an error and will attempt to prevent you from doing that.

Note: If a relation is represented graphically in a program such as Access then it is shown as a broken line that connects Field1 in Table1 to Field2 in Table2. This graphical view of a relation explains the terminology link.

Let us elaborate a bit on what may happen when a relation creates links between the rows of two tables Table1 and Table2. Let us assume for simplicity that the common field in the two tables is named ID. Then there are various possible scenarios that are given standard names.

  • one-to-one: Each possible value of ID occurs at most once in Table1 and at most once in Table2.
  • one-to-many: Each possible value of ID occurs at most once in Table1 but may occur many times in Table2.
  • many-to-one: Each possible value of ID may occur many times in Table1 but occurs at most once in Table2.
  • many-to-many or indeterminate: Each possible value of ID may occur many times in both Table1 and Table2.

Some interesting things happen in the many-to-many scenario. We call this the principle of multiplicative explosion.

Assume that for some value v of ID there are m rows in Table1 and n rows in Table2. Then, when the tables are joined for a query, the system will generate the m*n possible combinations of the data obtained by selecting each row with ID v in Table1 and pairing it with each row with ID v in Table2.

This situation will be illustrated in a Simple Test Database.

The principal concepts involved in the construction of queries are the following.

  • Relations
    If more than one table or helper query is to be used to define a new query, then the relations between the tables and helper queries must be specified.
    In the normal situation, there must be a relation path that connects every table directly or indirectly with every other table. If this is not the case then either a relation has been omitted or there must be some further table involved in defining the query in order to “bridge the relation gap”.
    However, in some cases a table is introduced that intentionally has no join relation with the other tables or helper queries. For example, one can define a table with a list of parameters that can control a query and by this mechanism can accomplish the work of multiple queries in a single query. We will see an example of this in the Sample Queries.
  • Selected Fields
    A field in a new query may be chosen as one of the fields in one of the tables being used to construct the query. In that case, the field is a selected field and its name and data is copied verbatim from the corresponding source field. In practice, most fields are selected fields since one usually needs to retain access to some of the source data.
  • Calculated Fields
    A calculated field does work to produce a new value in a cell (text, number, or date) using data from the various source tables. An important part of the work in a database is to define calculated fields that provide useful information. There are several flavors of calculated field:
    • Horizontal or Linked Calculation
      A linked calculation combines cell values from cells in one or more columns within the rows that are linked to the given cell row via the relations that link the various tables.
    • Vertical or Aggregate Calculation
      An aggregate calculation performs an operation such as count, sum, average, etc., that combines information from one column and multiple rows in a query and displays the result in a single row of the query result table.
    • Combination Calculation
      A combination calculation performs an aggregate calculation over the result of a linked calculation. If done properly, this is a way to turn two steps into one.
  • Selection of Rows Via Criteria
    A criteria or constraint may be used to select rows in a query result table as those rows that satisfy the criteria. The rows that fail to satisfy the criteria are dropped from the query result table and their information is not used in any way.
    If a criteria is used in a query that does not involve aggregation as well, then one can optionally show the values of the tested field that passed the criteria. This can be very useful to show that the criteria works as desired.
    If a criteria is used in a query that also involves aggregation then the Where keyword must be used and it is impossible to show the individual field values that passed the criteria. Hence, this feature must be used with great caution since it is harder to verify correctness.

The processes of aggregation and row selection both reduce the number of rows in a query result table. Aggregation does this by combining information from the original rows. Selection does this by selecting some rows and discarding others.

Our philosophy of query design is to make each individual query as simple as possible. We want each query to do one task that is easy to specify, easy to comprehend, and easy to test. We aim to avoid complex queries that combine two or more steps because these are hard to develop and hard to understand. The only thing easy about a complex query is how easy is it is to make a mistake.

We propose a problem-solving style that uses a sequence of simple queries that gradually converge on the final solution to the problem. Because each individual query is simple, the query may be examined and tested to verify correctness. Once the problem as a whole is solved, there is a direct path to the answer that inspires confidence in the reasoning and the result.

There are some features in Access and its underlying SQL language that directly support the creation of complex queries. We will tend not to use these features since they work against our philosophy of simple queries that are layered in sequence.

Some database people argue for complex queries because they believe that such queries are more efficient. In some cases that is true but in other cases exactly the same work is done as when simple queries are used. Furthermore, correctness is more fundamental than efficiency. If a query is wrong, it does not matter how fast it is. We would argue, in fact, that if you plan to use complex queries, it is better to have a parallel solution that uses simple queries so you can compare the results as a test for correctness.

On the deepest level, the use of a sequence of simple queries to solve a database problem is an instance of a fundamental principle of computer science, namely, Divide and Conquer. It is best to solve problems by a sequence of steps that the human mind can grasp easily.

This tutorial aims to discuss databases tables and queries in Access. Access is an instance of a database system that provides a graphical user interface for the construction of queries. The key advantage of such a system is that the user may build a new query in a simple step-by-step fashion:

The rest of this tutorial is devoted to illustrating both the specific techniques for defining queries in Access and how our query philosophy plays out in the problem solving process.

This section describes a simple test database with 2 small tables, TableA and TableB.

Table A       TableB

The relation that links TableA to TableB will be via the ID field in each table. Notice that:

TableA has 3 rows with ID = 1 and 2 rows with ID = 2.

TableB has 2 rows with ID = 1 and 4 rows with ID = 2.

Therefore, the relation is many-to-many with respect to both values of the ID.

Observe that:

TableA has one additional column A with 5 values 7, 11, 13, 17, 19.

TableB has one additional column B with 6 values 23, 29, 31, 37, 43, 47.

Since these 11 data values are distinct, it will be easy to see in a query where each piece of data is coming from.

Our first query will be the inner join that shows the 3 distinct columns ID, A, B from the two tables. Keep in mind that when the rows of TableA and TableB are linked, the ID values are equal which is why we have 3 distinct values and not 4.

Inner Join Query

Before looking at the query result table, try to predict for yourself how many rows the table will have.

Inner Join Result Table

For some people, it may be surprising that the result table has 14 rows. Here is how the table is computed.

First consider ID=1. Since TableA has 3 such rows and TableB has 2 such rows, we must have 6=3*2 rows with ID=1 in the inner join. This accounts for all combinations of the 3 values 7, 11, 13 from TableA and the 2 values 23, 29 from TableB. You can see these 6 combinations in the first 6 rows of the query result.

Next consider ID=2. Since TableA has 2 such rows and TableB has 4 such rows, we must have 8=2*4 rows with ID=2 in the inner join. This accounts for all combinations of the 2 values 17, 19 from TableA and the 4 values 31, 37, 43, 47 from TableB. You can see these 8 combinations in the last 8 rows of the query result.

In particular, the total number of rows is 14=6+8.

Our next query, Product, in effect adds a calculated field to what we did in the inner join query. This field computes the product of data in columns A and B. Since this computation uses data in linked rows, it is a linked calculation.

Product Query

Let us display and explain the syntax of the calculated field.

Product: [A]*[B]

The calculated field is named Product. This name comes first and is highlighted in orange. The name must be followed by a colon. Finally, the colon is followed by the formula for the calcuation which is highlighted in blue. To distinguish names of related columns in a calculation, these must be enclosed in square brackets.

Product Result

It should be no surprise that this result has the same number of rows as the inner join. We need to have a row for each combination of A and B values that will be multiplied to form the products.

Notice that in the query result table there is no difference between the data columns that came from the source tables and the calculated data column. The fact that the results are treated uniformly is a key to why queries may be built on top of helper queries.

Our last query, Sum, will do an aggregate calculation by summing over the A and B columns and grouping by the ID value.

Sum Query

Sum Result

This result table has only 2 rows since there are only 2 distinct values of the ID. To help you understand the sums computed, we show query result table for inner join.

Inner Join Result

We now make an HTML table that shows how the data in the inner join query leads to the sums in the Sum query.

ID A B
1 62=7+11+13+7+11+13 156=23+23+23+29+29+29
2 144=17+19+17+19+17+19+17+19 316=31+31+37+37+43+43+47+47

Notice, in particular, that although the Sum query does not explicitly create the inner join it must nevertheless generate the data that would appear in the inner join in order to actually carry out the sums.

If you wish to play with the simple test database, here it is: TestDB.accdb

This tutorial will use the Access 2007 ComputerStore database ComputerStore.accdb

This database is the one used in the Fall 2009 version of the course CS 1100. To avoid confusion if the version used in the course should later change, we have provided the link above to the Fall 2009 version.

The ComputerStore database contains 5 tables that are similar to real-world databases except that the amount of data is tiny.

Computer Store Tables

The table structure illustrates the extent to which database people will go to avoid redundancy.

The Orders table contains 3 fields.

The OrderID is a key that identifies which OrderDetails rows belong to the order. An order may be associated with one or more rows in the OrderDetails table. These rows are called the line items of the order. The relation of Orders to OrderDetails is one-to-many since, as we have said, an order can have multiple line items.

The OrderID has been designated as a primary key for the table which means that the database system will not permit duplicate values of the OrderID in the Orders table.

The ContactID is a key that identifies the contact or customer who placed the order. The key locates a row in the Contacts table. The relation of Orders to Contacts is many-to-one since one contact can place multiple orders.

The OrderDate field holds the date of the order which is common to the order as a whole rather than to any of its line items.

The Orders table does not contain the total cost of an order. If this is needed, it must be a calculated field in a query.

The OrderDetails table contains 4 fields.

The OrdID associates a line item in OrderDetails with the order to which it belongs. This field is linked to OrderID in Orders. Since an order may have many line items, OrdID can take on the same value many times. In particular, OrdID is not a primary key.

The ProdID associates a line item in OrderDetails with the product to which it belongs. This field is linked to ProductID in Products. Since the same product may be purchased in many line items, ProdID can take on the same value many times. In particular, ProdID is not a primary key.

The Quantity field tells how many products of this kind were purchased on this order.

The UnitPrice fields tells what the price of the product was at the instant the order was placed.

The total cost of a line item is the Quantity times the UnitPrice. This is not stored in OrderDetails because that would be redundant. Therefore, if the total cost of a line item is needed, it must be a calculated field in a query.

The combination of OrdId and ProdId forms a super key that uniquely determines each row in the OrderDetails table.

The Products table contains 8 fields.

The ProductID is a primary key that uniquely determines a product in the Products table.

The ProductName field holds the full product name.

The Current Unit Price field hold that price that would apply if a new order were placed for the product at this time. Note that this field name happens to contains blank characters.

The UnitsInStock field tells how many units are available to be shipped.

The remaining four fields hold data relevant to shipping the product.

The Contacts table contains 6 fields.

The ContactId uniquely determines a contact. Although it was not designated as a primary key when the Contacts table was defined, it nevertheless plays the role of a primary key. The ContactId in Contacts links to the ContactId in Orders. Since one contact may place many orders, this is a one-to-many relation.

The remaining 5 fields contain address and telephone information.

Note that the Contacts table does not contain City or State information. Instead, the Zipcode field is used to obtain this information from the Zipcodes table. Since many contacts might have the same zipcode, this is a many-to-one relation.

The Contacts table does not contain information about the combined total of all orders placed by the contact (if any were placed). If this information is needed, it must be a calculated field in a query.

The ZipCodes table contains 3 fields.

The Zipcode field uniquely determines a row in the Zipcodes table. Although it was not designated as a primary key when the Zipcodes table was defined, it nevertheless plays the role of a primary key.

The remaining 2 fields contain the City and State in which the zipcode is located.

The above discussion has mentioned the relations or links between the 5 tables in the ComputerStore database. These are shown in the diagram below.

Computer Store Tables Linked

Data for Orders

The snapshot shows the 3 fields of the Orders table: OrderID, ContactID, OrderDate. From the snapshot, we see that Orders has 39 rows.

You may observe that the OrderID is shown with the capital letter O followed by 4 digits and that the ContactID is shown with the capital letter C followed by 4 digits. This is simply the result of a format applied to the data. The data is still an integer number.

Data for Order Details

The snapshot shows the 4 fields of the OrderDetails table: OrdID, ProdID, Quantity, UnitPrice. From the snapshot, we see that OrderDetails has 84 rows.

The rows of the OrderDetails table contain the line items of the various orders. Since there more line items (84) than orders (39), some orders must consist of multiple line items. Of course, in real life, most orders do consist of multiple line items.

Data for Products

The snapshot shows some of the 8 fields of the Products table. The full list of 8 fields is: ProductID, ProductName, Current Unit Price, UnitsInStock, Width, Height, Depth, Weight. From the snapshot, we see that Products has 33 rows.

Data for Contacts

The snapshot shows the 6 fields of the Contacts table: Contact ID, FirstName, LastName, Address, ZipCode, PhoneNumber. From the snapshot, we see that Contacts has 25 rows.

Data for ZipCodes

The snapshot shows the 3 fields of the ZipCodes table: ZipCode, City, State. The state is given via its two letter abbreviation. From the snapshot, we see that ZipCodes has 29 rows.

It is important to test queries as much as possible. One test is to see if the row count resulting from a query makes sense. If you expect a row count to be less than or equal to some value but it turns out to be larger then you know an error has taken place.

To help in testing, the table below summarizes the number of rows in the database tables.

Table Number of Rows
Orders 39
OrderDetails 84
Products 33
Contacts 25
ZipCodes 29

We will now describe some optional setup actions that we find convenient and that we recommend.

It is always possible to directly define the relationships between the given tables in the ComputerStore database but since these relationships remain the same it is convenient to define them once and for all. Here is how to do this.

Click the main tab Database Tools and then in the Show/Hide group on the left click Relationships.

You may see some tables in the relationships panel already. You should drag the remaining tables from the left hand panel into the relationships panel. If need be, make the small window for each table large enough to see all column (field) names and arrange the windows more or less as shown below.

Relationships panel with all tables

You should now define the 4 relationships between the 5 tables

  • Relate OrderID in Orders to OrdID in OrderDetails.
  • Relate ProdID in OrderDetails to ProductID in Products.
  • Relate ContactID in Orders to ContactID in Contacts.
  • Relate ZipCode in Contacts to ZipCode in ZipCodes.

To define each relationship, click on the field name in one table and drag to the corresponding field name in the related table. You will get a dialog for additional settings but simply click OK.

If you get a diagram that looks more or less like the one below, you are done and can close the relationships panel.

Relationships panel with all tables and relationships

By default, Access uses Overlapping Windows to view a database table or to view a database query in its design view or after being run. This default can be quite annoying since the boundaries of what is being viewed may be outside of the area available in Access to view the object. This means that one has to scroll simply to manage the window.

We prefer to use the view called Tabbed Documents. Each object then uses the full available space and one can switch between objects using tabs.

Here is how to select Tabbed Documents.

Open the database and then click on the Microsoft Office Icon in the upper left:

Office Icon

You will see the following multi-purpose menu dialog:

Office Menu Dialog

Click Access Options and then click Current Database to get:

Overlap Windows Setting

Under Document Window Options, select Tabbed Documents and be sure that Display Document Tabs is also selected as shown below:

Tabbed Documents Setting

Now click OK to save the choice of Tabbed Documents.

Access will insist that you close the database and reopen it before activating the Tabbed Documents choice. This is vexing.

Furthermore, Access will not allow you to set this default for all databases at once. You must do it one-by-one. This is even more vexing.

Nevertheless, we believe that using Tabbed Documents is worth the initial effort.

In this section, we will present 4 fundamental queries that deal with information that is frequently needed. In many cases, these queries can be applied directly. In more subtle cases, the ideas used in these queries may be adapted to achieve solutions. Thus, both the queries and their methodology are important.

These 4 queries illustrate the technique of widening. An original table or query is widened by adding one or more columns while retaining the original information. This means that the new query may serve as a complete replacement for the original.

To work with any of the financial aspects of the ComputerStore database, one must calculate the line item cost of each line item. The database formula for this is:

LineItemCost: [Quantity]*[UnitPrice]

We widen the OrderDetails table with the line item cost by constructing a query that uses all 4 columns of OrderDetails and then adds a linked calculated field LineItemCost via the formula above.

OrderDetailsWithLineItemCost Definition

Note: Due to lack of space, the above snapshot does not show the column OrdID.

If this query is executed then the query result table has 5 columns with the LineItemCost at the right.

OrderDetailsWithLineItemCost Data

Examine the second row of this result. Since the Quantity is 4, the LineItemCost is 4 times the UnitPrice.

The number of rows in this query result table is 84 which is as it should be because this query does not alter anything that comes from OrderDetails. There is no row selection or aggregation.

We wish to widen the Orders table with a column OrderTotal that calculates the total cost of the order. In order to do this, we must sum the LineItemCost for each line item associated with the order. However, the information needed for this computation is not in the Orders table but rather is in the query:

OrderDetailsWithLineItemCost

Hence, we must link the Orders table with this query. We must also do aggregation to perform the sum.

We therefore construct the query with all columns from Orders plus a column corresponding to LineItemCost that will be calculated by summing all line items costs for a given order ID. Notice that we have given the final column a name:

OrderTotal: LineItemCost

This means that OrderTotal will be used to name the final column instead of the default name SumOfLineItemCost which is a bit ugly.

OrdersWithOrderTotal Definition

If this query is executed then the query result table has 4 columns with the OrderTotal at the right.

OrdersWithOrderTotal Data

The number of rows in this query result table is 39 which is the size of the Orders table. That this equality happens is not automatic. It is a consequence of the fact that each order has at least one line item in the OrderDetails table. If, somehow, the Orders table had bogus orders with no line items, then we would not get the equality of the number of rows.

The general point is that the process of linking two tables/queries can eliminate rows since linking only examines common rows in both entities. We will see an example of such elimination in the next query.

We wish to widen the Contacts table with a column GrandTotal that calculates the grand total or combined total of the OrderTotal of all orders placed by the contact. We must do this using a sum of the OrderTotal amounts for each order with the same ContactID as the given contact. In particular, the information needed for this computation is not in the Orders table but rather is in the query:

OrdersWithOrderTotal

Hence, we must link the Contacts table with this query. We must also do aggregation to perform the sum.

We therefore construct the query with all columns from Contacts plus a column corresponding to OrderTotal that will be calculated by summing all order totals for a given contact ID. Notice that we have given the final column a name:

GrandTotal: OrderTotal

This means that GrandTotal will be used to name the final column instead of the default name SumOfOrderTotal which is a bit ugly.

ContactsWithGrandTotal Definition

Note: Due to lack of space, the above snapshot does not show the columns ContactID, FirstName.

If this query is executed then the query result table has 7 columns with the GrandTotal at the right.

ContactsWithGrandTotal Data

The number of rows in this query result table is 21 which is less than the size of the Contacts table which is 25. This means that 4 contacts have been placed in the Contacts table but have never placed an order.

This shows that the act of linking Contacts with the query OrdersWithOrderTotal reduces the number of contacts returned by the query even though all columns in the Contacts table are represented in the result. This emphasizes once again that linking is a process of matching rows that contain a common field, in this case, the ContactID.

In some situations, it is convenient to know immediately the product name of the product associated with a line item. We do this by widening the query:

OrderDetailsWithLineItemCost

Here is the definition of the new query:

OrderDetailsWithLineItemCostProductName Definition

Note: Due to lack of space, the above snapshot does not show the columns OrdID, ProdID.

If this query is executed then the query result table has 6 columns with the ProductName at the right.

OrderDetailsWithLineItemCostProductName Data

The number of rows in this query result table is 84 which is as it should be because this query does not alter anything that comes from OrderDetailsWithLineItemCost. There is no selection or aggregation.

The 4 queries defined in this section illustrate some fundamental principles.

  • Queries can and should be built in layers. For example:
    • OrderDetailsWithLineItemCost is built on
          OrderDetails
    • OrdersWithOrderTotal is built on
          OrderDetailsWithLineItemCost and Orders
    • ContactsWithGrandTotal is built on
          OrdersWithOrderTotal and Contacts
    • OrderDetailsWithLineItemCostProductName is built on
          OrderDetailsWithLineItemCost and Products
  • Widening allows you to build a new query that is more useful than the original table or query on which it is based.
  • During the widening process, some data may appear to be lost. When this happens, it is due to the fact that linking returns rows only when some common data field has identical data. This data loss is a sign that not all data in the original table or query possessed the desired match.

Of course, not all queries need to use widening. Most queries will focus on a desired set of columns from the source tables/queries and then do further work of some kind to achieve a specific goal. The role of widening queries is to create general purpose helper queries that may be used in multiple situations.

The fundamentals of our query style are to build queries in layers with each layer being a simple query. Let us now explain a bit more about how these layers will be defined.

The bedrock of all queries are the original database tables. If these can be used directly then they will be used.

One layer above the database tables are the widening queries that we defined in the previous section. These queries handle:

These queries are intentionally quite general so that may be used in multiple situations.

Most queries that are intended to solve specific problems will be streamlined to focus on the particular issues. The first step in defining such a query will be choose which database fields or columns are used in the query. Usually, you choose the minimal number of fields needed to proceed with the problem solution. Choosing the set of fields is called projection since it is similar to selecting some coordinates and ignoring others in a math problem.

Once the fields are chosen, then we recommend that you restrict yourself to 2 specific kinds of queries: row selection and aggregation. Although it is possible to combine these 2 kinds of queries, we strongly recommend that you do not do this. The reason is to greatly improve clarity.

In pure row selection, you put one or more criteria into the Criteria row of the field column to restrict which rows will satisfy the query. The most common criteria are:

  • Numerical criteria
  • Textual criteria

For numerical criteria, you use one of six operators to place a numerical constraint:

= equals
<> not equals
< less than
<= less than or equals
> greater than
>= greater than or equals

For instance, to restrict the rows to those rows for which a field has a value “greater than 1000” you would use the criteria >1000 in that field.

You may also use the six operators above for textual criteria. Use = to select rows that exactly match a text string in a particular field. Use <> to get the opposite effect, that is, select the rows that do not match a text string. Use the four inequality operators to select rows based on the dictionary or lexicographic ordering of strings.

For instance, in a query based on the query OrderDetailsWithLineItemCostProductName that is defined in the previous section, you may use the Criteria

<>"Laser Printer (network)"

to select all rows whose ProductName is something other than Laser Printer (network). Note that the quote marks are required to specify the string being tested.

As a related instance, consider:

<"Laser Printer (network)"

This selects all rows whose ProductName comes prior to Laser Printer (network) in dictionary order.

For more advanced textual criteria, you may use the “wildcard” character * in conjunction with Like to search for generalized text matches.

For instance, you may restrict to all products with the word “monitor” in the ProductName by placing the following Criteria in the ProductName field:

Like "*monitor*"

The “wildcard” character * is used to signal that anything before or after “monitor” is OK when performing the match. By the way, a match involving Like is not case sensitive.

To get mismatches with the text “monitor”, use the related Criteria:

Not Like "*monitor*"

In all instances, the string to test must be enclosed in quote marks with or without the use of wildcards. The operators that define the test must precede the string and must not be in quote marks.

A pure row selection is used to restrict the rows in the base query to the set of rows that match the criteria used. You do nothing else in a pure row selection query. You do no form of aggregation so you have no problems of reasoning about which comes first: row selection or aggregation. As a bonus, with a pure row selection, you may run the query and check to see that the matched rows are correct.

Note also that in a pure row selection you do not use the Totals button in Access and you do not explicitly use the Where modifier. Internally, in the SQL produced by Access, the SQL WHERE modifier is inserted into the correct position for you.

An aggregation query is initiated via the Totals button in Access. An aggregation will combine information from one or more rows in the base query into a single row in the result. The most important choices during an aggregation are:

Group By Collect into one row all rows whose Group By fields match. Use the matched data for these fields.
Count Count the number of matched rows.
Min Find the minimum of the numeric data in the matched rows.
Max Find the maximum of the numeric data in the matched rows.
Sum Sum the numeric data in the matched rows.
Avg Average the numeric data in the matched rows.
StDev Find the standard deviation of the numeric data in the matched rows.
Var Find the variance of the numeric data in the matched rows.
Expression Treat the data in the column header as an expression that will specify the aggregation.
Where Do an aggregation while doing a row selection based on the expression in the Criteria row.

We avoid the aggregation keywords First and Last because we have found it hard to predict what row will be returned by these aggregation operators. Since we cannot predict, we do not find the keywords to be valuable.

The query style we propose avoids the explicit use of Where. We think it is easier to remain sane if row selection and aggregation are handled in separate and simpler queries and you have complete control of when each query operation is invoked.

In this section, we will present a collection of sample queries. We will build the queries in layers using row selection and aggregation as needed. For clarity, we will avoid combining these techniques in a single query. Each sample will be introduced by a statement of the problem that will be solved. In some cases, a later sample will use the result of an earlier sample. This is consistent with our philosophy of building queries in layers.

Query Definition.

StateContactID Definition

StateContactID Data

Since the ContactID acts as a primary key for the Contacts table and since every contact zipcode is in the ZipCodes table, this result has 25 rows just like the Contacts table.

Query Definition. Note that it builds on StateContactID.

StateCountContactID Definition

StateCountContactID Data

You can easily add the counts for all states to get 25 which is the count of rows in the base query StateContactID that is aggregated here.

Query Definition. The contacts who ordered are those whose ContactID appears in the Orders table. Since a contact may place multiple orders, the ContactID may appear multiple times. To remove duplicates, we must use Group By.

ContactIDWhoOrdered Definition

ContactIDWhoOrdered Data

Notice that there are only 21 contacts in this result table even though the Contacts table has 25 rows. This means that 4 contacts did not place an order.

We could have enhanced this query by linking with the Contacts table and adding any fields such as FirstName, LastName, ....

Even more information is returned in the fundamental query ContactsWithGrandTotal discussed above. This query is built on a layer of queries rather than directly from the original tables as we have done here.

Query Definition. Notice that the query is built by joining two earlier queries StateContactID and ContactsWhoOrdered.

StateContactIDWhoOrdered Definition

StateContactIDWhoOrdered Data

Since StateContactID has 25 rows while ContactIDWhoOrdered has only 21 rows, the result table has 21 rows.

Query Definition. The query StateCountContactIDWhoOrdered uses the helper query StateContactIDWhoOrdered in exactly the same way that StateCountContactID uses the helper query StateContactID.

StateCountContactIDWhoOrdered Definition

StateCountContactIDWhoOrdered Data

You can easily add the counts for all states to get 21 which is the count of rows in the base query StateContactIDWhoOrdered that is aggregated here.

Many of the next samples will deal with revenue and will therefore use the 4 Fundamental Queries.

We show 3 approaches:

  • Method 1: Use OrderDetailsWithLineItemCost.
  • Method 2: Use OrdersWithOrderTotal.
  • Method 3: Use ContactsWithGrandTotal.

Query Definition for Method 1. Sum the LineItemCost over all line items.

TotalRevenue-1 Definition

Query Definition for Method 2. Sum the OrderTotal over all orders.

TotalRevenue-2 Definition

Query Definition for Method 3. Sum the GrandTotal over all contacts.

TotalRevenue-3 Definition

TotalRevenue-1 Data       TotalRevenue-2 Data       TotalRevenue-3 Data

As we see, the total revenue is $115,455.84 by all methods. This is a nice consistency check that adds confidence that the fundamental queries for OrderTotal and GrandTotal are correct.

Note, however, that since the LineItemCost query is the basis for all 3 results we do not get independent proof of the correctness of this fundamental query.

Use OrderDetailsWithLineItemCost. Replicate all columns so this query may be used as a helper query.

Query Definition.

LineItemCostGE1000 Definition

LineItemCostGE1000 Data

The query result table has 27 rows as compared to 84 rows in the original data before the row selection.

Use OrdersWithOrderTotal. Replicate all columns so this query may be used as a helper query.

Query Definition.

OrderTotalGE1000 Definition

OrderTotalGE1000 Data

The query result table has 18 rows as compared to 39 rows in the original data before the row selection.

Use ContactsWithGrandTotal. Replicate all columns so this query may be used as a helper query.

Query Definition.

ContactsWithGrandTotalGE1000 Definition

Note: Due to lack of space, the above snapshot does not show the column ContactID.

ContactsWithGrandTotalGE1000 Data

The query result table has 13 rows as compared to 21 rows in the original data before the row selection.

Query Definition.

MaximumOrderTotal Definition

MaximumOrderTotal Data

In this query, there are two small possibilities that must be handled. There could be more than one person who had an order whose total was exactly equal to the maximum OrderTotal. Also, a person who had an order with value equal to the maximum Order Total might have had more than one such order. The query must be structured to handle these unlikely possibilities with no errors.

Query Definition.

ContactsWithMaximumOrderTotal Definition

ContactsWithMaximumOrderTotal Data

Using the Parameters button to the right of the Totals button, we set up an interactive parameter named:

Minimum OrderTotal To Examine

If we do not set the Data Type, it will default to Text which will not work in this query since we need a numeric type. We select Decimal since we expect a number that could be read as a currency value.

Parameter Definition.

MinimumOrderTotal Parameter Definition

Query Definition.

MinimumOrderTotal Definition

Note: Due to lack of space, the above snapshot does not show the column LastName.

We illustrate the interactive nature of the query using a parameter value of $1500 supplied by the user. Notice that the user should not type $.

Parameter Value.

MinimumOrderTotal Parameter Value

Query Result Table.

MinimumOrderTotal Data

This query result table has 16 rows. Notice that contacts may appear more than once if they have made more than one order that meets the minimum value specified by the user in the dialog box.

This problem must be solved in 3 steps:

  • Create the Levels table that lists the expenditure levels for which we wish to find the OrderTotal’s that reached that level.
  • Create an outer join of Levels with OrdersWithOrderTotal that pairs the levels with the order totals that reach that level.
  • Create the summary that counts the number of order totals that reach each level.

Table Definition for Levels.

Extra Levels Table Definition

Since the Levels table will contain currency values, we set the Field Size to choose Decimal numeric data and we set the view Format to Currency.

Table Data for Levels.

Extra Levels Table Data

We entered 11 expenditure levels chosen on the general grounds of separating by $1000 for smaller orders and separating by $5000 for larger orders. By starting with $0, we will get the total number of orders as the first count.

We next create a query LevelsOrderTotal>=Level that constructs an outer join of Levels with OrdersWithOrderTotal. No relation connects these two tables. Notice the Criteria in the OrderTotal column. This Criteria constrains the OrderTotal to be greater than or equal to the corresponding Level when the join is constructed.

Query Definition for LevelsOrderTotal>=Level.

LevelsOrderTotal>=Level Definition

LevelsOrderTotal>=Level Data View 1       LevelsOrderTotal>=Level Data View 2

This result table has 108 rows. As the levels get larger, the number of associated order totals gets smaller. From the right hand view, you can see this clearly. There are 6 order totals >=5000 and only 2 order totals >=10000. To quantify this in general is the purpose of the final query.

To define LevelsCountOrderTotals>=Level, we need to Group By on the Level and Count on the OrderTotal.

Query Definition of LevelsCountOrderTotals>=Level.

LevelsCountOrderTotals>=Level Definition

LevelsCountOrderTotals>=Level Data

Naturally, the counts decrease as the Level of expenditure increases because fewer and fewer OrderTotal’s satisfy the constraint. In particular, we see the counts of 6 for >=5000 and 2 for >=10000 that we earlier observed directly.

Query Definition.

In the query OrdersWithOrderTotal, we used the helper query OrderDetailsWithLineItemCost to compute the order total. We did a GroupBy on OrderID together with a Sum on LineItemCost to sum all line items for the order.

In this problem, we use the query OrderDetailsWithLineItemCostProductName that extends OrderDetailsWithLineItemCost. We now do GroupBy on ProdID together with a Sum on LineItemCost to sum all line items for each product. This gives the total spent on each product.

Since the pair OrdId and ProdID form a super key for OrderDetails, we can think of these fields as a pair of mathematical axes. We can select one axis and sum parallel to the other axis. If we select the OrdID axis, the sum along the ProdID axis yields the order total. If we select the ProdID axis, the sum parallel to the OrdID axis yields the amount spent on each product.

TotalSpentOnProducts Definition

TotalSpentOnProducts Data

Query Definition.

TotalRevenue-4 Definition

TotalRevenue-4 Data

This is the same value of the total revenue that was computed earlier.

The next several queries explore how to restrict the line items to those that contain a fixed string or those that contain a parameter string that is interactively supplied by the user in a dialog box.

Query Definition.

In the discussion on Row Selection in Our Query Style above, we described how to use the keyword Like to find matches for strings found with a field of a table or query. There we said:

For instance, you may restrict to all products with the word “monitor” in the ProductName by placing the following Criteria in the ProductName field:

Like "*monitor*"

This technique is the heart of the query that solves this problem.

LineItemsLikeMonitor Definition

Note: Due to lack of space, the above snapshot does not show the column OrdID.

LineItemsLikeMonitor Data

The next query is the foundation for the 4 examples that follow. All examples will use pentium as the user-supplied parameter.

Parameter Definition.

We want to generalize the query in the previous example by asking the user what string to match instead of hard-wiring one particular string such as monitor into the query. Whenever possible, it is better to give the user flexibility when the query is run rather than require that the query itself be modified each time a change is desired.

The first step needed is to set up a query parameter. This consists of the prompt that will show up in the dialog box when the query is executed plus the data type of the expected user response. In this case the data type is Text since we will use the response to match text.

LineItemsThatMatchX Query Parameter

Note that the parameter prompt is enclosed in brackets.

Query Definition.

We set up the query to use the query parameter in a Like expression on the Criteria line of ProductName. Notice that the prompt from the query parameter appears in brackets. Since we do not want to force the user to manually type the wildcard * characters, we place these into the Like expression using quoted strings "*" and the concatenation operator &. It is convenient that Access uses the same concatenation operator as Excel so that what you know from one program works in the other.

LineItemsThatMatchX Definition

Note: Due to lack of space, the above snapshot does not show the first five columns of the query definition.

Parameter Dialog.

When the query is executed, the following empty dialog is shown:

ProductMatchDialog Empty

Parameter Dialog with Value.

For this and the next several examples, we will use pentium as the text to match in the ProductName.

ProductMatchDialog Pentium

Query Result Table

LineItemsThatMatchX Data

Notice that the matching process is not sensitive to the case of the text entered by the user. We entered pentium but we were able to also match Pentium.

Query Definition.

The design of this query uses LineItemsThatMatchX to compute the order totals in exactly the same way that the query OrdersWithOrderTotal uses OrderDetailsWithLineItemCost.

OrderTotalMatchX Definition

OrderTotalMatchX Data

Query Definition.

The design of this query uses OrderTotalMatchX to compute the amount spent by contacts in exactly the same way that the query ContactsWithGrandTotal uses OrdersWithOrderTotal.

ContactTotalMatchX Definition

ContactTotalMatchX Data

Query Definition.

This query computes the desired maximum directly from the query ContactTotalMatchX. It is very important to capture this maximum in a query result in order to do the next query.

ContactTotalMatchXMaximum Definition

ContactTotalMatchXMaximum Data

Query Definition.

The key aspect of the design of this query is to link ContactTotalMatchXMaximum with ContactTotalMatchX so we can pick out the contact or contacts that achieve the maximum value. The remaining links simply let us collect the contact information we wish to display along with the maximum value: ContactID, LastName, FirstName, State, City. The choice of information to display is a somewhat random choice. As long as we show the ContactID so we can identify the contact, we can show as little or as much as we wish from the tables Contacts and ZipCodes.

ContactInfoForMatchXMaximum Definition

Note: Due to lack of space, the above snapshot does not show the column ContactID.

ContactInfoForMatchXMaximum Data