|
|
Chapter 3:
Collections, Tables, and Indexes Excerpted from SQL/400 Developer's Guide, (c) Paul Conte and Mike Cravitz, 29th Street Press, 2000. Chapter Overview
Creating a Collection In SQL/400, a collection is an object that contains other database objects, such as tables, views, distinct types, indexes, and stored procedures.1 An SQL/400 collection is implemented as an OS/400 library object, which means that a collection can contain programs, message files, and other OS/400 objects in addition to SQL database objects. (See Chapter 1 if you need to refresh your memory about OS/400 libraries.) You create a collection using a statement such as the following:2 Create Collection AppDta This Data Definition Language (DDL) statement
Once you create a collection, you can create SQL base tables, views, and indexes, as well as other types of OS/400 objects (e.g., non-SQL files and programs) in the collection.3 The Create Collection statement has an optional In ASP clause that lets you designate which AS/400 auxiliary storage pool (ASP) OS/400 should use to store any objects created in the collection. If no ASP is specified, the default system ASP is used. ASPs are an AS/400 feature that the AS/400 system administrator can use to set up named groups of disk units. For information about setting up ASPs, see the manual OS/400 Backup and Recovery. Creating a Table In a relational database, a base table contains the actual data. To create a table with SQL/400, you enter a Create Table statement that specifies
Figure 3.1 shows the Create Table statement for a more complete Customer table than the one presented in Chapter 2.
When this statement is executed, SQL/400 creates an externally described physical file object (see Chapter 1 for a discussion of physical files) in the AppDta collection with the following attributes:
When the statement is executed, UDB/400 also automatically starts journaling row inserts, updates, and deletes in the table to the QSqJrn journal in the same collection.6 As you can see in Figure 3.1, the Create Table statement first lists the name of the table to be created and then lists the column and constraint definitions, separated by commas and enclosed in a set of parentheses. In the following sections of this chapter, we explain how to code column definitions. Chapter 4 covers constraints, including primary key constraints, such as the one used on Figure 3.1. In brief, this constraint specifies that no two rows will have the same CustID column value; this column thus serves as a unique identifier for each row. SQL is a free-format language, and you can use multiple lines for a statement as well as blanks between words. The example shows a coding style that puts each column definition and constraint on a separate line and aligns the similar parts of each column definition. Although this columnar style isn't required, it makes the statement much easier to read than an unaligned stream of text. SQL isn't case sensitive: "Create Table," "CREATE TABLE", and "CrEaTe TaBlE" are all correct. Be aware, however, that string literals are case sensitive, and 'x' isn't generally treated the same as 'X'. Also, when SQL/400 creates OS/400 file and field names for tables, the names are generally stored in OS/400 system dictionary files (see Chapter 2) as upper case. Thus, if you display a list of the tables in the AppDta collection, the table created by the statement in Figure 3.1 will be listed as CUSTOMER. However, you can still refer to the table as "Customer" or "customer" in SQL statements. SQL/400 Naming Options The table name used in Figure 3.1 is a qualified name, which includes the collection name (AppDta) before the unqualified table name (Customer). SQL/400 has two alternative approaches to names, including different syntax rules for qualified names:
The SQL naming option conforms closely to the naming conventions established in several official SQL standards (e.g., the ANSI standard), while the system naming option generally follows the rules that apply to OS/400 objects. When you use Interactive SQL (ISQL), execute a RunSqlStm (Run SQL Statement) command, or create an HLL program with embedded SQL, you specify which naming option to use. For example, when you execute a RunSqlStm command, you specify either Naming(*Sql) or Naming(*Sys); if you specify neither, the default is system naming. With SQL names, you use a period (.) as the separator: AppDta.Customer With system names, you use a slash (/) between the collection name and the table (or other object) name: AppDta/Customer When you specify an unqualified name, SQL/400 determines the implicit collection that qualifies the object based on the naming convention in effect (SQL or system), the type of SQL statement being executed, and several other options you can specify when you execute the statement. Chapter 13 provides more information about the SQL naming convention, which is the one we follow in this book.7 Also be aware that SQL has many reserved words, such as Create, Table, and Order, that have special meaning. If you want to use one of these reserved words as the name of a table, column, or other SQL object, you must use quotation marks (") around the name when it appears in an SQL statement. The following example shows how you would code an SQL Select statement that retrieves rows from a table named Order: Select * From "Order" Where CustID = 499320 Column Definitions On the Create Table statement, you code one or more column definitions after the new table name. Each column definition specifies the column name and a data type. Figure 3.2 shows the data types available with SQL/400. (The "Column Data Types" section later in this chapter discusses these data types in more detail.)
SQL/400 allows the following synonyms for data-type names:
For the VarChar, VarGraphic, Blob, Clob, and DbClob data types, you can optionally specify an Allocate clause after the data type and length. The Allocate clause specifies how many bytes should be reserved for data in the fixed portion of a row's data storage. For example, the column definition PartDesc VarChar( 500 ) Allocate( 50 ) specifies that the PartDesc column can contain character strings up to 500 bytes long and that the first 50 bytes are stored in the fixed portion of the row. Values longer than 50 bytes have their first 50 bytes stored in the fixed portion of the row and additional data stored in the variable portion of the row. Following the data type, you can optionally code either or both of the following clauses:8
In database terminology, null means "no value" or "not known," and a column that is null-capable lets you set the column to null, rather than an actual value, as a placeholder to represent one of these meanings. (Chapters 18 and 19 discuss the concept of null in greater detail.) If you specify Not Null when you define a column, the column is not null-capable; otherwise, it is. UDB/400 keeps track of whether a null-capable column is null or not null by setting an associated hidden bit. When the column is null, this hidden bit is 1; when the column has a normal value, this bit is 0. SQL statements can test whether a column is null using the Is Null test described in Chapter 5, as well as set a column null. In Figure 3.1, only the CrdLimit column is defined as null-capable. The Default clause specifies a default column value, which is used for the column when you insert a row through a view that doesn't include all the underlying table's columns (we discuss views in Chapter 6) or when an explicit value is not provided on an Insert statement (covered in Chapter 7). UDB/400 puts a default value in each column that isn't in the view or the Insert statement's column list. In Figure 3.1, the Status column has a default of blank, and the CrdLimit column has a default of null. In SQL/400, you can specify the Default keyword with or without an explicit value. The following examples show both alternatives. PersonName Char ( 30 ) Not Null
Default 'no name', ...
PersonName Char ( 30 ) Not Null
Default, ...
The table in Figure 3.3 shows the default values SQL/400 assigns to a column depending on the column type, whether a Default clause was specified (with or without an explicit default value), and whether the Not Null clause was specified. Note that SQL/400 does not assign any default for a column that specifies the Not Null clause but does not specify a Default clause. Column Data Types When a column is stored on disk or in memory, the AS/400 uses one or more eight-bit bytes to represent the value. For example, to represent characters, the AS/400 normally uses the Extended Binary-Coded Decimal Interchange Code (EBCDIC) representation, as we describe later. The AS/400 also has several encoding schemes for numeric values, each offering advantages and disadvantages. Although it's useful to know how data is encoded on the AS/400, the more important aspect to understand about SQL/400 column data types is the purpose and relative advantages and disadvantages of each data type. SQL provides data types for character and binary strings, numbers, date and time values, and Universal Resource Locators (URLs) that identify nondatabase files. In the following sections, we look at each of these data-type categories in more detail. String Data Types As we mentioned, the AS/400 normally uses an EBCDIC single-byte character set (SBCS) encoding to store characters.9 Because an SBCS encoding uses one byte of storage for each character, there are 256 possible characters, including the uppercase and lowercase letters, digits, punctuation marks, and an assortment of graphical symbols and control characters. For example, the letter A is represented in the default EBCDIC character set by the eight bits 11000001. UDB/400 also supports international applications with features for non-English, single-byte character sets and double-byte character sets (DBCS), which use two bytes for each character. One of the supported DBCS character sets is the widely used International Standards Organization (ISO) Universal Multiple-Octet Coded Character Set 2 (UCS-2). You select the character set for a table or specific column by specifying one of the available Coded Character Set Identifiers (CCSID).10 The Char column type defines a fixed-length column. Whatever length you specify for the column is the actual length of the character string that's stored in the database, including trailing blanks necessary to pad out shorter values. For example, if you define a column as Char(10) and you attempt to store a value of ABC, the actual value stored will be The Graphic column type defines a fixed-length double-byte (i.e., DBCS) character column. The VarChar column type defines a variable-length character column. For a VarChar column, you specify the maximum-length string that the column can hold. A column defined as VarChar(100) can hold strings up to 100 characters long. If you were to store the value ABC in this column, only three characters would be stored. For VarChar columns, UDB/400 also stores the current string length for the column's value 3 in this example. UDB doesn't store additional blanks to pad VarChar column values. VarGraphic columns are variable-length character strings that use one of the DBCS encodings.11 When you use VarChar or VarGraphic, it's usually a good idea to specify the Allocate keyword. As we mentioned earlier, this keyword specifies how much space is reserved within the row for the column, with excess data for any string that's longer than the Allocate value being placed in an overflow area. You can improve SQL performance, particularly for larger tables, by choosing an Allocate length that is greater than the length of most of the values that will be stored in the column. For example, suppose you have a description column defined as VarChar(200) in which roughly 80 percent of your descriptions are 100 or fewer characters in length. In this case, specifying Allocate(100) would be a good choice.
For data that's represented by a string of noncharacter bytes (e.g., a graphical image), you can add the For Bit Data clause to a Char or VarChar column: Photo VarChar( 20000 ) For Bit Data ... The For Bit Data clause tells UDB/400 that the contents of a column should never be interpreted as character data. Among other things, this instruction prevents UDB/400 from attempting to convert the data when it's transmitted to other systems. Note that Char and VarChar columns are limited to a maximum of 32 KBytes, so you may need to use one of the large object data types (discussed next) for binary data. With SQL/400, you can define columns to hold strings longer than the 32 KByte maximum of Char and VarChar columns. The strings can be binary (the Blob data type), single-byte character (the Clob data type), or double-byte character (the DbClob data type). Large objects can be as much as 15 MBytes in size, making them suitable for large text documents, graphical images, audio data, and other such information. Note that you use the Clob or DbClob data type for character text and the Blob data type for noncharacter data, such as images and audio. When defining a large object column, you specify the maximum length, as in the following example: EmploymentApplication Clob( 1000000 ) ... Although you can specify the Allocate keyword for large object columns, in most cases it's best to omit this keyword and let the column assume the default allocation value of 0. With the default allocation, all the object's data is stored in the overflow area. This generally is suitable for the type of data stored in large object columns. Choosing the Correct String Data Type To store single-byte character data, use a Char, VarChar, or Clob data type. To store double-byte character data, use Graphic, VarGraphic, or DbClob. And to store binary (noncharacter) data, use Char or VarChar with a For Bit Data clause or use Blob. To store strings that may be longer than the Char, VarChar, Graphic, or VarGraphic limit (generally 32 KBytes, as shown in Figure 3.2), you must use one of the large object data types (Blob, Clob, or DbClob). As an alternative to storing the data in the UDB/400 database, you can store it in an integrated file system (IFS) file and store a datalink (discussed later) in the database. For strings that will fit in a Char, VarChar, Graphic, or VarGraphic column, you must decide whether to use a fixed-length string (Char or Graphic) or a variable-length string (VarChar or VarGraphic). The easy category includes those strings that naturally have a fixed length, such as two-character state codes, five-character zip codes, and four-character zip-plus codes. Other columns are naturally varying in length; name columns and description columns are two examples. If a great deal of variation exists in the lengths of values, variable-length strings may save some disk space, but this often is not a major issue for strings with a relatively small maximum length. Variable-length columns add some processing overhead how much overhead depends on the number of values that are longer than the allocated space and thus are partially stored in the overflow area. Variable-length columns are more convenient for some string manipulations (as you'll see in Chapter 5) because you don't have to deal with the extra blanks that pad a fixed-length string. The guideline (admittedly, a somewhat arbitrary one) that we generally follow in this book is to specify Char or Graphic for columns whose maximum length is at most 100 and to specify VarChar or VarGraphic for columns whose maximum length is greater than 100. Numeric Data Types For numeric columns, you have a choice of several data types:
Each data type uses a different scheme to store numbers as a series of bytes in the table, and each has its advantages and disadvantages. Each numeric data type has a precision and a scale. The precision is the maximum number of digits (including digits to the right of the decimal point). The scale is the number of decimal places. For some column data types (e.g., Decimal), you can explicitly define the precision and scale, while for others (e.g., Integer), the precision and scale are completely determined by the data type. A Decimal column specifies that numbers are stored in packed-decimal format. When specifying Decimal, you generally specify the precision and the scale (if not 0). So to specify a packed-decimal column with a precision of 7 and a scale of 2, you would code Decimal(7, 2) or Dec(7, 2). A packed-decimal column is implemented by using a half-byte (four bits) for each digit and the low-order half-byte (rightmost four bits) of the low-order byte for the sign. The half-byte 1111 is used for positive values, and the half-byte 1101 is used for negative values. Thus, a four-byte packed-decimal field can hold a number up to seven digits long. The number +1234567 would be represented in a Dec(7, 0) column as Note that this also would be how +12345.67 would be represented in a Dec(7, 2) column because the decimal point is implicit, rather than actually stored in the data. A six-digit number would also require a four-byte packed-decimal field because a field must be an integral number of bytes; in a six-digit packed-decimal field, one half-byte would be unused. Thus, the number -123456 would be represented in a Dec(7, 0) column as In general, you should declare packed-decimal fields with an odd number for the precision. Packed decimal has been around for a long time on IBM systems and is both compact and fast to process (on the AS/400). Its main disadvantage is that it's not found on many other systems or as a built-in data type in some widely used languages, such as C/C++ and Java. (IBM's C and C++ compilers for the AS/400 do support packed-decimal fields, and Java has the BigDecimal class. In both languages, however, support for decimal data types is slower than for integer and floating point.) If portability or cross-system data access is a concern, be sure to check on the language and data translation capabilities that will be used. A Numeric column specifies that numbers are stored in zoned-decimal format. The precision and scale are specified the same way as for Decimal columns. Zoned decimal represents a number as one byte (eight bits) per digit. Thus, to hold a seven-digit number, you need a zoned-decimal column that occupies seven bytes. Each byte is simply the EBCDIC character representation of the digit, except the last byte has the high-order half-byte (leftmost four bits) set to indicate the number's sign (1111 for positive, 1101 for negative, or 1110 for unsigned). The number +123 would be represented in a Numeric(4, 0) column as the following four bytes: The number -123 would be represented in the same column as Zoned decimal is a legacy of computer punch card days and is not as compact or efficient to process as packed decimal or binary. In general, you should avoid the Numeric column data type. A SmallInt, Integer (Int), or BigInt column specifies a binary numeric format. A SmallInt column is two bytes and can hold integer values from -32,768 to +32,767. An Int column is four bytes and can hold integer values from In general, the binary format of integers represents a number as a base 2 value, with one bit used for the number's sign.12 The value +1234 would be represented in a SmallInt column as 0000 0100 1101 0010 On the AS/400, negative numbers are represented in binary format using "two's complement" encoding. This approach flips bits for negative numbers and then adds 1, so that -1 is represented by the bit pattern 11111111. This encoding lets the processor perform faster arithmetic. Thus, the value -1234 would be represented in a SmallInt column as 1111 1011 0010 1110 An Int column would represent +1234 as 0000 0000 0000 0000 0000 0100 1101 0010 and -1234 as 1111 1111 1111 1111 1111 1011 0010 1110 Binary fields are compact, and most computer systems and languages support binary values. As we said, SQL/400 BigInt columns can hold values up to approximately ±9 sextillion (a maximum of 19 digits), whereas Decimal columns can hold numbers up to 31 digits long. Also, you can specify a scale for a Decimal column but not for a SmallInt, Int, or BigInt column. Thus, for business applications, the Decimal column data type is a good choice for large numeric fields or those that require precise decimal fraction values (as is often the case in financial applications). The SmallInt, Int, and BigInt column data types are preferable for integers within the supported ranges. The Real, Double, and Float numeric data types specify a floating-point numeric format. Floating point is generally used for scientific or mathematical applications (and by C, C++, and Java programs) and is not very common in AS/400 business applications, although RPG IV and ILE COBOL/400 now include support for floating point. SQL/400 supports single-precision (Float(24) or Real) and double-precision (Float(53) or Double) floating-point columns. A single-precision column occupies four bytes, and its values can range from approximately 1.2 × 10-38 to approximately 3.4 ×10+38. A double-precision column occupies eight bytes, and its values can range from approximately 2.2 × 10-308 to approximately 1.8 × 10+308. Although SQL/400 has many variations on how you may specify floating-point types for columns, it's sufficient to use Float(24) for single-precision and Double for double-precision columns. Date, Time, and Timestamp Data Types UDB/400 supports special data types for date, time, and timestamp values. For these three data types, you specify the Date, Time, or Timestamp type in the column definition but not a length (or number of decimal positions) the system determines the length of these types of fields. SQL/400 supports date arithmetic on date columns. You'll learn more about this in Chapter 5. Date columns and date arithmetic can simplify the way many common business applications handle dates. UDB/400 stores all date fields in the same four-byte internal format on disk. When you execute SQL statements or compile a program that contains SQL statements, you specify an external format (and in some cases, the separator character) to use. SQL/400 uses this information to translate internal values when they're displayed or returned to your program. For example, with ISQL, the StrSql (Start SQL) command's DatFmt and DatSep parameters specify the date format that's used. Figure 3.4 shows the external formats available for date columns in UDB/400. For several of the date formats, you can also specify a separator character. The allowable values are slash Time columns work similarly to date columns. UDB/400 stores time values in three-byte fields internally. Figure 3.5 shows the time formats available in UDB/400. For some time formats, you can specify a separator, which can be a colon (':'), period ('.'), or blank (' '). The *USA time format does not include seconds (even though the internal storage of the field does). Timestamp columns use 10 bytes internally. The timestamp data type has a system-defined external format that looks like this: yyyy-mm-dd-hh.mm.ss.uuuuuu where uuuuuu is millionths of a second. Use timestamps when you need to maintain a precise record of the sequence in which events occur. Datalink Data Types Datalink columns are used to hold references to nondatabase files stored in the local integrated file system (IFS), in a remote AS/400's IFS, or in the file system of any attached system (e.g., Windows 2000 Server) that has IBM's DataLink Manager installed.13 In other words, instead of storing the data in the column itself, you can use a datalink to specify a file location for the value of the column. Columns defined as type DataLink contain a reference to the file (in the form of a URL) and, optionally, a comment. When defining a column as DataLink, you should specify a length large enough to hold the largest expected URL plus any comment you may want to associate with the link. Here is an example: ProductImage Datalink( 500 ) ... SQL and System Table Names As we said in Chapter 1, OS/400 object names have a maximum length of 10 characters. SQL table names, however, can be up to 128 characters. Because SQL tables are implemented as OS/400 physical files, each table always has an associated 10-character system name. If you specify a table name of 10 characters or less in the Create Table statement, the table's system name is the same as its SQL name. However, if you specify a table name longer than 10 characters, UDB/400 assigns a 10-character system name using the first five characters of your longer name followed by a five-digit sequence number. For example, if you create a table named PartSupplier, UDB/400 will assign a system name such as PartS00001. An SQL statement can refer to a table using either the table's full SQL name or its (possibly shorter) system name. Because you may work with OS/400 file objects using languages other than SQL (e.g., using CL to perform backup operations), it's preferable to have meaningful system names for all tables. After you create an SQL table (or view or index, as we discuss later), you can use the SQL Rename statement to change the system name. Here's an example of the statements you might use: Create Table PartSupplier ... Rename Table PartSupplier To System Name PartSuplr You can also use the Rename statement to change the SQL name, as in the following example: Rename Table Acct To Account When you create a new table for which you want an SQL name longer than 10 characters as well as a meaningful system name, the best practice is to first create the table with the system name and then use a Rename statement to change the SQL name: Create Table PartSuplr ... Rename Table PartSuplr To PartSupplier For System Name PartSuplr The advantage of this approach is that PartSuplr is used as the name for the physical file, the physical file member, and the file's record format. Creating the table with the SQL name and then using Rename to change the system name leaves the file member and record format names as their system-generated names (e.g., PartS00001). Be sure to include the Rename statement's For System Name clause when you follow this practice. Without this clause, the Rename statement will change the system name to a generated name, such as PartS00001. SQL and System Column Names In Figure 3.1 , we used column names that conformed to the 10-character limit that UDB/400 has for database file field names. As with tables, columns have an SQL name and a system name. If you specify a column name longer than 10 characters, UDB/400 assigns a 10-character system name using the first five characters of the longer name followed by a five-digit sequence number. And, as with table names, it's preferable to use meaningful system names for all columns.14 The Create Table statement makes it quite easy to explicitly assign system names for one or more columns, using the For clause as shown in Figure 3.6.
For columns with an SQL name of 10 characters or less, you don't have to specify a For clause if you want identical SQL and system names. In SQL statements, you can refer to a column by either the SQL name or the system name. Descriptive Text and Column Headings After you create a table, you can define descriptive text and longer comments for the table and its columns. Descriptive text is saved in the SQL catalog and is therefore available to be queried. Moreover, this text shows up with various AS/400 utilities. For example, you see it when prompting (pressing the F4 key) in ISQL. You can also define column headings. ISQL uses column headings when displaying the results of interactive queries. A Label On statement stores up to 50 characters of descriptive text for a table. The following example adds descriptive text for the Customer table: Label On Table Customer Is 'Current and past customers' To add descriptive text for columns, you use the form of the Label On statement shown in Figure 3.7.
After the table name, there are one or more entries for the column text, separated by commas. Each column text entry has the form column-name Text Is 'text-string' You can also define column headings with SQL/400 as shown in Figure 3.8.
The Text keyword is not used when defining column headings, and the text string is treated as three 20-character segments. The first 20 characters are used for the first line of the heading; characters 21 through 40 (if present) are the second heading line, and characters 41 through 60 (if present) are the third heading line. You must carefully align the contents of each string to get the right column headings. SQL also lets you specify longer comments up to 2,000 characters for tables, views, columns, and many other SQL objects and their components. Such comments are stored in the SQL catalog, along with descriptive information supplied by Label On statements. To define a comment for a table, you use a Comment On statement such as the following: Comment On Table Customer
Is 'Customer master file ...'
The following examples show how to add comments for a distinct type (covered in the next section) and an index: Comment On Distinct Type CustIdType Is 'Customer ID Data Type' Comment On Index CustNameIdx Is 'Index on Customer Name' Figure 3.9 shows a Comment On statement for column comments.
Notice the similarity to the Label On statement in Figure 3.7 except the statement name is different and the Comment On statement doesn't use the Text keyword in the column entries. (This Comment On example uses the same text as the Label On statement in Figure 3.7, but remember that long comments can be up to 2,000 characters while descriptive text is limited to 50 characters.) Creating and Using User-Defined Types SQL/400 lets you create a new column data type, known as a distinct type, based on one of the built-in data types listed in Figure 3.2.15The following statement creates the distinct type CustIdType as equivalent to a Dec(7, 0) data type: Create Distinct Type CustIDType As Dec( 7, 0 ) With Comparisons Once you create a distinct type, you can use it in column definitions: CustID CustIDType Not Null ... When you use a distinct type to define a column, SQL/400 creates the column using the based-on type Dec(7, 0) in this example. The Create Distinct Type statement's With Comparisons clause (the default in SQL/400) specifies that you can use equality (i.e., =) and inequality (e.g., <) tests between two columns defined with the same distinct type.16 SQL/400 generally lets you assign a column with the based-on type to a column with the distinct type for example, assigning a Dec(7, 0) column to a CustIdType column and vice versa. However, you can't automatically compare a distinct type to any other type of column except for the same distinct type. For example, you couldn't compare a Dec(7, 0) column to a CustIdType column unless you created your own user-defined function to do so. (Chapters 8 and 13 discuss user-defined functions.) Nor can you use operations such as addition for numbers or string concatenation for a distinct type, unless you create the appropriate function. (An SQL technique known as "casting" also enables comparison, concatenation, and arithmetic or other operations on distinct types; we cover casting in Chapter 8.) On the Create Distinct Type statement, you can specify a built-in type and a length, precision, or scale, if appropriate for the built-in type. You can also specify CCSID, For Bit Data, For SBCS Data, or For Mixed Data for any of the character data types (e.g., Char, VarChar, Clob). You can't specify Allocate, Not Null, Default, or any other column attributes; these must be explicitly specified when you code a column definition on the Create Table statement. The implicit default value (as discussed earlier) for a distinct type is the same as for the based-on type. Distinct types are often referred to as user-defined types (UDTs) and serve two main purposes. First, a UDT provides a convenient way to define columns that contain the same kind of data. In the example we've been using, every column that should hold a customer ID can be defined with CustIdType. This use of a UDT improves consistency as well. Second, because of the restrictions in using columns defined with a distinct type, you have tighter control over which operations are valid for columns. For example, SQL/400 won't let you add two columns that are defined with CustIdType a sensible restriction because customer IDs don't represent quantitative values. For UDTs that do need arithmetic or other operations, you can use user-defined functions, as we mentioned earlier, or casting. Distinct types provide some of the capabilities of domains as described in Chapter 18. However, because they don't let you restrict which values can be stored in a column, they fall short of implementing the full domain concept. Also, a distinct type can't be based on another distinct type only on one of the built-in data types. These limitations diminish the usefulness of distinct types; however, distinct types can help standardize the definition of primary key columns, and, when used in conjunction with user-defined functions, they can provide additional flexibility in your database definitions. Using Standard Data Types You can improve the consistency of your database and simplify application development by deciding on a set of standard data types to be used for column definitions. For example, you might decide to use VarChar(500) Allocate(100) Not Null Default as the standard data type for all columns containing descriptions (e.g., of products, courses). You might also decide to use Char(50) Not Null Default for columns that store one part of a person's name (e.g., first or last name). To keep track of standard data types, you should establish a data dictionary, either manually or by using a CASE tool17. Chapters 19 and 21 provide additional suggestions for using data dictionaries. For some or all of your standard data types, you might also choose to create SQL distinct types, as described above. However, keep in mind the limitations of distinct types (e.g., you can't specify Not Null) and the fact that you can standardize the data types you use regardless of whether or not you also use distinct types. Adding, Dropping, or Changing Table Columns After you create a table, you may need to add or drop (remove) a column or change a column definition. You use the Alter Table statement to do so. The following example shows how to add a column to the Customer table created in Figure 3.1: Alter Table Customer
Add Discount Dec( 5, 3 ) Not Null
Default 0
If you use the Not Null clause when you add a new column, you must also specify a Default clause so that UDB/400 can assign the default value for the column in all existing rows. If you don't specify Not Null or a Default clause, the column is set to null in all existing rows. Be aware that under the covers, UDB/400 actually creates a new OS/400 physical file object for the table, copies data from the old file, and then deletes the old file. All related objects, such as views and indexes, are disassociated from the old file and associated with the new file. This process is all handled by UDB/400, but it may take a while and the table may not be available for use during some of this time. (The same process occurs when you drop a column and for many situations in which you change a column definition.) Here's how you can drop a column: Alter Table Customer Drop Discount To change a column definition, you specify any of the column attributes (e.g., data type) that you want to change: Alter Table Customer
Alter Discount
Set Default 0.01
Any column attributes that you don't specify remain unchanged, and, of course, both new and unchanged attributes must be compatible (e.g., the column's data type and default value). Creating an Index Although you don't specify a particular order of rows on either the Create Table or the Create View (discussed in Chapter 6) statement, UDB/400 does use internal indexes for efficient row selection and ordering. UDB/400 automatically selects which index (or indexes) to use when a Data Manipulation Language (DML) statement, such as Select, is executed or when an SQL cursor embedded in an HLL program is opened. When you specify a primary, unique, or foreign key constraint (discussed in Chapter 4), UDB/400 creates or shares an internal index as part of the physical file created for a base table. You can create additional indexes using the SQL Create Index statement. The Create Index statement is fairly simple. You code an index name (which can't be the same name as any base table, view, alias, other index, or non-SQL file in the same collection) and then specify a single base table and the columns over which the index should be created.18 The following example creates an index over the ShipCity and CrdLimit columns of the Customer table: Create Index AppDta.CustCtyX01
On Customer
( ShipCity,
CrdLimit Desc )
When you create an SQL index, UDB/400 creates an OS/400 logical file with a keyed access path.19 You cannot access an SQL index directly with any SQL DML statement. In SQL, indexes are solely for the internal use of UDB/400, and their purpose is generally related to performance. As the above example shows, you can optionally add the Desc keyword after any column name to specify a descending order. You also can optionally specify Unique or Unique Where Not Null in an index definition: Create Unique Index AppDta.BldRoomX
On Building
( BldName,
RoomNbr )
The Unique keyword enforces unique key values, including null, the same way as explained earlier for a base table primary key constraint. If you specify Unique Where Not Null, UDB/400 enforces unique values, except for null, as explained in Chapter 4 for unique constraints. In general, you should use a primary key or unique constraint on the Create Table statement for a base table which causes UDB/400 to create an internal index rather than create a separate index with Unique or Unique Where Not Null. In UDB/400, independent indexes are primarily useful for two cases:
Note that you can specify only a single primary key constraint for each base table, so if you need to enforce a second fully unique key (that is, including nulls) for the table, you need a Create Unique Index statement. Because you can specify multiple unique constraints (which exclude nulls) on a base table, there's little reason to use a Create Unique Where Not Null Index statement in SQL/400. The type of indexes built by the examples discussed so far are called binary radix tree, or b-tree, indexes. As the name suggests, these indexes use a tree structure to store key values and associated record locations. You can create another type of index, known as an Encoded Vector Index (EVI). This type of index uses an internal table that maps key values to binary codes and a vector of these codes (one per row in the database table) to identify which records have a particular key value. EVIs can dramatically speed certain types of queries.20 Here's a Create Index statement for an EVI: Create Encoded Vector Index PartWarehouseIdx On Part ( WarehouseID ) Indexes affect database performance both positively and negatively. In brief, indexes can speed data retrieval, but they may slow updates due to the time UDB/400 spends updating the entries in the index (or indexes) over a base table as rows are inserted, deleted, or updated in the table. UDB/400 is fairly efficient at maintaining indexes, so you don't generally need to worry if you have five or fewer indexes over an active table or 10 to 20 over a less frequently modified table.21 And, for tables that are primarily read-only, with little update activity, there's not likely to be any significant overhead from having numerous indexes. However, for actively updated tables, you want to avoid an excessive amount of index maintenance, or you may slow down your application's database updates. Creating an Alias You can create an alternative name, called an alias, for any SQL/400 table or view.22 Here's an example that creates the PartSuplr alias for a PartSupplier table: Create Alias PartSuplr For PartSupplier You might use an alias when you rename an object and still have existing applications that refer to the old object name. You'll also find an alias helpful when you want to use SQL to access an existing OS/400 database file object that has multiple members (as explained in Chapter 1). The following statement creates the SaleJune alias that can be used to reference the June member of the Sale multimember file without an alias, there's no way in SQL to directly access a file member other than the first member: Create Alias SaleJune For Sale ( June ) The optional member name is placed in parentheses immediately after the table (or file) name. SQL DML statements can use an alias anywhere that the corresponding table or view name can be used. For example, the following statement retrieves all rows from the June member of the Sale file, using the SaleJune alias created above: Select * From SaleJune Although an alias can be invaluable in certain circumstances, you shouldn't use aliases just to create a variety of different names for the same object. In general, you should use one consistent name for each database table and view. Dropping Collections, Tables, Distinct Types, Indexes, and Aliases To delete any of the database objects we've looked at in this chapter, you use one of the following statements:
Guidelines for SQL/400 Names A good naming convention for SQL/400 objects goes a long way toward standardizing your database definitions. Table, view, index, column, and other names should meet three criteria:
Names should begin with a letter (AZ) and should use only letters and digits (09) in the rest of the name. Avoid national characters (e.g., #, $, and @ in the United States) and the underscore (_) because of international character set conflicts and HLL syntax restrictions. Don't use SQL reserved words (these are listed in an appendix of the DB2 UDB for AS/400 SQL Reference). It's much easier to work with a database if its names follow a regular pattern and use consistent and clear abbreviations. The value of these two principles is easy to see if you consider the problems that arise when names aren't formed this way. If you came across the column name CAL1, would you have any idea that it meant "customer address, line 1"? Even if you could guess what the C, A, and L stood for, would you know whether it was a shipping or a billing address? Although a more meaningful name such as ShipLine1 is not completely unambiguous, you have a reasonable chance of comprehending it when you first encounter it. In addition, once you've had a brief period to familiarize yourself with the table definition, you'll find it very easy to follow programs that use this column. Consistency aids comprehension as well especially when names must be formed with abbreviations. Consider the frequently occurring need to have a column that stores a count of something (e.g., how many of an item are ordered). If some columns use different abbreviations for "count" Cnt, Cou, Cn and others use a variety of abbreviations for "number (of)" Nbr, No, Num it becomes difficult to quickly recognize the meaning of a column name, especially when you're working with code that uses the columns. You can avoid this problem by following a simple naming convention, such as: The term "count" and its abbreviation Cnt are used for columns that store an integral count of some item. This convention specifies both the term that will be used and its abbreviation. The people who designed the OS/400 interface studied abbreviations extensively and came up with a simple and effective scheme for creating abbreviations. The two main principles they followed are:
Because system object and column names are limited to 10 characters, you'll inevitably have to squeeze one or more of the abbreviations in a system name. Thus, it's a good idea to develop a list of terms that appear in your applications and two- and three-character abbreviations for each of them. Figure 3.10 provides a partial list of terms that occur in many applications. Create and maintain your own list of terms using a word-processor document or a table. Here are some further naming guidelines:
Also, most application-oriented names shouldn't include any indication of the table that contains them. For example, in Figure 3.1, we use Name instead of CustName. When Name is defined in the Customer table, it's obvious that Name means "customer's name." If there were a separate Employee table that also had a Name column, it would be clear that Name in that table meant "employee's name." There's one case in which the column name should incorporate some indication of the containing table a column that serves as a table's primary key. Notice in Figure 3.1 that we followed this convention with the CustID column. The reason for including Cust in the field name is not that ID alone would be an inadequate name within the Customer table itself. Rather, the reason for this explicit indication of the containing file is that a CustID column may also be defined in other tables that have some relationship to the Customer file. For example, a Sale table would normally have a CustID column to contain the customer ID of the customer placing the order. Obviously, it wouldn't be a good idea to use just ID as the name of the customer ID column in the Sale table. (What name would you give the column that contained the order ID?) Because CustID (or a similar name) is the best choice for a column in a table that's related to the Customer table, it's good practice to use the same name in the Customer table itself. Coding Suggestions
CrtSrcPf AppDta/SqlClcSrc(Note that you use the slash, or /, character as a qualifier with OS/400 object names.)
Chapter Summary The Create Collection statement creates an SQL collection, which is an OS/400 library object, to contain SQL tables, views, indexes, packages, and other SQL and non-SQL objects. The Create Table statement creates a base table, which is a UDB/400 externally described, physical file object. A base table contains the actual data for an application. Data in SQL tables is stored as rows. Each table definition specifies one or more columns, which represent different properties (e.g., customer name) of an application object. Each column has an SQL name and a system name, a data type, and an explicit or implicit size. A column may optionally have an explicit default value. A column can be defined to allow or prohibit the null placeholder. The Label On and Comment On statements can be used to specify descriptive text for tables, indexes, columns, and other SQL objects and components. You can also use Label On to specify column headings of up to three lines. The Create Distinct Type statement lets you create a user-defined data type (UDT) based on one of the SQL/400 built-in data types. You can use a distinct type in a column definition. The Alter Table statement lets you add, drop, or change a base table column. The Create Index statement creates an SQL index, which is a logical file object with a keyed access path. An SQL index is created over a single base table and specifies one or more key columns. You can also create Encoded Vector Indexes (EVIs) to speed up certain types of queries. In general, UDB/400 uses indexes to improve performance, and SQL DML statements cannot directly use an SQL index to access data. The Drop Collection, Drop Table, Drop Distinct Type, Drop Index, and Drop Alias statements delete the respective SQL objects and (optionally) all dependent or contained objects.
Exercises
Footnotes If you have a question about our products or about an order you have placed, please contact the customer service department at (800) 650-1804 or (970) 203-2914 or via e-mail. If you have a technical question about this site, please contact us. Terms & Conditions | Privacy Policy Copyright © 2009 Penton Technology Media Loveland. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||