INDEX.HOW 12/23/95 Marilyn Price: 72332,2312 ------------------------------------------------------------- Indexing .dbf tables: Tips, Tricks, and Miscellaneous Tidbits ------------------------------------------------------------- This How To is a mixed bag of information about indexing .dbf tables that has been accumulated over the last ten years or so. If I attempted to attribute every one of these, the list of names would be longer than most of the tips. My thanks to all who contributed to this knowledge base over the years. -------- Contents -------- 1. What's the difference between the different types of indexes? 2. What are the limits involving indexes? 3. How can I mix data types in an index tag? 4. What about TRIM()? 5. Why can't I find what I'm SEEKing? 6. Can I make a search case insensitive? 7. Keymatch() won't let me reuse a deleted keyfield. Now what? 8. Can I create an index tag involving more than one table? 9. I need an index order where part of it is in ascending order and part is descending. How do I do this? 10. I have a complicated order that I need the data in. What's the easiest way to do this? 11. Are there any potential problems with using UNIQUE? 12. I need to append a large amount of information. Is there any way to speed this process up? 13. I need to replace a key field, but dBASE is only changing the first instance. What's going on and how can I get them all? ------------------------------------------------------------- What's the difference between the different types of indexes? ------------------------------------------------------------- There's an excellent write-up on the different types of index (production .mdx, non-production .mdx, and .ndx) files in the dBASE for DOS forum library 10 called QMNDX.FAQ (also available as TI1501 in the same library). Summarizing these documents, there is a time and place for each of these types of indexes. Production indexes are the most common to use. These are automatically opened and kept up to date whenever a table is opened. Non-production .mdx files and .ndx files must be explicitly opened to be updated. When might you choose to use a non-production .mdx file instead of a production .mdx file? A. If you need more than 47 tags for a table. B. Your table is a readonly table (such as one on a CD). C. These are seldom used tags. When might you choose to use an .ndx file instead of an .mdx file? A. When you need to generate a temporary index (for a report, for example) in a network environment. You can create an .ndx file without using the .dbf exclusively. Adding a tag to an .mdx file requires exclusive use of the table. B. Compatibility with older, external programs. -------------------------------------- What are the limits involving indexes? -------------------------------------- These values can also be found in Appendix B of the Language Reference manual. The maximum index key expression length is 220 bytes. (Example: UPPER(lName + fName) The maximum index FOR expression length is 220 bytes. (Example: FOR UPPER(state) = "TN" The maximum evaluated index key length is 100 bytes. The maximum number of index tags per .mdx file is 47. The maximum number of open .mdx files open per active .dbf table is one production .mdx plus up to 9 additional non-production indexes (either .mdx or .ndx) allowed. The maximum number of .ndx index files open per active .dbf table is 10. ----------------------------------------- How can I mix data types in an index tag? ----------------------------------------- To mix data types within a key field, convert the non-character field types to character. For best results, use these: Dates: DTOS(dField) This is usually recommended over DTOC(dField) because the result will leave the index in chronological order. If chronological order isn't essential, then either function can be used. If you're converting an older program to Visual dBASE, you may find an index expression like this: RIGHT(DTOC(dField), 2) + LEFT(DTOC(dField),5) or SUBSTR(DTOC(dField,7,2)+SUBSTR(DTOC(dField),1,2)+SUBSTR(DTOC(dField),4,2) Either of these will also get your date fields in chronological order (and this type expression was necessary in dBASE III+), but DTOS(dField) is not only easier to read, it takes less typing, makes a shorter expression, and is easier to remember. Numbers: STR(nField, nlen, ndec) While the Language Reference has always stated that, if omitted, STR() will assume a value of 10 for nlen and 0 for ndec, I've always had best results by explicitly stating a length and number of decimals when using STR() in an index key expression. Some inexplicable "Not Found" situations have been solved by explicitly stating the length and decimals in both the key expression and the seek value. Logicals: IIF(lField, "T", "F") Or assign any other _character_ values for the true and false portions of this function. For example, if you want the true values to precede the false ones, then perhaps using "1" and "2" will be better. ------------------ What about TRIM()? ------------------ Just about any function or udf can be used in an index expression, however, as stated in the Language Reference, the resulting expression _must_ evaluate to the same length regardless of which record of the table is used. That is, variable length expressions are a real no-no. Variable length expressions are _valid_ in the sense that they won't give you a syntax error. However, the potential for index file corruption and not found information is too high to risk. It's my understanding that, when Visual dBASE, when creating an index, will use the contents of the first record of the table to determine how long the evaluated key is. If that first record is blank, then, potentially, an index expression such as TRIM(lname) + TRIM(fname) could have a length of zero! Suppose, however, that the first record has Joe Smith in it. This would result in an expression length of 8. Now the second record has Samantha Gillenwater in it. Visual dBASE will attempt to store the 18 characters into an 8 character slot, overwriting all or part of the next two keys, or getting the end of the key value overwritten by a later value. However it works, variable length keys are problems waiting to happen. The Visual dBASE Language Reference manual has a very mild statement to that effect. It says "Don't use functions such as CHR(), LTRIM(), RTRIM(), TRIM(), or IIF() that vary the field length in the key expression." I prefer the wording in the DOS Language References. It states "The key expression must evaluate to a fixed-length key. You can create an index with a variable length key, but the index may not be reliable." That said, if you are careful to ensure that the resulting key expression always evaluates to a fixed length, then feel free to use any of these expressions. How about a few examples? Suppose you wanted to keep a list of books in order by author's name and as much of the title as you can. Since this list includes some technical tomes, some of these titles can get quite large. Remembering the 100 byte limit to a key expression, you decide to use TRIM() to help you out. An index expression like this: LEFT(TRIM(lName) + " " + TRIM(fName) + " " + TITLE, 100) will work. (Note: lName and fName are Character 20 and Title is Character 100) But it contains TRIM()!! Yes, it does. However, the function controlling the final length is the LEFT(..,100). Let's look at some evaluated keys for this file. First, let's look at one extreme, a blank record. Before LEFT() gets hold of it, the expression contains 102 spaces. After applying LEFT(), there are 100 spaces. Okay, what about the other extreme? Suppose you had a book called "Splendors of the Past: Exploring the Lost Cities of Antiquity" (I made this up) by Samantha Gillenwater. Before using LEFT(), this comes to 121 characters. After applying LEFT(), there are 100 characters! Same length. So, with care, these variable length functions _can_ be used, but it is the programmer's responsibility to ensure that the final key always evaluates to a fixed length. Another technique sometimes used is to index on lname - fname tag name. The minus in this case trims all the trailing spaces from the first fieldname and appends them to the end of the expression. This _does_ provide a fixed length expression. However, it can also provide some unexpected results. For example, names indexed this way would appear in this order: James Smith John Smithsonian Steve Smith Which may not be the order you really want. ---------------------------------- Why can't I find what I'm SEEKing? ---------------------------------- There are a number of reasons that you might not find what you are seeking. These include: A. What you're seeking must match the index expression. Suppose you have an index expression of: upper(lastname + firstname) and you're trying to find John Smith. There are a number of ways to _successfully_ find John Smith (if he's part of the table). These include: SEEK SMITH JOHN' && if exact is off mLast = Smith ' mFirst = John' SEEK upper(mLast + mFirst) && if exact is off mLast = left(mLast + space(20), 20) SEEK upper(mLast + mFirst && if exact is off These also assume that the length of the last name is 20 characters. Since the first name is not blank, you _must_ pad the last name to the exact same number of characters as the field name. This is true for any index expression where more than one field is involved and you have portions of both fields. B. Check the status of exact. You will get different results in attempting to match a partial index depending on whether exact is set on or off. With exact on, not only must you match the significant portion of the expression, you must match the entire expression, length and all. With exact off, you can perform partial look-ups. Each has its uses, just be aware of the differences. C. Another potential for not finding what you're looking for is using the str() function without explicitly stating all three parameters. If you are looking for the number 100, remember that these are _not_ equal as far as the seek command is concerned: 100' Str(100) Str(100,5,0) Str(100,6,1) D. From the information above, I think you will see that your best chance of getting SEEK to find what you want is to match the index expression. What if you are matching it exactly and Visual dBASE still can't find the record? This is the time to start thinking that the index itself may have gotten corrupted. It happens sometimes. When it does, your best bet is to delete the production .mdx file and recreate all the tags. One way to do this is to use the file CREIND.ZIP (written by Romain Strieff), which can be found in the library. This program reads the index tags then deletes the .mdx file and recreates them. For best results, _DO NOT_ reindex the files by simplying issuing the REINDEX command. If there is corruption in the file, this will not always get rid of it. The technique of deleting the .mdx file and recreating all tags from scratch is much safer. ------------------------------------- Can I make a search case insensitive? ------------------------------------- Pick one: upper() or lower() and use it with all your character based index tags. Use the same function when SEEKing and case is no longer a consideration. Which one you pick is a matter of personal preference, just be consistent for the sake of your sanity. Upper() makes the _result_ all upper case, while lower() makes the result all lower case. For example, suppose you wanted to index your table on lastname + firstname and make the result case insensitive. Your index command would look like this: index on upper(lastname + firstname) tag name and the corresponding search could be: seek upper(trim(mLast + mFirst)) Note: since this is a concatenated expression, make sure that the trim is done _after_ the fields are combined rather than before. That is, don't use seek upper(trim(mLast) + trim(mFirst)). Unless the lastname you're searching is the maximum size, you won't find anything. There were some problems with some of the DOS versions of dBASE involving tags on upper(field) that persuaded some people to switch to using lower(). However, unless you are experiencing problems, there's no real reason for selecting one over the other. If you are using SEEKER.CC (found with some of the sample programs in the forum libraries), then you may want to stick with upper(), as this is the function it uses. ------------------------------------------------------------ Keymatch() won't let me reuse a deleted keyfield. Now what? ------------------------------------------------------------ Unlike SEEK, KeyMatch() doesn't obey any current filtering you may have in place, including the status of SET DELETED. It's my understanding that the reason for this is because keymatch() uses the values stored in the index file, not the table, in determining if there is a pontential key duplication. One tip I saw recently was to change the index expression by adding FOR .NOT. DELETED() to the end of it. Since deleted records are now explicitly excluded from this index, Keymatch() should allow you to reuse this key number. Just remember, since the original record can still be recalled, you have the _potential_ of duplicating a key field. -------------------------------------------------------- Can I create an index tag involving more than one table? -------------------------------------------------------- Having an index tag based on fields in related tables is sometimes necessary (for a report, for example). It is recommended that this type index be created at the time it is needed and deleted as soon as possible. If it is necessary to keep the tag for a protracted length of time, be sure that the other table involved is _always_ opened first. If you open the tables in the opposite order, you will get an error message about a field not being found. Here's one tip posted recently about creating a temporary index involving two tables: USE PATIENT ORDER PACCTNO IN SELECT() USE INVGET IN SELECT() SELECT INVGET SET RELATION TO ACCTNO INTO PATIENT INDEX ON IIF(SEEK(ACCTNO,"PATIENT"), PATIENT->PATLN, SPACE(15)) + ; ACCTNO TO TEMP.NDX ------------------------------------------------------------------------ I need an index order where part of it is in ascending order and part is descending. How do I do this? ------------------------------------------------------------------------ It depends on what kind of field is needed in reverse order. One possibility is to use a udf() (see below). Here are a couple of others: A. Most of the index expression is in ascending order, but I also need the largest contributions at the top. An expression similar to this can be used: index on upper(department)+str(100000000-donation,10,0) tag donors What would a table in this order look like? Department Name Donation Accounting Jim Jones 100,000 Accounting John Doe 90,000 Accounting Susy Smith 5,000 Shipping Jeff Smith 10,000 Shipping John Brown 5,000 B. Most of the index expression is in ascending order, but I want the payments listed in reverse chronological order. That is, the most recent payment for a person should list first. One expression that can be used is this: index on upper(lastname+firstname) + str({12/31/99}-paydate, 5,0) ; tag payments (Note: This assumes American date format.) Of course, this would have to be modified as the year 2000 approaches. What does this calculation mean? It's the number of days between the payment date and the last day of 1999. When that is included in the index expression, then the most recent dates for a person will appear first, exactly what was requested! -------------------------------------------------------------------------- I have a complicated order that I need the data in. What's the easiest way to do this? -------------------------------------------------------------------------- In a word, udf()s. Udf()s can be used for index expressions or for the filtering portion of the index command. The only restriction on udfs in this regard is that the udf must be available _every_ time the table is opened. One place where this is especially useful is when part of the key needs to be descending and part ascending. Since the udf() can perform all kinds of manipulations, just about anything is possible. --------------------------------------------------- Are there any potential problems with using UNIQUE? --------------------------------------------------- The keyword UNIQUE can be handy for creating an index of unique values for a field in a table. Just be aware that there is a potential problem involved. Suppose you had a table with the following field values: 1. California 2. California 3. New Mexico 4. Tennessee 5. Tennessee 6. Virginia Indexing on this field UNIQUE will give you a list of: 1. California 3. New Mexico 4. Tennessee 6. Virginia Now, suppose, for whatever reason, that you need to delete record number 4. Since you "know" there are other Tennessee records, you would expect Tennessee to still show up in your index, wouldn't you? However, the index doesn't really know that there are other records with Tennessee on them, so, until the file is reindexed, Tennessee will _not_ appear in the index order. This may affect pop-ups, for example. This can be a major problem in network environments and/or with large files, where reindexing is difficult to do frequently. In this case, maintaining a separate table of your unique values may be preferable in the long run. ------------------------------------------------------------------------- I need to append a large amount of information. Is there any way to speed this process up? ------------------------------------------------------------------------- Quick tip: If you are appending a number of new records to an indexed table, you'll find that the operation is speeded up if you include the keyword REINDEX at the end of the command. For example: APPEND FROM tempfile SDF REINDEX If this keyword is missing, then Visual dBASE will update all the index tags in the production index file (or any open non-production files) as it appends each record. If the keyword is present, then Visual dBASE will append all the records, then reindex the entire table. In most cases, there will be a significant time difference between the two techniques. ------------------------------------------------------------------------- I need to replace a key field, but dBASE is only changing the first instance. What's going on and how can I get them all? ------------------------------------------------------------------------- If you are attempting to replace a key field for an active index using a REPLACE ALL or REPLACE WHILE or REPLACE FOR or as part of a SCAN..ENDSCAN construct or as part of a DO WHILE..ENDDO loop, you may find that only some of the key fields got changed. Here's one explanation of what's happening: Suppose you want to change all occurrences of customer code "5555" in an invoice file to "6789". So you SEEK the first occurrence of the old code and do a REPLACE REST custno WITH "6789" WHILE custno = "5555". Visual dBASE replace that first record, then looks at the index. Well, since the key field value has changed, the next record _in index order_ is no longer a "5555" record, so it stops. However, there are still records in the table with custno = "5555". How can you be sure you've gotten them all? Here's a couple of suggestions: 1. Suspend the index. Either turn the index off completely (SET ORDER TO) or activate a different index. Then REPLACE ALL .. FOR.. The disadvantage here is this: if you have a large file and are only replacing the key field for a small percentage of the records, this will take a lot more time than it needs to. 2. Use a technique like this: DO WHILE .NOT. SEEK(mCustno) REPLACE custno WITH newcustno ENDDO By repeating the SEEK, you're quickly finding the next record with the old customer number. This is a technique that I've used for several years. -------------------------------------------------------------------- DISCLAIMER: the author is a member of TeamB for dBASE, a group of volunteers who provide technical support for Borland on the DBASE and VDBASE forums on Compuserve. If you have questions regarding this .HOW document, or about dBASE/DOS or Visual dBASE, you can communicate directly with the author and TeamB in the appropriate forum on CIS. Technical support is not currently provided on the World-Wide Web, via the Internet or by private E-Mail on CIS by members of TeamB. .HOW files are created as a free service by members of TeamB to help users learn to use Visual dBASE more effectively. They are posted first on the Compuserve VDBASE forum, edited by both TeamB members and Borland Technical Support (to ensure quality), and then may be cross-posted to Borland's WWW Site. This .HOW file MAY NOT BE POSTED ELSEWHERE without the explicit permission of the author, who retains all rights to the document. Copyright 1995, Marilyn Price. All rights reserved. -------------------------------------------------------------------- EoHT: MLP