Welcome To The Home Of The Visual FoxPro Experts  
home. signup. forum. archives. search. google. articles. downloads. faq. members. weblogs. file info. rss. print.
MAKING RUSHMORE RUSH MORE

Introduction
Speed is an important factor in your application. The ability to be able to quickly find the record or records you need directly affects the usability of your application. In an application where you are only working with a few thousand records, you don't really need to worry too much about optimizing your application, but when you start working with tens of thousands or even millions of records, without optimizing your application, your users might find themselves waiting for your app to find records. Strangely, your users seem to think that their time is important and for some reason, waiting for a slow app tends to annoy them. The purpose of this article is to show you how to use Rushmore Optimisation to speed up your apps. The key to using Rushmore is to understand how it works. This allows you to make intelligent design decisions that will allow you to properly use Rushmore.






Looking at the Data
Before we begin looking at Rushmore, it is important to know a bit about how the data in your tables is stored on the disk. If you were to look at a table on your disk, you would not see columns and rows. Instead, you would see a long string of data. The first bunch of bytes would be the table header and if you were to take the time to decipher the header record, you would then be able to figure out the record width. Let's assume that your record width is 500 bytes. If you wish to look at the second record in that table, you would need to move the disk read head exactly 500 bytes from the end of the header record. To look at the 100th record, you would need to move the read head exactly 50,000 bytes from the end of the header record, and so on. Again, if you were to look at the table on your disk, you would also notice that the records in your table are not sorted in the primary key order. VFP adds new records by "tacking them on" to the end of the table. The first record in your table would be the first record added to the table and the last record would be the last record added. In other words, it is entirely possible that the first record in your table could be record ID one and the last record in your table could be record ID 2. If this is the case, moving from record ID 1 to record ID 2 means moving the read head the entire length of the table. If this is a large table, physically moving that read head is going to take some time. Remember, we are talking about mechanical speed here - SLOW!!!
Sorting the records as they are added simply takes far too much time. If you wanted to have your records stored in sorted order, as you add each record, you would need to use a temporary table to sort the records as they are added or modified. You would start by reading the existing table and you would compare your new record's key to that existing record. If the existing record's key is less than your new record's key, you would save that existing record to your temporary table and loop to the next record in your original table. You would then continue this loop until you find a record in your existing table with a larger record key then your new record. When you eventually find that record with a higher record key than your new record, you would then save the new record to your temporary table and then save your existing record. You would then need to loop through all of the remaining records in your original table and save them to your temporary table. You now delete your existing table, rename your temporary table using your original table's name and you would have a sorted table. Obviously, this is bound to take a fair amount of time and it gets worse - if you are going to add hundreds of records, you need to go through this sort procedure for each new record. In earlier versions of VFP there was a SORT command that did exactly this. Unfortunately the SORT command, when used on anything other than a very small table, was unbelievably slow. Sorting the records as you add them takes far too much time. This is why VFP simply adds new records to the end of the table.

Performing a Query in VFP
Before we look at Rushmore, we will look at how a query works without Rushmore. Note the following query:

SELECT * ;
	FROM MyTable ;
	WHERE cSomeField = lcSomeValue ;
	INTO CURSOR csrMyResults


Because the data in the table is not sorted, in order to select records that satisfy the WHERE condition, VFP must look at each record in turn. Records can be added in any order, so there is no logic that VFP can use to figure out if the next record might contain a record that meets the WHERE condition. The only way to do this is to actually read the record. Indeed, VFP must read ALL of the records in the table because any record might meet that WHERE criteria. It is important to note that with a VFP back end, the WHERE condition is evaluated at the workstation - not at the server. This means that for VFP to evaluate the WHERE condition, each record must be sent from the file server to the work station across the network. If you are working on a table with many fields and many records, shipping all of these records across the network can take a fair amount of time.



The key factors here are:

1) VFP must look at every record in the table and
2) Each record must be shipped across the network.

If Rushmore is to speed up this query, these are the two factors that must be dealt with. Rushmore does not actually look at the table. Instead, it looks at indexes.

Indexes
If we are to understand how Rushmore works, it is important to understand how indexes are constructed. Let's create an index that Rushmore might use in this query:

INDEX ON cSomeField TAG MyTag


VFP creates an index tag that contains an address reference to the record and a field reference to cSomeField. If you were to look at that index, you might see something that looks like this:

Address ReferenceField Reference 12345 Adams 32145 Brown ... ... 09814 Murphy 14356 Smith

As can be seen, the index is sorted in order of the field reference. The address reference refers to the address of the record in the actual table. Using the index, VFP knows that to find the record containing the "Murphy" record, it needs to go to the address represented by 09814 in the table.
It should be noted that an index has two characteristics that allow it to be used in optimizing your query. First of all, an index is small in comparison to the table itself. Let's assume that MyTable contains 50 columns for a total record size of 500 bytes. The index contains an Integer field address reference for 4 bytes and a field reference of 20 bytes for a total of 24 bytes. (This assumes that cSomeField is a 20 byte Character field. A VFP integer is 4 bytes in size.) In this example, the index is only 4.8% of the table size. On average, moving a record pointer from one record to another in the index will only take 4.8% of the time than it would in moving the record pointer from one record to the next in the table. Using the index, VFP need move the record pointer a much shorter distance.
The index may indeed be small enough that the whole thing can fit into memory. If VFP can fit the index into memory, then it can work much faster. When working with data on a disk, VFP must physically move the read head on the disk drive. Mechanically moving the read head is far slower than accessing things in memory.
Secondly, the index is sorted. This means that there is a logic that Rushmore can apply to see if the next record will meet the WHERE criteria. Rushmore uses a "B-Tree" algorithm over the index to determine if a record will meet the WHERE criteria. Rushmore begins by looking at the middle record in the index. Because that index is sorted, it can ask the question "is the field reference for this index record equal to, greater than or less than the value in the WHERE criteria?" If the field reference is less than the WHERE criteria value, then Rushmore knows that the next record cannot meet that WHERE condition. If the field reference for that index record is greater than the WHERE criteria value, then Rushmore knows that the next record might indeed meet that WHERE condition. In other words, by reading the middle record of the index, Rushmore is able to eliminate half of records in that table as possible matches. Rushmore then does the same thing recursively with the remaining possible matches. At read two, Rushmore will have eliminated 75% of the possible records. At read three, 87.5% of the records are eliminated. Rushmore continues to eliminate records until it runs out of records or until it finds an exact match. If it runs out of records, Rushmore returns false, meaning that there are no matching records. If it finds an exact match on any read, it returns true, meaning that a match has been found and therefore no further reads are needed. Indeed, if on the very first read, Rushmore finds that the index field reference is an exact match to the WHERE condition, it returns true and the search is complete.
You will remember that Rushmore needed to address two factors - reading every record and shipping all of those records across the network. By using an index, Rushmore needs only send the index across the network. It needs to send the entire index across the network, but because the index is only 4.8% of the size of the table, far less data must be shipped. By using a B-Tree algorithm, Rushmore need only read a small percentage of the index records. In the unlikely event that you have a very large a table with 130,000,000,000,000,000,000,000,000,000 records, Rushmore need only make a maximum of 100 index reads to determine if there is a match.

Where is Rushmore Used?
Rushmore is used to optimize a series of commands, including SELECT, AVERAGE, COUNT, SUM, BROWSE, COPY TO, REPLACE FOR, etc. (Check the help file for a complete list.) In most of these commands, Rushmore is used to apply a filter in a FOR condition. In the SELECT command, it can be used to apply a filter and as well, can be used in a JOIN condition. Note the following two queries:

SELECT Table1.SomeField, Table2.SomeOtherField ;
	FROM Table1 ;
	INNER JOIN Table2 ON Table1.FK = Table2.PK ;
	WHERE Table1.nSomeField = lnSomeField ;
	INTO CURSOR csrResults

SELECT Table1.SomeField, Table2.SomeOtherField ;
	FROM Table1, Table2 ;
	WHERE Table1.FK = Table2.PK AND ;
				Table1.nSomeField = lnSomeField ;
	INTO CURSOR csrResults


Both of these queries return the same results. As can be seen, a JOIN condition is treated as a sort of filter. In a JOIN condition however, Rushmore goes one step further. Rushmore only wants to loop through the larger table once. Remember, that once a match is found, Rushmore stops searching. It is likely that Rushmore will find the match in the smaller table much faster than it would in the larger table.

Optimizing your Query
Since we know that Rushmore uses indexes to optimize your query, you need to make sure that the correct indexes are in place. This naturally leads to the question "What indexes can Rushmore use?" Rushmore can use any index as long as the index is not filtered. You can create structural CDX indexes, non-structural CDX indexes or even simple IDX indexes and as long as these indexes are up-to-date and active, Rushmore will use them. For example,

INDEX ON MyTable.SomeField TO "Data\MyIndex.IDX"
INDEX ON SomeField TAG MyStructuralTag
INDEX ON SomeField TAG MyNonStructuralTag "Data\MyCompoundIndex.CDX"
SET INDEX TO "Data\MyIndex.IDX"
SET INDEX TO TAG MyStructuralTag ADDITIVE
SET INDEX TO TAG MyNonStructuralTag OF "Data\MyCompoundIndex.CDX" ADDITIVE


Not only will Rushmore use any type of index, it will use multiple indexes. You do not need to create a complex index over multiple fields. Instead, you can create a series of simple indexes over single fields:

*** Instead of
INDEX ON cField1+TRANSFORM(nField2) TAG ComplexTag
*** These are better
INDEX ON cField1 TAG Field1Tag
INDEX ON nField2 TAG Field2Tag


A series of simple indexes can be used by a number of different queries. That complex index can only be used for one purpose.
Rushmore CANNOT use filtered indexes however. If your index has a FOR clause, Rushmore will ignore it.

Adding Indexes - The Cost
Too easy! Create an index over every field in your table, don't include a FOR clause and your queries will always be optimized. Unfortunately there is a cost associated with creating indexes. Indexes ARE sorted. This means that when you append a record to your table, or when you modify a field that you have an index record over, you have to update the index. You will remember that process to sort records in a table - a similar process needs to be used when you update your indexes. The more indexes you have, the longer it will take to accomplish your update. Adding indexes to all of your fields is definitely NOT the way to go. Add only the indexes that you KNOW that you need. Properly defining your application before you start coding will ensure that you do know which indexes you need.

Rushmore and Deleted Records
When you issue a SET DELETED ON command, VFP automatically adds the equivalent of a "WHERE NOT DELETED()" or a "FOR NOT DELETED()" condition to your command or query. As with any other filter condition, Rushmore can use an index to optimize this "WERE NOT DELETED()" condition. As just discussed however, there is a cost associated with any index. If you only have a few deleted records in your table, optimizing this condition probably will probably not significantly affect the speed of the query. On the other hand, an index over DELETED() will have to be updated for every new record and each time you do delete a record. The decision to add an index over DELETED() will have to be made on a case by case basis. If you do decide to add an index over DELETED(), make sure it is a binary index:

INDEX ON DELETED() TAG DelTag BINARY


This will at least minimize the size of your index. In this index, you would have a four byte address reference and a one bit field reference for a total of only 33 bits (at 8 bits per byte.)

Full or Partial Optimization
Take a look at the help file on SYS(3054). This "handy dandy" little function will give you a report on how well Rushmore can optimize your query. Go to your command window and use SYS(3054) as follows:

SYS(3054,12) && Turn on SYS(3054)
SELECT ... INTO CURSOR MyCursor && Run your query
SYS(3054,0) && Turn SYS(3054) back off.


When NOT to use Rushmore
For the most part, Rushmore will almost always make your queries go faster. There are however, situations where Rushmore will actually slow the thing down. You will remember that Rushmore looks to see which table is the largest because it only wants to loop through the large table once. In a complex query that uses sub-queries and derived cursors, Rushmore sometimes gets this wrong. You end up cycling through the small derived cursor once and the larger cursor multiple times. Remember, those derived cursors don't have any indexes for Rushmore to use. When you find yourself in this sort of a situation, you can use the FORCE clause in your SELECT statement to ensure that sub-queries are evaluated in the order in which they appear, or you can break the query into a number of smaller queries where each of these small queries is fully optimized. A final query over the result cursors of these smaller queries serves to join those result cursors and is not optimizable, but by that point, you are only working with a small number of records so optimization of that final query does not really matter.

Conclusion
For those of you who have not yet tried using Rushmore, you are in for a bit of a treat. Your applications will run much faster. In JRR Tolkien's classic "The Lord of the Rings" Gandalf asks his horse Shadowfax to "Show us the meaning of haste." You can ask Rushmore to do the same.

ABOUT THE AUTHOR: KEN MURPHY

Ken Murphy Ken Murphy is a long time FoxPlus, FoxPro and Visual FoxPro developer who specializes in developing data-centric solutions for charities. Ken posts in a number of forums but can most often be found at www.foxite.com where he is also a site moderator. Ken was recently named as a Microsoft MVP (Visual FoxPro) 2007/2008. Ken is active in ministry and serves as a Roman Catholic chaplain at his local hospital, nursing home and at the medium security penitentiary Springhill Institution. Ken is currently taking courses towards a Masters degree in Theology and discerning a call to ordained ministry as a Roman Catholic Deacon. Ken and his wife Gail are active in their community and belong to a small theater group "The Pit Players" doing musical dinner theaters as fundraisers for local charities. In the past 5 years, the Pit Players have raised over $60,000 for these charities. Ken is the Musical Director (and sometimes actor) for the Pit Players. Ken lives in Springhill Nova Scotia, Canada - home of the famous singer Anne Murray, and site of the famous coal mining disasters of 1956 and 1958.

FEEDBACK

Nilson Rishi @ 1/6/2008 3:47:59 AM
Great, Ken.
From Nilson Rishi

Paul Gibson @ 1/10/2008 11:20:58 AM
Brilliant Ken, I have been a student of Rushmore for many years because of my constant battle to speed up a legacy application, I have not seen Rushmore explained so succinctly before.

It has also made me realise that I've been creating my DELETED indexes as Regular for many years and I should probably make them Binary for further improvements. Cheers.

Prasant T @ 1/12/2008 8:11:22 AM
Thanks Ken. This will be very usefull for me.

nyron williams @ 1/19/2008 11:27:35 PM
Wow!!
As Usual great knowledge being passed on by you Mr. Murphy. I can only hope to gain as much knowledge as you one day. Great post !!!

@ 1/31/2008 11:33:39 PM
A very well documented post, nothing but words of admiration. Thank you for sharing with us this knowledge.

Kayode Adeniyi @ 2/20/2008 11:55:20 AM
Thanks Ken for this wonderful eye-opener! Great Post.

As Oliver Twist that always ask for more, can we have a post from that focuses of MSSQL V2005. That is, can I optimise my SQL on SQLSERBER V2005 using VFP front-end.

Ken Murphy @ 2/20/2008 7:51:24 PM
Kayode,

Thankyou for the very kind words. Yes, you can use MSSQL 2K5 with VFP. All you need do is ensure that your queries meet the requirements of what ever breed of SQL that you are using and your ODBC drivers are compatible.

glenn nemesis @ 3/25/2008 6:18:01 AM
Wow, these are indeed very useful. Keep on posting new articles for us to read. Thanks, long life to you!

Josip Zohil @ 4/3/2008 12:16:25 PM
Thank you for sharing with us your knowledge about the speed factors in retrieving VFP data. I have some doubt about some claim in the article:
“You will remember that Rushmore needed to address two factors - reading every record and shipping all of those records across the network”. How can you prove this? May be, it is shipping only the selected fields in the query across the network? Statistically we can prove (see for example http://www.postuni.si/postuni/clanki/articles/vfpperformance/performance1.htm) that the query retrieval time increase with the size of the index tag and the number (size) of selected records and selected fields. In theory we need at the work station only the relevant branch of the index tree and the selected fields.

“It needs to send the entire index across the network, …”: Is this thru? The author also said: “…Rushmore need only read a small percentage of the index records…”. In the index (CDX) file there can be many index tags (between them for the query and rush more relevant and not relevant). Vfp send across the network only the relevant tag (or a branch) or all the tags? Statistically we can see (at least for large dbf files) that reducing the index file (for example 50%) by deleting for the query irrelevant tags, we reduce the retrieval time of the query (for example 10%). We do not know, if the query sends the entire index across the network? We retrieve, for example 10 records and we download the 50MB large CDX file!!! If this is true, this is not an optimal solution (from the point of view of the query retrieval time). It is optimal only from the point of view of rush more.

Ken Murphy @ 4/3/2008 12:41:57 PM
Only the required indexes need be sent, but the entire index (tag) needs to go. Remember, processing is done at the workstation. How will the workstation know which node to start with if it doesn't have the whole index?

The second part of the speed issue is building the result cursor. The bigger the cursor, the longer it is going to take.

@ 4/4/2008 10:56:49 AM
Thank you for the responses.
“Only the required indexes need be sent…” means that some processing is done at the server, not only at the workstation. “…but the entire index (tag) needs to go …”. The server has to separate the index tag from the CDX file, if entire index (tag) needs to go. This processing is done on the server. May be VFP done also other processing at the server (for example separate for the query relevant branch from the index tree or put the index in the cache)?

Yours claims are based on VFP documentation, experience, experiments or other sources?

“It needs to send the entire index across the network …” The basic question about this is:
We have a 50 MB CDX file with two tags of 2.5MB each involved in a query. Does VFP send across the wire less than 5MB, only 5 MB, all 50MB, something in the middle, or more than 50MB? The question is very important in planning the performances of the query and understand rush more.

exsoftindo @ 5/12/2008 9:20:28 AM
Thanks for your topic it's very help full for me as a junior VFP Programmer.

I'm looking forward for other topic that discuss the performance improvement in VFP programming over the network

Thank's Again

information austin @ 6/20/2008 8:18:42 PM
Great post! Learned alot. I can only wish that Ken was still between us and we could learn more and more.

Ankur Joshi @ 8/9/2009 1:18:10 PM
Today I read the post. And my confussions are solved out by now. A great article by a great knowledge sharing person.

MR GONZALEZ @ 6/21/2016 8:28:20 PM
POB 380160 bklyn ny 11238 I need a operation loan for seriver team for working open store for seriver as a bank call 9292042380 @FOXTICE thankyou 25.000 traing



Your Name: 
Your Feedback: 

Spam Protection:
Enter the code shown: