Welcome To The Home Of The Visual FoxPro Experts  
home. signup. forum. archives. search. google. articles. downloads. faq. members. weblogs. file info. rss.
 From: Cetin Basoz
  Where is Cetin Basoz?
 Izmir
 Turkey
 Cetin Basoz
 To: Jim Carls
  Where is Jim Carls?
 Nashville
 Tennessee - United States
 Jim Carls
 Tags
Subject: RE: Slow SELECT with subquery
Thread ID: 288745 Message ID: 288854 # Views: 32 # Ratings: 2
Version: Visual FoxPro 9 SP2 Category: Databases, Tables and SQL Server
Date: Friday, December 24, 2010 4:16:30 PM         
   


> I have a form, in a construction project oriented application, that displays a list of purchase orders in a grid. It's based on a view that includes a subquery that SUM()s the total cost and sell prices based on values in an order item and order item split tables (the parent child structure is Orders-> Items-> Item Splits). The problem I'm having is that the query is slow (20 secs or more) when the number of orders in the system gets into the tens of thousands. Although this might be expected when displaying all the orders in the system, it is just as slow when displaying a handful. I structured the view so that any replaceable conditions are in the subquery, which returns just the order and project keys in addition to the summed field values and that is inner joined to the rest of the view's tables. In order to implement a querying tool using replaceable conditions, the subquery includes joins for all the tables that might be referenced in a condition.
>
> I was thinking that having the conditions in the subquery would quickly sum only the records needed to join with the other tables supplying fields, but this has had no effect on the speed. I tested removing the macro expanded conditions and using a simple "project key equals variable" condition in the subquery, but if any speed was gained, it was minimal.
>
> I'm pretty sure the problem is in the subquery and its SUM() functions, because another form that displays all the order items in a similar grid (instead of just the order headers) shows no lag on the same data set, using the same set of tables that might be referenced in a query (and in fact, no other form in the system shows this kind of lag). The keys that connect the three tables, PO_Key and PO_Itm_Key, are all represented by index tags in the appropriate tables and I've included a deleted() tag in all three. Is there a way to structure the SQL to produce a faster result? My alternative seems to be to split the view into two, with the subquery portion called manually as an option to repopulate the totals (or I could add "total" fields to the order structure, which would be a real pain to maintain).
>
> I'm hoping there is simply some general SQL structure approach that I've missed, but I've posted the code below (slightly edited to remove unnecessary columns and for readability). Any guidance here would be appreciated. Thanks!
>
>
>
> CREATE SQL VIEW "lvOrderList" ;
> 	AS SELECT DISTINCT Vendors.VNDR_ID, Vendors.V_NAME, ;
> 	Orders.po_key, Orders.po_id,  ...more fields... ;
> 	Orders.closed,;
> 	Orders.voided, Orders.marked,;
> 	PADR(IIF(!EMPTY(Orders.voided), "VOIDED", ;
> 		IIF(Orders.closed, "CLOSED", ;
> 		IIF(!EMPTY(Orders.issued), "ISSUED", ;
> 		IIF(!EMPTY(Orders.quoted), "QUOTED", SPACE(8))))), 8) AS STATUS, ;
> 	IIF(!EMPTY(Orders.voided), Orders.voided, ;
> 		IIF(Orders.closed, CTOD(" / / "), ;
> 		IIF(!EMPTY(Orders.issued), Orders.issued, ;
> 		IIF(!EMPTY(Orders.quoted), Orders.quoted, CTOD(" / / "))))) AS STATUSDATE, ;
> 	IIF(Orders.ship_opt = 3, CONS_VNDR.VNDR_ID, ;
> 		IIF(Orders.ship_opt = 2, ship_vndr.VNDR_ID, ;
> 		IIF(Orders.addr_opt = 3, "ALTERNATE ", ;
> 		IIF(Orders.addr_opt = 2, "PROJECT   ", "CLIENT    ")))) AS SHIPDEST, ;
> 	(Orders.reqdeposit + Orders.reqdeptax + Orders.reqdepfrt) - Orders.deposit AS DEPOSITOUT, ;
> 	Clients.CLNT_ID, ;
> 	NVL(Projects.PROJ_ID, SPACE(10)) AS "PROJ_ID", ;
> 	CAST(NVL(OrderSummary.PO_COST, 0)  AS N(12,4)) AS PO_COST, ;
> 	CAST(NVL(OrderSummary.PO_SELL, 0)  AS N(12,4)) AS PO_SELL ;
> 	FROM (SELECT Orders.po_key, ;
> 		Orders.proj_key, ;
> 		CAST(SUM(NVL(IIF(!EMPTY(ItemSplits.Qty_Ship), ItemSplits.Qty_Ship, ItemSplits.Split_QTY), 0) * NVL(Items.ORDERCOST, 0)) AS N(12,4) NOT NULL ) AS PO_COST, ;
> 		CAST(SUM(NVL(IIF(!EMPTY(ItemSplits.Qty_Ship), ItemSplits.Qty_Ship, ItemSplits.Split_QTY) - ;
> 		IIF(!ItemSplits.Bill_Var, ItemSplits.Qty_Var, 0), 0) * NVL(Items.ORDERSELL, 0)) ;
> 		AS N(12,4) NOT NULL ) AS PO_SELL ;
> 		FROM FFEZ!Orders JOIN FFEZ!Vendors ON Vendors.vndr_key = Orders.vndr_key ;
> 		JOIN FFEZ!Clients ON Orders.clnt_key = Clients.clnt_key ;
> 		LEFT JOIN FFEZ!Projects ON Projects.proj_key = Orders.proj_key ;
> 		LEFT JOIN FFEZ!Items ON Orders.po_key = Items.po_key ;
> 		LEFT JOIN FFEZ!ItemSplits ON Items.PO_ITM_KEY = ItemSplits.PO_ITM_KEY ;
> 		LEFT JOIN FFEZ!Vendors AS ShipVendor ON Orders.ship_vndr = ShipVendor.vndr_key ;
> 		LEFT JOIN FFEZ!Vendors AS ConsVendor ON Orders.cons_key = ConsVendor.vndr_key ;
> 		LEFT JOIN FFEZ!Invoices ON ItemSplits.inv_key = Invoices.inv_key ;
> 		LEFT JOIN FFEZ!ChkReqs ON ItemSplits.requestkey = ChkReqs.requestkey ;
> 		WHERE (&?gcActiveFilter) .AND. &?GCGROUPFILTER .AND. EMPTY(ItemSplits.Split_ID) ;
> 		GROUP BY Orders.po_key, Orders.proj_key) AS OrderSummary ;
> 	JOIN FFEZ!Orders ON Orders.po_key = OrderSummary.po_key ;
> 	JOIN FFEZ!Vendors ON Vendors.vndr_key = Orders.vndr_key ;
> 	JOIN FFEZ!Clients ON Orders.clnt_key = Clients.clnt_key ;
> 	LEFT JOIN FFEZ!Projects ON Projects.proj_key = Orders.proj_key ;
> 	LEFT JOIN FFEZ!Vendors AS ship_vndr ON Orders.ship_vndr = ship_vndr.vndr_key ;
> 	LEFT JOIN FFEZ!Vendors AS CONS_VNDR ON Orders.cons_key = CONS_VNDR.vndr_key ;
> 	ORDER BY Orders.po_id, Orders.po_co_id
> 
> 


Since all fields are not there I am not sure if there were a need to include all those joins. With the partial field names looks like most of them are not needed. I didn't understand the purpose of joins in the subquery. Wouldn't it be just EXISTS subqueries instead of joins? Also I find it rather susceptible when a query has an aggregation operator like Sum() and yet contains joins. Maybe you should first do the query and sum() later or the other way around? That subquery may be something like:

SELECT sumSet.po_key, sumSet.proj_key, ... ;
from (;
SELECT Orders.po_key, ;
		Orders.proj_key, ;
ItemSplits.Qty_Ship, ItemSplits.Split_QTY, Items.ORDERCOST, ;
ItemSplits.Bill_Var, ItemSplits.Qty_Var,Items.ORDERSELL ;
FROM FFEZ!Orders ;
 LEFT JOIN FFEZ!Items ON Orders.po_key = Items.po_key ;
 LEFT JOIN FFEZ!ItemSplits ON Items.PO_ITM_KEY = ItemSplits.PO_ITM_KEY ;
where ;
exists (select 1 from FFEZ!Vendors where Vendors.vndr_key = Orders.vndr_key) and ;
exists (select 1 from FFEZ!Clients where Orders.clnt_key = Clients.clnt_key) and ;
ItemSplits.Split_ID = 0 and ;
(&?gcActiveFilter) AND &?GCGROUPFILTER ;
) sumSet ;
	GROUP BY sumSet.po_key, sumSet.proj_key


You are using Empty(ItemSplits.Split_ID) which wouldn't be optimized. Instead use something like:

ItemSplits.Split_ID == ''
ItemSplits.Split_ID = 0

depending on type. OTOH I didn't understand the purpose of that filtering:

* ...
 LEFT JOIN FFEZ!ItemSplits ON Items.PO_ITM_KEY = ItemSplits.PO_ITM_KEY ;
*...
 WHERE ... EMPTY(ItemSplits.Split_ID) ;


You accept nulls but filter out empty ones??? Maybe it is the way you need it but sounded like a dangerous filter to me.

Since it is unlikely that you would use such a view for updating, do you ever need to use a view? If you do not then you could use multiple successive SQLs which might be much more optimized (divide and conquer) instead of trying to get all the data, create a big cursor and then apply the really slow 'distinct' operator.

There is a chance that with multiple SQLs this might be very quick. I would say do not completely trust into Rushmore optimization either. Sometimes a query showing full optimization is in fact not optimized at all (rather put the data on network and keep an eye on network traffic, try different SQLs).

Also if this is important enough, you could create a matching model on SQL server express, run the query and check execution plans. The plans may not match how VFP executes but gives idea at least on optimizing.

Cetin Basoz

.Net has got better.Think about moving - check my blog:
Blog (main)
Blog (mirror)

ENTIRE THREAD

Slow SELECT with subquery Posted by Jim Carls @ 12/23/2010 4:06:07 PM
RE: Slow SELECT with subquery Posted by Tamar Granor @ 12/23/2010 10:04:55 PM
RE: Slow SELECT with subquery Posted by Jim Carls @ 12/23/2010 11:59:20 PM
RE: Slow SELECT with subquery Posted by Tamar Granor @ 12/24/2010 10:38:28 PM
RE: Slow SELECT with subquery Posted by Ugur YILMAZ @ 12/24/2010 1:05:01 PM
RE: Slow SELECT with subquery Posted by Cetin Basoz @ 12/24/2010 4:16:30 PM
RE: Slow SELECT with subquery Posted by Jim Carls @ 12/24/2010 8:34:58 PM