How to compare first row with second row in SQL

  • Hai Mathi,Please take all table values into array(it is best to take user defined,bczu don't know the no of rows in a particular table).and when you r saving into a table just do the followingthis is from second row to end rows array(i).value=array(i).value - array(i-1).valueand for the first row array(0).value=0thank's' & regardsB.Ravindhar ReddyTechnical Member(P)+91-040-23750526 / 0799 / 1307(F)+91-040-23751018(M)09885130218

  • Hi Mathi,I think you need to use the lag function.Here's an example:SQL> create table tab1(id number, val number);Table created.SQL> insert into tab1 values (1,500);1 row created.SQL> insert into tab1 values (2,800);1 row created.SQL> insert into tab1 values (3,1000);1 row created.SQL> insert into tab1 values (4,2000);1 row created.SQL> insert into tab1 values (5,4500);1 row created.SQL> select * from tab1 order by 1; ID VAL--------- --------- 1 500 2 800 3 1000 4 2000 5 4500SQL> select id, val, val- lag(val,1) over(order by id) diff from tab1; ID VAL DIFF--------- --------- --------- 1 500 2 800 300 3 1000 200 4 2000 1000 5 4500 2500=20Regards,Joao Bello

  • Please make use of the LAG, an analytical function. Have a look at the following example:select ename, sal, sal- LAG(sal,1) over(order by ename) SALDIFF from EMP;Albert (+263 91 222 278)"If you think you are leading yet no-one is following you then you are merely taking a walk." -Chinese Proverb

    "csmathi via oracle-dev-l" 01/06/2004 05:28 AMPlease respond to

    To

    "asamatand..." ccSubject[oracle-dev-l] How to take Difference between second and first row and the result in second row using SQL

    # View Group Archive: http://ITtoolbox.com/hrd.asp?i=821

    Hi I have the following data in a tableUnit Pord CAT EXCH REPORTED OP_STK PROC BOOKING11 CF CR LIFFE 23-SEP-03 5946 1061 700711 CF CR LIFFE 26-SEP-03 5946 1177 712311 CF CR LIFFE 29-SEP-03 5946 1243 718911 CF CR LIFFE 30-SEP-03 5946 1251.5 6526.711 CF CR LIFFE 01-OCT-03 5946 1259 7205My Requirement is,i need the difference between second and first row(eg Booking 7123-7007), third and second row(7189-7123) and so on. I need result as follows by using sqlUnit Pord CAT EXCH REPORTED OP_STK PROC BOOKING DIFF 11 CF CR LIFFE 23-SEP-03 5946 1061 7007 011 CF CR LIFFE 26-SEP-03 5946 1177 7123 11611 CF CR LIFFE 29-SEP-03 5946 1243 7189 6611 CF CR LIFFE 30-SEP-03 5946 1251.5 6526.7 -96211 CF CR LIFFE 01-OCT-03 5946 1259 7205 678.3I believe by uing analytical function we can achieve the above result but i never used any analytical function so far. I would approiciate if someone can help me to solve this issue.Thank & RegardsMathi

    *Archives: http://www.OpenITx.com/g/oracle-dev-l.asp


    *Manage Subscriptions: http://My.ITtoolbox.com
    *Leave Group: mailto:leave-
    *Need Subscription Help? mailto:
    *Terms of Use: http://www.ittoolbox.com/help/termsofuse.htm*Copyright (c) ITtoolbox and message author. No redistribution.

  • Hi,How it can be achieved on 805?Regards,Younus

  • Unit Pord CAT EXCH REPORTED OP_STK PROC BOOKING DIFF11 CF CR LIFFE 29-SEP-03 5946 1243 7189 6611 CF CR LIFFE 30-SEP-03 5946 1251.5 6526.7 -96211 CF CR LIFFE 01-OCT-03 5946 1259 7205 678.3Describe these these 3 rows where diff is not correct.What is ur requirementregardsshobit gupta(SHOBIT GUPTA)

  • Csmathi, Yes, analytics can do it, try using lead().select Unit,=20 Pord,=20 CAT,=20 EXCH, =20 REPORTED, =20 OP_STK, =20 PROC, =20 BOOKING, nvl(lead(BOOKING) over(order by BOOKING),0) next_booking, nvl(lead(BOOKING) over(order by BOOKING ),0)-BOOKING diff from your_table; - ant

  • Hi Friends,Thanks you very much for your responses. Joao Bello query which= uses LAG analytical function was working excellent and that is= what i want. Thanks Joao Bello. Antony's query which uses LEAD= function also giving the same result but the result should start= from second row onwards. Anyway thanks Antony.Thanks you for all of you.Mathi

  • hi mathitry this sqlSELECT Unit,Pord,CAT,EXCH,REPORTED,OP_STK,PROC, A.BOOKING,DECODE(A.R1,0,0,B.booking-A.booking) diffFROM(SELECT Unit,Pord,CAT,EXCH,REPORTED,OP_STK,PROC,

    BOOKING,rownum RNUM FROM

    ) A,
    (SELECT BOOKING,rownum-1 RNUM FROM
    ) BWHERE A.R1 = B.R1hope this will solve ur problemregardsprabs--- "Gupta, Shobit via oracle-dev-l"

    wrote:> # View Group Archive:

    > http://ITtoolbox.com/hrd.asp?i=821

    > > Unit Pord CAT EXCH REPORTED OP_STK PROC BOOKING> DIFF> 11 CF CR LIFFE 29-SEP-03 5946 1243 7189> 66> 11 CF CR LIFFE 30-SEP-03 5946 1251.5 6526.7> -962> 11 CF CR LIFFE 01-OCT-03 5946 1259 7205> 678.3> > Describe these these 3 rows where diff is not> correct.> What is ur requirement> > regards> shobit gupta> (SHOBIT GUPTA)

    >

    > >

    >

  • The first suggestion was the best - use analytic functions...Unit Pord CAT EXCH REPORTED OP_STK PROC BOOKING 11 CF CR LIFFE 23-SEP-03 5946 1061 7007 11 CF CR LIFFE 26-SEP-03 5946 1177 7123 11 CF CR LIFFE 29-SEP-03 5946 1243 7189 11 CF CR LIFFE 30-SEP-03 5946 1251.5 6526.7 11 CF CR LIFFE 01-OCT-03 5946 1259 7205select unit,pord,cat,exch,reported,op_stk,proc, booking,LAG (booking, 1) over (order by reported) prev_booking from yourtabThis will return as "prev_booking" the previous row's value of "booking" (1back because of the 1), when ordered by the column "reported" (over theentire result set).Look in the documentation for LAG (or LEAD to look for the next record)Dan

  • Hi Use Analytical functionsRegards-chandra sekhar--- prabhath rao via oracle-dev-l

    wrote:> # ASP Tools for Software Development> # Read Paper:

    > http://oracle.ITtoolbox.com/r/hdr.asp?r=16563

    > > # View Group Archive:

    > http://ITtoolbox.com/hrd.asp?i=821

    > > hi mathi> > try this sql> > SELECT Unit,Pord,CAT,EXCH,REPORTED,OP_STK,PROC, > A.BOOKING,DECODE(A.R1,0,0,B.booking-A.booking) diff> FROM> (SELECT Unit,Pord,CAT,EXCH,REPORTED,OP_STK,PROC,

    > BOOKING,rownum RNUM FROM

    ) A,
    > (SELECT BOOKING,rownum-1 RNUM FROM
    ) B> WHERE A.R1 = B.R1> > hope this will solve ur problem> > regards> > prabs> --- "Gupta, Shobit via oracle-dev-l"

    > wrote:> > # View Group Archive:

    > > http://ITtoolbox.com/hrd.asp?i=821

    > > > > Unit Pord CAT EXCH REPORTED OP_STK PROC > BOOKING> > DIFF> > 11 CF CR LIFFE 29-SEP-03 5946 1243 > 7189> > 66> > 11 CF CR LIFFE 30-SEP-03 5946 1251.5 > 6526.7> > -962> > 11 CF CR LIFFE 01-OCT-03 5946 1259 > 7205> > 678.3> > > > Describe these these 3 rows where diff is not> > correct.> > What is ur requirement> > > > regards> > shobit gupta> > (SHOBIT GUPTA)

    > >

    > > > >

    > >

Query optimization that dramatically reduces runtime for queries which use window functions.

The Simplified Problem

A common problem we see our customers solving in SQL is how to compare two rows per group in a dataset.  For example, say you have the following data in the product_prices table:

Product updated_on Price
widget_A 2019-07-03 12:00:00 $105
widget_A 2019-07-03 11:00:00 $110
widget_A 2019-07-03 10:00:00 $100
widget_B 2019-07-02 06:00:00 $255
widget_B 2019-07-02 05:00:00 $235

How do you efficiently SELECT the price change at each updated_on time for each Product?  I.e. we’d like to see a result set of:

Product Most recent updated time Price change
widget_A 2019-07-03 12:00:00 -$5
widget_A 2019-07-03 11:00:00 $10
widget_B 2019-07-02 06:00:00 $20

This type of problem arises in a lot of domains.  Another common example of this problem occurs when you want to compare the time difference between subsequent events. For example, you may have a table that logs timestamps for each page visit of a user to your website and you want to do an analysis on the time difference between visits.

A common but sub-optimal way we see customers solve this problem is by using the ROW_NUMBER() window function together with a self join.  The algorithm is straightforward: first select all your product prices and order them within each product by updated_on using the ROW_NUMBER() window function.  Then self join on the table, mapping each row to the row preceding it.  In SQL:

         row_number() over (partition by product order by updated_on desc) as row_number

from ordered_prices op1 join ordered_prices op2

  on op1.product=op2.product and op1.row_number = op2.row_number+1

This query produces the desired result set, but at the cost of a join.  For real-world (i.e. less contrived) examples, solutions like this also often need to incorporate additional sub-selects or other expensive operations which compound the effect of the join and can make the query run excruciatingly slow.

This class of problem can manifest itself in other ways as well: sometimes instead of a self join, the query may just select from ordered_prices where row_number=1 to get the most recent updated_on and price.  But the general pattern is the same: using ROW_NUMBER() to label rows for further processing/joining.

A Better Solution

If you’re using ROW_NUMBER() in a query just to apply filters or joins later on in your transformation, there may be a better way.  Rather than co-opting ROW_NUMBER() to compare neighboring rows, a better solution is to apply a different window function that was designed to solve this problem:  LAG().  The LAG window function returns the value from a row offset from the current row.  The resulting SQL becomes:

       lag(price) over (partition by product order by updated_on desc) - price as price_change

This SQL says that we should order the rows within each product by updated_on, and take the difference of the price from the current row with the price of the following row within that group.  This eliminates the need for the join and can greatly simplify the query plan. One note: this result set will include NULL rows at the start of each product (i.e. where there is no previous row), but these can be trivially eliminated with a WHERE clause of price_change is not null when the results are used (or this query can be wrapped in a subselect to remove those rows).

A Real-World Example

The seed for this blog post was planted when a potential customer came to us with the same question we hear all the time:  “Why are our queries slow?”  After instrumenting their cluster with Integrate.io they were able to quickly identify their worst queries and find out why they are slow and what to do about it.  In their case, one of the worst offending queries used this ROW_NUMBER() pattern and triggered a large number of Query Recommendations in their Integrate.io Dashboard, including:

How to compare first row with second row in SQL

Since the query exhibited the ROW_NUMBER() pattern above, it caused multiple unnecessary table scans, an expensive join, significant IO skew, and a large amount of intermediate data to be written to disk during execution.  The query operated over a single table which had 876 million rows–not a huge table, but large enough to bring Amazon Redshift to its knees if not optimized properly–and was taking over 5.5 hours to execute!

After reviewing our Recommendations, they reimplemented their query using the LAG function to eliminate the join (which greatly simplified their query by removing two additional sub-selects), and the query execution time dropped to 30 seconds.  From 5.5 hours to 30 seconds simply by reviewing targeted Query Recommendations and implementing a straightforward solution–talk about a good ROI on time spent optimizing SQL.  Their exact feedback after seeing the new query runtimes? “OMFG. takes 30s. Wow”

Lessons Learned

Amazon Redshift is a fantastic general-purpose data warehouse. But since it is so easy to add data and new workloads to your cluster, it can be very hard to know where and how to start optimizing SQL queries. 

Analysts and data engineers often wonder which queries are the worst offenders and where limited SQL tuning effort should go to give the biggest bang-for-the-buck. Knowing where to put query tuning effort is crucial since seemingly simple-looking queries can sometimes have inefficiencies lurking which can compound to slow a query by several orders of magnitude. 

The good news: this is the problem that Integrate.io Query Recommendations was designed to solve. We help you pinpoint the root cause of query slowness so that you can spend your limited time optimizing the right queries in the right way.