Solved: Query regarding using cursor and arfc - performanc...

former_member97170 · ‎11-16-2009

I was trying to use a cursor in order to perform some parallel processing in order to improve performance of a database select.

I opened a cursor in the main program

OPEN CURSOR WITH HOLD c FOR SELECT *

FROM BSEG

WHERE BUKRS IN BUKRS

and BELNR IN BELNR

and GJAHR in GJAHR.

And called an ARFC function module that processes the cursor using the fetch statement(similar to the statement below)

. In order t ensure that the cursor is still available, I also sent the cursor as a parameter to the RFC.

fetch NEXT CURSOR c into table it_bseg PACKAGE SIZE 3000.

Upon testing this program, it seems that the cursor becomes invalid/closed after the RFC is executed one time. I think it is because the end of RFC LUW implicitly closed the database cursor once it finished execution. Hence the second and subsequent calls to the RFC fails with the error that cursor is invalid/not open.

Do you know if there is any way to keep the cursor valid till the end of the program, till an explicit close is issued?

Most SAP documentation seems to indicate that there is no way to influence this. However,as cursor is a pure db function, I find it difficult to believe that it is not possible to control the cursor's validity manually rather than letting SAP LUW handle it.

Thanks in advance.

Best Regards,

Shrikant

Former Member · ‎11-17-2009

First off I can not help with the open cursor question but I would advise you to first look at the busness requirements that lead to a multiple select option based select against BSEG which is a cluster table. Can you use header tables to get the required data? At least use them to refine your requirements so that if you have to use BSEG data you have the specific key values needed?

Most if not all of the information in BSEG can be found in these header and index tables that are several orders of magnitude faster to pull data from.

BKPF document header table

BSID Accounting: Secondary Index for Customers(Open Items)

BSIK Accounting: Secondary Index for Vendors

BSIM Secondary Index, Documents for Material

BSIS Accounting: Secondary Index for G/L Accounts

BSAD Accounting: Secondary Index for Customers(Cleared Items)

BSAK Accounting: Secondary Index for Vendors(Cleared Items)

BSAS Accounting: Secondary Index for G/L Accounts(Cleared Items)

former_member194613 · ‎11-17-2009

> And called an ARFC function module that processes the cursor using the fetch statement(similar to the statement below)

> In order t ensure that the cursor is still available, I also sent the cursor as a parameter to the RFC.

You want to do the fetches of a SELECT in an asynchrous RFC ... I don't understand ... seems extremely weird, how is the result suppposed gto come back from the RFC???? An RFC has an rather large overhead and is not suitable for parts of a SELECT

Please rephrase youzr question, maybe it was only unclear.

former_member97170 · ‎11-17-2009

Hi,

My original issue was a long running select on a database table using an index range scan, and fetching more than 60 million records(the table has more than 500 million records). The execution plan was quite good and the average time per record was around 0.7 ms. This select is the first select to the database and contains a large number of values(select-options and single values) in the where clause.

I was hoping to parallel process this using rfc so that multiple selects could be done simultaneously as I was advised that there are free resources on the db side. I could not consider spawning out the rfcs without leting it come back as the result needed was a summation based on all the records. Hence, I have a table being passed back from the rfc to the main program.

Also, I did not want to just split the selections and execute them parallely because the dialog process time out in the system was quite low(12 mins) and some of the selections could run longer than that. Hence, I was looking for a procedure to parallel process the program in order to fetch data from a single table, and depend on something like a packet size to ensure that the rfc does not time out.

I do understand that there will be an additional burden of sending the data back to the calling transaction, but felt that this must still be faster than going to database and fetching records sequentially.

After speaking to a Oracle database person, I also heard that the database can also do some parallel processing in case of full table scans or in case of partitions. But this will not be applicable for this table the way it is, as it is not partitioned and the access is not full table scan.

Hence, I am looking for a good way to parallel process the database access somehow. I did test parallel processing without cursors to select data from a table through parallel rfcs, and it gave me some modest gains(30%). But the system was only a test system and didnt have as much data as I would have liked.

Thanks in advance.

Edited by: Shrikant Varma on Nov 17, 2009 7:58 PM

Edited by: Shrikant Varma on Nov 17, 2009 7:59 PM

former_member192616 · ‎11-18-2009

Hi,

try to build packages based on the key

MANDT

BUKRS

BELNR

GJAHR

you can use BELNR for making your "packages" e.g. BELNR BETWEEN v1 AND v2 .

Implement the ranges (between) as an parameter so that you are

able to change the "package size" easily without changing the program

in order to find out which packagesize is good for your system.

Send the parameters (between and the other values for the variables) to the ARFC's, so that they start independent

selects with theese parameters.

Talk with a basis and/or database administrator regaring the DOP (degree of parallelism) since such

a parallelize program may put your I/O and/or your CPU subsystem under heavy load.

Kind regards,

Hermann

Former Member · ‎11-17-2009

First off I can not help with the open cursor question but I would advise you to first look at the busness requirements that lead to a multiple select option based select against BSEG which is a cluster table. Can you use header tables to get the required data? At least use them to refine your requirements so that if you have to use BSEG data you have the specific key values needed?

Most if not all of the information in BSEG can be found in these header and index tables that are several orders of magnitude faster to pull data from.

BKPF document header table

BSID Accounting: Secondary Index for Customers(Open Items)

BSIK Accounting: Secondary Index for Vendors

BSIM Secondary Index, Documents for Material

BSIS Accounting: Secondary Index for G/L Accounts

BSAD Accounting: Secondary Index for Customers(Cleared Items)

BSAK Accounting: Secondary Index for Vendors(Cleared Items)

BSAS Accounting: Secondary Index for G/L Accounts(Cleared Items)

former_member97170 · ‎11-17-2009

Hi James,

Thanks for your reply.

Actually, I am well aware of these tables. BSEG was just an example that I used because I am very familiar with FI. In this case, I wanted to test the program, and then apply it to another table as well(GLPCA).

So , here I am only looking at ways to technically improve performance and use the concept to parallel process select queries, if possible.

former_member194613 · ‎11-18-2009

still very weird.

If you want to parallel the process then you must create non-overlapping select statements and process them in separate processes. This should be possible with a controlling process.

The split of the open and the fetch is still not necessary.

And you should think about archiving, 500mio records in one table seems to be a lot.

former_member97170 · ‎11-20-2009

Hi Hermann,

Thanks for the reply. Yes, I could consider using belnr selection to reduce the size of the selection. This will again be a selection more based on knowledge of application(of course with fine tuning of packet size).

But then this approach will not be as efficient and as easy as letting database take control of the iteration of this huge dataset - as is intended by the use of cursors with the help of packet size.

I liked the idea that the databases themselves have some parallel processing potential/optionsto fetch data if partitions are available or the table is performing a full table scan. As I am also mainly trying to introduce parallel processing to fetch data from database in this case(and there is no big ABAP runtime in this case), maybe the solution for this has to come from database itself and not from ABAP - as the database in this case is fully in control of our requirements.

former_member192616 · ‎11-20-2009

Hi Shrikant,

>


> Yes, I could consider using belnr selection to reduce the size of the selection. This will again be a selection more 
> based on knowledge of application(of course with fine tuning of packet size).
>  But then this approach will not be as efficient and as easy as letting database take control of the iteration of this 
> huge dataset - as is intended by the use of cursors with the help of packet size.
>

I'm not sure what you mean here: "let the database contorl the iteration of this huge dataset - with cursors".

Generally speaking you have 2 options when it comes to parallel processing:

Parallelize processes in the ABAP stack. These processes access different, non-overlaping, parts of your restult set

in parallel. Using the BELNR could be an approach to build different, non-overlapping paackages.

Database side parallelization. This is possible for every action that uses multiblock I/O (full table scans and index fast full scans) and for partitioned segments. For non-partitioned segments and single block I/O database side parallelization is not possible.

So, generally speaking, you have 3 options:

1.) parallelize in ABAP and build idependent packages

2.) force the DB to do multiblock I/O (full table scan, index fast full scan) in parallel

3.) partition your table and indexes.

I think (it's just my opinion) for an ERP (not BW / BI) system

- option 3 is not a good option for OLTP systems (several reasons). I see it very, very, very rarely or close to never in OLTP systems

- option 2 may read (much) more data then you would need to read, although a very efficient I/O method would be used (multibock I/O)

- option 1 requires programming in ABAP but will read exactly the data you need to read and can be controlled very well in the ABAP stack (parallel single block I/O will be used)

In alamost all cases (that are similar to your case) i have see option 1 was used.

Kind regards,

Hermann

former_member97170 · ‎11-20-2009

Hi Hermann,

Thanks for highlighting different options.

What I meant by "let the database control the iteration of this huge dataset - with cursors" is regarding the cursor...open and fetch commands. My understanding(I could be wrong here) is that the cursor syntax in ABAP is translated into the database specific sql(native sql) cursor command. So here database would keep the actual cursor on the table, and keep sending out chunks of rows when a fetch is executed based on the package size.

While specifying the document numbers, I will have to use logic to determine the from and to numbers for each package, and also ensure that the document numbers entered are valid(if I am using autoincrement to find the to and from range each time). I could also use the number range information from NRIV for this(so that select is not made to nonexistant document numbers)

But in general, I think when application decides the ranges..there is still less uniformity in packages ;for example..if some document numbers have too many lines, or some ranges do not have document numbers.

If there is no other way, then I will have to consider something like this which incorporates logic to determine the packets each time. I do completely agree with you that this is a very valid approach, and allows non-overlapping packages, but I felt I could do it easier and better with database cursors.

Thats why I was hoping to get the cursor feature to work with rfc to make things more simple, and let database handle the cursor and the package size - while I just concentrate on parallellizing with parallel processes on abap side. Also this would have been a more general approach to many tables, not just this one without worrying what fields are most selective for making packages.

Also, can you help me understand why you think option 3 is a bad idea in OLTP, just out of curiosity. I have seen table partitions in OLTP systems (not too frequently, but still not very few). But I always felt that partitioning is a good idea in OLTP if tables are too huge and if it is based on some condition like fiscal year, company code etc - mostly because access & changes to previous years is not expected that much in comparison to the current years- so it helps to keep the b-trees for current year small. I definitely agree that using a random condition to do partitioning is a bad idea...as selects could now go across different partitions while accessing/changing data.

former_member192616 · ‎11-21-2009

Hi Shrikant,

ok, i see what you are trying to achieve. To be honest i never thought about that or have seen such an implementation and don't know if it is possible.

Regarding parallelization i have only seen parallelization in the database itself (1 sap process connected one database process which starts many childs working on the task in parallel) or in the appplication logic (many sap processes work on independent packages each sap process with one database process).

Regarding partitioning. My statement from yesterday was maybe too general. Of course it depends. I meant in a typcial OLTP where many concurrent users read rather small resultsets partitioning will probably not help much for performance. But even then there can be good reasons for partitioning when it comes to contention for ressources due to the concurrency in the database. And here sometimes random approaches (hash partitions) are used to distribute the concurrent users to different ressources. But i have seen very rarely such cases. Other reasons for partitioning could be administrative reasons to keep big data volumes managable. In general i don't see it too often.

Back to your problem, regarding the parallel fetch i don't know if this could work. Therefore i think you have to go for database or application parallelization. And as Siegfried said, thinking about archiving might be a good idea as well. But that is probably no short term option i guess. So short term i think you are left with parallelization on db or application level.

Kind regards,

Hermann

former_member97170 · ‎11-24-2009

Hi Hermann & Siegfried,

Thank you for the reply.

Archiving , data avoidance, summarization etc are being considered for the long term.The customer has only 4 years of data , so at best they will only be able to archive 2 years of data.

Best Regards

Shrikant

Query regarding using cursor and arfc - performance