cancel
Showing results for 
Search instead for 
Did you mean: 

ModelService remove() vs removeAll()

Former Member

Hi all, I have a requirement to remove all the expired price rows from the database using a cron job.
What I'm currently doing is fetching the list of such PriceRowModels and passing it to modelService.removeAll() function. The count of the models is 1,300,000+. The job ran for more than 2 hours and had to be aborted and the change in count was only a mere 4.

i.e. Before : 1,300,004
After: 1,300,000

Now, I kept on querying while the job was running and the count didn't change at all. I've also tried this by bringing the total count down to 83,000+ but still the same issue.

Any idea why is this happening?

Also, would it be better to just iterate the list of models and use the remove() function for one model at a time?

Thanks.

Accepted Solutions (0)

Answers (6)

Answers (6)

agrabovskis
Participant

The reported behaviour was due to transactions: if you remove data using removeAll(), then all the data will be removed in one transaction, and you won't see the changes outside of the running transaction. That puts a lot of pressure on DB in case of big transactions (I would strongly suggest not to affect more than 10k rows within transaction). What you should have done is fetch a batch of models to be removed, pass it to removeAll(), commit the transaction, and start over. The batch size to use is between 100-1000.

Note that Groovy scripts in HAC is executed in explicit transaction, hence effectively there's no difference between remove() and removeAll(). If you're familiar with transaction API in Hybris, you should have no problems committing current transaction.

The impex approach might work because each line is processed in separate transaction. Note that batch deletion via impex might suffer the same issue as the script did.

former_member633554
Active Participant
0 Kudos

For groovy script jobs that need to clean a lot of data, remove is safer. I find removeAll will often give GC memory and other errors.

This is safe, looking at the db one line at a time is removed.

flexibleSearchService = spring.getBean "flexibleSearchService"


FlexibleSearchQuery taskQuery = new FlexibleSearchQuery("select {pk} from {ProcessTaskLog join BusinessProcess as bp on {bp.pk}={ProcessTaskLog.process} } where  {bp.modifiedTime} < DATE_SUB(NOW(), INTERVAL 1710 DAY) " );

TaskLogs = flexibleSearchService.search(taskQuery).result

TaskLogs.each
{
    modelService.remove(it)
}

This is dangerous from a memory point of view.

flexibleSearchService = spring.getBean "flexibleSearchService"


FlexibleSearchQuery taskQuery = new FlexibleSearchQuery("select {pk} from {ProcessTaskLog join BusinessProcess as bp on {bp.pk}={ProcessTaskLog.process} } where  {bp.modifiedTime} < DATE_SUB(NOW(), INTERVAL 1710 DAY) " );

TaskLogs = flexibleSearchService.search(taskQuery).result

modelService.removeAll(TaskLogs)

former_member618655
Active Participant
0 Kudos

We also resorted to creating batch mode impex when we had to import 10 million orders/order entries during cutover, make sure you run the impex batch mode in legacy mode. You can also set the number of threads to 4 or 6 or 8

Former Member
0 Kudos

ModelService.removeAll() essentially just calls ModelService.remove() for each Model in the List (or did last time I checked), which means it's firing a separate SQL query for every item. If the latency between your application server and database is even 1ms, that's 1ms EXTRA per item to be removed, not even counting for the additional overheads on processing separate queries.

Fastest way I've been able to do this is by creating a Batch Mode ImpEx script which will wipe the table in a much faster way. You can then fire this ImpEx (use legacy mode) from your code and the items should be removed reasonably quickly.

Thanks, James

former_member537989
Contributor
0 Kudos

What I'm currently doing is fetching the list of such PriceRowModels and passing it to modelService.removeAll()

that's not convenient because it will require all those models to be represented in memory, instead, try to implement MaintenanceCleanupStrategy with some reasonable fetch window

Former Member
0 Kudos

Hello, we had a similar case with over 2.5 mil "models". As far as we went with inspecting the source of modelService.removeAll(), it seems the ModelService uses transactions and later on iterates one by one on the assigned list using some kind of queue (please feel free to correct me if i did catch something wrong here). Long story short, we decided to go with the "impex solution", using the ImportService and have removed models by using "remove impex header". Hope it helps in some way.