We have had an interesting issue where when we sent a large batch of components and pages for publishing in a SDL Web Cloud and DXA environment – After publishing many of these items, the publishing got stuck in different phases like “Waiting for publishing”,
“Waiting for Deployment” etc. and after few hours eventually it failed for all items.
Different teams were analyzing and working to resolve this to find that few of those transactions (1 in 100s) is giving the infamous “deploy duplicate binary” error which is quite weird while DXA is in use.
After further analysis, it turned out that in some cases this single failing of multimedia component publishing eventually crashing the entire Publishing ecosystem and failed all items in the entire publishing queue. Any further publishing would also starts stuck in the queue and eventually fails.
The failed multimedia is big in size (~ 100 MB PDF files) and stay in memory till default retry of 10 times – this takes some time and keep eating the memory. At the same time if a big chunk of items is waiting in the queue for publishing, this has adverse effect on the memory utilization which further leads to the crashing of the publisher due to “Out of Memory”
Moral of the story:
It seems the publishing queue is completely transaction based and execute each transaction batch in isolation, however, there seems some unexpected factors which may affects the publishing process – So a failure of an item in a publishing queue CAN cause failure of other batches of publishing items.