Friday, March 9, 2012

Error writing data to same destination in single data flow

I am getting the following error running a data flow that splits the input data into multiple streams and writes the results of each stream to the same destination table:

"This operation conflicts with another pending operation on this transaction. The operation failed."

The flow starts with a single source table with one row per student and multiple scores for that student. It does a few lookups and then splits the stream (using Multicast) in several layers, ultimately generating 25 destinations (one for each score to be recorded), all going to the same table (like a fact table). This all is running under a transaction at the package level, which is distributed to a separate machine.

Apparently, I cannot have all of these streams inserting data into the same table at one time. I don't understand why not. In an OLTP system, many transactions are inserting records into the same table at once. Why can't I do that within the same transaction?

I suppose I can use a UnionAll to join them back together before writing to a single destination, but that seems like an unnecessary waste and clutters the flow. Can anyone offer a different solution or a reason why this fails in the first place?

Thanks in advance.

What type of load are you using? Fast load with table lock? Certainly you will need to take off table lock, and maybe drop the fast load too.

An OLTP system is not doing a bulk load, it is doing nice and slow atomic transactions.

I also don't see what is wrong with the Union All transform. If you can get the current design working, then I think you should also try the union, and let us know which is faster, as that would be the real reason to choose one over the other, not good looks!

Re-reading this I see mention of a distributed transaction which matches the error better. Try using speparate connections for each destination. If the only reason for using the transaction is to ensure integrity accross the multiple destinations, then this would make me think of union even more as a distributed transaction can add a fair overhead.

|||

Thanks for the reply and the suggestions. Sorry for the delay in a follow-up, the automated alert for replies didn't work for me and I didn't know there was a reply.

I have since rewritten it to use the UnionAll task and it works. It's not so much an issue of how it looks, rather the trouble required to build it and maintain it. The current version of that task does not make it easy to distinguish which input is which so you can set the appropriate columns correctly. And if you should need to disconnect one stream from the UnionAll and reconnect it, the relative position of that stream moves and gets renamed. It's just a hassle to get right. Elegance in appearance is often times a clue to an elegant technical solution. Not always, but I'm a believer in a good "looking" solution.

You have a good suggestion about turning off the table lock and maybe the fast load option. I hadn't thought of that before I rewrote it. If I get a chance, I'll try the original approach and see if I can get it working.

The need for a transaction is to make sure all the data is loaded or none of it. If some part of it should fail, I want to rerun the entire package to load all the data again. This data flow isn't the only one in the package.

Thanks again.

No comments:

Post a Comment