Possible concurrency bug in delta 1.0.0 - Scala delta

hi, we have an in-house test where we write concurrently to the same delta table but different partitions, from multiple threads. so concurrency within a single local spark session using local filesystem. this used to work fine with delta 0.8.0 but now with delta 1.0.0 the same test fails occasionally with some data loss. i am not sure if a test like this is even appropriate given local mode and local filesystem limitations, but i figured i let you know just in case.

the test is here: https://github.com/tresata-opensource/delta/commit/9370b4614631ecfb07e5ed08fa5a3b80d9afe814

Asked Oct 05 '21 05:10
avatar koertkuipers

1 Answer:

local file system implementations do not guarantee atomic rename-without-overwrite under the covers, so there is not guarantee that locally run concurrency tests will work.

i tried to clarify that in the docs, see the note - https://docs.delta.io/1.0.0/delta-storage.html

Answered May 26 '21 at 23:21
avatar  of tdas