General Discussion
General Discussion
Hello All!
I have a general question: Can high latency (>30ms to hundreds ms) in data access lead to data corruption of any kind?
Customer has high latency on peak utilization hours (high CPU sometimes (sustained 100%), high disk utilization (sustained ~100%).
From time to time different applications presented data corruption in some way. Have seen virtual machines crashing, SQL databases, Exchange databases, Active Directory DB and so on.
Does it make sense to correlate data corruption of such application to very high sustained latencies perceived by the clients?
Kind regards,
Pedro Rocha
Solved! See The Solution
Just following up to inform what was the issue.
We had a case opened withe NetApp. First NetApp asked us to replace IOXM and SAS adapter. It did not solved the issue. Lastly, as a last resort, replacing motherboard module did the trick. Even though there was not a specific error or anything else pointing to the MB, it was it.
Regards,
Pedro.
interesting question. did some googling ->
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/CorrOpt_SIGCOMM2017.pdf
Hi,
Thanks for answering! But I did not found in the paper anything really related to what I was asking, regarding high latency (on the storage side) and corruption of data manipulated by applications such DBs and virtual machines.
Can you point what exactly what caught your attention?
Kind regards,
Pedro
Sorry, maybe I miss understood then? disregard that link.
Just thinking out loud here... part of that slow latency is getting the ACK back to the client/host that requested the write, right? Host sends -> <travel time> -> storage gets -> storage writes -> storage sends the ACK back -> <travel time> -> host gets the ACK and knows the data is good.
Could 30ms latency directly cause an issue... maybe? Maybe if a client/host went offline during that small windows between storage ACK and the host getting the ACK? But is 30ms latency an issue with databases.... I'd go with yes.
Curious what others have to say though.
Does anyone else has anything to add?
Just following up to inform what was the issue.
We had a case opened withe NetApp. First NetApp asked us to replace IOXM and SAS adapter. It did not solved the issue. Lastly, as a last resort, replacing motherboard module did the trick. Even though there was not a specific error or anything else pointing to the MB, it was it.
Regards,
Pedro.