After the fact: GRANDPA Equivocation and sysinfo processes eventually lead to massive penalties for Kusama Network
PolkaBase
2020-08-21 06:02
本文约1629字,阅读全文需要约7分钟
The adventurous Kusama canary network had another small "twist" a month ago: network punishment, node disconnection, after a series of practice, Kusama will increase its resistance (robustness), PolkaBase Take you an article to interpret the te

text

Hindsight: GRANDPA Equivocation and sysinfo processes eventually lead to massive Kusama Network penalty

image description

“Multiple bugs in the code caused nodes to withdraw from the Kusama network and lost the database storing the blocks of verified blocks. As a result, the same nodes double-signed these blocks when they restarted. The Kusama council motion has been given to users Penalty fines resulting from this issue are compensated."

On Friday, July 31st, two Kusama validators with runtime v2019 started crashing every few minutes, giving two apparent errors, and reporting an issue. At first glance, the problem seems to be with the validator's key. It was later discovered that this was not the cause, as the affected validators confirmed among themselves that they had not changed keys in the process. Also, the issue appears to only exist on the Kusama network, not Polkadot.

Digging a little further, the team realized that the issue seemed to be caused by GRANDPA ambiguity, leading to a slash event in Kusama that was originally caused by a file descriptor leak, crashing the node. This leak prevents nodes from writing the state of GRANDPA voters (votes in a given round) to disk, and causes nodes that lost this data to vote again after a restart, this time voting for more blocks than they originally chose to validate. Block update. This leads to the GRANDPA algorithm encountering ambiguous algorithmic situations.

https://wiki.polkadot.network/

GRANDPA Equivocation: A validator signs two or more intention voting information at the same block time but on different chains.

The combination of these two events led to a situation where validators were heavily penalized sometime after v0.8.15 (Kusama v2015) was released and upgraded the network. Authority discovery has been available for some time at the runtime module level, but is not enabled by default on clients, and this version also enables the GRANDPA algorithm to report ambiguities for unsigned externals.https://github.com/

New Reporting Features of GRANDPA:

Armed with this information, the team's main hypothesis is that the ambiguity caused by the file descriptor leak actually started happening early on, but was only reported after the July v0.8.15 upgrade: Node crashes with networks running this version Afterwards, it began to report its own node status, which attracted the attention of relevant teams. Still, an investigation of the logs of the nodes run by Parity did not reveal any previous occurrences of ambiguous "Equivocation" (they would be logged to the terminal).

Further research into the root cause of the file descriptor leak pointed to two main culprits: permission discovery and metrics collection. Authority discovery using too many sockets to query data from DHT (ie discovery of other authority IP addresses). For system metrics collection (e.g., CPU and memory), we rely on the sysinfo crate, which keeps a cache of file descriptors on all processes in the system and on each process's thread (data is obtained by reading from /proc) .

The short-term solution is to disable Authorization Discovery by default and stop collecting system metrics. If there is a workaround in place for overcommitting sockets, the Authority Discovery module will be re-enabled again in a future release.

Before releasing a new version, the Parity team recommends manually disabling Authority Discovery. Also, in any case of a node crash, it is recommended that validators introduce a delay (1-2 minutes) before restarting. This reduces the likelihood of a node having Equivocation in GRANDPA if the node's ticket is not persisted on disk.

After some discussion and development, Polkadot v0.8.22 has been released, including the short-term fixes detailed above. All validators should upgrade their version and monitor the results. The Kusama Council has reinstated all slashes caused by the bug, and in this spirit, there has been a new discussion on reinstating economic losses rather than validator nomination losses.

https://matrix.to/

To keep up with the development, there are many ways to get involved in the Kusama community. Join the discussion on the Kusama channel:

https://wiki.polkadot.network/

Translation and editing: Shawn PolkaBase

https://polkadot.network/

Translation and editing: Shawn PolkaBase

Original link:https://polkadot.network

PolkaBase
作者文库