Posts

Showing posts from January, 2018

xrp.ninja Ripple validator returns normal since two day ago

Image
There's been some problems with the validator ever since it crashed about a week ago. It doesn't crash any more since I added much more disk space. But it easily falls behind and needs to play catch up several hours after each restart. I learned this from state_accounting field in the output of "rippled server_info". I created a metric and a chart from it. This is what the chart looks for the last week. The " Full " mode state means it is fully synced and participating in consensus process. You can see it is now asymptotically approaching 100 now. The value means the percentage of uptime the validator stays in each mode. I figured out why the disk was out of space. It was because the validator fell behind. When that happens, online delete is disabled. See this code: https://github.com/ripple/rippled/blob/fc0d64f5eec4386db7146251ab1a7fe880bec17c/src/ripple/app/misc/SHAMapStoreImp.cpp#L751 I saw some "Not deleting" messages in the l...

xrp.ninja Ripple validator crashed last night due to low free disk space

Image
The above graph shows what happened. I did get an alert from GCE that disk usage was high. I have an alert policy which says alert me if disk usage is over 80% for more than 5 minutes. However, it was too late, so I didn't get up and thought maybe it could resolve on its own. But it didn't. And GCE didn't keep alerting me, which surprises me. Rippled logged these two lines before it died: 2018-Jan-10 11:33:27 Application:FTL Remaining free disk space is less than 512MB 2018-Jan-10 11:33:27 Application:FTL Application::onStop took 23ms So rippled killed itself: https://github.com/ripple/rippled/search?utf8=%E2%9C%93&q=%22Remaining+free+disk+space+is+less+than+%22&type= Before that, log was flooded with the following messages for 5 hours: 2018-Jan-10 10:55:08 LoadMonitor:WRN Job: recvGetLedger run: 1390ms wait: 0ms 2018-Jan-10 10:55:32 LoadMonitor:WRN Job: recvGetLedger run: 1250ms wait: 0ms 2018-Jan-10 10:55:32 LoadMonitor:WRN Job: recvGetLed...

Ripple's Decentralization Strategy

Copied from the following link: https://www.xrpchat.com/topic/16362-rippled-0810-released/?do=findComment&comment=191441 mDuo13 wrote this. Kudos to him. To recap the Decentralization Strategy, here's a summary: Switch to using a validator list site (vl.ripple.com). This is where we are now. All rippled instances configured to use the site can automatically follow Ripple's updates to the recommended set of validators, in lockstep. In case you're curious, the validator list site publishes cryptographically signed recommendations of validators, so it's not easy to impersonate. And rippled caches the data it gets from the site, so the XRP Ledger won't go down even if vl.ripple.com is down for a while. (It might be tough to bring new rippled servers online while vl.ripple.com is down, but I think there are some protections against that, too.) Update the site and the existing validators to use validator tokens instead of master validator secret key...

xrp.ninja Ripple validator upgraded to 0.81.0

Image
Information about this version can be found here  https://ripple.com/dev-blog/rippled-version-0-81-0/ . This happened at about 1:45pm local time. There were some behavior changes after the upgrade:

Monitoring ripple validator running in GCE

Image
GCE provides various kinds of metrics from which one can create dashboard, alerting policies etc. However, there is no way to monitor performance of rippled unless we wrote something ourselves. Fortunately, GCE allows creating custom metrics. So as a starting point, I decided to create a metric for rippled build_version. This information is very useful. For example, you will be able to tell if the behavior of the server changes after version changes. However, I later learned that custom metric can't have "STRING" as its value type. So I created an uptime metric with build version as its label. It works just the same. Here is a screenshot of the chart created from this metric: Unfortunately, it seems I can't share this chart publicly, unlike charts created from built-in metrics which can be shared publicly.