Can accidental short circuit lead to data loss?

leon.andrec · February 2, 2020, 7:28pm

Running 4TB on rock64 4GB (Raspberry Pi clone)

I am doing ok I guess since April 2019, when I started my node on V3. My only problem is that my node crashes from time to time and then I just press the reset button on my rock64, the node restarts, and everything is back to normal.

I have been experimenting with implementing an external Arduino based restarting system. It should hijack the reset button to restart the node automatically when it crashes. I have yet to figure out the technical details, but today, during experimentation, I accidentally short-circuited my 12V power supply. 12V goes directly to the hard drive and through a 5V DC-DC step-down converter to my rock64.

As a consequence, I heard a chilling sound from the hard drive and from the 120W power supply. I checked the dashboard and after a restart, it showed 3.0TB of space available, and 507GB used.

After another restart, I was at 1.4TB available and 2.1TB used, that being the normal, expected value and the node seems to be running ok now for 20 minutes.

Should I worry? What happened? Has anyone had a similar experience?

Storgeez · February 2, 2020, 8:10pm

“Chilling sound” meaning bad sound?

It depends on a lot of factors. Upon reset, CPU execution is interrupted, meaning if a file was written part-way or cached in RAM, the remainder will never get written. Depending on file system, this might corrupt the file system but modern (journaling) file systems are protected against that usually. The next layer is file format corruption. For example database, depending on the technology (similar to file system), it might (and should) be protected, but without confirmation don’t count on it. Interrupt happening mid-transaction could corrupt the database and lead to data loss or repair requirement.

Many people have had their DBs corrupted by various reasons, so take care not to have this happen.

BrightSilence · February 2, 2020, 8:15pm

That sounds like your node was walking through the pieces again to recalculate how much space is actually used. I don’t know exactly what triggers that, though some of the recent updates did. It doesn’t necessarily mean corruption, but regardless for obvious reasons this should be avoided anyway. The good thing is that the latest version checks for database corruption before starting and if it finds a problem the node does not start. Since that doesn’t seem to be the case for you, you’re probably fine. It’s still possible that pieces your node was working on got corrupted, but that would probably not be more than a hand full and won’t likely lead to disqualification.

Storgeez · February 2, 2020, 8:18pm

It does some kind of data enumeration each time it’s started, it does this every time the node is started as far as I’ve seen.

But why does the node crash? Out of memory or something?

leon.andrec · February 2, 2020, 8:24pm

Storgeez · February 2, 2020, 8:29pm

I don’t know much about Linux but PC value is program counter value? Looks like an OS crash. Are there any OS updates to install?

anon27637763 · February 2, 2020, 9:09pm

It should also be noted that an external USB drive connected to a Rock64 is going to be slow. A node that is storing 2.1 TB on a slow external drive is going to take quite a bit of time to read through the sqlite database… which could easily result in reported usage statistics to be slow upon node restart.

leon.andrec · February 2, 2020, 9:16pm

eagleye · February 2, 2020, 9:19pm

You don’t want to design hardware to reboot the system. You want to find out why it’s failing a Rpi should be stable all the time.

Everytime you force a reboot you risk a database crash and then you’ll be losing files which lead to failing audits. You probably also need to check your database for errors.

Too many abnormal reboots could lead to DQ.

Storgeez · February 2, 2020, 9:30pm

That Chinese power supply and regulator are suspicious, make sure they aren’t the culprits. Insufficient decoupling and filtering on the power rails could feed spikes from your hard drive’s voice coils, though the regulator, if it’s not a good one, and into your computer. Is the step down regulator rated for high enough current? Why didn’t you simply use an old ATX power supply with integrated 5V and 3.3V rails?

Verify it’s not a software issue first though.

I made a similar device to reset my router. lol
But that’s a different story.

anon27637763 · February 2, 2020, 9:35pm

Nice and neat work!

However, it should be remembered that many of the SoC boards share peripheral bandwidth between devices. IIRC, on the RPi, the Ethernet connection bandwidth is shared with the USB interface. So, the 100 Mbps interface isn’t really 100 Mbps nor completely full duplex.

Your setup should work fine for lots of purposes. However, the SATA/USB interface is speed limited to the USB bandwidth which may or may not be shared with the Ethernet interface.

On the reboot issues… There are many things that could cause these SoC boards to reboot. A common cause is minor power fluctuations. You have a very nice power supply, but power fluctuations may still occur… on the board… when one or more peripheral devices begin drawing excess power… possibly dropping the voltage across one or more critical circuits.

On the whole, I would not recommend uses any of the consumer/hobbyist SoC boards for a production application. The boards have many quirks and are simply too unstable.