Bitcoin seed security analysis
An anonymous Bitcoiner writes:
I’m trying to balance the concept of seed protection/memorization with seed entropy. I have added 10 random words to my seed (that my kids have memorized), along with deletion of the last seed word (which my kids have also memorized). I now intend to save these 33 words in plain sight… making them publicly available, to ensure they aren’t lost and forgotten. My question to you is, how safe is this wallet?
It’s always fascinating to see the custom cold storage solutions that folks come up with… so if I’m understanding it correctly, this is the proposed setup:
- Generate a 24 word random seed phrase
- Random insert 10 other words throughout these 24 words (I’m assuming they’re BIP39 words, otherwise they will stick out like a sore thumb to anyone who knows BIP39)
- Remove the last word (which I assume is the real last word, not a randomly added word)
Anything that involves memorization is probably a bad idea. You didn’t mention how many kids you have or if they are performing exercises to ensure that their memorization retains its integrity, but there are a lot of things that could go wrong here if something causes your kids to not be able to remember said words (and presumably their positions.)
When you say “in plain sight” it’s not clear how public you are making this seed phrase. I certainly wouldn’t post it on the internet… here’s some back of the napkin math:
If someone has your public 33 words and knows that there are 10 fake words and 1 missing real word (the checksum) on the end, how hard would it be to brute force?
If it were me, I’d write a script that’s something like this (pseudocode):
Or, in layman’s terms, find every possible combination of 23 words from the given list of 33 words. Then for each of those lists of words, try all 2048 possible combinations for the 24th word.
So if my math is correct then there’s a worst case scenario of having to brute force (33!/(23!10!))*2048 = 189,565,009,920 possible combinations. That sounds like a lot, right?
Well, we can compare that to a normal 24 word seed phrase which is more like 2048²³ combinations (it’s not 2048²⁴ due to checksum validity requirements) and you can see how greatly the security is reduced in terms of brute forcing.
2048²³ = 14,474,011,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 combinations
Compared to 189,565,009,920 combinations means the total guessable set of combinations (and thus your computational security) has been reduced to 0.000000000000000000000000000000000000000000000000000000000000001% of a normal seed phrase. It’s difficult to even comprehend how drastic of a reduction this is, though I wrote out the full number rather than using scientific notation in order to make it more clear.
How long would it take to actually run a brute forcing script? This stackexchange user notes that it takes his average laptop about 4 days to test derivations for 500M addresses, but I’d note that this problem is also absurdly parallelizable. So if we assume 100M addresses per day on an average machine, it would take at most half a year to brute force this seed with a naive script and hardware. This could easily be reduced to a few days or hours by a motivated sophisticated attacker who spins up a bunch of virtual machines to spread the brute forcing problem across them.
In reality it’s a bit more complex than I’ve described — for example, you’d probably want to test several different common derivation paths for each combination of seeds, though this would not change the computational complexity of the problem by more than 10X. As we can see, it’s incredibly easy to create a custom seed solution that drastically reduces the computational security of your cold storage. This is one reason why at Casa we have decided that the most secure solution is one where the user can’t possibly screw up handling seeds because they don’t store the seeds at all.
Multisig seed security
As I was writing this article I decided to perform a real-world demonstration regarding seed security.
Note that you should never take a photo of your seed phrase, or even the back of a seed phrase sheet. As no fewer than half a dozen of my observant followers pointed out to me, many of the words on the sheets could be recovered via careful manipulation of the image in order to see the indentations more clearly. However, they were making an assumption that this seed phrase was for a single signature setup. What I didn’t mention was that it was only 1 seed phrase for a 3-of-5 multisig Casa Keyshield; discovery of 1 seed phrase could have privacy ramifications but would not destroy the security of this setup. Because some folks didn’t seem to understand why I wasn’t concerned with this seed phrase being discovered, let’s do some more math!
Remember that each of these seed phrases has over 2048²³ combinations. You might be thinking that since this is a 3-of-5 multisig setup, that means an attacker would only need to know 3 seed phrases, thus if they extracted the phrase from the photo of the seed sheet they’d only have to brute force 2 seeds which would be more like 2048⁴⁶ combinations. However, in order to generate an address for the wallet to check if it has funds and may in fact be the correct set of seeds, you have to know all 5 public keys. You’d also have to know the order of the keys and their derivation paths, which add several magnitudes more complexity. But best case scenario and you’d have to search through a set of 2048⁹² combinations even if you already knew 1 of the 5 seed phrases AND the pubkey ordering AND the derivation paths. Even if you could test 1,000,000,000,000 combinations per second, you’d be guessing for 1.4x10²⁸⁵ years before you exhausted the possible set of combinations.
If you wanted to get more sophisticated and hope that this wallet has been spent from, you could also search the set of 500,000+ known public keys from the known 100,000+ spent TX outputs that were encumbered by 3-of-5 scripts. If you got a hit and found a 3-of-5 P2SH/P2WSH TXO that was spent using one of the public keys then it would be somewhat safe to assume that you found 1 of the 5 seeds, thus you could add the other 4 public keys from that TXO to the search set, improving your odds because you wouldn’t have to find the combination of all 5 seed phrases simultaneously, but rather find one seed phrase at a time. Thus you could reduce your brute forcing set size from 2048⁹² down to 2048²³. If you could test 1,000,000,000,000 combinations per second, you’d be guessing for 4.59x10⁵⁶ years before you exhausted the possible set of combinations.
Vires in numeris
Hopefully the above scenarios have helped you understand how bitcoin users are protected by large numbers. There are so many possible private keys that could be used to secure bitcoin that even if someone set up a large cluster of computers to search through as many keys as possible, it’s unlikely that they’d ever find a key that is being used during their lifetime.
These large numbers can be manipulated to your benefit or to your detriment — it’s easy to accidentally do the latter if you aren’t well versed in the technical underpinnings of the system!