Hashing & Bitcoins

Great Resources for Figuring It Out:

Bitcoin FAQ https://en.bitcoin.it/wiki/FAQ
SHA-2 from Wikipedia http://en.wikipedia.org/wiki/Sha256
The best YouTube Explanation http://www.youtube.com/watch?v=mrlgw5KpkXM
Archive of blocks mined http://blockexplorer.com/

***And for sure look back in the Java Revolution for Hashing and Hash-table pages.***

JSR Summary:

The one line summary is: you perform a very sophisticated calculation on a specific string of letters trying to yield a hexadecimal number result which begins with 13 zeros - the first computer in the world to achieve this with each new string receives 50 bitcoins, and this happens several times per hour.

Fleshed out a bit, you take a "message", which is based on, among other things, the transactions that occurred with the last block's mining (so some combination of public keys of sellers and buyers, combined with dates, in all likelihood - so, a simplistic example might be PB1234-MF5678-02-14-13-etc) and add a random number (in the example, it could become PB1234-MF5678-02-14-13-etc-9876) and apply the SHA256 hashing algorithm to it (hashing is where you do any specific calculation to something which can be repeated to confirm receipt of it (and for other reasons) - I pasted the pseudocode of SHA256 below).

If you come up with a hexadecimal "hash result" which (as of February 2013) starts with 13 zeros (for example 00000000000003da6b507b256cde91e063eed9a13c8b93b57bc70a3260b3f4b0), then Bingo!, you've come up a new block mining result, and you receive (as of Feb 2013) 50 bitcoins (today 1 BTC is equal to 25.64 USD, so that's 1282 USD). The average number of times it would take to come up with that result is 1/2 * 16 to the power of 13 (since the odds of getting a 0 out of the 16 digits of hexadecimal (0-F) is 1/16, and so to get two 0s in a row is 1/16*16, and so on.)

And the other thing to note is that not only does your computer have to do, on average, an enormous number of calculations to get a result, the calculation itself is enormously time consuming compared to most calculations computers do (again, see the pseudocode below).

Never-the-less, through the network of bitcoin mining nodes around the world, the "Bingo"s happen several times per hour, based on, currently, tens to hundreds of transactions per mining block. What people like us do to hedge their bets is pool their processing with others to increase the likelihood of "scoring".

Interestingly, and importantly, the integrity of the system - i.e. what prevents it from being counterfeited - comes from the fact that one of the things used in each new block's calculation is the last hash result in the chain! 221,089 blocks have so far been mined since the start on January 1st, 2009, and enough blocks will be mined until there are exactly, and no more than, 21,000,000 bitcoins in existence. But that's moving into economics, and out of my area of competence! So, on to you, Perry!! ;)

Random JSR Points/Notes

Bitcoin http://www.youtube.com/watch?v=mrlgw5KpkXM at the 15 minute mark is where he starts to explain the math.

blockexplorer.com is great.

sha256
solve the (continually changing) hashing problem puzzle of 13 etc. zeros

21 million bitcoins to be created.

January 2009-13, the first half created.
2013-17, the next quarter, etc. logarithmically
controlled by the puzzle getting harder and harder

220,000 blocks produced to this point (see http://blockexplorer.com/)

186 billion hash operations per second?

The value is 50 bitcoins per block solved now, but that goes down logarithmically as well over time.

The chain of blocks assures the integrity of itself.

payload = <some data related to things happening on the Bitcoin network>
nonce = 1
hash = SHA2( SHA2( payload + nonce ) )

So, the things happening on the Bitcoin network is some representation of the transactions that just happened.
A bitcoin exchange is: a public key of the sender and the public key of the person being sent to, and signed with their own private key.

All nodes get all transactions. All can be transacted by verifying the public key.

The same bitcoins cannot be sent again because

(A block is a collection of transactions - the nodes create a block with all the transactions - blocks chained together by having the hash of the previous block.

All nodes download this chain as soon as they are established.

From Wikipedia, the Proof-of-work System for Bitcoin

How does the proof-of-work system help secure Bitcoin?

To give a general idea of the mining process, imagine this setup:
payload = <some data related to things happening on the Bitcoin network>
nonce = 1
hash = SHA2( SHA2( payload + nonce ) )

The cost function use in bitcoin is the Hashcash cost-function. Bit coin mining intensively computes a high value Hashcash stamp on the underlying block chain data. The hashcash stamp that Bitcoin miners produce is computed by repeatedly increasing "nonce" until the hash function yields a value, that has the rare property of being below a certain target threshold. (In other words: The hash "starts with a certain number of zeroes", if you display it in the fixed-length representation, that is typically used.)
As can be seen, the mining process doesn't compute anything special. It merely tries to find a number (also referred to as nonce) which - in combination with the payload - results in a hash with special properties.

The advantage of using the hashcash mechanism consists of the fact, that it is very easy to check a result: Given the payload and a specific nonce, only a single call of the hashing function is needed to verify that the hash has the required properties. Since there is no known way to find these hashes other than brute force, this can be used as a "proof of work" that someone invested a lot of computing power to find the correct nonce for this payload.

This feature is then used in the Bitcoin network to secure various aspects. An attacker that wants to introduce malicious payload data into the network, will need to do the required hashcash proof of work before it will be accepted. And as long as honest miners have more computing power, they can always outpace an attacker.
Also see Hashcash and Proof-of-work system and SHA2 and on Wikipedia.

Technical details of SHA-1 used for e-mail from Wikepedia

The header line looks something like this[2]:
X-Hashcash: 1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
The header contains: the recipient's email address, the date, and information proving the required computation has been performed. The presence of the recipient's email address requires that a new header be computed for each recipient, and the date allows the recipient to record headers received recently and make sure the header is unique to this email.

Sender's side
The sender prepares a header and adds an initial random number. It then computes the 160 bit SHA-1 hash of the header. If the first 20 bits of the hash are zeros then this is an acceptable header. If not then the sender increments the random number and tries again. Since about 1 in 220 headers will have 20 zeros as the beginning of the hash the sender will on average have to try 220 random numbers to find a valid header. Given reasonable estimates of the time needed to compute the hash,[when?] this would take about 1 second to find. At this time no more efficient method is known to find a valid header.
A normal user on a desktop PC would not be significantly impacted by the processing time required to generate the Hashcash string. However, spammers would suffer a significant impact due to the high number of spam messages required.

Recipient's side
Technically the system is implemented as follows:
• The recipient's computer calculates the 160 bit SHA-1 hash of the entire string "1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa". This takes about two microseconds on a 1 GHz machine—far shorter than the time it took for the rest of the e-mail to be received. If the first 20 bits are all zero, then it is valid. Later versions may require more bits to be zero.
• The recipient's computer checks the date in that header "060408" (8 Apr 2006). If it's within 2 days of today, it's valid (to compensate for clock skew and routing time).
• The recipient's computer checks to see if the e-mail address in the hashcash header is (any of) the valid e-mail address(es) of the recipient (or any mailing lists to which the recipient is subscribed).
• If all the other checks are valid, the recipient's computer puts that string in a database. If that string was not already in the database, it is valid.
All these tests take far less time and disk space than receiving the rest of the e-mail.
Required effort

The time needed to compute such a hash collision is exponential with the number of zero bits. So one can keep adding zero bits (doubling the amount of time needed to send with each zero bit) until it is too expensive for spammers to generate valid header lines. (Confirming the header is valid always takes the same amount of time, no matter how many zero bits are required for a valid header.)

Pseudocode for the SHA-256 algorithm from Wikipedia.

Note the great increase in mixing between bits of the w[16..63] words compared to SHA-1.
Note 1: All variables are unsigned 32 bits and wrap modulo 232 when calculating
Note 2: All constants in this pseudo code are in big endian

Initialize variables
(first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):
h[0..7] :=
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19

Initialize table of round constants
(first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):
k[0..63] :=
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2

Pre-processing:
append the bit '1' to the message
append k bits '0', where k is the minimum number >= 0 such that the resulting message
length (in bits) is 448 (modulo 512).
append length of message (before pre-processing), in bits, as 64-bit big-endian integer

Process the message in successive 512-bit chunks:
break message into 512-bit chunks
for each chunk
break chunk into sixteen 32-bit big-endian words w[0..15]

Extend the sixteen 32-bit words into sixty-four 32-bit words:
for i from 16 to 63
s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3)
s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
w[i] := w[i-16] + s0 + w[i-7] + s1

Initialize hash value for this chunk:
a := h0
b := h1
c := h2
d := h3
e := h4
f := h5
g := h6
h := h7

Main loop:
for i from 0 to 63
S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
ch := (e and f) xor ((not e) and g)
temp := h + S1 + ch + k[i] + w[i]
d := d + temp;
S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
maj := (a and (b xor c)) xor (b and c)
temp := temp + S0 + maj

h := g
g := f
f := e
e := d
d := c
c := b
b := a
a := temp

Add this chunk's hash to result so far:
h0 := h0 + a
h1 := h1 + b
h2 := h2 + c
h3 := h3 + d
h4 := h4 + e
h5 := h5 + f
h6 := h6 + g
h7 := h7 + h

Produce the final hash value (big-endian):
digest = hash = h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7
The computation of the ch and maj values can be optimized the same way as described for SHA-1.
SHA-224 is identical to SHA-256, except that:
• the initial variable values h0 through h7 are different, and
• the output is constructed by omitting h7.
Here the initial values for the variables (in big endian):
(The second 32 bits of the fractional parts of the square roots of the 9th through 16th primes 23..53)
h[0..7] :=
0xc1059ed8, 0x367cd507, 0x3070dd17, 0xf70e5939, 0xffc00b31, 0x68581511, 0x64f98fa7, 0xbefa4fa4