So you want to know about blockchain coding but maybe don’t know where to start? In this article, we’ll explore introductory information about the most commonly used languages in blockchain and general programming concepts. By the end, you’ll be able to write your own blockchain from scratch with a few lines of Python code!
It’s advisable to learn to code and understand at least the basics of programming before diving into this article. But anyone with minimal or even no knowledge can get a well-rounded introduction since each aspect will be thoroughly explained.
Blockchain technology is an exciting and rapidly developing industry. As such, the vast majority of projects are developed in an opensource manner. This means the code is available for anyone to copy, modify, or redistribute. The original blockchain, Bitcoin, is a prime example of open source development as Bitcoin-core, the most popular Bitcoin client, currently has over 600 contributors from around the world.
Because of the nature of open source development, many different languages are used. There’s actually is no official or standard language. In fact, you could write blockchain code in just about any computer language as long as it is Turing complete (i.e. it can solve any computational problem). Funny enough, an in-game engineering system in the popular game Minecraft, called
Since blockchains are abstract pieces of data, any Turing complete language could hypothetically interact with the network.
Best options for blockchain coding languages
But before you try to write a blockchain on powerpoint or a smart contract with 3D blocks, we should consider the more practical options that have been more empirically proven and are widely used by opensource projects. Most of the common languages you have probably heard of are Turing complete. But some languages make it easier to do certain kinds of projects than others.
Languages like Ethereum’s Solidity were developed specifically for blockchain development. Solidity helps in that it allows for more Object-Oriented Programming, which makes it much easier for people to read and write the language. This is a super important characteristic of good and maintainable
We can go to Github to find out which blockchain coding languages are used in the client software of most blockchains. That way we have a good idea of where to start.
Finding the most popular coding languages
Looking at the top 10 blockchains on Coinmarketcap.com (excluding tokens and
If you are working in the blockchain industry, however, it is likely that you would be working on a project on Ethereum and therefore be working with Solidity. About 45 of the top 100 projects on Coinmarketcap are tokens built off other blockchains. The vast majority of these are on Ethereum. But this article is more about the basics of blockchain coding with the end goal of understanding how it works on a technical level.
So if you are looking to become a developer in the industry, you would probably want to put Solidity at the top of that list, followed by the others. Because, of
Let’s continue with a little history of these languages.
C is one of the early general-purpose programming languages. It was initially created in 1972 from an earlier language called B. The B language was a bit slow and lacked some features. So developers of Unix operating system at Bell Labs, Ken Thompson
C++ was invented in 1979 by a Danish computer scientist named Bjarne
C++ was the language that Satoshi Nakamoto originally used in the first implementation of Bitcoin.
Python is a high level and interpreted language with a strong emphasis on readability and whitespaces. Being an interpreted language means that when the code is run, it is translated into an intermediate language like C or bytecode. The advantage of this is that you can significantly reduce the amount of code you need to write and it removes redundant or superfluous declarations of things like variable types, which is common in languages like C. It was released in 1991 by Guido Van Rossum. Python actually got its name from the British comedy sketch group Monty Python!
Python has risen significantly in popularity in the last few years. This is particularly because it’s a lot quicker to write programs with it when compared to languages like C. Generally, you don’t need to worry about memory allocation and other small details. But also it’s because of the rising popularity of data science and machine learning applications, for which Python has tons of fantastic libraries.
Like Python, Java is another
Java syntax is very similar to C. The similarity was intentional so that experienced programmers in the industry could easily transition and pick it up without any major hurdles. Java was designed with the philosophy of “write once, run anywhere.” So that Java code could easily run across many platforms without the
We have to mention Solidity in this article. It is the de facto smart contract language, even if we aren’t going to use it in our blockchain. Solidity was designed to be an easy-to-use
What is a Blockchain Technically?
To oversimplify, a blockchain is very much like a linked list.
First, to understand what a linked list is, you should know what a pointer is. A pointer is an address to a specific place in your computer’s memory. If you have ever programmed in a language like C or C++ before, you should be fairly familiar with pointers already. In practice, they are a bit tricky to get the hang of, but they allow you more granular control over how your program may use memory. This can be useful
Pointers can reference other pointers, structs, or basically any other data type. We won’t have to worry about pointers in this article beyond understanding what a linked list is. This is because we will be using Python, which is called a “high level” language. A lot of the nitty-gritty stuff is simplified and hidden, such as all the referencing and handling of pointers. It’s done automatically by the language and is abstracted away to make the code easier to read and write.
What is a linked list?
A linked list is just a sequence of elements or objects that are “linked” together by pointers. Each element is made up of two pieces. One piece is the data you are trying to record, and the second piece of data is a pointer or “link” to the next element.
Unlike other types of more typical lists used in programming like arrays, you have to iterate through each item sequentially so between that and the extra space in memory needed for the pointers they can be a bit slower. With arrays (another type of list), you can just use an index to access a specific item you want to use because of the way they allocate space in memory.
Cryptographic hash pointers
With a blockchain, instead of using a regular pointer as a reference, it uses a cryptographic hash pointer, which contains the hash of the previous block. A hash is the output of a hash function which is normally called a “one-way function.” It allows you to input some piece of data and get a unique output of a fixed length.
You can’t reverse the data from the output hash very easily. But you can easily prove that a hash is correct or that your data has not been tampered with by putting your data into the hash function and verifying that the data does in fact map to that specific output of the hash function. And so using this, you can include the hash of the previous block, as well as other pieces of data such
When we have a really good hashing function that has all of the right qualities, we can see that changing even a small piece of the input data will give us a very different output hash.
The properties needed for a good hash function do vary slightly. It depends on the specific application you are using it for. But here are some of the general attributes that you would want to have.
Hash functions must produce a fixed-length output. No matter what the length is of the input string (also called the “plaintext” input or “message”), you always want the output to be the same. Even if it’s smaller than the output text, you would just add some sort of “padding” to the initial input.
The hash function should be fairly efficient. It’s possible to create hash functions that have the other properties but they are very inefficient and slow to compute. This just comes down to the practicality of using the function, especially when using it over a decentralized network. We need as much speed as we can get wherever and whenever we can get it.
Preimage resistance means that even if you were to try various input texts or pre-image into your hash function, you likely won’t find a correlation or mapping to a specific output character or combination of characters.
Collision resistance means that two unique inputs will not produce the same output hash. Or, that it’s extremely improbable to find two such inputs.
Similar to collision resistance, but it should be unlikely to find two output hashes that are even somewhat similar.
The input text is not correlated in any way to the output text. Ciphers like the Caeser cipher (discussed below) would not fulfill these criteria. This is because each element is correlated 1:1 to an element in the output text. Therefore, you should not find any correlation.
How to write your first blockchain in Python
We will be using trinket.io to write a proof of work blockchain from scratch in Python 2 all in the browser!
There are definitely secure libraries with the standard Bitcoin hashing algorithm SHA-256 among others that are available in Python. One of these standard libraries is called
Understanding hash functions
But it’s important to have an understanding of what hash functions actually do. It’s also good to know what makes one secure as well as understand some basic cryptography. So we will write our hashing function from scratch.
Again we will assume you have some programming knowledge. But feel free to follow along without that knowledge. You can also learn the basics of Python at learnpythonthehardway.org. But here is a simple program demonstrating some of the basic
Try running the code and see what happens.
With blockchain coding, it’s first things first: the hashing function. There are libraries that you can use that already have standard hashing functions. But for the sake of learning, we will write some from scratch. Of course, it’s going to be super simple and not secure, so I don’t recommend using it for anything! But it’s important to understand how everything works and to have a grasp of basic cryptographic principles if you are going to be learning how to build a CRYPTO-currency and blockchain.
To understand hashing, you need to understand cryptography and look at some basic and early encryption algorithms. We’ll start off with the Caesar cipher and the Vigenère cipher.
The Caesar cipher is named after Julius Caesar because he apparently used this cipher in some of his own correspondents. It is one of the earliest and most well-known encryptions. It works quite simply by shifting each letter by a “shift” which can also be referred to as a key. For example, if the shift is 1 then if you input some plaintext like “abc” the Caesar cipher will output “bcd” because each letter is shifted by one index in the alphabet. You can change the shift to an index you have in your alphabet to encrypt it differently.
During Roman times, when literacy rates may have been quite a bit lower than in the modern world, and the copying of documents could not have been done with a quick CTRL-C, and encryption wasn’t a widely known idea, the Caesar cipher may have been sufficient for such an application. But obviously in the modern world when a computer can do billions of operations per second it’s not a very tall order to try 25 possible character shifts until its broken. This Cipher could be broken in less than a second on a regular computer. It could even be broken fairly quickly by hand.
If we introduce more elements to our shift key, however, something interesting happens. If our key has several elements, then we can have a key like “
This cipher was wrongly called the Vigenère cipher due to a misattribution. It was actually created by Giovan Battista Bellaso, an Italian cryptologist in the 1500s. Not Vigenère. It can be a secure cipher as long as the key is a completely random set of characters and the length of the key is at least as long as the input message you are trying to encrypt. Ciphers like this that cannot be cracked. Additionally, they have a one-time-use
Now let’s do an implementation of these ciphers in Python. The only difference here was done for programming simplicity and that is that the alphabet we are using is the ASCII table. So the results will be slightly different than using the traditional ciphers and will utilize more characters.
Press run code and see what the output is!
If you can understand this code or create your own implementation, congratulations! You now have the power to encrypt and decrypt messages! Maybe play with it and see if you can come up with something on your own.
Hashing vs cryptography
Now, as it turns out there is a subtle distinction to be made between encryption and cryptographic hashing even though they share a lot of similarities. In blockchain coding, it’s important to differentiate. Encryption is meant to be reversible if you have the right key. It’s also generally used to send private messages, even if the channel is insecure. Whereas hashing is not reversible, even if you know the exact algorithm that was used, there is no information gain to be had from the output hash.
So there is also no key
A simple hashing function
A very simple hashing function is taking the value of each input character on the index table and adding them up. So ABC = 1 + 2 + 3 = 6. The idea here is that you can’t easily figure out what exact combination of letters lead to that sum. Let’s see what that function looks like in Python. For the index, we will use the ASCII table instead of a regular alphabet.
If you run this code, will notice that the input “this” and “hits” actually produce the same output hash. This is bad and will not work. That’s because there are too many collisions! Let’s try again. But this time, let’s multiply each number by its index in the input string before we add it to the sum.
This is a lot better but still insufficient as similar inputs produce similar outputs due to the similar lengths of the input. Even if you couldn’t figure out exactly what the inputs were, you could gain some information about them.
A stronger hashing function
We need something much much stronger for a blockchain. So let’s take a crack at it and see if we can create something that’s reasonably strong.
Altogether, our hashing function is 90 lines of code. It’s not terribly complicated to understand if you are familiar with Python. Here is an explanation of how it works:
Hashing function explanation
First, in the hash() function we created an alphabet. This can be anything of your choosing. Like the ASCII table. In this example, it is just uppercase letters and digits. Any other characters are omitted and removed from the input string.
The next step is to either extend or compress our input message as we want a fixed length output string. We do this inside the chomp() function which will divide the string into multiple equal length parts if it’s too long, and then it simply adds each character from the new separated strings together by whatever their index was in the alphabet.
If the index is greater than the length of the alphabet then it simply rolls over back to 1 or A in this case and continues counting from there. And if the string is too short then we add what “padding” which in this case is just copying the alphabet to the input string until it is the desired length.
Once we are here, we already have more or less gibberish for a string. But it still may be possible to reverse and you might be able to observe some common patterns in the output. So now we will put it through an actual cipher or “mixer” algorithm.
Here is the simple version of how it works. With an input text of “AB” the function returns A^2 + B + (index of A) x 2. This is in the character_map() function, which is the key part of the function.
We start our transformation of the plaintext by inputting the first two letters of the string. So A and B or character 1 and character 2. Then we modify the string with the new output and perform the action on the next pair of characters, overlapping. So now we do the transformation on character 2 and character 3 and so on and so forth until we have done it on all the characters. This interlinks each character together so that a small change anywhere in the text will have a downstream effect on the all other characters of the output hash.
Using multiplication and a rollover
The multiplications in the remapping of the characters are helpful in obscuring the input further. This is because it causes numbers and combinations of numbers to “blow up” and roll back over to 1. Since the combination of numbers and the multiplication could lead to huge or smaller numbers, the roll over obscures which it could be and makes it hard to know what the original index was even if you know the exact hashing function used.
So unlike the first hashing function, which just used the sum of ASCII characters, it could give you some indication as to the length of the input string as well as characters used. Because longer strings and higher characters would lead to a higher number. But using multiplication and a
How the cipher works
That may seem a little complicated. And you might be able to create a simpler function. But it’s designed in such a way to take into account the combination of the current and the next character in the string of plaintext. This is important because it means that if we iterate the cipher multiple times over, the same text that’s the input of a character at the beginning of the text will eventually carry over to the next character of
The avalanche effect
This is also known as the avalanche effect or butterfly effect. Even the slightest change in initial conditions will drastically change the end result. It is one of the important and desirable qualities of designing a hash function.
If we then run our message through several rounds of this, it becomes increasingly difficult to determine what the original message was. In our function, we will just leave the default value as 3 for efficiency. But the more rounds, the more secure (in theory).
Now that that hard part of building the hashing function is over, we can move on to actually more blockchain coding and creating our blockchain!
Coding the blockchain
So what we do here is start by creating a new object class called Block. It takes in an index, timestamp, and previous hash as initial variables, then simply combines them all together in a plaintext string. Then it runs them through the hashing function we wrote earlier.
Creating the genesis block
Next, we create a helper function called create_genesis_block() since we have to start our blockchain from somewhere. Then, another helper called next_block() which just iterates to a new block with a new date-time and message.
Then all we have to do is create the first block and put it in a list and then iterate for as many blocks as we want to create. Now we have a
Coding a Proof-of-Work blockchain
We will replace our next_block() function with a function called mine(). This function will allow us to have a decentralized proof of work blockchain. Proof of Work (PoW) implements two important elements; computational power and randomness. The computation power ensures that miners are generally acting in good faith because they are burning electricity in order to confirm the next block. Since energy isn’t free, there is a high cost to adding blocks. So the reasoning is that rational actors won’t burn money for no reason.
The randomness helps to keep things decentralized and fair. It’s not always the miner with the most computing power that wins the block reward. Even if that is the case on average, this gives incentive to the other smaller miners on the network. Because they occasionally still get something even if they aren’t always number one with overall mining power. The incentive for mining, which is the block reward, is the transaction fees + newly minted coins. This is the case on most proof of work blockchains.
How do the nodes agree?
So how do all the nodes on the network agree on who actually won the block reward? How does a miner know when it has won? Well, this is done through something called difficulty. It’s somewhat self explanatory, its how much energy is spent to find a block on average. The network will periodically adjust the difficulty to maintain a consistent interval between blocks because if more miners join or leave the network the total power being used would change and thus this would create a decrease or increase the average time to find a block.
Under the hood of Proof-of-Work
How does a miner win a block reward? By getting an output hash whose sum is less than or equal to the variable called the “target”. If your blockchains hashing function is designed properly and has the qualities listed earlier in this article, there should be virtually no way to compute or know in advance what input will produce a certain output. So the only way to get a hash with a sum that’s lower than the target is to literally try as many random inputs (block data + nonces) to the hashing function as you can. That is until you get a hash that has the values you want. Again, this random hashing takes time and energy or work, hence proof of work.
The sum of the output hash in our case will just be the index of each element added together. So if our output hash was:
ABC = 1 + 2 + 3 = 6
If our target was say 5 then we would have to try again. Now if we got lucky and our output hash was AAA:
AAA = 1 + 1 + 1 = 3
Now that our output hash is lower than the target, we would broadcast our input message. We broadcast this with the other important information to the network. Then the nodes can quickly verify that with the data we sent them, the output of the hashing function does in fact sum to 3. Now they all confirm it. Once 51% or more of the nodes confirm it, we have reached consensus, permanently adding the block to the chain.
This mechanism is quite clever because it utilizes the fact that the output of (good) hashing functions produces random, normally distributed set of sums. If you remember from taking a stats class a normal distribution or Gaussian distribution fits a bell curve which means its very predictable how often a random sample will lead to a certain outcome or in our case a specific sum.
Each character of a hash is uniformly distributed. That means no character happens more frequently than any other character. If you increase the length of your output hash from N = 1 to N = 2, it would produce sums around the number 26 on average. If N was changed to 3, it would still be normally distributed. Just the average will increase slightly to 39.
Adjusting network difficulty
We know that the outcome is always going to be normally distributed. So we can easily predict the probability of a sum and therefore the time and energy needed on average to find a block.
With a network that only had 1 miner and if the average time to try 100 hashes was 1 minute and our difficulty was set to a 1 in 100 (1% odds), then our average block time would be 1 minute. If 99 more miners joined the network (and they all had the same computer and latency) then the mining power would increase 100 fold and therefore new blocks would be discovered 100 times a minute and the network difficulty would adjust so that any individual miner would have reduced odds from 1% to 0.01% chance of discovering a block and the blocktime of 1 minute is maintained.
Even though we linearly reduce the target sum the odds of getting that sum would decrease exponentially.
The beauty of Proof of Work (PoW)
This is very ingenious because no node on the network knows the exact hash of the next block until its discovered. Proof of Work makes sure everyone is honest purely by mathematics!
In proof of work implementations, if you are a miner you take all the data you want to add in the next block and then you add a randomly generated number called nonce to that data to give it a different variation. Then you run your data through the hashing algorithm. This way you can see if the output hash is lower than the target. If it’s not, you generate a new nonce and try again. Rinse and repeat until you discover a block! And that’s it! That’s how a simple mining algorithm works in blockchain coding.
You will see in our proof of work blockchain that the mine() function is very similar to the next_block() function, just instead it has a while loop which continually tries a new hash with a different nonce until it gets the right sum. With the parameters in this blockchain and a single miner, it takes about 4 seconds to find a block. If you change the target variable lower, it will take longer, and higher it will be shorter.
Your own Proof of Work blockchain!
So now we have an entire Proof of Work blockchain all in our browser in only about 170 lines of code made completely from scratch! Of
Blockchain Coding Resources
If after this you are still excited about blockchain development there are lots of online resources for learning to code blockchain.
Dr. Rob Edwards from San Diego State University has a good video series on Youtube hashing functions and programming them in C.
And if you want a more academic guide in building hash functions you can see this paper which builds hash functions with compression from block ciphers similar to the hash function we built in this article.
Code Academy is one of the best places to learn the basics of the more common languages. Many people will highly recommend it whether you are trying to learn
Cryptozombies is a popular platform for learning the basics of Solidity. It teaches you by doing blockchain coding puzzles in your browser to build a zombie Dapp. It’s definitely a fun way to learn Dapp development for Ethereum and its completely free!
Jamson Lopp has a great resource page for learning Bitcoin development specifically, but lots of the information will be applicable to other blockchains as well.