Algorithms_in_C++ 1.0.0
Set of algorithms implemented in C++.
Loading...
Searching...
No Matches
md5.cpp File Reference

Simple C++ implementation of the MD5 Hashing Algorithm More...

#include <algorithm>
#include <array>
#include <cassert>
#include <cstring>
#include <iostream>
#include <string>
#include <vector>
Include dependency graph for md5.cpp:

Namespaces

namespace  hashing
 Hashing algorithms.
 
namespace  MD5
 Functions for the MD5 algorithm implementation.
 

Functions

uint32_t hashing::md5::leftRotate32bits (uint32_t n, std::size_t rotate)
 Rotates the bits of a 32-bit unsigned integer.
 
bool hashing::md5::isBigEndian ()
 Checks whether integers are stored as big endian or not.
 
uint32_t hashing::md5::toLittleEndian32 (uint32_t n)
 Sets 32-bit integer to little-endian if needed.
 
uint64_t hashing::md5::toLittleEndian64 (uint64_t n)
 Sets 64-bit integer to little-endian if needed.
 
std::string hashing::md5::sig2hex (void *sig)
 Transforms the 128-bit MD5 signature into a 32 char hex string.
 
void * hashing::md5::hash_bs (const void *input_bs, uint64_t input_size)
 The MD5 algorithm itself, taking in a bytestring.
 
void * hashing::md5::hash (const std::string &message)
 Converts the string to bytestring and calls the main algorithm.
 
static void test ()
 Self-test implementations of well-known MD5 hashes.
 
static void interactive ()
 Puts user in a loop where inputs can be given and MD5 hash will be computed and printed.
 
int main ()
 Main function.
 

Detailed Description

Simple C++ implementation of the MD5 Hashing Algorithm

Author
tGautot

The MD5 Algorithm is a hashing algorithm which was designed in 1991 by Ronal Rivest.

MD5 is one of the most used hashing algorithm there is. Some of its use cases are:

  1. Providing checksum for downloaded software
  2. Store salted password

However MD5 has be know to be cryptographically weak for quite some time, yet it is still widely used. This weakness was exploited by the Flame Malware in 2012

Algorithm

First of all, all values are expected to be in [little endian] (https://en.wikipedia.org/wiki/Endianness). This is especially important when using part of the bytestring as an integer.

The first step of the algorithm is to pad the message for its length to be a multiple of 64 (bytes). This is done by first adding 0x80 (10000000) and then only zeroes until the last 8 bytes must be filled, where then the 64 bit size of the input will be added

Once this is done, the algo breaks down this padded message into 64 bytes chunks. Each chunk is used for one round, a round breaks the chunk into 16 blocks of 4 bytes. During these rounds the algorithm will update its 128 bit state (represented by 4 ints: A,B,C,D) For more precisions on these operations please see the Wikipedia aritcle. The signature given by MD5 is its 128 bit state once all rounds are done.

Note
This is a simple implementation for a byte string but some implmenetations can work on bytestream, messages of unknown length.

Function Documentation

◆ hash()

void * hashing::md5::hash ( const std::string & message)

Converts the string to bytestring and calls the main algorithm.

Parameters
messagePlain character message to hash
Returns
void* Pointer to the MD5 signature
287 {
288 return hash_bs(&message[0], message.size());
289}
void * hash_bs(const void *input_bs, uint64_t input_size)
The MD5 algorithm itself, taking in a bytestring.
Definition md5.cpp:138
T size(T... args)
Here is the call graph for this function:

◆ hash_bs()

void * hashing::md5::hash_bs ( const void * input_bs,
uint64_t input_size )

The MD5 algorithm itself, taking in a bytestring.

Parameters
input_bsThe bytestring to hash
input_sizeThe size (in BYTES) of the input
Returns
void* Pointer to the 128-bit signature

Values of K are pseudo-random and used to "salt" each round The values can be obtained by the following python code

from math import floor, sin
for i in range(64):
print(floor(2**32 * abs(sin(i+1))))
138 {
139 auto* input = static_cast<const uint8_t*>(input_bs);
140
141 // Step 0: Initial Data (Those are decided in the MD5 protocol)
142 // s is the shift used in the leftrotate each round
144 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
145 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20,
146 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
147 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21};
148 // K is pseudo-random values used each round
149 // The values can be obtained by the following python code:
150
151 /**
152 * @brief Values of K are pseudo-random and used to "salt" each round
153 * The values can be obtained by the following python code
154 * @code{.py}
155 * from math import floor, sin
156 *
157 * for i in range(64):
158 * print(floor(2**32 * abs(sin(i+1))))
159 * @endcode
160 */
162 3614090360, 3905402710, 606105819, 3250441966, 4118548399, 1200080426,
163 2821735955, 4249261313, 1770035416, 2336552879, 4294925233, 2304563134,
164 1804603682, 4254626195, 2792965006, 1236535329, 4129170786, 3225465664,
165 643717713, 3921069994, 3593408605, 38016083, 3634488961, 3889429448,
166 568446438, 3275163606, 4107603335, 1163531501, 2850285829, 4243563512,
167 1735328473, 2368359562, 4294588738, 2272392833, 1839030562, 4259657740,
168 2763975236, 1272893353, 4139469664, 3200236656, 681279174, 3936430074,
169 3572445317, 76029189, 3654602809, 3873151461, 530742520, 3299628645,
170 4096336452, 1126891415, 2878612391, 4237533241, 1700485571, 2399980690,
171 4293915773, 2240044497, 1873313359, 4264355552, 2734768916, 1309151649,
172 4149444226, 3174756917, 718787259, 3951481745};
173
174 // The initial 128-bit state
175 uint32_t a0 = 0x67452301, A = 0;
176 uint32_t b0 = 0xefcdab89, B = 0;
177 uint32_t c0 = 0x98badcfe, C = 0;
178 uint32_t d0 = 0x10325476, D = 0;
179
180 // Step 1: Processing the bytestring
181
182 // First compute the size the padded message will have
183 // so it is possible to allocate the right amount of memory
184 uint64_t padded_message_size = 0;
185 if (input_size % 64 < 56) {
186 padded_message_size = input_size + 64 - (input_size % 64);
187 } else {
188 padded_message_size = input_size + 128 - (input_size % 64);
189 }
190
191 std::vector<uint8_t> padded_message(padded_message_size);
192
193 // Beginning of the padded message is the original message
194 std::copy(input, input + input_size, padded_message.begin());
195
196 // Afterwards comes a single 1 bit and then only zeroes
197 padded_message[input_size] = 1 << 7; // 10000000
198 for (uint64_t i = input_size; i % 64 != 56; i++) {
199 if (i == input_size) {
200 continue; // pass first iteration
201 }
202 padded_message[i] = 0;
203 }
204
205 // We then have to add the 64-bit size of the message at the end
206 // When there is a conversion from int to bytestring or vice-versa
207 // We always need to make sure it is little endian
208 uint64_t input_bitsize_le = toLittleEndian64(input_size * 8);
209 for (uint8_t i = 0; i < 8; i++) {
210 padded_message[padded_message_size - 8 + i] =
211 (input_bitsize_le >> (56 - 8 * i)) & 0xFF;
212 }
213
214 // Already allocate memory for blocks
216
217 // Rounds
218 for (uint64_t chunk = 0; chunk * 64 < padded_message_size; chunk++) {
219 // First, build the 16 32-bits blocks from the chunk
220 for (uint8_t bid = 0; bid < 16; bid++) {
221 blocks[bid] = 0;
222
223 // Having to build a 32-bit word from 4-bit words
224 // Add each and shift them to the left
225 for (uint8_t cid = 0; cid < 4; cid++) {
226 blocks[bid] = (blocks[bid] << 8) +
227 padded_message[chunk * 64 + bid * 4 + cid];
228 }
229 }
230
231 A = a0;
232 B = b0;
233 C = c0;
234 D = d0;
235
236 // Main "hashing" loop
237 for (uint8_t i = 0; i < 64; i++) {
238 uint32_t F = 0, g = 0;
239 if (i < 16) {
240 F = (B & C) | ((~B) & D);
241 g = i;
242 } else if (i < 32) {
243 F = (D & B) | ((~D) & C);
244 g = (5 * i + 1) % 16;
245 } else if (i < 48) {
246 F = B ^ C ^ D;
247 g = (3 * i + 5) % 16;
248 } else {
249 F = C ^ (B | (~D));
250 g = (7 * i) % 16;
251 }
252
253 // Update the accumulators
254 F += A + K[i] + toLittleEndian32(blocks[g]);
255
256 A = D;
257 D = C;
258 C = B;
259 B += leftRotate32bits(F, s[i]);
260 }
261 // Update the state with this chunk's hash
262 a0 += A;
263 b0 += B;
264 c0 += C;
265 d0 += D;
266 }
267
268 // Build signature from state
269 // Note, any type could be used for the signature
270 // uint8_t was used to make the 16 bytes obvious
271 // The sig needs to be little endian
272 auto* sig = new uint8_t[16];
273 for (uint8_t i = 0; i < 4; i++) {
274 sig[i] = (a0 >> (8 * i)) & 0xFF;
275 sig[i + 4] = (b0 >> (8 * i)) & 0xFF;
276 sig[i + 8] = (c0 >> (8 * i)) & 0xFF;
277 sig[i + 12] = (d0 >> (8 * i)) & 0xFF;
278 }
279
280 return sig;
281}
double g(double x)
Another test function.
Definition composite_simpson_rule.cpp:115
T copy(T... args)
uint32_t toLittleEndian32(uint32_t n)
Sets 32-bit integer to little-endian if needed.
Definition md5.cpp:89
uint64_t toLittleEndian64(uint64_t n)
Sets 64-bit integer to little-endian if needed.
Definition md5.cpp:102
uint32_t leftRotate32bits(uint32_t n, std::size_t rotate)
Rotates the bits of a 32-bit unsigned integer.
Definition md5.cpp:66
Here is the call graph for this function:

◆ interactive()

static void interactive ( )
static

Puts user in a loop where inputs can be given and MD5 hash will be computed and printed.

Returns
void
351 {
352 while (true) {
353 std::string input;
354 std::cout << "Enter a message to be hashed (Ctrl-C to exit): "
355 << std::endl;
356 std::getline(std::cin, input);
357 void* sig = hashing::md5::hash(input);
358 std::cout << "Hash is: " << hashing::md5::sig2hex(sig) << std::endl;
359
360 while (true) {
361 std::cout << "Want to enter another message? (y/n) ";
362 std::getline(std::cin, input);
363 if (input.compare("y") == 0) {
364 break;
365 } else if (input.compare("n") == 0) {
366 return;
367 }
368 }
369 }
370}
T compare(T... args)
T endl(T... args)
T getline(T... args)
void * hash(const std::string &message)
Converts the string to bytestring and calls the main algorithm.
Definition md5.cpp:287
std::string sig2hex(void *sig)
Transforms the 128-bit MD5 signature into a 32 char hex string.
Definition md5.cpp:122
Here is the call graph for this function:

◆ isBigEndian()

bool hashing::md5::isBigEndian ( )

Checks whether integers are stored as big endian or not.

Note
Taken from this StackOverflow post
Returns
true IF integers are detected to work as big-endian
false IF integers are detected to work as little-endian
76 {
77 union {
78 uint32_t i;
80 } bint = {0x01020304};
81
82 return bint.c[0] == 1;
83}
Here is the call graph for this function:

◆ leftRotate32bits()

uint32_t hashing::md5::leftRotate32bits ( uint32_t n,
std::size_t rotate )

Rotates the bits of a 32-bit unsigned integer.

Parameters
nInteger to rotate
rotateHow many bits for the rotation
Returns
uint32_t The rotated integer
66 {
67 return (n << rotate) | (n >> (32 - rotate));
68}
T rotate(T... args)
Here is the call graph for this function:

◆ main()

int main ( void )

Main function.

Returns
0 on exit
376 {
377 test(); // run self-test implementations
378
379 // Launch interactive mode where user can input messages and see
380 // their hash
381 interactive();
382 return 0;
383}
static void test()
Self-test implementations of well-known MD5 hashes.
Definition md5.cpp:297
static void interactive()
Puts user in a loop where inputs can be given and MD5 hash will be computed and printed.
Definition md5.cpp:351
Here is the call graph for this function:

◆ sig2hex()

std::string hashing::md5::sig2hex ( void * sig)

Transforms the 128-bit MD5 signature into a 32 char hex string.

Parameters
sigThe MD5 signature (Expected 16 bytes)
Returns
std::string The hex signature
122 {
123 const char* hexChars = "0123456789abcdef";
124 auto* intsig = static_cast<uint8_t*>(sig);
125 std::string hex = "";
126 for (uint8_t i = 0; i < 16; i++) {
127 hex.push_back(hexChars[(intsig[i] >> 4) & 0xF]);
128 hex.push_back(hexChars[(intsig[i]) & 0xF]);
129 }
130 return hex;
131}
T hex(T... args)
Here is the call graph for this function:

◆ test()

static void test ( )
static

Self-test implementations of well-known MD5 hashes.

Returns
void
297 {
298 // Hashes empty string and stores signature
299 void* sig = hashing::md5::hash("");
300 std::cout << "Hashing empty string" << std::endl;
301 // Prints signature hex representation
303 // Test with cassert whether sig is correct from the expected value
304 assert(hashing::md5::sig2hex(sig).compare(
305 "d41d8cd98f00b204e9800998ecf8427e") == 0);
306
307 // Hashes "The quick brown fox jumps over the lazy dog" and stores signature
308 void* sig2 =
309 hashing::md5::hash("The quick brown fox jumps over the lazy dog");
310 std::cout << "Hashing The quick brown fox jumps over the lazy dog"
311 << std::endl;
312 // Prints signature hex representation
314 // Test with cassert whether sig is correct from the expected value
315 assert(hashing::md5::sig2hex(sig2).compare(
316 "9e107d9d372bb6826bd81d3542a419d6") == 0);
317
318 // Hashes "The quick brown fox jumps over the lazy dog." (notice the
319 // additional period) and stores signature
320 void* sig3 =
321 hashing::md5::hash("The quick brown fox jumps over the lazy dog.");
322 std::cout << "Hashing "
323 "The quick brown fox jumps over the lazy dog."
324 << std::endl;
325 // Prints signature hex representation
327 // Test with cassert whether sig is correct from the expected value
328 assert(hashing::md5::sig2hex(sig3).compare(
329 "e4d909c290d0fb1ca068ffaddf22cbd0") == 0);
330
331 // Hashes "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
332 // and stores signature
333 void* sig4 = hashing::md5::hash(
334 "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
336 << "Hashing "
337 "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
338 << std::endl;
339 // Prints signature hex representation
341 // Test with cassert whether sig is correct from the expected value
342 assert(hashing::md5::sig2hex(sig4).compare(
343 "d174ab98d277d9f5a5611c2c9f419d9f") == 0);
344}
int compare(const void *a, const void *b)
Definition shell_sort2.cpp:87
Here is the call graph for this function:

◆ toLittleEndian32()

uint32_t hashing::md5::toLittleEndian32 ( uint32_t n)

Sets 32-bit integer to little-endian if needed.

Parameters
nNumber to set to little-endian (uint32_t)
Returns
uint32_t param n with binary representation as little-endian
89 {
90 if (!isBigEndian()) {
91 return ((n << 24) & 0xFF000000) | ((n << 8) & 0x00FF0000) |
92 ((n >> 8) & 0x0000FF00) | ((n >> 24) & 0x000000FF);
93 }
94 // Machine works on little endian, no need to change anything
95 return n;
96}
bool isBigEndian()
Checks whether integers are stored as big endian or not.
Definition md5.cpp:76
Here is the call graph for this function:

◆ toLittleEndian64()

uint64_t hashing::md5::toLittleEndian64 ( uint64_t n)

Sets 64-bit integer to little-endian if needed.

Parameters
nNumber to set to little-endian (uint64_t)
Returns
uint64_t param n with binary representation as little-endian
102 {
103 if (!isBigEndian()) {
104 return ((n << 56) & 0xFF00000000000000) |
105 ((n << 40) & 0x00FF000000000000) |
106 ((n << 24) & 0x0000FF0000000000) |
107 ((n << 8) & 0x000000FF00000000) |
108 ((n >> 8) & 0x00000000FF000000) |
109 ((n >> 24) & 0x0000000000FF0000) |
110 ((n >> 40) & 0x000000000000FF00) |
111 ((n >> 56) & 0x00000000000000FF);
112 ;
113 }
114 // Machine works on little endian, no need to change anything
115 return n;
116}
Here is the call graph for this function: