RSA encryption is a commonly used encryption algorithm (I will not write it myself). This article attempts to use the most basic way to explain the calculation steps, theoretical basis and proof of each step in RSA; even if you have forgotten all the cryptography you have learned before Knowledge, you can fully understand RSA by following the ideas of this article, and you don’t have to memorize things you don’t fully understand. The entire article will be a bit long. If you encounter a formula or theorem that you already understand, you can skip it directly.
RSA encryption process
First, we will introduce the RSA encryption process. The content marked with [] will be explained later:
- Find two large prime numbers: p, q; calculate the product n; it can be guaranteed that no one can pass n and separate p and q.
- Calculate another number fn = (p – 1) * (q – 1); this fn means: between 1 and n, the number of numbers that are relatively prime to n [1] (This comment explains why fn can represent This number), because n cannot be split, so the attacker cannot calculate fn.
- Then choose a prime number e = 65537 as the public key (you can also choose something else), e is a prime number.
- Find the private key d, which satisfies e * d mod f == 1 [2] (this note describes how to find d).
The above is the preparation work; the person who prepares these data and saves the private key is called the receiver, and the person who obtains the public key through plain text and encrypts it using the public key is called the sender.
After the sender obtains the public key [e, n], it encrypts it in the following way:
- Now assume that there is a number m that needs to be encrypted, ensuring that m<n; (If it is relatively long, it can be added in sections, and the receiver can solve it in sections, always ensuring that m<n);
- The sender calculates the ciphertext c = m ^ e mod n and sends it to the receiver.
- After the receiver obtains the ciphertext c and its own private key [d,n], it decrypts it in the following way:
- The receiver calculates m’ = c ^ d mod n.
- m’ == m must be established and decryption is completed [3] (This comment explains why m’ == m is established, which is the theoretical basis of RSA)
【1】Why fn = (p-1) * (q-1) represents the number of numbers relatively prime to n in [1, n]
- Because p is a prime number, in [1, p], all numbers except p are relatively prime to p, so fp = (p – 1)
- Because q is a prime number, in [1, q], all numbers other than q are relatively prime with q, so fq = (q – 1)
- Because n = p * q, n is not a prime number, so the above rule cannot be applied: However, the number of numbers that are relatively prime to n can be calculated through the elimination method.
- In [1, n], there are total numbers p * q – 1 except n; the numbers that are not relatively prime with n include: A. All numbers smaller than p, the product of q, total p – 1; B .The product of all numbers smaller than q and p, totaling q-1; therefore, there are the number of numbers in [1,n] that are relatively prime to n: fn = p * q – 1 – (p – 1) – ( q – 1) = p * q – p – q + 1 = (p – 1) * (q – 1).
- To sum up: when p and q are prime numbers and n = p * q, fn = (p – 1) * (q – 1) can represent the number of numbers between [1, n] that are relatively prime to n.
【2】When the prime number e and the number fn are known, how to find d so that e * d mod fn = 1
In order to illustrate this process more clearly, here is a simple example:
Assume: p = 3; q = 13; then n = 39; fn = 2 * 12 = 24; now, I can specify any prime number as the public key e. Suppose I specify e = 11 and need to find a number d, Satisfies e * d mod fn = 1.
According to the most intuitive operation method, violently traverse d = 1, 2, 3, 4… until you find a d that satisfies d * e mod fn = 1:
11 mod 24 = 11
22 mod 24 = 22
33 mod 24 = 9
44 mod 24 = 20
55 mod 24 = 7
66 mod 24 = 18
77 mod 24 = 5
88 mod 24 = 16
99 mod 24 = 3
110 mod 24 = 14
121 mod 24 = 1
Ending; in fact, the process is looking for:
x * 11 – y * 24 = 1;
So the question becomes: If I want to get 1, how many 11 (public keys) do I need to use? This “how many” is the private key. Because it is private key * public key mod n = 1.
Then, we demonstrate the process of solving the greatest common divisor of 11 and 24. What we need is not the greatest common divisor, because the greatest common divisor you require must be 1. We only need the e used in the process of solving the greatest common divisor. number.
Note: In fact, because RSA only stipulates that the private keys p and q are both large prime numbers, what if (p – 1) * (q – 1) happens to be a multiple of e? In this case, the greatest common divisor is e instead of 1, so if you encounter this situation, you need to change e: you only need to calculate once whether (p – 1)*(q – 1)/e is an integer That’s it: if it is not an integer, you can use it. If it is an integer, re-select e. As long as you ensure that fn = (p – 1) * (q – 1) is not an integer multiple of e, then fn and e must be relatively prime. , when fn and e are relatively prime, the greatest common divisor must be 1, which needs to be ensured when selecting e.
Then we list the greatest common divisor solution process (euclidean subtraction):
24 – 11 = 13
13 – 11 = 2
11 – 2 = 9
9 – 2 = 7
7 – 2 = 5
5 – 2 = 3
3 – 2 = 1
This process is complete, so what value do we want? The number of 11 used in each step, mark it at the end:
24 – 11 = 13 How many 11s are used? 1, because 24 does not use 11, and 11 uses 1 11, so there is 1 in total.
13 – 11 = 2 How many 11s are used? 2, because according to the previous step: 13 uses 1 11; this step 11 uses 1 11, so there are 2 in total
11 – 2 = 9 How many 11s are used? 3, because according to the previous step: 2 uses 2 11s; this step 11 uses 1 11, so there are 3 in total
9 – 2 = 7 How many 11s are used? 5, because according to the previous step: 9 uses 3 11s; the previous step 2 uses 2 11s, so there are 5 in total
7 – 2 = 5 How many 11 are used? 7, because according to the previous step: 5 are used for 7; 2 are used for 2 in the second step, a total of 7
5 – 2 = 3 How many 11s are used? 9, because in the previous step: 5 used 7; in the second step 2 used 2, a total of 9
3 – 2 = 1 How many 11’s are used? 11 pieces, because the previous step: 3 used 9 pieces; the second step 2 used 2 pieces; a total of 11 pieces
At the end, when we get 1, we use 11 11s. The first 11 is d, and the last 11 is e. (Emmm, this example is not very good, e and d are equal, but I have already written this, so I won’t find other examples again. You can calculate and verify by yourself. Normally, e and d are not equal. Same, exactly equal here)
Then why are the number of 11 calculated by the two methods consistent. If you already understand, you can skip the following example. If you still don’t understand, you can read this example:
【2.1】Example of going up and down stairs
According to the above steps of solving the greatest common divisor, is it very similar to a dynamic programming: a list records the results of each step, and the results of the next step depend on the previous results; when it comes to dynamic programming, you have to go up and down the steps. If I put the question Change it to: A little person goes up and down the stairs. Each time he can only choose one of: A. Walk up 11 steps or B. Walk down 24 steps. The starting position is 0. Question: How many times should you walk A at least? Only then can you reach level 1.
The first method: As long as it does not exceed 24, keep going up 11. If it exceeds 24, go down 24 until you reach 1, and record the number of times you go up 11.
The second method is smarter:
- As long as it doesn’t exceed 24, I keep walking up to 11. If it exceeds 24, I will go up to 11: 3 times, the next time, to 9; I will record it: it takes 3 11 to go up 9 steps; then next time when I am about to reach 24, Is it possible to give priority to 9 and directly use the result of step 9 three times?
- Starting from 9, go up 11 to 20, not exceeding; then go up 11, to 31, exceeded. At this time, I can see if I don’t use 11, but use the smallest one to satisfy the number of steps exceeding 24: 9, 20 + 9 = 29, which is also exceeded; then I will use 9 directly; so adding 1 to the original 3 and adding 3 to get 7; finally reached 5; record: I walked up 5 steps and used 7 11, next time Can use.
- Starting from 5, it goes up to 11, to 16, which is not exceeded; then it goes up to 11, and it is exceeded. At this time, can I choose a smaller value to replace 11:5? No, because 16 + 5 = 21 < 24; Is 9 OK? OK, because 16 + 9 = 25 > 24; so choose 9, which means adding 1 11 to the original 7 11s, and then adding 3 11s: 7 + 1 +3 = 11, and then you can see that the current 25 – 24 = 1, indicating that the result requirements are met, so a total of 11 11s are needed to reach level 1.
The third method is a little smarter:
Record two numbers, the minimum upward step m, the minimum downward step n, the initial value m = 11, n = 24; record the other two values stepM, stepN, indicating the current minimum step up/down number, the number of 11 is used, then the initial values are: mNeed11 = 1, nNeed11 = 0. Start looping:
- Let me first see which one is bigger, upward or downward? Big downward: Then I can subtract as many 11 as possible from 24 to get a downward order closest to 1: is it 24 – 2 * 11 = 2; then replace the previous minimum order 24 with this 2 : m = 11, n = 2, update nNeed11 = 2, because two 11s are subtracted.
- Let me see which one is bigger, m or n? Upwards: Then you can subtract as many 2s as possible from 11 to get an upward order closest to 1: is it 11 – 5 * 2 = 1; then use this 1 to replace the previous minimum upward order 11: m = 1, n = 2, mNeed11 = mNeed11 + 5 * nNeed11 = 11, because you used one upward m to subtract as many 5 n as possible, so the previous mNeed11 was added later.
At this point m is already equal to 1, so exit the loop.
Then take a look at these three methods: method one is actually a method of adding 11 upwards one by one, and subtracting 24 from 24; method three is actually a method of finding the greatest common divisor by euclidean division. In the process of solving the greatest common divisor of e and fn, Record the number of e used.
This example is over. Summary: Solving for e * d mod fn = 1 and solving for the greatest common divisor by euclidean division are related because; the former requires an intermediate product of the latter, not the greatest common divisor itself.
【3】m’ == m must be established
The interval is quite far, let’s re-describe the problem:
m is plain text, any positive integer; p and q are prime numbers; n = p * q; fn = (p – 1) * (q – 1);
e is any prime number, and d satisfies e * d mod fn = 1;
c is ciphertext, c = m ^ e mod n
m’ is the decrypted text, m’ = c ^ d mod n
Proof: m’ == m
To clear some obstacles before telling the final proof, let’s first prove three small corollaries:
【3.1】Module distribution rate: “The module of the product is equal to the module of the product”
That is: a * b mod c = (a mod c) * (b mod c) mod c (I made up the “modular allocation rate”, I don’t know how to call it, this is easy to read and easy to remember)
Proof: Split a and b into the form of the sum of multiples and remainders of c:
Let: a = k * c + x; b = s * c + y; k and s are multiples, x and y are remainders, x and y must be less than or equal to c.
Then there is: a * b mod c = (k * c + x) * (s * c + y) mod c = (k * m* c^2 + k * c * y + m * c * x + x * y) mod c
The first three terms all contain the c factor, so they will be modulated out, so the final remainder is:
a * b mod c = x * y mod c = (a mod c) * (b mod c) mod c
Get certified.
【3.2】If a and c are mutually prime and b and c are mutually prime, then a * b and c are mutually prime
Proof by contradiction: If a * b and c are not mutually prime, then there is a factor m greater than 1, which is in a * b and c at the same time; for a * b, if you want to have a factor m, then the factor m is either in a, or In b, or in a, b; in either case, it is mutually prime with a, c and contradicts a, b, so there is no m; so a * b is coprime with c.
Get certified.
【3.3】If a and b are relatively prime, then a mod b and b are relatively prime
Decompose a into multiples and remainders in the form a = k * b + x; then x = a mod b; because k * b must not be co-prime with b, and if x and b are not co-prime, then k * b + x and b must not be mutually prime, and contradict a and b, so x and b must be mutually prime.
Get certified.
【3.4】Start proving m’ == m, first prove this problem: for two numbers that are relatively prime: a and b; between [1, a], for all numbers x that are relatively prime with a, the calculated y = b * x mod a are not equal.
That is: all y and x are mapped one-to-one, and there is no situation where two different xi and xj can calculate the same y.
Proof by contradiction: I randomly select two numbers, xi and xj, that are relatively prime with a from [1,a]; if there is: b * xi mod a == b * xj mod a; then: (b * xj – b * xi) must be an integer multiple of a, which is equivalent to subtracting the same remainders, and the rest are multiples of a. This is easy to understand; it means: b * (xj – xi) is an integer of a times.
Because |xj – xi| must be smaller than a, so if you want to extract the factor a from b * (xj – xi), you must extract a factor greater than 1 from b and multiply it by (xj – xi), Only then is it possible to make up a factor a; assuming that a factor m greater than 1 is extracted from b, and a is made up; then the factor m must also be a factor of a; and there are factors m greater than 1 for a and b, It contradicts a and b, so it does not exist; therefore, b * (xj – xi) cannot be an integer multiple of a. That is to say, any pair: b * xi mod a and b * xj mod a are not equal.
Get certified.
[3.5] For all y = b * x mod a in [3.4], they are relatively prime with a.
Obviously, according to [3.2]: because b and a are mutually prime and x and a are mutually prime, so b * x and a are mutually prime; and according to [3.3] because b * x and a are mutually prime, so b * x mod a and a is relatively prime.
Get certified.
[3.6] Combine [3.4] and [3.5] to make inference:
Because all y in [3.4] are smaller than a; and all y correspond to x one-to-one without repetition, so the quantities are equal; and all y are relatively prime with a. If I put all x into the set X and all y into the set Y, then X and Y must be equivalent, indicating: all positive integers between [1, a] that are relatively prime to a.
Next: Because set X and set Y are equivalent, the product of all elements in the two sets must be equal:
x1 * x2 * x3 … x(f(a)) == y1 * y2 * y3 … y(f(a))
Sure, the elements inside are all the same, but the order of multiplication may not be equal, so none of them can be missed.
Then replace both the y side with the form: y = b * x mod a
x1 * x2 * … == (b * x1 mod a) * (b * x2 mod a) …
Mods on both sides are still equal:
x1 * x2 * … mod a == (b * x1 mod a) * (b * x2 mod a) … mod a
According to [3.1] “The module of the product is equal to the module of the product”:
x1 * x2 * … mod a = (b * x1 mod a) * (b * x2 mod a) … mod a = (b * x1) * (b * x2) … mod a = b ^ f(a) * (x1*x2…) mod a
f(a) is the number of elements in the set X, that is, the number of all numbers in [1,a] that are relatively prime to a.
The condition that the mods on both sides are equal is equivalent to subtracting the two sides and leaving k a’s, k is arbitrary;
So there are right – left:
(b ^ fa – 1) * (x1 * x2 * …) = k * a
Because x1,
Therefore: b ^ fa mod a = 1 is established. 【Theorem A】
Based on this inference, we apply it to RSA. The above formula a is n in RSA, and b is m in RSA;
So:
c = m ^ e mod n; e * d mod fn = 1;
There are: m’ = m ^ (e * d) mod n = m ^ (k * fn + 1) mod n = m * (m^fn * m^ fn … *m^fn) mod n
According to [3.1] “The module of the product is equal to the module of the product”:
m’ ={ m mod n * [ (m^fn mod n) *…] } mod n
According to the previous [Theorem A], m ^ fn mod n = 1, so the brackets in the above formula are a bunch of 1 multiplied, so:
m’ = {m mod n * (1 * 1 * 1 …)} mod n = m mod n
Because RSA always guarantees m < n when intercepting ciphertext m, so m mod n = m, that is, m’ = m.
Proof: decrypted text m’ == original text m; RSA encryption and decryption is established.
The above [Theorem A] is: Euler’s theorem, the core of which is to prove that the set All the b’s in y were proposed separately to form the form b^fa. Finally, mathematical transformation verified that: b^(fa – 1) mod a = 1 is established.
【3.7】The case where m and n are not mutually prime
In the above [3.6], we simply applied [Theorem A]. The requirements for a and b in [Theorem A] are: a and b are mutually prime. However, in RSA, our requirement for the original text m is only : m < n, mutual prime is not guaranteed; in actual situations: m may appear any number smaller than n; n = p * q; n must not be a prime number, so there is no guarantee that m and n are mutually prime; but in this case In this case, RSA still holds, and the proof is given below:
Consider the few cases where m and n are not mutually prime: Since m < n; n = p * q; p and q are both prime numbers; then there are only the following cases where m and n are not mutually prime:
- m = k * p; k < q;
- m = k * q; k < p;
Since p and q are equivalent, we only need to prove that RSA still holds in one case and is completely symmetric in the other case.
Let’s discuss the case m = k * p; k < q;:
Goal: Calculate c = (k * p) ^ e mod n, verify: m’ = c ^ d mod n is equal to m.
Since p and q are prime numbers and k and q are mutually prime, according to [3.2] k * p and q are mutually prime; therefore, k * p and q satisfy Euler’s theorem:
(k * p) ^ fq mod q = 1
Then: [ (k * p) ^ fq ] * [(k * p) ^ fq] mod q = ?
Obviously it is still 1. According to [3.1] “the module of the product is equal to the module of the product”:
Original formula = { [ (k * p) ^ fq mod q] * [ (k * p) ^ fq mod q] } mod q = (1 * 1) mod q
Does that mean that the multiplication mod q of any (k * p) ^ fq is 1, so the multiplication of fp is also 1, so there is:
(k * p) ^(fp * fq) mod q = 1
fp = p – 1; fq = q -1; fn = (p – 1) * (q – 1); so:
(k * p) ^ fn mod q = 1; means:
(k * p) ^ fn = s * q + 1; s is arbitrary, I don’t know how many.
Multiply both sides by k * p at the same time:
(k * p) ^ (fn + 1) = s * q * k * p + k * p
Mod n on both sides:
(k * p) ^ (fn + 1) mod n = k * p
According to RSA, e * d mod fn = 1; therefore: e * d = s * fn + 1; s is arbitrary;
Combining the above formula: I now need to base it on:
(k * p) ^ (fn + 1) mod n = k * p
roll out:
(k * p) ^ (s * fn + 1) mod n = k * p
What if I say I can based on:
(k * p) ^ (fn + 1) mod n = k * p
roll out:
(k * p) ^ (2 * fn + 1) mod n = k * p;
Does it mean that if I replace 2 with s; s >= 1; it is true:
Then dismantle:
(k * p) ^ (2 * fn + 1) mod n = [ (k * p) ^ fn * (k * p)^(fn + 1)] mod n = [ (k * p) ^ fn mod n ] * {[(k * p)^(fn + 1)] mod n} mod n
Is the latter term of fn + 1 equal to k * p:
Original formula = [ (k * p) ^ fn mod n ] * (k * p) mod n
Because k * p must be smaller than n, so k * p = k * p mod n
Original formula = [ (k * p) ^ fn mod n ] * (k * p mod n) mod n
Put it together again:
Original formula = [ (k * p) ^ fn * (k * p) mod n ] mod n = (k * p) ^ (fn + 1) mod n = k * p
No matter how many (k * p) ^ fn are multiplied before, I can eat it like this, so:
m’ = (k * p) ^ (s * fn + 1) mod n = (k * p) ^ (fn + 1) mod n = k * p = m
Therefore, RSA still holds when m and n are not mutually prime.
【4】Summary
Let’s summarize by re-narrating the RSA encryption and decryption process:
- By looking for two large prime numbers p and q, it is guaranteed that fn = fp * fq; fp and fq represent the number of numbers between [0, p/q] that are relatively prime to p/q. Since p and q are prime numbers, it is guaranteed : fp = (p – 1); fq = (q – 1); fn = (p – 1) * (q – 1) is established (f function is called Euler function, this function has some special properties, this article [1] What is proved is one of them: the properties when p and q are prime numbers, f(p * q) = fp * fq).
- Take the prime number e as the public key, and use euclidean division to solve for the private key d based on e and n. [2] Stage proof, and examples of up and down steps are listed to facilitate understanding of why euclidean division is used. Its essence is not to solve the greatest common divisor.
- Encrypt the original text m, c = m ^ e mod n
- To decrypt the ciphertext c, m’ = c ^ d mod n = m; including two situations: 1. m and n are mutually prime, proven by applying Euler’s theorem; 2. m and n are not mutually prime, [3.7] The proof is given at the stage. In both cases: m’ = m is established, so it is only necessary to ensure that m < n.
Ending, the above is the entire proof process of the theoretical basis that RSA relies on; all the arguments in this article start from the most basic theory to ensure that they can still understand without understanding the Euler function, Euler’s theorem, and the purpose of euclidean division. The reason why RSA encryption and decryption is established. At the same time, it also fills in the problems that arise during actual use: e and n are not co-prime, and m and n are not co-prime. It is a good article to fully understand RSA, hahaha; of course, for some people who have already compared For engineers who know RSA, some of the discussions in this article really seem protracted.