The future of encryption technology: from server password storage to user data encryption scheme

Posted on

Reprinted with the authorization of the author,Original link, author:Roronoa Zoro

This article mainly talks about the data encryption schemes in common scenarios, as well as the prospect of the future encryption technology

Facebook stores user password in clear text:

Hundreds of millions of Facebook users had their account passwords stored in plain text and searchable by thousands of Facebook employees — in some cases going back to 2012, KrebsOnSecurity has learned. Facebook says an ongoing investigation has so far found no indication that employees have abused access to this data.

As early as 2012, Facebook stored the account passwords of hundreds of millions of users in clear text, and thousands of Facebook employees could search at will

Original text:Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years

CSDN 6 million user account password disclosure:

According to the evening news of December 21, Beijing time, CSDN, the online community of China developer technology, issued a statement tonight, apologizing publicly for the “leakage of 6 million user accounts and passwords”, admitting that some user accounts are at risk and will temporarily shut down user login, and requiring users who “registered before April 2009 and have not changed their passwords since September 2010” to immediately change their passwords.

Original text:CSDN explains the details of password disclosure of 6 million users: temporarily shut down login

Why can’t I store passwords in clear text

Many novice programmers store passwords like this:

username phone password
Xiao Ming 18888888888 asd123456
Daming 17777777777 123abc!@#

Why is it unsafe to do so?

First of all, in case of data leakage, the plaintext password will directly expose the user’s privacy in the air, and anyone can log in to the account exposing the password and change it at will. Secondly, even if it won’t be disclosed, internal employees can easily access users’ plaintext passwords. When the company is on a large scale, you can’t guarantee that there are no bad people in the company. Will they search some users’ passwords and infringe on users’ privacy. So it is absolutely not safe to store passwords in plaintext.

Even if you use such a password: ppnn13% dkstfeb.1st (more than thirteen years old, cardamom top in early February), plaintext storage, security is also wood.

No matter how complex the password is, it can’t beat the plaintext of CSDN

From Zhihu users:Right Here

Digression:What is the most famous computer password in history?

password meaning
FLZX3000cY4yhx9day Flying down 3000 feet, suspected of nine days under the Milky way
hanshansi.location()!∈[gusucity] Hanshan Temple
hold?fish:palm You can’t have both fish and bear’s paws
Tree_0f0=sprintf(“2_Bird_ff0/a”) Two Orioles Singing green willows
csbt34.ydhl12s The green moss on the pool is three or four, and the Oriole at the bottom of the leaf is one or two
for_$n(@ RenSheng)_$n+=”die” Who has not died since ancient times
while(1)Ape1Cry&&Ape2Cry The apes on both sides of the Strait can’t stop singing
doWhile(1){LeavesFly();YangtzeRiverFlows()}; Boundless falling trees, endless Yangtze River rolling
dig?F*ckDang5 Hoe standing grain gradually pawning a midday

How to store & check password

Since passwords cannot be stored in clear text, how can they be stored safely? How do I check that the password entered by the user is correct?

It is necessary to store relevant information for verification. Is there a mechanism that can save only part of the password information and also be used for password verification? In this way, even if the database is leaked, the attacker can’t push back the user’s password through the information, so as to protect the user’s account security.

hash function (hash function) can solve this problem.

The future of encryption technology: from server password storage to user data encryption scheme

Hash functions are unidirectional and irreversible. It’s easy to understand from the figure above that some information will be discarded after passing through hash functions, just like this algorithm:

Algorithm: when storing the user name, discard the user’s last name, and then randomly scramble the order, input Zhao day, output day.

Even if we know the algorithm and the data of TIANRI, we can’t infer the name of Zhao Litian, because some information is lost.

h = hash(p)

H is the final value stored in the database, P is the user’s original password, when the user logs in, enter the password P1, we calculate $H_ 1 = hash(p_ 1) To determine whether H1 is the same as the record h in the database and whether the password entered by the user is correct.

All hash functions have one property: if two H values are inconsistent, then the input p value is not the same (one-way hash function), but on the other hand, the input and output are not one-to-one correspondence, for example, there are different H values, so that the p value calculated by hash function is the same.

Is hash function safe?

No, because of the nature of the above hash function, if two users use 123456abc, the H value stored in the database is the same, and the H value calculated by different passwords may be the same (collision attack), then the attacker can violently calculate all the possibilities according to the hash function and make a table to get the H value You can infer that the password is 123456abc. This practice is calledRainbow WatchAttack.

The future of encryption technology: from server password storage to user data encryption scheme

For example, the commonly used hash functionMD5Because of the increasing computational power, it is no longer safe

Since 1996, MD5 has been proved to be weak and can be cracked. For data that need high security, experts generally suggest using other algorithms, such as SHA-2. In 2004, it was proved that MD5 algorithm can not prevent collision attack, so it is not suitable for security authentication, such as SSL public key authentication or digital signature

Safety reinforcement: adding salt

For example, a random salt value (salt value) is recorded when each user’s password is stored. The salt value and password P are used to calculate the H value

h = hash(salt, p)

The database also stores the salt value and H value, so if the attacker wants to obtain a user’s password, he has to establish a corresponding rainbow table, which increases the cost of the attacker.

But even so, it is not safe to use SHA-2 to add salt, because the computing power is increasing year by year, and the attack cost is decreasing. The attack group with financial resources can still establish these rainbow tables, and then steal the user’s password.

Safety reinforcement: improving calculation strength

If we can use hardware to control the time of each hash calculation, such as 1 second, and no matter what machine or high-performance CPU is used, it takes 1 second each time. If an attacker wants to calculate the rainbow table, 10 million combinations will take 115 days (hash space is far more than 10 million), then this method is very difficult to crack.

The future of encryption technology: from server password storage to user data encryption scheme

bcryptIt is a cryptographic hash function designed by Niels Provos and David Mazi è res according to the blowfish encryption algorithm, which was shown in USENIX in 1999. In the implementation, bcrypt will use a process of adding salt to defend against rainbow table attack. At the same time, bcrypt is also an adaptive function, which can resist the increasing computing power of computer by increasing the number of iterations.

In addition to encrypting your data, by default, bcrypt will use random data to override the original input file three times before deleting the data, in order to block attempts by people who may have access to your computer data to recover the data. If you don’t want to use this feature, you can set it to disable.

In addition to bcrypt, an algorithm that adjusts the computing strength and resists the attack risk brought by the increasing CPU computing power, scrypt algorithm also uses memory space and takes up a certain amount of content for each calculation. However, bcrypt algorithm is widely used due to its mature implementation. Spring boot security uses this algorithm for password encryption.

For example, after bcrypt encryption, a password becomes like this:


2A represents the version of bcrypt algorithm. 07 represents the number of iterations. The higher the number of iterations, the longer the time required for each calculation. The following woshiyigesaltzhi $represents the salt value for encryption. The database can directly store this field, for example:

name phone pwd_hash
Xiao Ming 1234 $2a$07$woshiyigesaltzhi$$$$$.lrU488y7E1Xw.JA4uizIu.PBSSe7t4y

This is also the password storage method recommended in this paper, hash + salt + computing strength, can better protect the user password security, because the login is not a frequent operation, it does not matter that the user waits for one second each time.

User data password encryption scheme: Double hash

Password information can be hash to achieve the purpose of irreversibility, but some user data is reversible and requires encryption. What should we do? For example, users’ online documents are encrypted and decrypted by user-defined passwords.

It’s easy to think ofAES256And so on

e = AES256(salt, text)

By taking the user’s password as the salt value to encrypt and decrypt the document, the server stores the user’s password.

As for the previous problem, it is not safe to store passwords in plaintext. Here, we can use the method of double hash to ensure certain security

Implementation plan

The user password storage still adopts the scheme mentioned above, and bcrypt algorithm can be used. Here, the stored value is H1. When the user requests to encrypt data, the encrypted password is provided. We check the correctness of the user password through H1, and calculate another hash value with the user’s password, which is recorded as H2. The calculation method of H2 is different from that of H1, but it is simple In this case, we use H2 to encrypt and decrypt user documents

e = AES256(h2, text)

Remember, the value of H2 cannot be stored (cannot be saved in the database) for encrypting and decrypting data. Since the value of H2 is used up, it will be discarded. The hash function of H2 can be private to further ensure the security.

Why do you need the parameter H2? Firstly, the length of user passwords is inconsistent. Symmetric encryption algorithms such as AES need a fixed length encryption parameter. Secondly, after hashing, the data security can be further guaranteed. If the database is leaked, the attacker can not decrypt even if he knows the user password and does not know the private hash function.

Encryption of other user data

The encryption scheme mentioned above is controlled by the user’s password, and the data security is very high (only the user knows the password, and the data can’t be recovered after the password is lost). However, this method is not suitable for many scenarios, such as the conventional user data: mobile phone number, social account number, address, name, high-frequency access data is not suitable for password encryption, and the efficiency is too low, so how to protect it What about the security of such data?

To understand this problem, we need to know how user data is transmitted

The future of encryption technology: from server password storage to user data encryption scheme

The data generated by the user on the client software (such as the browser) is encrypted and transmitted to the back-end server through HTTPS, processed by the server software (such as Java), and then stored in the storage device (such as the hard disk) through the database software (such as MySQL) after calling the database interface.

There are four stages for data encryption. The closer the encryption process is to the user, the more secure it is

  • Server software encryption: after the data arrives at the server, it is immediately encrypted and stored (executed in memory), for example, Java executes aes256
  • Database software encryption: call database API to realize database encryption, such as AES encryption of MySQL
  • Storage side off disk encryption: use hardware encryption technology for encrypted storage, such as cloud disk encryption function provided by cloud service provider
Encryption mode Prevent internal leakage Prevent database leakage Prevent loss and leakage of physical machine
Server software encryption √ (most scenes)
Database software encryption
Storage side disk dropping encryption

The former two encryption methods can ensure that even the database administrator can not view the user’s data. The last one is usually of little significance, but it still needs to be used due to the legal requirements of some countries and regions, or the user’s requirements for hard disk encryption measures. There is still a risk of internal leakage in database software encryption, such as MySQL binlog. Even if you use aes256, the key will be stored in binlog during data synchronization. There is a way of leakage.

If there is no need to search the relevant user data (only conventional reading and writing), you can use server encryption or database encryption to protect user data. But if the relevant data needs to support the search function, this problem is very difficult.

Searchable encryption technology

This paperPractical techniques for searches on encrypted dataA new research direction was published in 2000Searchable EncryptionThe first practical searchable encryption scheme SWP is proposed. The implementation idea is: encrypt each word, and then embed a hash value into the ciphertext. The server extracts and changes the hash value to check whether there is a similar special format in the ciphertext and confirm whether it matches the search.

The future of encryption technology: from server password storage to user data encryption scheme

The above idea is very ideal, but there will be many difficulties when landing. For example, we must use fixed size words, but the most important thing of the search system is its word segmentation engine. How to segment words in multiple languages directly determines the search effect. Many search systems still use the following structure:

The future of encryption technology: from server password storage to user data encryption scheme

One way synchronization of MySQL data to elasticsearch to complete the support of related search function. More than 20 years later, today, searchable encryption technology is still unable to be used. It can even be said that if the software provider provides the search function, the data is stored without encryption (hard disk encryption is not visible to the software layer), and the encrypted data cannot support the search function.

The premise of conspiracy realization

If I am a bad person, to realize a conspiracy, the very important premise is that the operation is simple enough and the number of people who know it is small enough. When the complexity is too large or there are many people in need, it is impossible to realize the conspiracy.

Knowing this, we can easily judge that “the landing of the United States on the moon is false”, and this conspiracy conjecture is wrong. Because the moon landing project involves a lot of people and the project is complex, it is impossible to realize this conspiracy.

This conclusion is the basis of the following.

What is the security of some software claims?

The searchable encryption technology mentioned above can not be implemented. Some software manufacturers (including some large ones) provide search functions and still claim that they encrypt user data and are very secure. What are they talking about?

First, they may be talking about hard disk encryption, not software layer encryption (they can only defend against the risk of hard disk theft, but not against the risk of data leakage and internal risk); second, they may be talking about encryption in the transmission process, such as HTTPS or private communication encryption protocol; third, they may have a perfect internal management process to control internal and external risks.

Using the premise of the above conspiracy theory, as long as we add some processes, make them open and transparent, and increase the cost of sabotage, we can also guarantee the security of user data when the technology is not available, such as:

  • Database operation log review: other people review whether the DBA database operation is compliant, and whether to secretly view a user’s data, etc
  • Multi password: the password is mastered by many people. Only when the password is input uniformly can the data be decrypted, which improves the operation complexity, such as multi person audit
  • Open and transparent process: confidential operation process record, for example, XX needs to decrypt data due to development test

Having a perfect process can also let users rest assured to store private data.

The future of encryption technology: no encryption

The future of encryption technology: from server password storage to user data encryption scheme

The client side encryption not mentioned above is put here. The closer the encryption point is to the user, the more secure it is. If the data sent by the user is encrypted (non HTTPS), the more secure it is Class transmission encryption), and control the password by itself, so that the service provider does not need to do further encryption processing. The title of “no encryption” refers to this. The service provider does not have to spend a lot of energy and cost on data security to give the data ownership to the user.

One of the technologies required for client encryption is:fully homomorphic encryption In September 2009, Craig gentry, a doctor of IBM, published a paperFully Homomorphic Encryption Using Ideal LatticesA feasible method is proposed, which solves a big problem of cryptography. Holomorphic encryption can be simply understood as follows:

f(data) = DE( f( E(data) ))

Where f is an arbitrary operation function, e is an encryption function, and De is a decryption function. That is to say, any calculation operation on ciphertext is equivalent to the same operation on plaintext.

That’s awesome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~!!!!!!!!!!! For example, I have some financial data in my hand, which needs statistical analysis by a third party organization, but I don’t want to give them data directly. They tell me that they can provide fully homomorphic encryption service, so I can encrypt the data before giving them the data. They give me the results of statistical calculation, and I can decrypt the relevant results. The real data is only available to me.

These features of fully homomorphic encryption can well solve the problems of data security and trust. Craig gentry has given an implementation, and many cryptographers have also given other implementation methods. However, from the current point of view, this technology is not mature, for example, a key has to be 100 MB, which can not be used in the current network environment.


There is no absolute security, only relative security. It is expected that holomorphic encryption technology will make a breakthrough in the commercial field.

This work adoptsCC agreementReprint must indicate the author and the link of this article

Leave a Reply

Your email address will not be published.