Entropy-Based Image Encryption Using Orthogonal Variable Spreading Factor (OVSF)

The purpose of image encryption is to provide data privacy and security. The former ensures that only authorized personnel can access the original content, while the latter implies that there is no evident relationship between the encrypted and the original content, and that the key space is equally likely and large enough. In the current state of the field, there are several proposals of image encryption techniques with very high privacy (in terms of entropy) but weak in terms of security (i.e., small key space). Recently, a new encoding-based method that provides a long key space (namely 8,57 × 10 506 ) with a middle value of entropy (87%) was proposed. Our proposal preserves the strength of the image encryption methods based on encoding, but with a higher value placed on security than the preliminary works. Every pixel of an image is mapped into an orthogonal code based on 256 bits. The 8-OVSF codes are selected to encode the image, given that the entropy of the inter-symbol is near the possible maximum. Numerous test results verify that our ciphered data have a very high value of entropy (98,5%) with an equally likely and long key space (8,57 × 10 506 ), thus providing an adequate balance between privacy and security.


Introduction
Nowadays, the number of images that are published on social networks, chats, public web pages, and other digital mediums is rapidly increasing. In some cases, these images contain confidential content, which is why their originator wants only authorized personnel to have access rights (privacy) through robust key (security) encryption schemes. Accordingly, the security and privacy standards of multimedia content have aroused the interest of the scientific community in the field of information technologies. Therefore, the quantity and quality of state-of-the-art digital privacy and security methods have increased. One way to provide privacy and security to digital image content is through encryption, by applying the properties of diffusion and confusion (Shannon, 1949). Diffusion implies that, if the image is slightly changed, the encrypted data must change significantly; regarding confusion, the direct relationship between the key and the encrypted data must be null. Typically, pixel permutation guarantees confusion, whereas pixel substitution is used for the diffusion property. The ideal result is a ciphered image with a uniform distribution (histogram), regardless of the original image's pixel characteristics.
For two decades, one of the most popular methods for image encryption was based on chaotic mapping. Proposed initially by Fridrich (1998), this approach divides the image into several sub-blocks of non-fixed size. Next, after a quantized map is formed, pixel permutation (place) and bit permutation (grayscale level) are applied. The simpler chaotic 1D-map is known as a 'logistic map' (Ye and Huang, 2017;Telem, Segning, Kenne, and Fotsin, 2014). More recent proposals include 2D and 3D sequences (Saljoughi, and Mirvaziri, 2019;Hua, Jin, Xu, and Huang, 2018), as well as dynamic functions at runtime (Asgari-Chenaghlu, Balafar, and Feizi-Derakhshi, 2019). Although the uncertainty (entropy) of the ciphered data in these methods has reached a very high value, security remains a significant challenge, given that these methods have been cryptanalyzed due to weaknesses in the diffusion rounds (Li, Xie, Liu, and Cheng, 2014;Feng, He, Li, and Li, 2019). Currently, researchers focus their efforts on the following aspects: (1) improving the confusion property, (2) improving the diffusion property, and (3) changing the relationship between confusion and diffusion.
To enhance the diffusion property, chaotic sequences are mixed with DNA-based techniques, which consist of changing the value of the shuffled pixel by an encoding process into nucleotides (Zhang, Fang, and Ren, 2014;Chai, Gan, Yuan, Chen, and Liu, 2019). Nevertheless, DNA encoding has also been cryptanalyzed (Wen, Yu, and Lü, 2019;Akhavan, Samsudin, and Akhshani, 2019). Another solution consists of elliptic curves for obtaining pseudo-random sequences (Hayat, and Azam, 2019). This method was also cryptanalyzed (Khoirom, Laiphrakpam, and Themrichon, 2018). Unlike the afore-mentioned methods, in literature, there are proposals that differ from the traditional schemes which have yet to be broken. For instance, Kumar and Quan (2019) analyze images using polar decomposition and the Shearlet transform, whereas, in Ballesteros, Peña, and Renza, (2018) diffusion and confusion tasks are performed through an encoding process using scrambled Collatz conjecturebased codes. However, the entropy value of the ciphered data fails to reach the possible maximum.
In summary, many proposals of methods concerning image encryption have very high entropy values, but suffer from security weaknesses. However, there are some nontraditional proposals that have yet to be cryptanalyzed. The lingering issue with these proposals is that their entropy values are not high enough. Therefore, it can be generally stated that systems that focus on privacy do not possess optimal security, and vice-versa. As a result, the trade-off between security and privacy is still a challenge to be overcome. It is thus necessary to increase the key space and equalize the likelihood in encryption keys for methods based on chaotic mapping while also improving the degree of ciphered data uncertainty in non-traditional schemes.
The highlights of the proposed method in this study are the following: • Image content privacy is provided through the Orthogonal Variable Spreading Factor (OVSF) coding process. Each pixel (8-bits) of the input image is encoded into an orthogonal code (256-bits) using a specific map obtained from a key. Due to the quasiperfect symmetry between ones and zeros of the entirety of orthogonal codes, a very high entropy value is theoretically expected.
• The maximum number of possible mappings between the 8-bit pixel values and the 256-bit orthogonal codes provides security to the ciphered content. It corresponds to the number of permutations used to scramble the 8-OVSF codes; a very high value of 256!
• Since the number of bits in the encrypted data is 32 times greater than the input image, it may be more convenient to group them into 16-bit words and write them as an audio file (samples). In other words, our proposed method is image-to-audio encryption.

Background concepts
The aim of this section is to provide some basic concepts which are necessary to understand the proposed method.

Entropy
In information theory, a well-known parameter that measures uncertainty and data distribution is entropy (Robinson, 2008). The greater the homogeneity in data distribution, the greater the value of the entropy. The most famous entropy formula is Shannon's entropy equation, calculated in terms of the probability of each available data value. It is shown as Equation (1): where is the input image (or audio), ( ) is the probability of occurrence of the value in the image (or audio), and is the total number of levels of the image (or audio), with = 2 for q bits of quantization. For example, in grayscale images, = 2 8 . If all intensity values have equal occurrence, then entropy is equal to q (8 in the example). However, if occurrence is not homogenous, the entropy value will be lower because uncertainty decreases.

Orthogonal Variable Spreading Factor (OVSF)
OVSF codes are originally used for channelization in Wideband Code Division Multiple Access (WCDMA). These codes are characterized by being orthogonal to one another, as well as their length being determined in terms of the level (L). An L-OVSF has 2 codes of 2 bits each. A way to obtain OVSF codes is through the application of an iterative tree that uses the value of the predecessor code as the root of the current code (Saini and Bhooshan, 2006). Each root has two descendants that double its length. The first  Figure 1 shows the 3-OVSF code tree.
The first root is code "1", which does not have a counterpart. In the 1-OVSF, there are 2 1 codes of 2 1 bits, obtained as follows: the first code doubles its root and the result is "11". The second one uses the root in the first part and complements it in the second part, thus obtaining "10". In the second level (2-OVSF), there are 4-codes with four bits each. Each pair of codes has the same root. For the first descendant, the root is repeated; for the second, the root is complemented. This process is repeated until all codes are obtained.
The main advantage of OVSF codes is orthogonality, which results in 50% of the bits in a pair of codes being different for every available pair. For example, the codes '1100' and '1001' are equal in the first and third bit, but the second and fourth bits are different.
In terms of entropy, OVSF is expected to provide a high value of uncertainty for the encrypted image due to the quantity of zeros being similar to the quantity of ones. For example, for 3-OVSF, there are seven codes where the number of zeros is equal to the number of ones; only the first code has all bits equal to "1". In total, there are 28 zeros (0,44%) and 36 ones (56%). For 8-OVSF, the number of bits equal to 0 is 32640 (0,498%), while the number of bits equal to 1 is 32 896 (0,502%). For higher values of , the quantity of zeros and ones is more symmetrical. Thus, the orthogonality property of OVSF codes is used in our proposal for obtaining high entropy values in encrypted images.

The proposed scheme
Recently, a scheme for image encryption based on the Collatz Conjecture was proposed (Ballesteros et al., 2018). Its proposal differs from classical schemes in some aspects: (i) an encoding block with a non-fixed length map is used to replace the permutation and diffusion processes; (ii) the output is not a ciphered image, but a ciphered audio; (iii) the number of available keys related to the security of the scheme is significantly higher than state-of-the-art methods. Since its encoding process uses binary codes obtained from the Collatz Conjecture, which does not have symmetry between zeros and ones, the entropy of the ciphered audio is not close to the highest possible value. The authors reported entropy values of 14 for audio files quantized to 16-bits. In the current proposal, the aim is to preserve the strengths of (Ballesteros et al., 2018) and improve upon its weaknesses by replacing the encoding block with 8-OVSF codes which should theoretically provide greater entropy given their orthogonality. As a result, the uncertainty about which image corresponds to the encrypted audio is greater compared to the scheme in Ballesteros et al. (2018). Figure 2 shows the proposed general diagram. The inputs of the image coding are the image ( 2 ) and the seed, the outputs are the ciphered audio (CA) and the key. In the image recovering module, the inputs are CA and the key, the output is the recovered image ( 2 ).

Image coding
This module is used at the transmitter stage in order to send the information with unintelligible content. Figure 3 presents the block diagram of this module, including the following blocks: generation of 8-OVSF codes, scrambling the 8-OVSF codes, mapping block, splitting the binary code, and creating the audio file. These blocks are detailed further below: Generation of 8-OVSF codes: In this block, 256 orthogonal codes of 256 bits are obtained. The creation of these codes follows the theory presented in the Orthogonal Variable Spreading Factor section. The first code of the 8-OVSF has all bits equal to one, while the remaining binary codes have an equal number of zeros and ones. These 256 codes form a matrix with dimensions 256 x 256, with each row being an orthogonal code. Unlike the method proposed by Ballesteros et al. (2018), the length of each code is fixed. The output of the block is given the name "OC" (OVSF Codes).

Scrambling the 8-OVSF codes:
The aim of this block is to disorder the matrix obtained in the previous block. Using a seed value, the order of the rows in the OC matrix is reorganized. The output is named "SOC" (Scrambled OVSF Codes) with dimensions being 256 x 256. This block provides security to the scheme (analyzed in detail in Section of Security analysis).

Mapping:
The 2D image ' 2 ' is converted into a row vector, from left to right and top to bottom. The output is named ' 1 '. Each value of 1 is mapped to an orthogonal code. Therefore, every 8-bit pixel is represented by an orthogonal code of 28 bits (i.e., 256 bits). In this regard, 1 is used as the multiplexor selector, where the inputs correspond to the rows in the SOC matrix. The output is a matrix of 256 columns (bits) with m x n rows, named ' 2 '.
Finally, the total number of bits of encrypted data will be 32 times greater than the original grayscale image. Once each pixel of the image has been mapped, values are arranged into a 1D vector. The output is named ' (Mapped Image).

Splitting the binary code and creating the audio file:
Next, it is necessary to split the binary code into 16 bit words. Each orthogonal code has 16 sub-blocks of 16 bits by block. According to Equation 2, the total number of sub-blocks, , is equal to: The result is a matrix of × 16, named ' '. Finally, every sequence of 16 bits of S is transformed into a floating point value, in the range of [−1 1], as a sample of the ciphered audio. It is saved as a WAV file with a specific value for frequency sampling (e.g., = 8 kHz). Time (in seconds) of the ciphered audio is defined by Equation 3: For example, if the original image is 128 × 128 and = 8 kHz, then = 32,768 (s). You can note that ts is the number of samples of ciphered audio.
Key: According to Shannon's theory, security of an encryption system must rely solely on the key. In our proposal, the key is composed of the seed, the number of rows ( ), and the number of columns ( ) in the image. However, security analysis is performed only on the seed.
The image encoding procedure is illustrated with the following example. Suppose that the system works with 2-OVSF codes (to simplify the example), and the orthogonal codes are '1111', '1100', '1010', and '1001'. The value of the OC matrix will be: = 1111 1100 1010 1001 (4) And now, suppose that from the seed value, the OC matrix is scrambled as follows: = 1001 1100 1111 1010 (5) Suppose also that the 2D image is 2 x 2, with the following values: Note that the highest value of 2 for the current example is 3; thus, the system can work with 2-OVSF. For our proposed method, the highest intensity of a pixel is 255, and therefore, it is necessary to work with 8-OVSF.
Continuing the example, 2 is rearranged into a 1D array, resulting in: Then, each value of 1 is used as the multiplexor selector in which the input is SOC and the output is 2 . For the current example, row 1 of SOC is selected, followed by row 3, row 2, and row 0. The following matrix is obtained: 2 = 1100 1010 1111 1010 (8) Next, 2 is transformed to 1D array: With this 1D array, one sample of the ciphered audio is obtained. However, for the real method proposed in this study, 16 samples are obtained for each pixel of the original image.

Image recovery
The proposed scheme is intended to provide covert communication between a transmitter and receiver. The ciphered audio and the key are transmitted in separate channels, e.g., email, social networks, public webpages, and others. Once the receiver acquires both the ciphered audio and the key, the image can be recovered (see Figure 4).
Each block is explained as follows:

Generation of 8-OVSF codes:
This block works in the same way as the corresponding block in the image coding module. The output is named 'OC', with a size of 256 x 256. Scrambling the 8-OVSF codes: Using the same seed of the image coding module, the OC matrix is reorganized in terms of its rows. The number of available options of the scrambled matrix was discussed in detail in the Security analysis section. The output is SOC.

Deciphering the audio:
The inputs of this block are the unintelligible, ciphered audio files and the SOC matrix. First, every sample of the ciphered audio, CA, is represented by 16 bits. Second, a 1D array sequence is obtained by the concatenation of all binary representations of CA; this is called 1 . The total number of bits is calculated using where is the total number of samples of CA.
Next, 1 is split into sub-blocks of 256 bits. The number of sub-blocks is obtained by using the ratio between bits and 256; then, the result is /16. If the process has been performed successfully, /16 must be equal to × . Next, the sub-blocks are arranged into a matrix of /16 rows with 256 columns named 2 . This means that every row of 1 is composed of orthogonal code. Finally, the 2 code is compared against any SOC code, and subsequently restores the row position of the corresponding match, named 1 . Creating the grayscale image: Inputs of this block include the key (specifically the values of and ) and 1 , which is reorganized into a matrix of × . The result is the recovered image, 2 .
The above steps are illustrated through an example: In a similar way to the previously outlined example regarding the image coding module, suppose the system works with 2-OVSF codes and the orthogonal codes are '1111', '1100', '1010', and '1001'. The value of the OC matrix is: The first 2 code, '1100', is compared against each SOC code as the algorithm searches for a match. It is found that if row zero of 2 is equal to the first row of SOC, then the returned value is 1. If the first row of 2 , '1010', is matched with the third row of SOC, then the returned value is 3. Next, the second row of 2 , '1111', is matched with the 2 row of SOC, and the returned value is 2. Finally, the 3 row of 2 , "1001", is matched with row zero of OC, and the returned value is 0. At the end, the returned value is 1 = [1320]. With = 2 and = 2, the recovered image is: It is easy to verify that 2 is equal to 2 .

Performance assessment
Certain metrics have been selected in order to evaluate the performance of the proposed method in terms of similarity between the input image and the recovered image, as well as the quality of the CA.

Metrics to evaluate image similarity
Among the metrics commonly used to measure the image similarity are the Structural Similarity Index (SSIM) and Peak Signal to Noise Ratio (PSNR). These metrics are explained below.

Peak Signal to Noise Ratio (PSNR):
It is commonly used to compare two images. It is obtained as follows: = 10 × log 10 255 2 (17) where MSE is the Mean Squared Error, A and B are the images, and the index represents the absolute position of the pixel (e.g., left to right and up to down). If A and B are equal, then MSE is 0 and PSNR is ∞. Higher values of PSNR are preferable.

Metrics to evaluate quality of the ciphered audio
A good CA file has unintelligible content, meaning that it should appear and sound like noise. Mathematically, this implies that neither neighboring samples are correlated, nor is its entropy low. One way to measure intercorrelation is by using DS metric through Equation (20): where is the current sample of , +1 is the right sample of , −1 is the left sample of , and is the total number of samples. Taking into account that natural audio signals are highly intercorrelated, the current sample should be very similar to its neighbors, with the resulting DS being very low. On the other hand, in CA files, the difference between paired neighbor samples is high. The highest value of DS is illustrated in Figure 5 as an example. Suppose the odd samples are equal to 1; and the even samples, equal to −1. The dynamic range of that signal is 2. Then, by using Equation (20), DS is obtained: The result above is the maximum DS value for audio signals with a dynamic range of 2. However, in a real-world scenario, if the audio signal is quantized to 16 bits, the total number of different values is 216, distributed in the range [−1 1], with a mean close to 0. Then, a maximum DS value would be: Where + is the highest value of , − is the lowest value of , and is the mean of the audio signal. For + = 1, − = −1, and = 0, max is equal to √ 2. This means that the ciphered data has unintelligible content.
On the other hand, entropy is a well-known metric to measure the uncertainty of data and quality of CA. For unintelligible audio content with a uniform distribution (i.e., all values being likely), entropy is equal to the number of quantization bits. For example, if the audio is represented with 16 bits, the audio content will be highly unintelligible, due to an entropy value of 16. In other words, the lower the entropy value, the higher the intelligibility of the audio. The formula for entropy was presented in the Entropy Section.

Validation
The aim of this section is to validate the performance of the proposed system in terms of the quality of the CA and the recovered image as well as the security analysis. A total of 20 grayscale images (128 x 128 pixels) were used as input for the image coding module; each image is ciphered with 200 keys. At the end, 4000 CA signals were obtained. Figure 6 shows the selected grayscale images.

Preliminary results
The performance of the proposed method is illustrated for three images. Figure 7 shows the original image, the CA, and the recovered image; Figure 8 shows the data distribution (image and CA); and Figure 9 shows the data correlation. According to the results shown in Figure 7, each recovered image is highly similar to the input image. In all three cases, the SSIM is higher than 0,999 and PSNR is ∞. Additionally, the ciphered signals look like noise, with a maximum value of 1 and minimum value of −1. Histograms of the CA signals (Figure 8(b), 8(d), 8(f)) are quasi-uniform, even though the histograms of the images are not uniform (Figure 8(a), 8(c), 8(e)). The entropy of the images is around 6 for 8 bits of quantization (75%), while the entropy of the CA files is 15,75 for 16 bits of quantization (98,5%). These results suggest that the system performed as expected. Finally, Figure 9 shows the behavior of adjacent (horizontal) pixels and neighboring samples. It is clear that the original image is highly correlated because the behavior of the adjacent pixels is around the main diagonal. However, in the case of neighboring samples (audio signal), the data is uncorrelated because the results are scattered in all directions.

Quality of the ciphered audio and the recovered image
This section tests the performance of the proposed method in terms of the quality of the recovered images and their CA files. For the first group of tests, SSIM ( Figure 10) and PSNR between the input image and the recovered image were calculated. For the second group, the DS of the CA, entropy of the input image, and entropy of the CA were measured (Figure 11). Figure 10 shows a very high structural similarity between the input image and the recovered images for all 4000 tests. SSIM values are around 1, and higher than 0,9999990. The PSNR values of 15 images were ∞, while the others were higher than 90 dB. Considering the results above, it means that the proposed method is reversible. Figure 11(a) shows the entropy results for 4000 CA files. Most of the results (95% confidence) are between 15,75 and 15,78. Therefore, the CA signals are very close to perfectly demonstrating the behavior of unintelligible data (i.e., 16 for data quantized with 16-bits). Regarding DS (Figure 11(b)),

Security analysis
The following step in the validation process consists of analyzing and testing the security of the proposed method. First, a theoretical analysis is performed on the key space. Secondly, key sensitivity is measured through several tests.
Key space: According to Shannon's theory, the security of a system must rely solely on the key. It is assumed that the details of the method (e.g., image coding and decoding (a) (b) Figure 11. Quality of the ciphered data: (a) Entropy, (b) DS. Source: Authors modules) are known by a third person (e.g., eavesdropper). Then, the keys must satisfy the following conditions: • The number of keys must be large enough to resist a force attack, at least for a considerably long time.
• All keys must be of equal likelihood, which means that the uncertainty of the keys (entropy) must be as high as possible.
• Using a different key must provide a different output.
Albeit the third parties have enough time and hardware resources for testing all the keys, they have no certainty of which one is correct.
To satisfy the first condition, our proposed system uses a seed value as an input for a pseudo-random number generator, which reorganizes the 8-OVSF matrix. Then, there are many available scrambled matrices as the factorial of the number of orthogonal codes. That is, the key space is 256! = 8,57 x 10 506 . The reader can then verify that the abovementioned value is the same as in Ballesteros et al. (2018). For the second condition, the method works with seed values of different lengths and characteristics (e.g., only numbers, only letters, hybrid, uppercase and/or lowercase). The third condition is analyzed in the following section.

Key sensitivity analysis:
Although the first two conditions of Shannon's theory are satisfied, the system can still be insecure if two slightly different keys provide the same results. This means that the key sensitivity must be verified as well. At the receiver, the original key is slightly changed (e.g., an upper case instead of a lower case of the same character), and next, the new key is used in the image recovering module. Thus, the dissimilarity between the input image and the recovered image is expected to be high. SSIM is selected to compare the images. Figure 12 shows an example of this test, using the key "Shannon" in the transmitter, whereas the key "Shannon" is used in the receptor. Figure 13 shows the results (confidence range) of SSIM between the original image and the recovered image for 200 tests. According to Figure 12 and Figure 13, the structural similarity is very low. The ciphered data is very sensitive to the key. That is, if the non-authorized user has access to the ciphered data and knows the method but not the exact key, the original content will not be revealed.

Comparison with state-of-the-art methods
Most of the image encryption methods provide a ciphered image in which the size of the original image is preserved. In our proposal, the output is an audio signal instead of an image. For this reason, comparison to the state-of-the-art methods is divided into three parts: a) image-to-image encryption (Table  1), b) audio-to-audio encryption (Table 2), and c) image-toaudio encryption (Table 3). In all cases, the entropy of the ciphered data (quality of the process) and key space (security of the method) were taken into account.
According to Table 1, the image encryption methods reached a very high entropy value for the encrypted data. However, in audio encryption methods (Table 2), there is still a need for improvement in terms of entropy. In terms of the key space, audio encryption methods are better than image encryption ones. In conclusion, the strength of the image encryption methods is the weakness of the audio encryption methods and vice-versa.

Conclusions
The purpose of this research was to increase the entropy of encrypted data obtained by encryption methods (i.e. 87,5%), while preserving the key space (10 506 ). the mapping process between the pixels of an image and the resulting orthogonal codes provided ciphered data with a very high entropy percentage (98,5%) in a similar way to that of image encryption methods based on chaotic mapping (99,6%).
Taking into account that the image encryption methods based on this encoding method have yet to be been broken and the results of several tests demonstrated high key sensitivity (i.e., SSIM = 0,015 between the original image and the recovered image with a slight change in the key), it is concluded that the aforementioned challenge has been overcome, i.e., the transmitted content remains private and can only be revealed by authorized personnel.