Gaussian Pyramid Decomposition in Copy-Move Image Forgery Detection with SIFT and Zernike Moment Algorithms

ABSTRACT


INTRODUCTION
Image manipulation for the criminal is still a dangerous impact of imaging technology growth.In Indonesia, many image manipulations cases are targeted at public figures and government officials to spread hoaxes and pornography-related content (Riadi, Fadlil, & Sari, 2017).One of the mildest image manipulation techniques is copy-move forgery.This manipulation allows a computer user to hide or add an object with the copied part on the same image (Tyagi, 2018).
There are two types of copy-move forgery detection.They are point-based and block-based (Sadeghi, Dadkhah, Jalab, Mazzola, & Uliyan, 2018).Block-based detection is done by sorting and looking for the similarity of the blocks resulting from splitting the input image before determining the damaged area, while the keypoint-based method is the process for generating a vector feature per keypoint which uses to find the similarity of objects in the damaged area.
Scale Invariant Feature Transform (SIFT) is one of many key point-based methods to detect copymove forgery (Sadeghi et al., 2018).The advantages of SIFT are detection accuracy in rotation, noise adding, and JPEG compression (Hailing, Weiqiang, & Yu, 2008).Nuari, Utami, and Raharjo (2019) held http://dx.doi.org/10.35671/telematika.v15i1.1322a quality comparison between SIFT and Speed Up Robust Feature (SURF) in copy-move detection.The comparison concludes that SIFT has a higher accuracy rate than SIFT when the maximum difference between those is 16.32%.Nevertheless, SURF detection is quicker than SIFT, when the maximum difference is 1.97 seconds.
For dealing with this problem, some researchers combine SIFT with other methods.Lin et al. (2019) used SIFT with Local Intensity Order Pattern (LIOP) to help detect areas with few key points and significant geometrical transformations in copy-move detection.After extracting features, they did keypoint matching with transitive matching.The proposed method earned a 73.44% precision rate and 75.41%.
Besides LIOP, many researchers use Zernike Moments to deal with SIFT problem.This method can also be better in detecting copy-move pictures with rotation, blurring noise adding, and JPEG compression (Ryu, Lee, & Lee, 2010).Md Salleh, Rohani, and Maarof (2018) use Zernike Moments as feature extraction after using Discrete Wavelet Transform (DWT) in pre-processing steps.However, the noise shown in the results is due to a lack of morphological process (as seen in Figure 1).However, the detection time is faster than Zernike Moments without DWT, where this method can detect until 234 seconds.
Figure 1 From left to right : Original image, copy-move forgery result, detection result Zheng et al. (2016) used the adaptive segmentation method for separating textured and smooth areas.After separating the region, they used SIFT with the g2NN matching method in the textured area and Zernike moments with overlapping blocks in the flat area.This method can acquire more than 88% precision and 86% recall.Nevertheless, this method reaches 179 seconds which means the running time is slower than SIFT.Sun, Ni, and Zhao (2018) use nonoverlapping blocks segmentation to separate textured and smooth areas in copy-move detection for decreasing computing time.In the textured region, they used SIFT feature and g2NN keypoint matching.They use Zernike Moment feature extraction and hashing-based similarity calculation in flat areas.This method earns a maximum precision rate of 99.18% and a maximum recall rate of 89.57%.However, the detection time is still slower than SIFT, where this method needs 853 seconds to process the images.
Many researchers use a combination of SIFT and Zernike Moments to detect copy-move forgery (Mohamadian & Pouyan, 2013).However, this combination is slowest than SIFT only (Sun et al., 2018;Zheng et al., 2016).On the other hand, the Gaussian pyramid is helpful to detect copy-move forgery, including the fake image with noise and JPEG Compression, without burdening the testing time (Shabanian & Mashhadi, 2018).Shabanian and Mashhadi (2018) use Gaussian pyramid decomposition in this research field.They use it to resize the forged image before dividing it into blocks.Then, each pair of blocks check by their similarity with Structural Similarity Index Measure (SSIM).This method yields a better result with http://dx.doi.org/10.35671/telematika.v15i1.132281.62% precision and 100% recall.This method is also stable in copy-move with JPEG compression and noise-adding attacks and helps to reduce the computational time.
This study intends to add SIFT and Zernike Moments combinations with the Gaussian pyramid decomposition method to detect copy-move image falsification.This study examines the average precision, recall, and detection time in using the Gaussian pyramid as decomposition and subsequently detected by a combination of SIFT and Zernike Moments methods.

RESEARCH METHODS
The proposed method process consisting Decomposition with gaussian Pyramid, separating smooth area and textured area, SIFT extraction in textured area, Zernike moments in smooth area and postprocessing.

Gaussian Pyramid Decomposition
The first process is decomposition with Gaussian pyramid decomposition.This method will reduce image size by ½ times and smooth it with a Gaussian kernel (Shabanian & Mashhadi, 2018).Then, the images will be put to the layer above the previous image.This process repeats until forming an image pyramid.
However, when the image size is smaller, detection quality is lower.To control the image resolution before being used to the forward process, we use the method same as previous research (Rahma, Utami, & Fatta, 2020) to get the smaller image, as shown in Figure 2.

Separating Smooth and Textured Area
Before separating the image into the smooth and textured area, we will use the adaptive segmentation method (Pun, Yuan, & Bi, 2015) to get an initial size of each segment before determining the number of parts as mentioned in equation ( 1). (1) Where n is the number of segments, H and W each represent the height and width of the resized images, and s is the initial size of each part.This process begins by calculating the area of the picture by multiplying H * W. Next, the initial size (s) will divide the image's region area.Nevertheless, these results lead to too many segments that slow down the separation process.Therefore, we use the square root of the division result to get the n.
Next, we split up the image by adapting to the modified method from Zheng et al. ( 2016) First, we extract the keypoint with SIFT method and then make the segment with n value.However, due to http://dx.doi.org/10.35671/telematika.v15i1.1322irregular shapes in each section, we counted the size and area of the bounding box, as exemplified in Figure 3, before we add the key point to it.Then, make the ratio between the keypoints number and the bounding box's area.Then, we will compare it to the limit value.We put the segment to a smooth region if it is less or the same as the limit value.Otherwise, we put it into a textured section.

SIFT Detection
We will merge and convert the textured area into a mask as a guide area in SIFT extraction.SIFT looks for image features based "scale-space" concept to take features at more than one level of scale and type of image resolution, which not only increases the number of features available but is also tolerant of scale changes (Lowe, 1999;Lowe, 2004).
After that, we applied the g2NN feature matching method from Amerini, Ballan, Caldelli, Del Bimbo, and Serra (2011).First, we count the Euclidean distance between the key point to another.Next, we ordered the Euclidean data from the smallest.Then, we evaluate the ratio between the first distance (d1) and the following distance (d2) and compare it with the predetermined ratio value as shown in equation ( 2) for checking if the two pairs of points have a similarity.The last step is determining the pair of keypoint location points processed in the post-processing step. (2)

Zernike Moment Detection
In the Zernike Moments detection process, we adopted and modified the Zernike moments process by Sun, Ni, and Zhao (2018).First, we merged the smooth area segment to form an image and split it into 24 × 24 pixels blocks.Next, we change each block from RGB to HSV mode and quantize H and S components to ten levels.
Then, we make k block groups based on the levels.We split the block groups into a main block and sub-blocks.Phase correlation is used to calibrate each sub-block.This calibration helps realign the location of the sub-blocks, where these sub-blocks are the area of forgery, as illustrated in Figure 4. http://dx.doi.org/10.35671/telematika.v15i1.1322This method uses orthogonal functions called Zernike polynomials, which form a completely orthogonal set over the interior of the unit circle (Teague, 1980).Next, we determine the Euclidean distance between the Zernike Moments value of both areas.Later, we checked if the Euclidean distance is lesser than the predetermined value (0.05).The last step is finding center points from each pair of blocks.This result is also used in the post-processing step later.

Post-Processing
First, we joined SIFT keypoint locations and Zernike Moments center points into single-point location data.Then, we counted Euclidean distance between two coordinates of the position (indicated with (i,j) and (k,l)) and checked it with a predetermined value (D) as shown in equation ( 3).
(3) Next, we make a density-based cluster to make groups from selected positions (Soni, Das, & Thounaojam, 2018).The group is made based on the density of the distance between the main point and another.The last steps are to reduce the unnecessary location and the morphology process for filling the gap between the points.

Tools and Dataset
In this research, we use cloud-based Google Colaboratory.It has specifications equivalent to Intel Xeon 2.3 GHz, 13 GB RAM, and 108 GB space of Hard Disk.We also use Python 3.6.7 and image processing libraries like OpenCV library 3.4.2.17, Scikit-image library 0.16.2, and Mahotas library 1.4.11.This research also used datasets from varying sources.First, using 96 images (48 plain copy-move forgeries and 48 ground truths) from Christlein et al. (2012).We also used copy-move with rotation (240 copy-move images and 288 ground truth) and copy-move with JPEG compression (240 copy-move images and 288 ground truth) (Christlein et al., 2012).All images in this dataset have a resolution between 533 × 800 pixels and 3888 × 2592 pixels.
We also use another dataset using 40 copy-move images and 40 ground truths, modified from the same dataset above (Christlein et al., 2012) taken from MICC F600 (Amerini et al., 2011) for detecting multiple copy-move.This image has a range with an equal resolution as above. http://dx.doi.org/10.35671/telematika.v15i1.1322 We modified 40 original images from MICC F2000 (Amerini et al., 2011) with GNU Image Manipulation Program (GIMP) for copy-move with reflection attack.All images in this dataset have the 2048 × 1536 pixels resolution.In copy-move with image inpainting, we changed 46 fresh pictures (Christlein et al., 2012).

Test Scenario
In the test scenario, we use plain copy-move and plain copy-move with rotation and JPEG Compression.With the criteria of transformations: 1. Rotation angle between 2° to 10° with 2° steps 2. JPEG compression with a quality factor between 20 to 100 in multiples of 10 Besides plain copy-move and copy-move with transformations, we also use other scenarios that related to process at this method: 1. Comparison between Gaussian pyramid decomposition size limit with three limit values in pixel 512, 768, 960.
2. Comparison between area separation ratio limit with three limit values in pixel 0,001, 0,003, 0,005.Last, we add three scenarios contains others type of copy-move as mentioned by Walia and Kumar (2019) : multiple copy-move, copy-move with reflection attack and copy-move with image inpainting.

Analysis
In the analysis process, we will measure precision and recall rate at the pixel level using a confusion matrix, as shown in Figure 5.Then, we count accuracy, precision and recall rate as defined in equations ( 4), ( 5), (6).Accuracy rate will measure the proportion between all true result and all result in confusion matrix (Al-Qershi & Khoo, 2018).Precision rate is a rate to measure if a detection result indicates the forgery on the image, when recall rate shows while recall shows the possibility that part of the faked image is detected (Christlein et al., 2012).All tests will be repeated three times to get an accurate result.Then we will count the average result of each test.http://dx.doi.org/10.35671/telematika.v15i1.1322

Results in Plain Copy-Move
In plain copy-move, we used gaussian size limit 768 pixels, area separation ratio 0.003.The result, however, still generating false detection.Some unmanipulated parts are detected as copy-move, even though the detection reached the right forgery area.The resulting image is shown in Figure 6.
Due to many false results, there is an imbalance between precision and recall rate.The recall rate value is much greater than the value of precision.The average level of precision and recall in this experiment reached 18.21% and 69.39%, respectively.The average accuracy is also relatively low in 58.46%.

Results in Rotation Transformation
In copy-move with rotation attack detection, we use gaussian size limit 768 pixels and area separation ratio 0.003.Based on data in Table I, average precision and accuracy level are generally stable.
However, there are still many undetected areas or the excess of the detection result.Nevertheless, the recall rate and processing time are rising with the increase in the value of the rotation angle.This result shows in Figure 7.The results of these data indicate that the higher the rotation value, the wider the detection ranges.The graph of accuracy, precision, recall and time is shown in Figure 8.

Results in JPEG Compression
We also use preset value for gaussian size limit and area separation ratio in copy-move with JPEG compression attack detection.The average precision and recall rate in this detection has fluctuated when they decreased in quality factor 40 to 70 but increased back to 80.The results data has served in Table 2.
This condition happened due to a large number of detections results along with a large number of pixels.The fluctuation happens because of the increase in the quality factor value.It also triggers false detection, which results in low precision values.The graph of accuracy, precision, recall and time is shown in Figure 9.In this experiment, we use three gaussian limits in pixel (512,768,960).Block size in Zernike Moments adjusts with image size.The result served in Table 3 and Figure 10.
Based on these data, when the limit value is rising, the precision rate increased, but the recall rate value is reduced.Image size reducing by Gaussian pyramid is a cause of this result.For the test time itself, the image processing time will be longer as the value limit for the image size and the size of each block increases.

b. Comparison between area separation ratio limit
In this experiment, we used three separation ratio limit values: 0,001, 0,003, 0,005 and used Gaussian pyramid decomposition limit 768 pixels.The result served in Table 4 and Figure 11.
Based on these data, the right of separation ratio value is helping the quality of copy-move with a particular area.If the value is too small, some smooth regions are not detected because SIFT cannot produce a key point in this area.Otherwise, if the ratio is too large, there are many false detections, including smooth field.In multiple copy-move, we used the same settings in the regular copy-move experiment.This experiment reached the highest result in all detection where the precision rate is 18.66%, recall rate is 71.73% and accuracy rate is 56.97%.This result shows proposed method can detect copy-move images with multiple duplications, but there are still many unwanted detections.The detection time is slightly slower, which earn 213 seconds.The resulting image is shown in Figure 12  In copy-move with reflection attack, we found the method is inadequate to detect this type, as the precision rate is 6.74% and recall rate is 22.40%, although the accuracy rate is 64.5%.The cause of the low level of precision and recall is matching process on SIFT and Zernike Moment step has not been able to check whether the keypoint or block detected by the copy-move has a symmetrical position with the original part (Warif, Wahab, Idris, Salleh, & Othman, 2017).The resulting image is shown in Figure 13 above.

c. Copy-Move with Image Inpainting
Figure 14 From left to right : Original image, copy-move with image inpainting result, ground truth, result map of the detection Same with the previous type, we got the lowest rate in all detection where the precision rate value is 4,7%, and the recall rate is 41.72% with accuracy rate 59.58%.These results indicate that this method does not have a detection that checking whether original parts or undetected images that sampled to cover other objects in the picture (Liang, Yang, Ding, & Li, 2015).The resulting image is shown in Figure 14 above.

Discussion
We will explain how this method can only reach low precision and recall rates.First, we found that the Gaussian pyramid influences detection results.The image resolution decreases, the false detection increases, as shown in Figure 15.Another factor that affects the result is the filtering of the Zernike Moments issue on smooth surface detection still relies on the similarity of Euclidean distances from one block to another.We also found the fault in objects that have smooth surfaces, such as the sky and ground areas.
Besides them, the clustering process in the post-processing process has not been able to produce accurate detection results where there are still many clusters formed from parts that are not detected.We also get the Google Colaboratory systems are also playing the detection role.We can conclude that the causes of this are lack of symmetrical position check on SIFT and Zernike Moment detection and sample checking in image inpainting, In response to low precision and recall rate at copy-move with reflection attack and image inpainting.
Our experiment has a similarity in computational complexity with one of our reference studies (Sun et al., 2018), where the computational complexity of the g2NN process at both experiments is O(n2), where  is the number of SIFT keypoints.However, the computational complexity in smooth area http://dx.doi.org/10.35671/telematika.v15i1.1322detection at our test is O(k(m/k)3), where k is the number of color groups, while m is the total size of the smooth area divided by the block size.

CONCLUSIONS AND RECOMMENDATIONS
We tested the impact of the Gaussian pyramid at the combination of SIFT and Zernike Moments in copy-move forgery detection.All results have too few values, both in precision and recall rate.The cause of this weakness is the false detections that have gone too wide, although they reach the right faked area.
Besides that, there are many undetected areas on some types of copy-move forgery.
After investigating this weakness, we found that the causes of low accuracy are the effect of the Gaussian Pyramid decomposition process, area filtering in Zernike Moments that is less deep, and clustering in post-processing that cannot produce accurate detection results.Besides them, the lack of additional features like a symmetrical position in the matching process and sample checking at uncopied parts or original parts.We hope the following research can overcome these problems.

Figure 2
Figure 2 Controlling size after decomposition process

Figure 3
Figure 3 Zernike Moments detection from Zheng et al. (2016), where uses bounding box in each segment.

Figure 4
Figure 4 Alignment of main block (a), sub-block (b) and sub-block after calibration (c) (Sun et al., 2018) The next step is counting Zernike Moments with V Component at each main block and sub-block.

Figure 6
Figure 6 From left to right : Original image, plain copy-move image, ground truth, result map of the detection.

Figure 7
Figure 7 From left to right : Original image, copy-move with 10° rotation image, ground truth, result map of the detection.
graph form of results of rotation transformation detection test (a) average accuracy (b) average precision (c) average recall (d) average time graph form of results of JPEG compression detection test (a) average accuracy (b) average precision (c) average recall (d) average time 4. Results in Other Scenarios a.Comparison between decomposition result size limit

Figure 11
Figure 11 Result with 768 pixels limit value; Left : Result with 960 pixels limit value 5. Results in Other Copy-Move Types a. Multiple Copy-Move

Figure 13
Figure 13 From left to right : Original image, copy-move with reflection attack result, ground truth, result map of the detection.

Table 4
Results in area separation ratio limit comparison test