Survey on Deep Learning Based Intrusion Detection System

Development of computer network has changed human lives in many ways. Currently, everyone is connected to each other from everywhere. Information can be accessed easily. This massive development has to be followed by good security system. Intrusion Detection System is important device in network security which capable of monitoring hardware and software in computer network. Many researchers have developed Intrusion Detection System continuously and have faced many challenges, for instance: low detection of accuracy, emergence of new types malicious traffic and error detection rate. Researchers have tried to overcome these problems in many ways, one of them is using Deep Learning which is a branch of Machine Learning for developing Intrusion Detection System and it will be discussed in this paper. Machine Learning itself is a branch of Artificial Intelligence which is growing rapidly in the moment. Several researches have showed that Machine Learning and Deep Learning provide very promising results for developing Intrusion Detection System. This paper will present an overview about Intrusion Detection System in general, Deep Learning model which is often used by researchers, available datasets and challenges which will be faced ahead by researchers.


INTRODUCTION
Network security is one of important areas in network implementation, particularly in a network which serves important service or contain important or crucial data. Network security has become research area for a long time for many organizations and scientist who conduct research in network development.
As the network develops rapidly, issues related to network security also begin to vary. Based on data (CyberEdge Group, 2020) it is known that attacks on enterprise networks have increased in the last five years. Some of these attacks are ransomware attacks, malware, denial of service (DoS) and advanced persistent threats.
The purpose of various attacks is generally to obtain, destroy, obscure or alter information. There are many forms of network security implementation ranging from AAA systems (Authentication, Authorization & Accounting), Firewalls, Routing Filters, Access Control, Intrusion Prevention Systems, Intrusion Detection Systems, Honeypot, and others. This paper will focus on the Intrusion Detection System.
According to (Bace & Mell, 2001) intrusion detection is a technique of monitoring activities and events which occur on a computer system or network and analyzing it for signs of interruption. In recent years with the increasing computing speed, artificial intelligence has begun to develop rapidly, especially in the field of Machine Learning. Many studies can be found in many journals regarding the implementation http://dx.doi.org/10.35671/telematika.v14i2.1317 of Machine Learning-based IDS with various datasets used. Deep Learning, which is a branch of Machine Learning, has also begun to be widely used to implement IDS. Many researchers are interested in it because of the deep learning algorithm that which to act like the way human think. This paper will review essential matters related to research in the development of IDS based on deep learning.
Several studies have been done in reviewing deep learning-based IDS. For instance, research (Berman et al., 2019) which described deep learning methods used in cyber security, performance metrics of deep learning, application of deep learning based IDS, and portray studies in this area. Study (Ring et al., 2019) focused on depicting dataset for network-based intrusion detection system. It represents a complete point of view for IDS dataset. The same work also was done by authors (Hindy et al., 2020). This study also researched on dataset for intrusion detection.
Authors (Drewek-Ossowicka et al., 2021) characterized usage of neural network only for intrusion detection system. Yet in the study, the author narrated the deep learning methods, public datasets for IDS, and result of several previous studies. Study (Gamage & Samarabandu, 2020) described the datasets and deep learning models, compared and calculated both of them. Authors (Aldweesh et al., 2020) wrote a long paper about deep learning approach for deploying anomaly-based intrusion detection system. In detail, this work presents deep learning models, comparison between deep learning and shallow model, the datasets and comparison of several previous researches in this area.
This study will cover the deep learning models, datasets, frameworks, recent studies and research challenge in this area.

RESEARCH METHODS
This review paper begins with a brief literature review to see a general overview of the Deep Learning-based Intrusion Detection Systems topic. After getting a general information, authors then prepare the paper structure. This was followed by raising several questions related to IDS based on Deep Learning.
Because this paper aims to help readers understand this topic from the ground up, especially readers who are still unfamiliar with IDS and Deep Learning, the questions which were raised also start from the basics. After the above questions were formed, the next step was to do the paper review itself. The search for research papers to answer Q4 was done using keywords "intrusion detection" or "intrusion detection system" and "deep learning" ((intrusion detection OR intrusion detection system) AND deep learning). In addition, paper selection is limited to papers published no later than 2016. Google Scholar as a good paper search engine as well as a tool for indexing papers, was used as the main tool for conducting paper searches and also as a benchmark for recording the number of citations for each paper. After the required papers have been collected, the information contained in each paper is organized and compiled to answer the seven questions above. The final step in the process of this paper review is to write the search results into the paper.

Intrusion Detection System
Intrusion Detection System was initiated by (Anderson, 1980) in 1980. Various IDS products were created in 4 decades of IDS development. During its development, there were various kinds of problems that arose, for instance, the high rate of false alarms which is sending alerts when there was no dangerous traffic. This will increase the network security analyst workload. If the network security analyst keeps getting false alarm alerts, then it is possible that the actual attack is infiltrate into one of these false alarms. Therefore, many studies on IDS focus on reducing the number of false alarms and increasing the ability to detect malicious traffic. On the other hand, traditional IDS is incapable of detecting unrecognized attacks. Changes in conditions and the network environment are very fast and the emergence of various new technologies on the network also raises various types of new attacks. Therefore, it is urgent to develop an IDS that can detect attacks that are not even recognized (unknown attack).
To address the problem above, many researchers have started using Machine Learning as a method for developing intrusion detection systems. Machine Learning is an artificial intelligence technique which is able to dig and discover valuable information from a huge dataset (Fulkerson et al., 1995). An IDS based on Machine Learning can be a good IDS with a high level of detection accuracy if sufficient datasets are available and Machine Learning models are created using the right method. Machine Learning is easy for many people to learn because it does not depend on one field of science. Furthermore, Deep Learning has better performance compared to Machine Learning methods in processing bigdata. The difference in characteristics between Machine Learning and Deep Learning is how they do calculations. Deep Learning will create several hidden layers in its processing. Meanwhile, traditional Machine Learning models only have 1 processing layer. Therefore, the traditional machine learning model is often termed as shallow model.
Generally, IDS can be classified based on the detection method into two types, Anomaly-based IDS and Misuse / Signature-based IDS (Aghdam & Kabiri, 2016). Anomaly-based IDS will perform detection by comparing a traffic with normal traffic beforehand. If a traffic looks different than usual traffic, it will be detected as an anomaly traffic. Then, IDS will send a warning alert to the network administrator. On the other hand, Signature-based IDS will compare each traffic with the attack signature database which has been prepared on the IDS. If a traffic is similar to one or more signature database entries, then IDS will send alerts to the administrator. The advantages of Signature-based IDS are the low false alarms and provide detail reports. If a malicious traffic pattern does not exist in the IDS database, then IDS cannot detect it and IDS will not report anything to the administrator.
Moreover, this type of IDS also cannot detect new type or pattern of malicious traffic so that administrators must keep IDS database update. Anomaly-based IDS attempts to deal with this weakness. The most important thing in preparing Anomaly-based IDS is to define a normal traffic database in detail and as many as possible, so that Anomaly-based IDS will be able to recognize anomalous traffic patterns. If various new protocols and services traffic appears on the network which are not in the IDS database, this will raise a false alarm. In addition, this type of IDS cannot provide accurate and detailed reports to administrators because it only contains a normal traffic database.
Based on the data source, IDS can also be classified into two types, Host-based IDS and Network-based IDS (Heberlein et al., 1989). Host-based IDS is implemented on a specific host and http://dx.doi.org/10.35671/telematika.v14i2.1317 focuses on detecting malicious traffic entering that host. This type of IDS can detect in detail because it can monitor any suspicious activity on the system either in files, programs and ports. On the other hand, Network-based IDS is placed on a network device or on a host which can access every activity on the network. This type of IDS focuses on checking every activity on the network, usually by considering traffic addresses in network communication. In Network-based IDS, data is usually divided into 3, namely packet data, data flow and session data. Host-based IDS will get its data source by accessing logs.

a. Restricted Boltzman Machine (RBM)
The RBM model consists of hidden layers and visible layers. Units which are on the same layer are not connected to each other and follow the rules of the Boltzmann Distribution. Each neuron stores the weight computation which occur in every layer. Using randomly generated stochastic coefficients, input weights can be sent by nodes in a random process (Alrawashdeh & Purdy, 2017). RBM does not differentiate between forward and backward direction because it assumes the same weight of both. RBM is an unsupervised learning model that is trained by a contrastive divergence algorithm (Deng et al., 2013) and is usually used to perform feature extraction and denoising.

b. Deep Belief Network (DBN)
The DBN model consists of several RBM layers (Xin et al., 2018) and a Softmax classification layer. Some of the RBM is in a hidden layer on DBN which is used for training and then used again at the next training stage (Nadeem et al., 2016). DBN has two stages of training, unsupervised pretraining and supervised fine-tuning (Ranzato et al., 2009). DBN uses feature extraction and classification in detecting attacks (Alrawashdeh & Purdy, 2017) (Zhao et al., 2017).

c. Deep Neural Network (DNN)
DNN is It is an algorithm formed from many interconnected layers and is known as end-to-end machine learning. In DNN, patterns are extracted from simple feature representations with limited prior knowledge. This deep learning model is widely used in cases that cannot be solved properly by traditional machine learning algorithms (Z. Wang, 2018). A model can be trained using this neural network to perform regression and classification.

d. Recurrent Neural Network (RNN)
RNN is an artificial neural network that is designed to have sequential data. It is commonly used to perform Natural Language Processing (NLP) (Graves et al., 2013) (Graves & Jaitly, 2014).
Sequential data are contextual, which means that we should not analyze sequential data separately. In order to collect contextual information, every RNN unit receives both its current status and its previous status. The direction of data flow in the RRN model is one-way flow, from one hidden unit to the next. In order for RNN to solve the problem of non-sequential data, several researchers have developed variants of RNN including long-term memory (LSTM) (Hochreiter & Urgen Schmidhuber, 1997), gated recurrent unit (GRU) (Chung et al., 2014), and bi-RNN (Schuster & Paliwal, 1997 Autoencoder has two components, encoder and decoder. Features are extracted from raw data using encoder. These extracted features then reconstructed by decoder. During the training process, the difference between the encoder input and the decoder output gradually decreases. Dataset does not need to be labelled because Autoencoder is unsupervised learning algorithm. If the decoder can reconstruct the data through the extracted features successfully, it can be indicated that the features extracted by the encoder describe the data substance. There are many variants of well-known autoencoders, such as denoising autoencoder (Vincent et al., 2008) (Vincent et al., 2010) and autoencoder sparse (Deng et al., 2013). Autoencoder includes neural networks that implement back-propagation (H. Lee et al., 2007).

f. Convolutional Neural Network (CNN)
The CNN algorithm is invented to imitate the human visual system, which means that CNN has an excellent performance to solve computer vision problem (Razavian et al., 2014) (Krizhevsky et al., 2012) (Lawrence et al., 1997). The CNN model consists of alternate convolution and pooling layers.

g. General Adversarial Network (GAN)
GAN is a framework for studying generative models (Goodfellow et al., 2014). There are two subnetworks of GAN model, generator and discriminator. Synthetic data which is similar to real data can be produced by generator. meanwhile and discriminator tries to differentiate between synthetic data and real data. Hence generator and discriminator support each other. GAN can learn a generative neural network that can model a training data distribution without labels. The generative network converts random input vectors into outputs similar to the training data earlier.
In GAN there is a separate generative network that tries to distinguish between actual training data and sample data generated by the generative network (H. Wang & Yu, 2019).

Datasets
Understanding data is the foundation of deep learning methodologies. Dataset which is used in IDS must reflect the behavior of the host or network. Data sources for IDS generally come from packets, flows, sessions, and logs. It will not be easy to create a dataset and it will take a long time.
However, if the dataset has been successfully created, the researcher can share the dataset. Hence, many researchers can use it without have to collect the dataset. Apart from convenience, there are other advantages of using public datasets. First of all, public datasets are reliable and recognized by other researchers which means it will produce trustworthy result. Secondly, many studies also use these datasets which allows new research result become comparable with previous research. It also possible for other authors to cite some result from the research in their paper.
Based on (Ring et al., 2019) there are 34 public datasets that can be used for Network-Based IDS which are summarized in Table 1. It can be seen from Table 1 that IDS dataset has been created since 2009, yet there are also some latest datasets which are provided by other authors. There are 3 common types of IDS dataset which are emulated traffic by software, real traffic captured from real network and synthetic traffic which is artificially generated. Some datasets were created in hours and some others were created in days or even months. When deciding which dataset should be used in research, it will be better to look at how big the dataset file size because it determines how long a deep http://dx.doi.org/10.35671/telematika.v14i2.1317 learning model has to trained. Thus, it will also determine how long the research must be conducted even tough author (Liu & Lang, 2019) deep learning-based model will have better performance on big dataset. Basically, it is a library for numerical computations. TensorFlow can run faster because it is programmed with the Python API using the C / C ++ engine. Theano is a library written in python. A Machine Learning team from Montreal University invented it to define, optimize, and perform multidimensional mathematical functions.
Keras is a framework written in Python which is developed to implement deep learning using Theano and TensorFlow. Keras provides excellent Neural Network API capabilities to implement with deep learning algorithms. Because it is based on Theano and TensorFlow, this deep learning framework is widely used and allows a platform that is extensible, modular, and easy to use by Python users (Nweke et al., 2018). The Torch framework can perform scientific calculations that offer various support for Machine Learning mechanisms. PyTorch is starting to be widely used in deep learning models and is considered a competitor to Tensor-Flow. PyTorch which is developed at Facebook company uses the Torch framework which is used for deep neural network construction. Caffe is developed for computer vision and machine learning. It was created by Berkeley Center with contributors in community. Its architecture is vivid with its modularity and high speed, so it can be used to design algorithms in modular way. Currently, deep learning acceleration using Caffe is supported by NVidiaGPU.
cuDNN stands for CUDA DNN, it is a library for DNN which can be accelerated using GPU.
Implementation of standards such as backward convolutions, forward convolution, normalization, pooling and layer activation are carried out with good standards (highly tuned). DIGITS is a webbased tool for deep learning development which was created by NVIDIA. DIGITS uses text files to set parameters and networks. DIGITS is capable of network visualization, visualization of the learning process and has a lot of GPU support (Erickson et al., 2017).
MX-Net is a framework for deep learning and was built using C++ with multiple language bindings. It also supports distributed computing with multiple GPUs. Compared to TensorFlow and Caffe, MX-Net efficiency is on par with these framework (Erickson et al., 2017). CNTK was released in 2015 under Microsoft Research and is characterized as Visual Studio for Machine Learning. It is a much easier to use this framework for developer who has been using Visual Studio for programming.
This framework usage is not as many as other previous and well-known frameworks because it is relatively new (Nweke et al., 2018).

Recent Researches
In Table 2, the authors have compiled several papers in this research area. In Table 2  the authors believe that the computing speed will be faster over time, thus in the end this will not be an obstacle anymore.
From The next challenge is that researchers must also understand that many network equipment vendors have provided Intrusion Detection and Intrusion Prevention features on their devices which also have high accuracy and can automatically update the database from the vendor server (signaturebased). Although this vendor may not take part in the publication of research, the existence of these tools must be realized by researchers so that the research conducted can remain relevant to the needs of science and technology.
The final challenge is how to implement the results of this study into practice by combining existing models with the IDS tools used while making regular improvements to the model.

CONCLUSIONS AND RECOMMENDATIONS
The beginning of this paper explains the urgency of developing research in the field of networks, A total of 34 datasets are presented briefly in Table 1  The authors hope that this paper can help to provide a brief overview of research in the field of IDS based on deep learning so that if there are researchers who are interested in conducting research, they can study research papers both in leading journals and those that become references on this paper.