Data Poisoning Attack

arikkl
Feb 26, 2023
3 min read

Updated: Mar 2, 2023

Many armies are developing artificial intelligence systems that help their combatants automatically detect armored vehicles of enemy forces and immediately neutralize the threat. Those systems are trained on the images and videos of the vehicles, along with labeling the names of those individuals. Now, imagine what would have happened if an enemy had carried out a cyber-attack against the artificial intelligence system and managed to disrupt its detection capabilities. That is, what would have happened if the attacker had infected the images used for the classification training, causing the system to identify "friendly" armor as enemy forces?

Background

Artificial intelligence is trained with examples so there is a tight correlation between the quality of the examples through which it is trained and its inference accuracy. A system that was trained with inaccurate training data (that may be infected on purpose), will draw erroneous conclusions, and impair the quality of the system.

For example, suppose you are developing a machine-learning model to classify images of dogs and cats and you use a labeled dataset of images. If the images are mislabeled, i.e., dogs are labeled as cats and vice versa, the model would provide very poor classification performance.

How is a Data Poisoning Attack Executed?

Public Dataset poisoning - Artificial intelligence development platforms may import training data from external resources such as free public data repositories or paid. Kaggle, Google Dataset Search, Datahub, and more are examples of free datasets that provide a series of information on a variety of topics. The benefits of using open and free information sources are clear. The datasets are available for free and they are usually maintained by a community (crowd wisdom) and thus assumed to be in good quality. To infect the dataset, the attacker may take advantage of cyber vulnerabilities in the dataset protection. in addition, he may become a contributing member of the community and use that to poison the dataset.

Poison the data during transmission (In Transit) - If the dataset is transferred from the external repositories into the organization in an insecure manner, the attacker may poison it. He may change, insert or remove, either by interfering with the transmission or by impersonating a legitimate source of information.

Attacking the dataset that is stored in the cloud - An artificial intelligence system is trained on large volumes of data. Most organizations find it difficult to store information in internal storage systems and choose to store it in the cloud. Storing the information in the cloud and connecting the enterprise systems to the cloud, constitutes a significant attack surface. This surface may be used by attackers to poison the data. Moreover, many studies show that the main reasons for data breaches stem from cloud misconfiguration issues and inaccurate definitions of cyber security liability limits with the cloud service provider.

How To Prevent A Data Poisoning Attacks?

Data Management Policy (Data Governance) - As part of the organization’s data management policy, guidelines to ensure the usage of high-quality data must be defined. These guidelines should define instructions and controls to be implemented throughout the entire data life cycle. The main areas of focus in the context of data poisoning prevention include addressing the following issues:

What is the source of the information? what are legitimate sources of data?
What is the policy for updating information? Who has permission to update it? How often is it updated?
What monitoring indications are given on data updates?
Is it known and clear what the datasets contain?
Is the data accurate? How can the accuracy be verified?
How an “Insider Threat”, that poisons the data, will be identified?

Securing the data transfer process into the organization - The system administrator should note that the data transfer process is done securely, preferably without any third party, which can reveal the security of the system. It is recommended to implement transit encryption and integrity checks.

Securing the data stored in the cloud - As said before, most of the data breaches in the cloud are the result of poor cloud security implementation or misconfiguration. The data protection in the cloud must implement the data management policy (mentioned above). In addition, best practices of data protection such as HIPPA and GDPR should be implemented. Special attention should be given to the implementation of security controls that will prevent access to buckets where the dataset is stored from external access by an attacker.

Conclusion

Data Poisoning attacks can be prevented. Clear and applicable data governance policy, usage of reliable as well as accurate data, and tight cyber protection will reduce dramatically the risk.

CyberIN provides strategic consulting in the fields of emerging technologies and cyber threats for governmental ministries and defense organizations. If you have any questions about our services, please don't hesitate to contact us.

Data Poisoning Attack

Recent Posts

Comments