C Source Code Generation from IR towards Making Bug Samples to Fluctuate for Machine Learning

Yuto Sugawa, Chisato Murakami, Mamoru Ohara

研究成果: 書籍の章/レポート/Proceedings会議への寄与査読

抄録

In recent years, machine learning (ML) techniques have become popular for detecting software bugs. However, a common challenge in ML-based bug detection arises from the unequal distribution of correct and incorrect training data. Specifically, there is a scarcity of incorrect data (containing bugs) compared to the abundance of correct data, negatively impacting ML model performance. To address this issue, researchers suggest artificially injecting bugs into correct samples. In addition to the equal distribution of samples, the diversity of training data significantly affects ML performance. Our work focuses on generating various incorrect samples stemming from a single root cause (bug). Specifically, we plan to inject bugs into LLVM IR codes and translate them into source codes written in high-level programming languages. For diversity, we use probabilistic language models in the translator. In this paper, we present an IR-to-C translator using seq2seq and explore the resulting diversity of generated samples.

本文言語英語
ホスト出版物のタイトルProceedings - 2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops, ISSREW 2024
出版社Institute of Electrical and Electronics Engineers Inc.
ページ321-328
ページ数8
ISBN(電子版)9798350367041
DOI
出版ステータス出版済み - 2024
イベント35th IEEE International Symposium on Software Reliability Engineering Workshops, ISSREW 2024 - Tsukuba, 日本
継続期間: 28 10月 202431 10月 2024

出版物シリーズ

名前Proceedings - 2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops, ISSREW 2024

会議

会議35th IEEE International Symposium on Software Reliability Engineering Workshops, ISSREW 2024
国/地域日本
CityTsukuba
Period28/10/2431/10/24

フィンガープリント

「C Source Code Generation from IR towards Making Bug Samples to Fluctuate for Machine Learning」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル