Researchers from Salus Security, a blockchain security company with global offices, have recently conducted a study to demonstrate the capabilities of GPT-4 in parsing and auditing smart contracts. Although artificial intelligence (AI) shows promise in generating and analyzing code, it falls short as a security auditor.
The researchers utilized a dataset of 35 smart contracts, known as the SolidiFI-benchmark vulnerability library, which consisted of 732 vulnerabilities. Their aim was to assess GPT-4’s ability to identify potential security weaknesses across seven common vulnerability types.
Their findings indicate that ChatGPT excels in detecting true positives, which are actual vulnerabilities that warrant further investigation. During testing, it achieved over 80% precision in this area. However, it struggled with generating false negatives, as evidenced by its low recall rate of only 11% (higher recall rates are preferred).
Consequently, the researchers concluded that GPT-4’s vulnerability detection capabilities are inadequate, with an accuracy rate of only 33%. Therefore, they recommend relying on dedicated auditing tools and human expertise for auditing smart contracts until AI systems like GPT-4 can be enhanced to meet the required standards.