WVU Med prof HU explores the possibilities of Code Interpreter in bioinformatics

Code Interpreter, the newest official plugin for ChatGPT, didn’t make the grade from West Virginia University researchers who put it to the test using biological data. However, the team was impressed with some of the tool’s added features that will enhance classroom learning experiences. (WVU Illustration/Aira Burkhart)

Researchers at West Virginia University have discovered limitations in the use of the latest official ChatGPT plugin, called Code Interpreter, by scientists who work with biological data and use computational methods to prioritize targeted treatment for cancer and genetic disorders. However, they believe that there is potential for its use in educational settings. 

“Code Interpreter is a good thing and it’s helpful in an educational setting as it makes coding in the STEM fields more accessible to students,” said Gangqing “Michael” Hu, assistant professor in the Department of Microbiology, Immunology and Cell Biology at the WVU School of Medicine and director of the Bioinformatics Core. “However, it doesn’t have the features you need for bioinformatics. These are technical issues that can be overcome. Future developments of Code Interpreter are likely to extend its use to many fields such as bioinformatics, finance, and economics.” Gangqing "Michael" Hu, assistant professor, Department of Microbiology, Immunology and Cell Biology, WVU School of Medicine (WVU Photo)

Since its release in December 2022, the popular artificial intelligence chatbot ChatGPT has gained the attention of businesses, educators, and the general public. However, it didn’t quite live up to the needs of people working in biomedical research including bioinformatics — the field where computer science meets biology — who eagerly awaited OpenAI’s Code Interpreter plugin hoping it would fill the gaps.

Hu and his team put Code Interpreter to the test on a variety of tasks to evaluate its features. Their findings, published in Annals of Biomedical Engineering, show the plugin breaks down some of the barriers, but not all of them.

For example, people without a science background will have ease of access to coding, or computer programming, with a Code Interpreter. Hu said it’s also cost-effective and sparks curiosity for students to explore data analysis and boosts their interest in learning. He points out, though, that users will need to understand how to interpret data recognize whether the results are accurate, and know how to interact with the chatbot.

Bioinformaticians rely on precise coding, computer software programs, and internet access to store, analyze, and interpret biological data such as DNA and human genome used for advancements in modern medicine.  

Despite the need for improvements specific to bioinformatics, Hu said, Code Interpreter helps users determine whether a response is accurate or if it is a fictitious answer presented with confidence, known as a hallucination. 

“People know that ChatGPT can do many impressive things, but it is not good at providing a citation or reference to support its answer. If it is asked about the source to support the claim of a response, it may start to make up references,” Hu explained. “Code Interpreter provides a solution to minimize hallucinations. For questions that can be addressed through coding, the code itself serves as the source or citation. That is a significant step forward.”

Working with Hu was Lei Wang, a postdoctoral fellow in the WVU Department of Microbiology, Immunology and Cell Biology; Xijin Ge, of South Dakota State University; and Li Liu, of Arizona State University.

The team found positive results in the Code Interpreter’s ability to convert data to charts and graphs.

Suggestions for upgrades to Code Interpreter include internet access for downloading genome data, installation of software specific to bioinformatics, expansion of storage capacity, and support for additional programming languages. In addition, researchers found a need for privacy and security applications to comply with regulations such as HIPAA. 

In testing data analysis, they discovered several limitations. The plugin supports only one computer program, Python, and few of its software packages are dedicated to bioinformatics. In addition, it doesn’t allow access to internet data and cannot work with large files.

“It allows for 100 megabytes or so, but the files we’re handling are at a gigabyte level,” Hu said. “Also, it doesn’t support parallel processing needed for large datasets which results in slow performance.”

Hu said that while he anticipates more upgrades for Code Interpreter, he plans to help students learn more about the advantages of the current plugin.

“In my class next spring, I plan to introduce this plugin to help students learn about data visualization,” Hu said. “AI is a fast-moving field. I hope by that time OpenAI may overcome some of the limitations so it can be used for a broad range of bioinformatics coding.” 

Earlier this year, Hu led another study to prepare high school and college students to harness the power of ChatGPT by learning more about coding. The process employed OPTIMAL — Optimization of Prompts Through Iterative Mentoring and Assessment — to improve communication with a chatbot.

In the long run, Hu said he will continue to monitor and test new AI programming and features.

“As new products develop, I’ll just keep going,” Hu said. “There are certainly many other innovative uses awaiting to be discovered.”

Overall, the ChatGPT plugin, Code Interpreter, has the potential to be a powerful tool for students, researchers, and medical professionals alike. It can provide a more efficient and accurate way to interpret data and can be used in a variety of fields, including education, biology, and health. While there are some potential drawbacks, such as the need for additional training and the potential for errors, the benefits of this technology far outweigh the risks. With the right implementation, ChatGPT can be a valuable resource for those in the scientific and medical communities.