Purdue University drug discovery researchers have created a new data mining framework for training artificial intelligence and machine learning models.

The software, known as Lemon, helps researchers mine the Protein Data Base (PDB), which hosts data on more than 140,000 biomolecular structures, within six minutes.

Discover B2B Marketing That Performs

Combine business intelligence and editorial excellence to reach engaged professionals across 36 leading media platforms.

Find out more

A key challenge in using machine learning for drug development is creating a process by which a computer can extract the needed information from a data pool.

Drug scientists must pull biological data and train the software to understand how a typical human body will interact with the combinations that come together to form a medication.

Purdue College of Science assistant professor Caurac Chopra said: “It can take an enormous amount of time to sort through all the accumulated data. Machine learning can help, but you still need a strong framework from which the computer can quickly analyze data to help in the creation of safe and effective drugs.”

Lemon’s fast C++11 library with Python bindings means it can mine the PDB with exceptional speed. Loading all traditional mmCIF files in the PDF typically takes around 290 minutes, but Lemon does this in about six minutes when applying a simple workflow on an 8-core machine.

GlobalData Strategic Intelligence

US Tariffs are shifting - will you react or anticipate?

Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.

By GlobalData

Lemon also allows the user to write custom functions to use as part of their software suite, as well as develop custom functions in a standard manner to generate unique benchmarking datasets for the entire scientific community.

Purdue PhD chemistry student Jonathan Fine, who worked on the platform, said: “Experimental structures deposited in PDB have resulted in several advances for structural and computational biology scientific and education communities that help advance drug development and other areas.

“We created Lemon as a one-stop-shop to quickly mine the entire data bank and pull out the useful biological information that is key for developing drugs.”

Lemon was originally designed to create benchmarking sets for drug design software and identify the biomolecular interactions that cannot be modeled well in the PDB, which are known as lemons.

Medical Device Network Excellence Awards - Nominations Closed

Nominations are now closed for the Medical Device Network Excellence Awards. A big thanks to all the organisations that entered – your response has been outstanding, showcasing exceptional innovation, leadership, and impact

Excellence in Action
HemoSonics has won the 2025 Marketing Award for its impactful promotion of theQuantra Hemostasis System and leadership in blood management education. See how targeted campaigns, thought leadership content, and hands on clinician training are accelerating Quantra’s market traction and shaping the future of hemostasis testing.

Discover the Impact