Molactivity 3.0 Instructions

Advanced Molecular Activity Prediction Toolkit - A multi-mode molecular property prediction platform based on Transformer neural networks.

1. Introduction

Molactivity is an advanced molecular activity prediction toolkit that provides five different implementation modes, ranging from pure Python implementations to GPU-accelerated deep learning methods. The toolkit supports two primary data types: SMILES (standard, fast, and rocket modes) and Images (Category D). Additionally, Molactivity provides several useful tools (Category E), such as SMILES structure analysis (E1_structure_analysis), converting SMILES to image format (E2_smiles_to_images), and calculating the molecular weight of SMILES (E3_molecular_mass).

1.1 Project Highlights

  • Multi-mode Architecture: Five different implementation modes, from pure Python to GPU acceleration
  • Transformer Core: Molecular activity prediction based on attention mechanisms
  • Flexible Deployment: Support for various deployment scenarios from lightweight to high-performance
  • Chemical Intelligence: Professional molecular fingerprints and chemical feature extraction
  • Complete Workflow: Complete solution for training, evaluation, prediction, and analysis

1.2 Five Modes Overview

Mode Series Description Use Cases Tech Stack
Standard Mode A Series Pure Python Implementation Education, Resource-limited Pure Python
Fast Mode B Series NumPy Optimization Medium-scale Data Python + NumPy
Rocket Mode C Series PyTorch Deep Learning Large-scale Training PyTorch + GPU
Image Mode D Series Molecular Image Processing Visual Analysis CNN + Vision
Tools Mode E Series Analysis Toolkit Data Processing RDKit + Tools

2. Standard Mode

Standard mode is user-friendly and does not require the installation of any third-party libraries. Users only need to install Anaconda and run the program using Spyder. This mode uses the CPU for training and does not require a GPU. Standard mode is further divided into three sub-modes: training, evaluation, and prediction.

2.1 Training in Standard Mode

This mode offers two options: training a new model from scratch or loading an existing model to continue training. Additionally, when training multiple models, users can choose between sequential training and parallel training. Theoretically, parallel training reduces the total training time.

Users can open run_train_standard.py in Spyder and configure the training parameters inside.

Training Speed Example:

For 96 SMILES data points, sequentially training 3 models with 2 epochs each takes approximately 132 seconds, averaging about 22 seconds per epoch.

2.2 Evaluation in Standard Mode

This mode evaluates trained models and requires SMILES data and corresponding true activity labels. When evaluating multiple models, users can choose between sequential evaluation and parallel evaluation. Theoretically, parallel evaluation reduces the total evaluation time. By default, evaluation results are saved to evaluating_dataset_with_predictions.csv.

Users can open run_evaluate_standard.py in Spyder and configure prediction parameters, such as setting the output file name.

2.3 Prediction in Standard Mode

This mode uses trained models to predict the activity of unknown SMILES data. It only requires SMILES input and does not need true activity labels. When using multiple models for prediction, users can choose between sequential prediction and parallel prediction. Theoretically, parallel prediction reduces the total prediction time. By default, prediction results are saved to predicting_dataset_with_predictions.csv.

Users can open run_predict_standard.py in Spyder and configure prediction parameters, such as setting the output file name.

Prediction Speed Example:

Predicting 96 SMILES data points using 3 models in parallel takes approximately 13 seconds.

3. Fast Mode

Similar to the standard mode, the fast mode also includes three sub-modes: training, evaluation, and prediction. The main difference is that the fast mode utilizes NumPy optimization to accelerate the computation process, providing 3-5x performance improvement over Standard Mode while maintaining Python flexibility.

4. Rocket Mode

Similar to the standard and fast modes, the rocket mode includes training, evaluation, and prediction functionalities. This mode is designed for high-performance computing and requires a high-end GPU to achieve maximum processing speed using PyTorch deep learning framework.

5. Development Environment Setup

5.1 Recommended Environment

For the best experience with Molactivity, we strongly recommend using:

  • Anaconda: Comprehensive Python distribution with package management
  • Spyder IDE: Scientific Python development environment
  • Python 3.8+: Core language requirement

5.2 Installation

Basic Installation (Standard Mode)

pip install molactivity

Complete Installation (All Modes)

pip install molactivity[all]

6. Step-by-Step Usage Guide

6.1 Preparing Your Data

Training Data Format (train_sample.csv)

SMILES,Activity CCO,1 CCN,0 c1ccccc1,1 CCC,0 CC(C)O,1

Prediction Data Format (predict_sample.csv)

SMILES CCO CCN CCC CC(C)O c1ccccc1

6.2 Configuration Example

STANDARD_CONFIG = { 'PARALLEL_TRAINING': False, # Set to True for parallel training 'CONTINUE_TRAIN': False, # Continue from existing model 'optimal_parameters': { 'learning_rate': 0.001, 'transformer_depth': 2, # Number of transformer layers 'attention_heads': 2, # Number of attention heads 'hidden_dimension': 64 }, 'model_parameters': { 'input_features': 2048, # Morgan fingerprint size 'epochs': 2, 'batch_size': 32 }, 'num_networks': 2, # Number of models to train 'device': 'cpu', }

7. Performance Benchmarks

Mode Dataset Size Training Time Prediction Time Hardware
Standard 96 molecules ~132s ~13s CPU
Fast 1000 molecules ~45s ~5s CPU
Rocket 10K molecules ~15s ~2s GPU

8. Application Scenarios

Research Applications

Drug discovery, virtual screening, QSAR modeling

Educational Applications

Machine learning and cheminformatics education

Industrial Applications

High-throughput screening, materials design

9. Troubleshooting

Common Issues:

  • Ensure you're using the correct Conda environment in Spyder
  • For GPU modes, verify PyTorch with CUDA support is installed
  • Use PARALLEL_TRAINING for better performance with multiple models

10. License and Acknowledgments

This project is licensed under the MIT License.

Official Website: molactivity.com

Contact: jiangshanxue@btbu.edu.cn

Author: Dr. Jiang at BTBU (Beijing Technology and Business University)