Guide - Molactivity Usage Documentation

1. Introduction

Molactivity is an advanced molecular activity prediction toolkit that provides five different implementation modes, ranging from pure Python implementations to GPU-accelerated deep learning methods. The toolkit supports two primary data types: SMILES (standard, fast, and rocket modes) and Images (Category D). Additionally, Molactivity provides several useful tools (Category E), such as SMILES structure analysis (E1_structure_analysis), converting SMILES to image format (E2_smiles_to_images), and calculating the molecular weight of SMILES (E3_molecular_mass).

1.1 Project Highlights

Multi-mode Architecture: Five different implementation modes, from pure Python to GPU acceleration
Transformer Core: Molecular activity prediction based on attention mechanisms
Flexible Deployment: Support for various deployment scenarios from lightweight to high-performance
Chemical Intelligence: Professional molecular fingerprints and chemical feature extraction
Complete Workflow: Complete solution for training, evaluation, prediction, and analysis

1.2 Five Modes Overview

Mode	Series	Description	Use Cases	Tech Stack
Standard Mode	A Series	Pure Python Implementation	Education, Resource-limited	Pure Python
Fast Mode	B Series	NumPy Optimization	Medium-scale Data	Python + NumPy
Rocket Mode	C Series	PyTorch Deep Learning	Large-scale Training	PyTorch + GPU
Image Mode	D Series	Molecular Image Processing	Visual Analysis	CNN + Vision
Tools Mode	E Series	Analysis Toolkit	Data Processing	RDKit + Tools

2. Standard Mode

Standard mode is user-friendly and does not require the installation of any third-party libraries. Users only need to install Anaconda and run the program using Spyder. This mode uses the CPU for training and does not require a GPU. Standard mode is further divided into three sub-modes: training, evaluation, and prediction.

2.1 Training in Standard Mode

This mode offers two options: training a new model from scratch or loading an existing model to continue training. Additionally, when training multiple models, users can choose between sequential training and parallel training. Theoretically, parallel training reduces the total training time.

Users can open run_train_standard.py in Spyder and configure the training parameters inside.

Training Speed Example:

For 96 SMILES data points, sequentially training 3 models with 2 epochs each takes approximately 132 seconds, averaging about 22 seconds per epoch.

2.2 Evaluation in Standard Mode

This mode evaluates trained models and requires SMILES data and corresponding true activity labels. When evaluating multiple models, users can choose between sequential evaluation and parallel evaluation. Theoretically, parallel evaluation reduces the total evaluation time. By default, evaluation results are saved to evaluating_dataset_with_predictions.csv.

Users can open run_evaluate_standard.py in Spyder and configure prediction parameters, such as setting the output file name.

2.3 Prediction in Standard Mode

This mode uses trained models to predict the activity of unknown SMILES data. It only requires SMILES input and does not need true activity labels. When using multiple models for prediction, users can choose between sequential prediction and parallel prediction. Theoretically, parallel prediction reduces the total prediction time. By default, prediction results are saved to predicting_dataset_with_predictions.csv.

Users can open run_predict_standard.py in Spyder and configure prediction parameters, such as setting the output file name.

Prediction Speed Example:

Predicting 96 SMILES data points using 3 models in parallel takes approximately 13 seconds.

3. Fast Mode

Similar to the standard mode, the fast mode also includes three sub-modes: training, evaluation, and prediction. The main difference is that the fast mode utilizes NumPy optimization to accelerate the computation process, providing 3-5x performance improvement over Standard Mode while maintaining Python flexibility.

4. Rocket Mode

Similar to the standard and fast modes, the rocket mode includes training, evaluation, and prediction functionalities. This mode is designed for high-performance computing and requires a high-end GPU to achieve maximum processing speed using PyTorch deep learning framework.

5. Development Environment Setup

5.1 Recommended Environment

For the best experience with Molactivity, we strongly recommend using:

Anaconda: Comprehensive Python distribution with package management
Spyder IDE: Scientific Python development environment
Python 3.8+: Core language requirement

5.2 Installation

Basic Installation (Standard Mode)

pip install molactivity

Complete Installation (All Modes)

pip install molactivity[all]

6. Step-by-Step Usage Guide

6.1 Preparing Your Data

Training Data Format (train_sample.csv)

SMILES,Activity
CCO,1
CCN,0
c1ccccc1,1
CCC,0
CC(C)O,1

Prediction Data Format (predict_sample.csv)

SMILES
CCO
CCN
CCC
CC(C)O
c1ccccc1

6.2 Configuration Example

STANDARD_CONFIG = {
    'PARALLEL_TRAINING': False,      # Set to True for parallel training
    'CONTINUE_TRAIN': False,         # Continue from existing model
    'optimal_parameters': {
        'learning_rate': 0.001,
        'transformer_depth': 2,      # Number of transformer layers
        'attention_heads': 2,        # Number of attention heads
        'hidden_dimension': 64
    },
    'model_parameters': {
        'input_features': 2048,      # Morgan fingerprint size
        'epochs': 2,
        'batch_size': 32
    },
    'num_networks': 2,               # Number of models to train
    'device': 'cpu',
}

7. Performance Benchmarks

Mode	Dataset Size	Training Time	Prediction Time	Hardware
Standard	96 molecules	~132s	~13s	CPU
Fast	1000 molecules	~45s	~5s	CPU
Rocket	10K molecules	~15s	~2s	GPU

8. Application Scenarios

Research Applications

Drug discovery, virtual screening, QSAR modeling

Educational Applications

Machine learning and cheminformatics education

Industrial Applications

High-throughput screening, materials design

9. Troubleshooting

Common Issues:

Ensure you're using the correct Conda environment in Spyder
For GPU modes, verify PyTorch with CUDA support is installed
Use PARALLEL_TRAINING for better performance with multiple models

10. License and Acknowledgments

This project is licensed under the MIT License.

Official Website: molactivity.com

Contact: jiangshanxue@btbu.edu.cn

Author: Dr. Jiang at BTBU (Beijing Technology and Business University)

Molactivity 3.0 Instructions