Advanced Molecular Activity Prediction Toolkit - A multi-mode molecular property prediction platform based on Transformer neural networks.
Molactivity is an advanced molecular activity prediction toolkit that provides five different implementation modes, ranging from pure Python implementations to GPU-accelerated deep learning methods. The toolkit supports two primary data types: SMILES (standard, fast, and rocket modes) and Images (Category D). Additionally, Molactivity provides several useful tools (Category E), such as SMILES structure analysis (E1_structure_analysis), converting SMILES to image format (E2_smiles_to_images), and calculating the molecular weight of SMILES (E3_molecular_mass).
| Mode | Series | Description | Use Cases | Tech Stack |
|---|---|---|---|---|
| Standard Mode | A Series | Pure Python Implementation | Education, Resource-limited | Pure Python |
| Fast Mode | B Series | NumPy Optimization | Medium-scale Data | Python + NumPy |
| Rocket Mode | C Series | PyTorch Deep Learning | Large-scale Training | PyTorch + GPU |
| Image Mode | D Series | Molecular Image Processing | Visual Analysis | CNN + Vision |
| Tools Mode | E Series | Analysis Toolkit | Data Processing | RDKit + Tools |
Standard mode is user-friendly and does not require the installation of any third-party libraries. Users only need to install Anaconda and run the program using Spyder. This mode uses the CPU for training and does not require a GPU. Standard mode is further divided into three sub-modes: training, evaluation, and prediction.
This mode offers two options: training a new model from scratch or loading an existing model to continue training. Additionally, when training multiple models, users can choose between sequential training and parallel training. Theoretically, parallel training reduces the total training time.
Users can open run_train_standard.py in Spyder and configure the training parameters inside.
Training Speed Example:
For 96 SMILES data points, sequentially training 3 models with 2 epochs each takes approximately 132 seconds, averaging about 22 seconds per epoch.
This mode evaluates trained models and requires SMILES data and corresponding true activity labels. When evaluating multiple models, users can choose between sequential evaluation and parallel evaluation. Theoretically, parallel evaluation reduces the total evaluation time. By default, evaluation results are saved to evaluating_dataset_with_predictions.csv.
Users can open run_evaluate_standard.py in Spyder and configure prediction parameters, such as setting the output file name.
This mode uses trained models to predict the activity of unknown SMILES data. It only requires SMILES input and does not need true activity labels. When using multiple models for prediction, users can choose between sequential prediction and parallel prediction. Theoretically, parallel prediction reduces the total prediction time. By default, prediction results are saved to predicting_dataset_with_predictions.csv.
Users can open run_predict_standard.py in Spyder and configure prediction parameters, such as setting the output file name.
Prediction Speed Example:
Predicting 96 SMILES data points using 3 models in parallel takes approximately 13 seconds.
Similar to the standard mode, the fast mode also includes three sub-modes: training, evaluation, and prediction. The main difference is that the fast mode utilizes NumPy optimization to accelerate the computation process, providing 3-5x performance improvement over Standard Mode while maintaining Python flexibility.
Similar to the standard and fast modes, the rocket mode includes training, evaluation, and prediction functionalities. This mode is designed for high-performance computing and requires a high-end GPU to achieve maximum processing speed using PyTorch deep learning framework.
For the best experience with Molactivity, we strongly recommend using:
pip install molactivity
pip install molactivity[all]
SMILES,Activity
CCO,1
CCN,0
c1ccccc1,1
CCC,0
CC(C)O,1
SMILES
CCO
CCN
CCC
CC(C)O
c1ccccc1
STANDARD_CONFIG = {
'PARALLEL_TRAINING': False, # Set to True for parallel training
'CONTINUE_TRAIN': False, # Continue from existing model
'optimal_parameters': {
'learning_rate': 0.001,
'transformer_depth': 2, # Number of transformer layers
'attention_heads': 2, # Number of attention heads
'hidden_dimension': 64
},
'model_parameters': {
'input_features': 2048, # Morgan fingerprint size
'epochs': 2,
'batch_size': 32
},
'num_networks': 2, # Number of models to train
'device': 'cpu',
}
| Mode | Dataset Size | Training Time | Prediction Time | Hardware |
|---|---|---|---|---|
| Standard | 96 molecules | ~132s | ~13s | CPU |
| Fast | 1000 molecules | ~45s | ~5s | CPU |
| Rocket | 10K molecules | ~15s | ~2s | GPU |
Drug discovery, virtual screening, QSAR modeling
Machine learning and cheminformatics education
High-throughput screening, materials design
This project is licensed under the MIT License.
Official Website: molactivity.com
Contact: jiangshanxue@btbu.edu.cn
Author: Dr. Jiang at BTBU (Beijing Technology and Business University)