PFASGroups Documentation
PFASGroups is a Python cheminformatics package for automated detection, classification, and analysis of Per- and Polyfluoroalkyl Substances (PFAS) using SMARTS-based structural group matching.
from PFASGroups import parse_smiles
results = parse_smiles(["CCCC(F)(F)F", "FC(F)(F)C(=O)O", "OCCOCCO"])
for mol in results:
print(mol.smiles, "—", len(mol.matches), "group match(es)")
# CCCC(F)(F)F — 1 group match(es)
# FC(F)(F)C(=O)O — 1 group match(es)
# OCCOCCO — 0 group match(es)
Key capabilities
119 halogen groups: 27 OECD, 48 generic, 43 fluorotelomer, 1 aggregate (114 compiled for F-only)
PFAS definition screening: classify molecules against 5 regulatory frameworks
Fingerprinting: generate 116-column fingerprints (binary, count or max-component) for ML
Group selection: use all groups, OECD only, generic, telomers, or a custom subset
Dimensionality reduction: PCA, t-SNE, UMAP on fingerprint matrices
Molecule prioritization: rank molecules by structural novelty
Command-line interface:
pfasgroups parse,fingerprint,list-groupsMulti-halogen support: extend detection to Cl, Br, I via Multi-Halogen Analysis (Advanced)
Getting Started
Core Concepts
Advanced Topics
- Tutorial
- Advanced Features: H-Components and Wildcard Groups
- H-Components (Hydrocarbon Analysis)
- Wildcard Groups
- Advanced: Combining H-Components and Wildcards
- Custom Wildcard Definitions
- Best Practices
- Summary
- See Also
- Customization
- Prioritization
- Group Feature Extraction
- Benchmarking
- Multi-Halogen Analysis (Advanced)
- Command-Line Interface
API Reference
Project Info