PFASGroups Documentation

PyPI version Python 3.7+ License: CC BY-ND 4.0 Powered by RDKit

PFASGroups is a Python cheminformatics package for automated detection, classification, and analysis of Per- and Polyfluoroalkyl Substances (PFAS) using SMARTS-based structural group matching.

from PFASGroups import parse_smiles

results = parse_smiles(["CCCC(F)(F)F", "FC(F)(F)C(=O)O", "OCCOCCO"])

for mol in results:
    print(mol.smiles, "—", len(mol.matches), "group match(es)")
# CCCC(F)(F)F — 1 group match(es)
# FC(F)(F)C(=O)O — 1 group match(es)
# OCCOCCO — 0 group match(es)

Key capabilities

  • 119 halogen groups: 27 OECD, 48 generic, 43 fluorotelomer, 1 aggregate (114 compiled for F-only)

  • PFAS definition screening: classify molecules against 5 regulatory frameworks

  • Fingerprinting: generate 116-column fingerprints (binary, count or max-component) for ML

  • Group selection: use all groups, OECD only, generic, telomers, or a custom subset

  • Dimensionality reduction: PCA, t-SNE, UMAP on fingerprint matrices

  • Molecule prioritization: rank molecules by structural novelty

  • Command-line interface: pfasgroups parse, fingerprint, list-groups

  • Multi-halogen support: extend detection to Cl, Br, I via Multi-Halogen Analysis (Advanced)

Advanced Topics

Indices and tables