HalogenGroups / PFASGroups

Getting Started

  • Installation
    • Requirements
    • Install with pip
    • Install from source
    • Optional dependencies
    • Verify installation
    • Troubleshooting
  • Quickstart
    • Parsing SMILES
    • Accessing matches
    • Converting to a DataFrame
    • Generating embeddings
    • PFAS definition screening
    • Saturation filter
    • Multi-halogen parsing (advanced)
    • Command-line usage

Core Concepts

  • PFAS Definitions
    • OECD 2021
    • EU REACH
    • OPPT 2023
    • UK Environment Agency
    • PFASSTRUCTv5
    • Retrieving definition objects
    • Checking a single molecule
  • Algorithm
    • Overview
    • Group library
    • SMARTS-based matching
      • Halogen filtering
      • Saturation filtering
    • Overlap deduplication
    • Component graph metrics
      • Bond-order model
      • Resistance distance and Kirchhoff index
      • Accessing resistance metrics
    • Embedding generation
    • PFAS definition classification

Advanced Topics

  • Tutorial
    • Sample data
    • Parsing SMILES
      • Iterating over results
      • Drilling into a single match
      • Exporting to DataFrame
    • Parsing RDKit Mol objects
    • Plotting results
    • Component filters
    • Embeddings
      • Saving to a database
    • Dimensionality reduction
      • PCA
      • t-SNE
      • UMAP
    • Prioritisation
      • Ranking against a reference set
      • Ranking by intrinsic fluorination
    • Comparing two datasets
    • Multi-halogen analysis
    • PFAS definition screening
    • Loading and inspecting groups
  • Customization
    • Inspecting the built-in groups
    • Filtering groups
    • Defining a custom group
    • Combining built-in and custom groups
    • HalogenGroup attributes
    • Using raw (uncompiled) groups
    • Custom SMARTS tips
    • Component-type constraints
  • Prioritization
    • Basic usage
    • Novelty against a reference set
    • Scoring formula
    • Parameters
    • Example: selecting a diverse subset
  • Benchmarking
    • Validation against PFASSTRUCTv5
    • Comparison with CSRML classifier
    • Running the benchmark scripts
    • Performance
  • Multi-Halogen Analysis (Advanced)
    • Quick Comparison
    • Functions with Altered Defaults
    • Parsing Multi-Halogen Molecules
    • Multi-Halogen Fingerprints
      • Via PFASEmbeddingSet
    • Combining with Saturation Filters
    • CLI with Multiple Halogens
    • Implementation Details
    • When to Use Which Import
    • See Also
  • Command-Line Interface
    • Synopsis
    • parse
    • fingerprint
    • list-groups
    • list-paths
    • Global Options
    • Environment
    • Performance Tips
    • See Also

API Reference

  • Core API
    • Parsing
      • parse_smiles
        • parse_smiles()
      • parse_mols
        • parse_mols()
      • parse_mol
        • parse_mol()
    • Fingerprinting
      • generate_fingerprint
        • generate_fingerprint()
    • Group library
      • get_compiled_HalogenGroups
        • get_compiled_HalogenGroups()
      • get_HalogenGroups
        • get_HalogenGroups()
      • load_HalogenGroups
        • load_HalogenGroups()
      • get_PFASDefinitions
        • get_PFASDefinitions()
      • get_componentSMARTSs
        • get_componentSMARTSs()
    • Molecule prioritization
      • prioritise_molecules
        • prioritise_molecules()
  • Data Models
    • PFASEmbeddingSet
      • PFASEmbeddingSet
        • PFASEmbeddingSet.__init__()
        • PFASEmbeddingSet.matches
        • PFASEmbeddingSet.from_raw()
        • PFASEmbeddingSet.from_smiles()
        • PFASEmbeddingSet.from_mols()
        • PFASEmbeddingSet.from_inchis()
        • PFASEmbeddingSet.reorder()
        • PFASEmbeddingSet.iter_group_matches()
        • PFASEmbeddingSet.plot_components_for_group()
        • PFASEmbeddingSet.show()
        • PFASEmbeddingSet.plot()
        • PFASEmbeddingSet.to_sql()
        • PFASEmbeddingSet.svg()
        • PFASEmbeddingSet.summarise()
        • PFASEmbeddingSet.table()
        • PFASEmbeddingSet.classify()
        • PFASEmbeddingSet.summary()
        • PFASEmbeddingSet.plot_all_components_with_group_colours()
        • PFASEmbeddingSet.to_sql_all()
        • PFASEmbeddingSet.to_fingerprint()
        • PFASEmbeddingSet.n_molecules
        • PFASEmbeddingSet.has_cache
        • PFASEmbeddingSet.match_cache
        • PFASEmbeddingSet.get_embedding()
        • PFASEmbeddingSet.to_array()
        • PFASEmbeddingSet.compare_kld()
        • PFASEmbeddingSet.perform_pca()
        • PFASEmbeddingSet.perform_kernel_pca()
        • PFASEmbeddingSet.perform_tsne()
        • PFASEmbeddingSet.perform_umap()
        • PFASEmbeddingSet.column_names()
        • PFASEmbeddingSet.from_sql()
    • MoleculeResult
      • MoleculeResult
    • GroupMatch
    • MatchComponent
    • EmbeddingArray
    • HalogenGroup
      • HalogenGroup
        • HalogenGroup.id
        • HalogenGroup.name
        • HalogenGroup.smarts
        • HalogenGroup.componentSmarts
        • HalogenGroup.componentForm
        • HalogenGroup.componentHalogens
        • HalogenGroup.componentSaturation
        • HalogenGroup.max_dist_from_comp
        • HalogenGroup.linker_smarts
        • HalogenGroup.constraints
        • HalogenGroup.__init__()
        • HalogenGroup.set_component_smarts()
        • HalogenGroup.set_componentSmarts()
        • HalogenGroup.constraint_gte()
        • HalogenGroup.constraint_lte()
        • HalogenGroup.constraint_eq()
        • HalogenGroup.constraint_only()
        • HalogenGroup.constraint_rel()
        • HalogenGroup.formula_dict_satisfies_constraints()
        • HalogenGroup.find_matched_atoms()
        • HalogenGroup.component_satisfies_all_smarts()
        • HalogenGroup.find_alkyl_components()
        • HalogenGroup.find_aryl_components()
        • HalogenGroup.find_components()
        • HalogenGroup.test()
    • PFASDefinition
      • PFASDefinition
        • PFASDefinition.id
        • PFASDefinition.name
        • PFASDefinition.description
        • PFASDefinition.fluorineRatio
        • PFASDefinition.smarts_strings
        • PFASDefinition.smarts_patterns
        • PFASDefinition.includeHydrogen
        • PFASDefinition.requireBoth
        • PFASDefinition.__init__()
        • PFASDefinition.applies_to_molecule()
        • PFASDefinition.test()
  • Embedding Analysis
    • PFASEmbeddingSet
      • PFASEmbeddingSet
        • PFASEmbeddingSet.__init__()
        • PFASEmbeddingSet.matches
        • PFASEmbeddingSet.from_raw()
        • PFASEmbeddingSet.from_smiles()
        • PFASEmbeddingSet.from_mols()
        • PFASEmbeddingSet.from_inchis()
        • PFASEmbeddingSet.reorder()
        • PFASEmbeddingSet.iter_group_matches()
        • PFASEmbeddingSet.plot_components_for_group()
        • PFASEmbeddingSet.show()
        • PFASEmbeddingSet.plot()
        • PFASEmbeddingSet.to_sql()
        • PFASEmbeddingSet.svg()
        • PFASEmbeddingSet.summarise()
        • PFASEmbeddingSet.table()
        • PFASEmbeddingSet.classify()
        • PFASEmbeddingSet.summary()
        • PFASEmbeddingSet.plot_all_components_with_group_colours()
        • PFASEmbeddingSet.to_sql_all()
        • PFASEmbeddingSet.to_fingerprint()
        • PFASEmbeddingSet.n_molecules
        • PFASEmbeddingSet.has_cache
        • PFASEmbeddingSet.match_cache
        • PFASEmbeddingSet.get_embedding()
        • PFASEmbeddingSet.to_array()
        • PFASEmbeddingSet.compare_kld()
        • PFASEmbeddingSet.perform_pca()
        • PFASEmbeddingSet.perform_kernel_pca()
        • PFASEmbeddingSet.perform_tsne()
        • PFASEmbeddingSet.perform_umap()
        • PFASEmbeddingSet.column_names()
        • PFASEmbeddingSet.from_sql()
      • Generating an embedding array
      • Key attributes
    • Dimensionality reduction
      • perform_pca
      • perform_tsne
      • perform_umap
    • Statistical comparison
      • compare_kld
    • Database I/O
      • to_sql / from_sql

Project Info

  • Changelog
    • Version 3.1.3
    • Version 3.2.0
    • Version 3.1.0
    • Version 2.2.4
    • Version 2.2.3
    • Version 2.0
  • Contributing
    • How to contribute
    • Setting up a development environment
    • Running tests
    • Code style
    • Adding new halogen groups
    • Reporting issues
    • Contact
  • License
    • Summary
    • Full license text
    • Citation
HalogenGroups / PFASGroups
  • Search


© Copyright 2026, Luc T. Miaz.

Built with Sphinx using a theme provided by Read the Docs.