Introduction
pm4py is a python library that supports (state-of-the-art) process mining algorithms in python. It is open source (licensed under GPL) and intended to be used in both academia and industry projects. pm4py is a product of the Fraunhofer Institute for Applied Information Technology.
This repo contains links to process mining training, links to supporting documentation to support with installing libraries & how to use the library, and an example python notebook that you can run with the artifical healthcare data provided.
Presentation
I presented on process mining in the East of England at a few different communities, presentation below:
Recommended training in process mining
Process Mining in Python - Youtube Videos
pm4py tutorials - tutorial #1: What is Process Mining? This video covers what is process mining; examples, definition of process mining, event log, main tasks of process mining, process discovery, conformance checking, process enhancement
pm4py tutorials - tutorial #2: Importing CSV Files This video covers example process; how to read graphical representation of processes, example data in CSV format #(can be downloaded here, importing CSV data in Python using pandas library, importing CSV data, reformatting the data into event log using format_dataframe function and obtaining start and end activities using get_start_activities and get_end_activities functions from pm4py library.
pm4py tutorials - tutorial #3: Importing XES Files This video covers case level attributes, XES format; tools supporting XES format, how XES looks, example XES file, XES - extensions, standard extensions of XES (website), extensions on log level, trace level and event level, XES public datasets, globals (default values) in XES files, classifiers; meta information in XES files, reading XES files using read_xes function from pm4py library and getting start and end activities.
pm4py tutorials - tutorial #4: Playing with Event Data; Lambda Functions
pm4py tutorials - tutorial #5: Playing with Event Data; Shipped Filters
pm4py tutorials - tutorial #6 exporting event data
pm4py tutorials - tutorial #7 process discovery
pm4py tutorials - tutorial #8 conformance checking
Resources Available:
Official pm4py
Process Mining - Data Science in Action Book
dcr4py - supporting documentation
dcr4pydocs - extension of pm4py documentation
Example Publications using pm4py
https://processintelligence.solutions/pm4py/publications
Structure of this repo
Python Notebook - open in google colab Example artificial healthcare data
Additional example datasets for healthcare & process mining
Real life log of a Dutch academic hospital, originally intended for use in the first Business Process Intelligence Contest (BPIC 2011) Uploaded to this repo.
Requirements
pm4py depends on some other Python packages, with different levels of importance:
Essential requirements: numpy, pandas, deprecation, networkx
Normal requirements (installed by default with the pm4py package, important for mainstream usage): graphviz, intervaltree, lxml, matplotlib, pydotplus, pytz, scipy, stringdist, tqdm
Optional requirements (not installed by default): scikit-learn, pyemd, pyvis, jsonschema, polars, openai, pywin32, python-dateutil, requests, workalendar