Explore and select categorical features:
Explore and select numeric features
"Public meeting" Categories
Prob("Pump status" | "Public meeting")

Decision tree model parameters:


The criteria for measuring the quality of split.
The split strategy at each node.
The maximum depth of the tree.
The minimum number of samples required to split an internal node.
The minimum number of samples required to be at a leaf node.
The number of features to consider when looking for the best split.
The seed value used by the random number generator.
Grow a tree with maximum leaf nodes in best-first fashion (based on relative reduction in impurity).
The minimum required reduction in impurity for node split.
The “Balanced” mode assigns weights inversely proportional to labels frequencies in the training set. Use "None" for equal class weights.
Fraction of dataset to include in the test split.
Seed number used for random generation of test dataset.

Random forest model parameters:


Number of trees in ensemble (forest).
The criteria for measuring the quality of split.
The maximum depth of the tree.
The minimum number of samples required to split an internal node.
The minimum number of samples required to be at a leaf node.
The number of features to consider when looking for the best split.
Grow a tree with maximum leaf nodes in best-first fashion (based on relative reduction in impurity).
The minimum required reduction in impurity for node split.
Whether use bootstrap samples or the entire dataset when building trees
Whether to use out-of-bag samples to estimate the generalization accuracy.
Controls both the randomness of the bootstrapping of the samples and the sampling of the features to consider when looking for the best split at each node.
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
The “Balanced” mode assigns weights inversely proportional to labels frequencies in the training set. Use "None" for equal class weights.
Fraction of dataset to include in the test split.
Seed number used for random generation of test dataset.
Gradient Boosting Model
Model summary
Trained model summary:
Set values for categorical model inputs:
Set values for numeric model inputs:
Sliders here
Prediction results
Predicted pump status:
Calculated class probabilities:

Classification Dashboard

The AI-enabled dashboard developed by DataOrbs.com provides a unified platform where users can explore data, select any number of categorical or numeric inputs, train a tree-based model including a single Decision Tree as well as ensemble of trees (Random Forest), and make predictions regarding the functioning status of water pumps.

Contact us at contact@dataorbs.com for any questions or comments.

Original Data source ans size

59400 records, provided by Taarifa and the Tanzanian Ministry of Water.

Input features used by Dashboard

Categorical features:

Public meeting, Permit, Extraction type, Management group, Payment type, Quality group, Quantity, Source type, Source class, Waterpoint type,

Numeric features:

Longitude, Latitude, Population, Construction year, Year inspected, Week inspected, Day inspected.

Target variable

Pump status: 'Functional', 'Functional functional needs repair', 'Non-functional'

Water point locations are not available in the demo version.