Our First ML Model: Decision Tree from Scratch
8/23/2025
decision-tree · tutorial · supervised-learning · bigml · BigML Analyst Certification I
Hands-on tutorial • 8-10 min read • 30-60 min to run
TL;DR: Decision trees split data by feature thresholds to create easy-to-interpret rules. We’ll train a small tree on a toy dataset and evaluate accuracy.
Why decision trees
- Highly interpretable rules.
- Work with mixed data types without heavy preprocessing.
- Fast to train for small-to-medium datasets.
Minimal working example (Python + scikit-learn)
Prerequisites
# Windows PowerShell
python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install scikit-learn pandas
Code (minimal)
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Toy dataset
X = pd.DataFrame({
'age': [22, 25, 47, 52, 46, 56, 55, 60],
'income': [20000, 30000, 50000, 60000, 52000, 80000, 70000, 90000]
})
Y = [0, 0, 1, 1, 1, 1, 1, 1]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=42)
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, preds))
Interpreting splits
- Each internal node tests a feature threshold.
- Leaves contain predicted labels and class probabilities.
- Use
export_text
orplot_tree
(sklearn) to visualize rules.
Visual: decision tree example
Replace with an SVG showing a small tree (root → branches → leaves) and one highlighted prediction path.
Production notes
- Trees can overfit; prefer pruning or ensembles for robust performance.
- Save the preprocessing pipeline with the model.
Next article
We’ll explain classification vs. regression and when to use each.
Appendix: BigML decision-tree quickstart (from transcript)
Reproduce the transcript demo steps quickly on BigML:
-
Create a source
- Upload
diabetes.csv
(or create from a public URL). Let BigML infer field types; adjust locale/decimal separator if needed.
- Upload
-
Inspect dataset
- Confirm instance count (e.g., 768) and class balance for the target field. Hover histograms to see mean/median/std.
-
Choose objective
- Ensure
diabetes
(or your target) is selected as the objective (BigML defaults to last column).
- Ensure
-
Build a decision tree
- From the dataset view, open the actions menu (cloud icon) → Model. One-click builds a default tree.
- Use the gears icon to configure max_depth, pruning, and whether to treat missing as a value.
-
Explore predictions
- Click any node to view the prediction, confidence, and prediction path (the sequence of splits producing the prediction).
- Use alternative visualizations: tree branches, sunburst, and 2D PDP (plasma glucose vs BMI heatmap).
-
Export / programmatic predictions
- Download sample Python code from the model’s Action menu to call the prediction endpoint or embed the model.
Notes
- Confidence in BigML’s decision trees reflects node purity and instance counts (higher instance backing → higher confidence).
- You can filter the UI to show only branches that predict a particular class for quick rule extraction.