Our First ML Model: Decision Tree from Scratch

8/23/2025
decision-tree · tutorial · supervised-learning · bigml · BigML Analyst Certification I

Hands-on tutorial8-10 min read30-60 min to run

TL;DR: Decision trees split data by feature thresholds to create easy-to-interpret rules. We’ll train a small tree on a toy dataset and evaluate accuracy.

Why decision trees

Minimal working example (Python + scikit-learn)

Prerequisites

# Windows PowerShell
python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install scikit-learn pandas

Code (minimal)

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Toy dataset
X = pd.DataFrame({
    'age': [22, 25, 47, 52, 46, 56, 55, 60],
    'income': [20000, 30000, 50000, 60000, 52000, 80000, 70000, 90000]
})
Y = [0, 0, 1, 1, 1, 1, 1, 1]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=42)

model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)

preds = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, preds))

Interpreting splits

Visual: decision tree example

placeholder: decision tree diagram

Replace with an SVG showing a small tree (root → branches → leaves) and one highlighted prediction path.

Production notes

Next article

We’ll explain classification vs. regression and when to use each.


Appendix: BigML decision-tree quickstart (from transcript)

Reproduce the transcript demo steps quickly on BigML:

  1. Create a source

    • Upload diabetes.csv (or create from a public URL). Let BigML infer field types; adjust locale/decimal separator if needed.
  2. Inspect dataset

    • Confirm instance count (e.g., 768) and class balance for the target field. Hover histograms to see mean/median/std.
  3. Choose objective

    • Ensure diabetes (or your target) is selected as the objective (BigML defaults to last column).
  4. Build a decision tree

    • From the dataset view, open the actions menu (cloud icon) → Model. One-click builds a default tree.
    • Use the gears icon to configure max_depth, pruning, and whether to treat missing as a value.
  5. Explore predictions

    • Click any node to view the prediction, confidence, and prediction path (the sequence of splits producing the prediction).
    • Use alternative visualizations: tree branches, sunburst, and 2D PDP (plasma glucose vs BMI heatmap).
  6. Export / programmatic predictions

    • Download sample Python code from the model’s Action menu to call the prediction endpoint or embed the model.

Notes