Start Here

Course Map

Forecasting Methods ├── Qualitative (Delphi, Jury of Executives, Sales Force, Market Survey) └── Quantitative ├── Time Series (Endogenous — uses ONLY past data patterns) │ ├── FMTS (Fixed-Model) — assumes components, simple formulas │ │ ├── Average / Naive │ │ ├── Moving Average (Simple & Weighted) │ │ ├── Exponential Smoothing (α) │ │ ├── Adaptive Smoothing (auto-tuning α) │ │ ├── Holt's Method — Exp Smoothing + Trend (α, β) │ │ └── Winters' Method — Exp Smoothing + Trend + Season (α, β, γ) │ │ ├── Additive (seasonal added) │ │ └── Multiplicative (seasonal multiplied) │ ├── OMTS (Open-Model) — analyzes data first, then builds model │ │ ├── Decomposition Analysis │ │ └── ARIMA / Box-Jenkins (AR, MA, ARMA, ARIMA) │ └── Soft Computing │ ├── Fuzzy Time Series (Chen's method + improvements) │ └── ML/DL approaches (CNN, RNN, Neuro-Fuzzy) └── Causal / Associative └── Regression (Simple Linear, Multiple)
Endogenous vs Exogenous: Time series methods are endogenous — they only look at the data's own past patterns. Regression is exogenous — it uses outside factors (price, economy, etc.) to explain the data. This is a fundamental distinction the professor draws.
Foundation

Time Series Components

Every time series is made up of four components. All forecasting techniques try to identify and project some or all of these:

📊 Level
  • The horizontal baseline — what sales would be if there were no trend, seasonality, or noise
  • Think of it as the "average base" of the series
  • Example: a component whose demand is always ~1000/month has a level of 1000
📈 Trend
  • A continuing pattern of increase or decrease over time
  • Can be linear (straight line) or curved
  • Caused by population growth, technology changes, culture shifts
  • For exponential smoothing, think of trend as a "step function" — level steps up/down each period
🔄 Seasonality
  • Repeating up-down pattern within ≤ 1 year
  • Examples: air conditioners peak in summer, toys peak in fall
  • Patterns longer than 1 year are called "cycles"
  • Key: the pattern repeats itself every year
🎲 Noise (Random)
  • Random fluctuation that TS techniques CANNOT explain
  • If noise looks non-random, your model is missing something!
  • A good test: if residuals don't look random, there's still pattern left

Additive vs. Multiplicative Seasonality

This comes up in Winters' method and decomposition. You must know the difference:

Additive: Y = Level + Seasonal + Error
  • Seasonal swings stay the SAME size regardless of level
  • Peaks and valleys have constant amplitude
  • Seasonal factors sum to ZERO
  • Forecast: (L + B×h) + S
Multiplicative: Y = Level × Seasonal + Error
  • Seasonal swings GROW as the level grows
  • Peaks and valleys get wider over time
  • Seasonal factors average to 1.0
  • Forecast: (L + B×h) × S
If you see a plot where the "wave" gets bigger as the series goes up → multiplicative. If the wave stays the same height → additive. Most real-world business data is multiplicative.
Foundation

Forecast Error Metrics — Fully Explained

Error for any period: et = Actualt − Forecastt

MAD — Mean Absolute Deviation
MAD = Σ|et| / n = (|e₁| + |e₂| + ... + |eₙ|) / n

Take each error, make it positive (absolute value), average them. Tells you the average size of your mistakes in the same units as your data. Easy to understand: "on average, we're off by X units."

MSE — Mean Squared Error
MSE = Σ(et²) / n = (e₁² + e₂² + ... + eₙ²) / n

Square each error, then average. Big errors get punished much more than small ones (because squaring amplifies large numbers). Good for comparing models when you want to penalize big misses.

MAPE — Mean Absolute Percentage Error
MAPE = [Σ(|et| / At) × 100] / n

Turns each error into a percentage of the actual value, then averages. Great for comparing accuracy across products with different scales. "On average, we're off by X%."

Worked Example — Port of Baltimore (from lecture)

α = 0.10, initial forecast = 175. Actual data: 180, 168, 159, 175, 190, 205, 180, 182

QtrActualForecast (α=.10)Error|Error|Error²|Error|/Actual
1180175.005.005.0025.002.78%
2168175.50−7.507.5056.254.46%
3159174.75−15.7515.75248.069.91%
4175173.181.821.823.331.04%
5190173.3616.6416.64276.898.76%
6205175.0229.9829.98898.8014.62%
7180178.021.981.983.921.10%
8182178.223.783.7814.292.08%
TOTALS82.451526.5444.74%

MAD = 82.45 / 8 = 10.31    MSE = 1526.54 / 8 = 190.82    MAPE = 44.74% / 8 = 5.59%

How was F₂ calculated? F₂ = 0.10 × 180 + 0.90 × 175 = 18 + 157.5 = 175.50

How was F₃ calculated? F₃ = 0.10 × 168 + 0.90 × 175.50 = 16.8 + 157.95 = 174.75

FMTS

Naive & Moving Averages

Naive Forecast

Ft+1 = At

"Tomorrow will be the same as today." Free, simple, surprisingly hard to beat sometimes. Used as a baseline benchmark.

Simple Moving Average (SMA)

Ft+1 = (At + At-1 + ... + At-N+1) / N
Worked Example — 3-month SMA

Data: Jan=12, Feb=13, Mar=16, Apr=19, May=23

1
FApr = (12 + 13 + 16) / 3 = 41/3 = 13.67
2
FMay = (13 + 16 + 19) / 3 = 48/3 = 16.00
3
FJun = (16 + 19 + 23) / 3 = 58/3 = 19.33

Notice: each new forecast "drops" the oldest value and "adds" the newest.

Weighted Moving Average

Ft+1 = (w₁·At + w₂·At-1 + w₃·At-2) / (w₁+w₂+w₃)

Example with weights 3, 2, 1: F = (3×16 + 2×13 + 1×12) / 6 = (48+26+12)/6 = 86/6 = 14.33

Visual Intuition

Raw Series vs Smoothers

This plot uses the exact worked-example numbers: actuals for Jan-May, then SMA and WMA forecasts for Apr-Jun.

Jan Feb Mar Apr May Jun 12 15 18 21 24 forecast months WMA stays closer because it weights recent data more SMA is smoother, but lags more
Actual 3-period SMA 3,2,1 WMA
Exam takeaway: the smoother the line, the more it lags when the series is trending.

Decision Cue

What Bigger N Does

Increasing the window lowers noise but delays the forecast’s response to real movement.

actual smaller N follows turns faster larger N looks calmer but reacts later time
Actual Smaller N Larger N
Exam takeaway: if the series is trending hard, a large moving average can look “clean” while still forecasting too low or too high.
The big trade-off with N: More periods → smoother but slower to react (lags behind trends). Fewer periods → reactive but noisy. A 12-month MA on monthly data completely destroys the seasonal pattern!
FMTS

Exponential Smoothing — Full Explanation

Think of exponential smoothing as a "smart moving average" that puts more weight on recent data and less on older data, with the weights decaying exponentially into the past. You only need the last forecast and last actual — no need to store all history!
Ft+1 = α × At + (1 − α) × Ft Same thing written differently: Ft+1 = Ft + α × (At − Ft) = Old Forecast + α × (Error)

What does α control? It's the "reactivity dial" between 0 and 1:

Low α (0.05 – 0.2)
  • Forecast is mostly old forecast (heavy smoothing)
  • Slow to react to changes
  • Great when data is noisy but stable
  • Example: α=0.1 means 10% new data, 90% old forecast
High α (0.5 – 0.9)
  • Forecast is mostly last actual value
  • Reacts quickly to changes
  • Good when level is shifting
  • But overreacts to noise!
  • At α=1.0, it becomes a naive forecast

How weights actually decay

With α=0.1, the weight on each past period's sales is:

Period t: α = 0.10 (10.0% weight) Period t-1: α(1-α) = 0.09 (9.0%) Period t-2: α(1-α)² = 0.081 (8.1%) Period t-3: α(1-α)³ = 0.073 (7.3%) ...and so on, decaying exponentially forever
Exponential smoothing assumes NO trend and NO seasonality. The forecast for ALL future periods is the same flat number. If your data has trend or seasonality, you need Holt's or Winters' method.
FMTS

Holt's Method (Exponential Smoothing with Trend)

Adds a trend component to capture upward/downward movement. Uses two constants: α (level) and β (trend).

Step 1 — Smooth the level: Ft = α × At-1 + (1-α) × (Ft-1 + Tt-1) Step 2 — Smooth the trend: Tt = β × (Ft − Ft-1) + (1-β) × Tt-1 Step 3 — Combined forecast: FITt = Ft + Tt For h periods ahead: FITt+h = Ft + h × Tt

Visual Intuition

Holt Tracks the Slope

The smoothed level absorbs noise, then Holt projects forward using the current trend estimate instead of staying flat.

t1 t2 t3 t4 t5 t6 t7 t8 t9 level = smoothed baseline FIT forecast extends the current slope
Actual Level Forecast
Exam takeaway: Holt is still smoothing, but it refuses to stay flat because it carries an explicit trend estimate.
Worked Example — Portland Manufacturer (from lecture)

Given: F₁ = 11, T₁ = 2, α = 0.2, β = 0.4. Actual data: A₁=12, A₂=17, ...

Computing Month 2:

1
F₂ = α × A₁ + (1-α) × (F₁ + T₁) = 0.2 × 12 + 0.8 × (11 + 2) = 2.4 + 10.4 = 12.80
2
T₂ = β × (F₂ − F₁) + (1-β) × T₁ = 0.4 × (12.8 − 11) + 0.6 × 2 = 0.72 + 1.2 = 1.92
3
FIT₂ = F₂ + T₂ = 12.8 + 1.92 = 14.72

Computing Month 3 (A₂=17):

1
F₃ = 0.2 × 17 + 0.8 × (12.8 + 1.92) = 3.4 + 11.776 = 15.18
2
T₃ = 0.4 × (15.18 − 12.80) + 0.6 × 1.92 = 0.952 + 1.152 = 2.10
3
FIT₃ = 15.18 + 2.10 = 17.28
FMTS — Important

Winters' Method (Trend + Seasonality)

Multiplicative Formulation

Level: Lt = α × (Yt / St-m) + (1-α) × (Lt-1 + Bt-1) Trend: Bt = γ × (Lt − Lt-1) + (1-γ) × Bt-1 Seasonal: St = δ × (Yt / Lt) + (1-δ) × St-m Forecast h steps ahead: Ŷt+h = (Lt + Bt × h) × St+h-m
What each piece does in the Level formula:
Yt / St-m = "de-seasonalize" this period's actual (divide out last year's seasonal factor) → gives you an estimate of the level without seasonal noise
Lt-1 + Bt-1 = last period's level + trend → the "expected level" from the previous model
• α blends these two estimates together

Seasonality Shape

Additive vs Multiplicative

Both have trend and repeating waves. The difference is whether the wave stays constant or grows with the level.

Base trend same slope for both additive wave stays similar multiplicative wave widens
Additive seasonality Multiplicative seasonality
Exam takeaway: if the seasonal peaks get larger as the series rises, think multiplicative.

Forecast Logic

De-seasonalize, Then Reapply Season

Winters first strips the seasonal effect out, updates level and trend, then puts seasonality back onto the future forecast.

1. Actual Yₜ observed with season 2. Divide by Sₜ₋ₘ remove old seasonality 3. Update Lₜ, Bₜ smooth level and trend 4. Update Sₜ refresh seasonal factor 5. Forecast = (L + Bh) × S trend core times season
Exam takeaway: multiplicative Winters is “trend forecast first, seasonal multiplier last.”

Initialization (How to Get Started)

Winters' needs initial values for L₀, B₀, and all m seasonal factors. The procedure from your lecture (Kuliah 6):

Full Initialization Procedure (Multiplicative)
1
Fit a least-squares regression to the first few years of data (at least 4-5 years for quarterly data). The y-intercept = L₀ (initial level), the slope = B₀ (initial trend).
2
Compute fitted values ŷt = L₀ + B₀ × t for each period t = 1, 2, ..., n used in regression.
3
Detrend the data: Compute S₀t = Yt / ŷt for each period. This removes the trend and gives you raw seasonal ratios.
4
Average seasonal values for each season across years. For quarterly data: average all Q1 ratios, all Q2 ratios, all Q3 ratios, all Q4 ratios.
Example: S̄[Q1] = (0.7368 + 0.7156 + 0.6894 + 0.6831) / 4 = 0.7062
5
Normalize so seasonal factors average to 1.0:
CF = m / Σ(averages) where m = number of seasons (4 for quarterly)
Multiply each average by CF.
Example: if averages sum to 3.9999, CF = 4/3.9999 ≈ 1.0000
Initial seasonal factors: S₋₃=0.7062, S₋₂=1.1114, S₋₁=1.2937, S₀=0.8886
The normalization constraint:
• Multiplicative: seasonal factors must average to 1.0 (equivalently, sum to m)
• Additive: seasonal factors must sum to 0
A factor of 1.15 means that season is 15% above the average. A factor of 0.85 means 15% below.
For the multiplicative forecast formula (L + B×h) × S, remember it's × (times), not + (plus). The additive version uses +. This distinction has appeared on exams.
ARIMA Deep Dive

What is ARIMA? — The Big Picture

ARIMA stands for AutoRegressive Integrated Moving Average. It's the most powerful classical time series method. While FMTS techniques assume which patterns exist (and use fixed formulas), ARIMA first analyzes the data to discover what patterns are there, then builds a custom model.

Think of ARIMA as being built from three ingredients you mix together:

AR (AutoRegressive): Today's value depends on yesterday's value (and maybe the day before, etc.)
I (Integrated): We may need to "difference" the data to make it stationary first
MA (Moving Average): Today's value depends on yesterday's forecast error (and maybe earlier errors)

ARIMA(p, d, q) → p = AR order, d = differencing order, q = MA order
Why do we need this? Sometimes a time series has complex dependency patterns that simple exponential smoothing can't capture. Maybe today's sales depend on sales from 2 weeks ago, or maybe they depend on how wrong our recent forecasts were. ARIMA can model these patterns.
ARIMA

AR (AutoRegressive) Models — Explained from Scratch

The Idea

In an AR model, the current value Xt is predicted using past values of itself. It's literally a regression where the predictors are the variable's own lagged values.

AR(1) — First Order

Xt = β₀ + β₁·Xt-1 + et

This says: "Today's value = some constant + some fraction of yesterday's value + random shock."

If β₁ = 0: There's no temporal dependence. Xt is just random noise around β₀.

If β₁ is large (close to 1): Yesterday's value strongly influences today. The series has "memory."

If β₁ > 1: The series is explosive (grows without bound) — not stationary!

Real-world analogy: Think of temperature. Today's temperature is strongly related to yesterday's (high AR). But tomorrow's stock price might barely relate to today's (low AR). An AR model captures exactly this kind of "yesterday influences today" relationship.

AR(2) — Second Order

Xt = β₁·Xt-1 + β₂·Xt-2 + et

Now today depends on the last TWO values. More complex patterns can be captured — like oscillating behavior.

AR(p) — General

Xt = β₁·Xt-1 + β₂·Xt-2 + ... + βp·Xt-p + et

Today depends on the last p values. p is the "order" of the AR model.

How to tell if you have an AR process

AR Signature:
PACF: sharp cutoff after lag p (only first p lags are significant, rest drop to zero)
ACF: gradual decay (slowly decreasing, possibly oscillating, but many lags significant)

The PACF tells you the order: if PACF cuts off after lag 2 → AR(2).
If model works well, the residuals (what's left after fitting) should look random — no patterns, no dependence. If residuals still show patterns, the model isn't capturing everything.
ARIMA

MA (Moving Average) Models — Explained from Scratch

The Idea

In an MA model, the current value depends on past random shocks (errors), NOT past values of itself. It's like saying: "Today's value is the normal level, plus an echo of yesterday's surprise, plus an echo of the surprise before that..."

MA(1) — First Order

Xt = μ + et + α₁·et-1

μ = the mean level. et = today's random shock. α₁·et-1 = an echo of yesterday's shock.

If α₁ = 0: no temporal dependence, just pure random noise around the mean.

If α₁ is large: past shocks strongly influence current value.

Real-world analogy: Imagine a lake. You throw a stone (shock) and it creates ripples. The MA model says today's water level is the normal level plus the ripple from today's stone plus the lingering ripple from yesterday's stone. An MA(2) would include ripples from two days ago too.

MA(2) — Second Order

Xt = μ + et + α₁·et-1 + α₂·et-2

How to tell if you have an MA process

MA Signature:
ACF: sharp cutoff after lag q (only first q lags are significant)
PACF: gradual decay (slowly decreasing)

The ACF tells you the order: if ACF cuts off after lag 1 → MA(1).
Extra hint: If the lag-1 autocorrelation is NEGATIVE, strongly consider an MA term.
ARIMA — Critical for Exam

ACF & PACF — The Complete Identification Guide

What Are These?

ACF (Autocorrelation Function): Measures the correlation between the series and lagged versions of itself. ACF at lag 3 = correlation between Xt and Xt-3. This includes BOTH direct effects and indirect effects that "propagate" through intermediate lags.

PACF (Partial Autocorrelation Function): Measures the correlation between Xt and Xt-k AFTER removing the effects of all intermediate lags. So PACF at lag 3 = the "pure" correlation at lag 3 that isn't explained by lags 1 and 2.

Analogy: Imagine you measure how much a child's height correlates with their grandparent's height. The ACF would show a strong correlation (because height passes through the parent). The PACF would remove the parent's influence first, showing only the "direct" grandparent effect — which is much weaker.

The Master Identification Table

ModelACF PatternPACF PatternHow to Read It
AR(1)Decays exponentially from lag 1 (many lags slowly fading)Cuts off sharply after lag 1 (only lag 1 is significant)PACF shows 1 spike then nothing → AR(1)
AR(2)Decays gradually (may oscillate — go positive/negative)Cuts off after lag 2 (lags 1 and 2 significant, rest zero)PACF shows 2 spikes then nothing → AR(2)
MA(1)Cuts off sharply after lag 1 (only lag 1 is significant)Decays exponentially from lag 1ACF shows 1 spike then nothing → MA(1)
MA(2)Cuts off after lag 2 (lags 1 and 2 significant, rest zero)Decays graduallyACF shows 2 spikes then nothing → MA(2)
ARMA(1,1)Both decay graduallyBoth decay graduallyNeither cuts off cleanly → try ARMA
The Memory Trick (never forget this!):

AR → PACF cuts off (the one with the different letter cuts off: A≠P)
MA → ACF cuts off (the one with the matching A cuts off: MA↔ACF)

Or even simpler: "AR = Partial, MA = Auto" — each model is identified by the function that cuts off.
AR(1)
ACF
PACF

PACF cuts off first, so this is AR.

AR(2)
ACF
PACF

Two PACF spikes then silence is the AR(2) signature.

MA(1)
ACF
PACF

ACF cuts off first, so this is MA.

MA(2)
ACF
PACF

Two ACF spikes then silence means MA(2).

What "Cuts Off" vs "Decays" Looks Like

Sharp Cutoff
Lag 1: ████████ 0.65 (BIG) Lag 2: █ 0.05 (tiny) Lag 3: ░ 0.02 (tiny) Lag 4: ░ -0.01 (tiny) ...all remaining near zero

One or two significant spikes, then everything drops into the "not significant" band (dashed lines on the plot).

Gradual Decay
Lag 1: ████████ 0.65 (big) Lag 2: ██████ 0.42 (still big) Lag 3: ████ 0.27 (smaller) Lag 4: ███ 0.18 (smaller) Lag 5: ██ 0.12 (fading) ...slowly dying out

Values slowly shrink over many lags. May be all positive (exponential decay) or alternate positive/negative (oscillating decay).

On the exam, you might see an ACF/PACF plot and be asked "What model does this suggest?" Follow these steps:
1. Look at PACF — does it cut off sharply? If yes → AR model, order = lag where it cuts off
2. Look at ACF — does it cut off sharply? If yes → MA model, order = lag where it cuts off
3. Both decay gradually? → Try ARMA. Try the simplest (1,1) first.
4. "Significant" = the bar extends beyond the blue dashed confidence lines on the plot
ARIMA

ARMA & ARIMA Models

ARMA(p,q) — Mixing AR and MA

Xt = β₁Xt-1 + ... + βpXt-p + et + α₁et-1 + ... + αqet-q

Combines autoregressive terms (past values) with moving average terms (past errors). Sometimes an ARMA(1,1) works better than AR(3,0) and is simpler.

Rule of thumb: Simpler models are better! If ARMA(1,1) fits as well as AR(3), use ARMA(1,1). Also, if a mixed ARMA model seems to fit, try dropping one AR and one MA term — they might be canceling each other.

ARIMA(p,d,q) — Adding Differencing for Non-Stationary Data

Problem: AR and MA only work on stationary data (constant mean, constant variance). Real data often has trends → non-stationary.

Solution: Difference the data to remove the trend, then apply ARMA.

First difference: Y't = Yt − Yt-1 This removes a linear trend. Second difference: Y''t = Y't − Y't-1 This removes a quadratic trend. d = how many times you difference.

ARIMA Notation

Notationp (AR)d (diff)q (MA)English
ARIMA(1,0,0)100Just AR(1) — no differencing, no MA
ARIMA(0,0,1)001Just MA(1) — no differencing, no AR
ARIMA(1,0,1)101ARMA(1,1) — mixed, no differencing
ARIMA(1,1,0)110AR(1) on first-differenced data
ARIMA(0,1,1)011MA(1) on first-differenced data
ARIMA(2,0,0)200AR(2) on original stationary data
ARIMA

The Box-Jenkins Process — Step by Step

This is the systematic approach to building an ARIMA model. It's an iterative cycle:

STEP 1: PLOT THE SERIES ↓ Is it stationary? (constant mean, constant variance) No → Difference it (d=1, maybe d=2) until stationary Yes ↓ STEP 2: MODEL IDENTIFICATION Examine ACF and PACF of the (differenced) series Use the signature table to pick p and q ↓ STEP 3: MODEL ESTIMATION Fit the ARIMA(p,d,q) model (minimize sum of squared errors) ↓ STEP 4: IS THE MODEL ADEQUATE? Check 1: Are residuals random? (plot them, Ljung-Box test) Check 2: Are coefficients significant? (check standard errors) Check 3: Compare AIC/BIC with alternative models (lower = better) │ No → Modify p and/or q, go back to Step 3 Yes ↓ STEP 5: FORECAST Use the estimated model to predict future values Include confidence intervals (±2 × standard error)

Stationarity — When Do You Difference?

A stationary series fluctuates around a constant mean with constant variance. Visually, it looks like "random noise around a flat line."

A non-stationary series has a trending mean or changing variance. If you see the data going up or down over time → non-stationary → difference it.

The Ljung-Box Test

H₀: The residuals are random (white noise) — THIS IS WHAT WE WANT H₁: The residuals are NOT random — model is inadequate If p-value > 0.05 → fail to reject H₀ → residuals are random → MODEL IS GOOD ✓ If p-value < 0.05 → reject H₀ → residuals have patterns → MODEL NEEDS WORK ✗ fitdf parameter = p + q (total AR + MA terms in the model)
For model comparison, lower AIC = better model. In the GNP growth rate example from lecture: ARIMA(1,0,0) had AIC=2261.7, ARIMA(0,0,1) had AIC=2273.4, and ARIMA(2,0,0) had AIC=2255.3. So AR(2) was the best — lowest AIC wins.
ARIMA

Reading R Output — A Trap to Know

Your lecture specifically warned about this:

R calls the mean "intercept" — but it's NOT the true intercept when AR terms are present!

For an AR(1) model: Xt = β₀ + β₁·Xt-1 + et
The mean μ and intercept β₀ are related by: μ = β₀ / (1 − β₁)
Or equivalently: β₀ = μ × (1 − β₁)

R gives you μ (labeled as "intercept"). You need to convert if you want the true intercept.
Example from Lecture — GNP Growth Rate

R output for ARIMA(1,0,0):

Coefficients: ar1 intercept 0.390 36.093 ← R calls this "intercept" but it's actually the MEAN (μ) s.e. 0.062 4.268 sigma² = 1513, AIC = 2261.66

To get the true intercept β₀:

β₀ = μ × (1 − β₁) = 36.093 × (1 − 0.39) = 36.093 × 0.61 = 22.017

The actual model equation: Xt = 22.017 + 0.39·Xt-1 + et

For the ARIMA(2,0,0) model: β₀ = 36.052 × (1 − (0.3136 + 0.1931)) = 36.052 × 0.4933 = 17.784

Model: Xt = 17.784 + 0.3136·Xt-1 + 0.1931·Xt-2 + et

When is mean = intercept? Only when there is NO AR term (p=0). For pure MA models, R's "intercept" IS the true mean/intercept.
ARIMA

ARIMA — Practice Scenarios

Scenario 1: You plot ACF and see: lag 1 = 0.8, lag 2 = 0.6, lag 3 = 0.4, lag 4 = 0.25... (slowly decaying). PACF shows: lag 1 = 0.8, lag 2 = −0.05, lag 3 = 0.02 (sharp cutoff after lag 1). What model?

Scenario 1 ACF
1 2 3 4
Scenario 1 PACF
1 2 3 4
Read it fast: decaying ACF + cutoff PACF means AR.
PACF cuts off sharply after lag 1 (only lag 1 significant) → AR signature. ACF decays gradually → confirms AR. Order = 1 (where PACF cuts off). Model: AR(1) = ARIMA(1,0,0).

Scenario 2: ACF shows lag 1 = −0.6, lag 2 = 0.03, lag 3 = −0.01 (cutoff after lag 1). PACF decays: lag 1 = −0.6, lag 2 = −0.3, lag 3 = −0.15... What model?

Scenario 2 ACF
1 2 3 4
Scenario 2 PACF
1 2 3 4
Read it fast: cutoff ACF, even if negative, still points to MA.
ACF cuts off sharply after lag 1 → MA signature. Lag-1 autocorrelation is negative (−0.6) → extra confirmation for MA. PACF decays → confirms MA. Model: MA(1) = ARIMA(0,0,1).

Scenario 3: The original series trends upward. After first differencing, the series looks stationary. The differenced series has PACF cutting off at lag 2 and ACF decaying. What model?

We differenced once (d=1). The differenced series shows AR(2) signature (PACF cuts off at 2, ACF decays). So the model is ARIMA(p=2, d=1, q=0).

Scenario 4: Both ACF and PACF decay gradually (neither cuts off cleanly). What should you try?

Both decaying = ARMA signature. Start with ARMA(1,1). Compare AIC values for different (p,q) combinations. Remember: simpler models are better, and AR/MA terms can cancel each other out.
Fuzzy TS Deep Dive

What is Fuzzy Time Series?

The core idea: Classical methods (ARIMA, smoothing) work with precise numbers. But forecasting is inherently uncertain! Fuzzy Time Series says: instead of working with exact values like "15,460 students," let's work with fuzzy categories like "many students" — and build relationships between these categories to forecast.

Traditional (Boolean) logic: a student count is either in the range [15000, 16000] or it's not. It's 0 or 1.

Fuzzy logic: a student count of 15,900 might be "mostly in [15000, 16000]" (membership 0.9) and "slightly in [16000, 17000]" (membership 0.1). It can belong to multiple sets with different degrees.

Why Fuzzy for Time Series?

Statistical methods like ARIMA are powerful but they're "certain" — they give you a precise number. Fuzzy TS embraces uncertainty and works well when data is limited, imprecise, or when you want an interpretable model. It was first proposed by Song and Chissom in 1993 and improved significantly by Chen in 1996.

Fuzzy TS — Critical

Chen's 6-Step Method — Fully Explained

Step 1: Define the Universe of Discourse & Partition It

First, find the range of your data and add some padding:

U = [Dmin − D₁, Dmax + D₂] D₁ and D₂ are arbitrary buffer values to ensure all data fits comfortably. Then divide U into n EQUAL-length intervals: u₁, u₂, ..., uₙ

The number of intervals (n) and the buffer values (D₁, D₂) are your choices. They significantly affect accuracy.

Step 2: Define Fuzzy Sets on the Universe of Discourse

Each interval gets a fuzzy set A₁, A₂, ..., Aₙ. Each set has a triangular membership function:

A₁ = 1/u₁ + 0.5/u₂ + 0/u₃ + 0/u₄ + ... A₂ = 0.5/u₁ + 1/u₂ + 0.5/u₃ + 0/u₄ + ... A₃ = 0/u₁ + 0.5/u₂ + 1/u₃ + 0.5/u₄ + ... ... Format: membership_degree/interval Ai has membership 1 in ui, 0.5 in adjacent intervals, 0 elsewhere.

These can be given linguistic labels: A₁ = "not many", A₂ = "not too many", A₃ = "many", etc.

Step 3: Fuzzify Historical Data

For each historical data point, find which interval it falls in, and assign it to the fuzzy set with the highest membership degree (which is the set whose interval contains the value).

Example: if enrollment is 15,460 and u₃ = [15000, 16000] Then 15,460 ∈ u₃, so it gets fuzzified as A₃.

Step 4: Identify Fuzzy Logical Relationships (FLR)

Look at consecutive time periods. If F(t−1) is fuzzified as Aj and F(t) is fuzzified as Ak, then:

FLR: Aj → Ak This is a FIRST-ORDER relationship (depends on only 1 previous period). "If last year was Aj, then this year is Ak."

Step 5: Establish Fuzzy Logical Relationship Groups (FLRG)

Group all FLRs that have the same left-hand side. Merge the right-hand sides (remove duplicates):

If you have: A₃ → A₂, A₃ → A₃, A₃ → A₄, A₃ → A₃ Group becomes: A₃ → A₂, A₃, A₄ (duplicates removed)

Step 6: Defuzzify the Forecasted Output

To forecast F(t), check which fuzzy set F(t−1) belongs to, look up its FLRG, then:

Case 1: Empty group

If Aj has no relationships (never appeared as a left-hand side), forecast = midpoint of uj

Case 2: One-to-one (Aj → Ak)

Only one target. Forecast = midpoint of uk

Case 3: One-to-many (Aj → Aa, Ab, Ac)

Forecast = average of midpoints of ua, ub, uc

Example: A₃ → A₂, A₃, A₅. Midpoints: m₂=14500, m₃=15500, m₅=17500.
Forecast = (14500 + 15500 + 17500) / 3 = 47500 / 3 = 15,833

Fuzzy TS

Full Worked Example — Alabama Enrollments (Chen's Method)

Complete Solution — Using ACTUAL VALUES

Data: Enrollments 1971–1992 at the University of Alabama

YearEnrollYearEnrollYearEnroll
197113055197916807198716859
197213563198016919198818150
197313867198116388198918970
197414696198215433199019328
197515460198315497199119337
197615311198415145199218876
197715603198515163
197815861198615984

Step 1: Universe of Discourse

Min = 13055, Max = 19337. Let D₁ = 55, D₂ = 663 → U = [13000, 20000]

Divide into 7 equal intervals (length = 1000 each):

u₁ = [13000, 14000] midpoint = 13500 u₂ = [14000, 15000] midpoint = 14500 u₃ = [15000, 16000] midpoint = 15500 u₄ = [16000, 17000] midpoint = 16500 u₅ = [17000, 18000] midpoint = 17500 u₆ = [18000, 19000] midpoint = 18500 u₇ = [19000, 20000] midpoint = 19500

Step 2: Define Fuzzy Sets

A₁ = 1/u₁ + 0.5/u₂ + 0/u₃ + 0/u₄ + 0/u₅ + 0/u₆ + 0/u₇ A₂ = 0.5/u₁ + 1/u₂ + 0.5/u₃ + 0/u₄ + 0/u₅ + 0/u₆ + 0/u₇ A₃ = 0/u₁ + 0.5/u₂ + 1/u₃ + 0.5/u₄ + 0/u₅ + 0/u₆ + 0/u₇ A₄ = 0/u₁ + 0/u₂ + 0.5/u₃ + 1/u₄ + 0.5/u₅ + 0/u₆ + 0/u₇ A₅ = 0/u₁ + 0/u₂ + 0/u₃ + 0.5/u₄ + 1/u₅ + 0.5/u₆ + 0/u₇ A₆ = 0/u₁ + 0/u₂ + 0/u₃ + 0/u₄ + 0.5/u₅ + 1/u₆ + 0.5/u₇ A₇ = 0/u₁ + 0/u₂ + 0/u₃ + 0/u₄ + 0/u₅ + 0.5/u₆ + 1/u₇

Step 3: Fuzzify Every Data Point

YearEnrollIntervalFuzzifiedYearEnrollIntervalFuzzified
197113055u₁A₁198215433u₃A₃
197213563u₁A₁198315497u₃A₃
197313867u₁A₁198415145u₃A₃
197414696u₂A₂198515163u₃A₃
197515460u₃A₃198615984u₃A₃
197615311u₃A₃198716859u₄A₄
197715603u₃A₃198818150u₆A₆
197815861u₃A₃198918970u₆A₆
197916807u₄A₄199019328u₇A₇
198016919u₄A₄199119337u₇A₇
198116388u₄A₄199218876u₆A₆

Step 4: Build FLRs (consecutive pairs)

1971→1972: A₁ → A₁ 1977→1978: A₃ → A₃ 1983→1984: A₃ → A₃ 1972→1973: A₁ → A₁ 1978→1979: A₃ → A₄ 1984→1985: A₃ → A₃ 1973→1974: A₁ → A₂ 1979→1980: A₄ → A₄ 1985→1986: A₃ → A₃ 1974→1975: A₂ → A₃ 1980→1981: A₄ → A₄ 1986→1987: A₃ → A₄ 1975→1976: A₃ → A₃ 1981→1982: A₄ → A₃ 1987→1988: A₄ → A₆ 1976→1977: A₃ → A₃ 1982→1983: A₃ → A₃ 1988→1989: A₆ → A₆ 1989→1990: A₆ → A₇ 1990→1991: A₇ → A₇ 1991→1992: A₇ → A₆

Step 5: Build FLRGs

A₁ → A₁, A₂ A₂ → A₃ A₃ → A₃, A₄ A₄ → A₃, A₄, A₆ A₆ → A₆, A₇ A₇ → A₆, A₇

Step 6: Forecast (example: forecast 1975)

1974 is fuzzified as A₂. FLRG: A₂ → A₃ (one-to-one)

Forecast = midpoint of u₃ = 15,500 (actual: 15,460 — very close!)

Forecast 1976: 1975 is A₃. FLRG: A₃ → A₃, A₄ (one-to-many)

Forecast = (midpoint u₃ + midpoint u₄) / 2 = (15500 + 16500) / 2 = 16,000 (actual: 15,311)

Fuzzy TS

The Variations Approach — Handling Trend

The Problem: Chen's original method works on actual values. But if there's a strong upward trend, the values keep climbing and the FLRs become less useful (everything just goes up). The forecasts tend to lag behind.

The Fix: Instead of fuzzifying actual values, compute the year-to-year changes (variations) first: V(t) = Y(t) − Y(t−1). Then apply Chen's method to the variation series. This removes the trend from the analysis!

To get the final forecast: The defuzzified output gives you a predicted CHANGE. Add this change to the last known actual value: Forecast(t) = Actual(t−1) + Predicted Change.

Why It Works

Trend Before vs After Variations

This version mirrors the Alabama example: rising enrollments through 1975, then a negative variation in 1976.

Raw values Y(t) Variations V(t) 71 72 73 74 75 76 72 73 74 75 76 raw series still trends up variations switch sign and center near zero
Exam takeaway: fuzzy variations model the changes, not the climbing level, so the rule base stops getting dragged upward.

Forecast Rule

Predict Change, Then Add It Back

The fuzzy model outputs a variation first. Converting that to the final enrollment forecast is a second step.

Last actual Y(t-1) = 15,460 Predicted change V̂(t) = +500 Final forecast 15,460 + 500 = 15,960 Forecast(t) = Actual(t−1) + Predicted Change
Exam takeaway: the defuzzified result is not the final enrollment itself until you add it back to the latest actual value.
From Your Professor's Actual Test!

Fuzzy Exam Problem — Fully Solved

📝 The Exact Problem from Your In-Class Test

"Use variations of historical enrollment data. Let U = [−1000, 1400] be the universe of discourse. Partition into 4 equal intervals to produce 4 fuzzy sets. Find first-order FLRGs. Forecast enrollment for 1975 and 1976."

Complete Step-by-Step Solution

Step 0: Compute ALL Variations

YearEnrollmentVariation V(t)YearEnrollmentVariation V(t)
19711305519821543315433−16388 = −955
19721356313563−13055 = 50819831549715497−15433 = 64
19731386713867−13563 = 30419841514515145−15497 = −352
19741469614696−13867 = 82919851516315163−15145 = 18
19751546015460−14696 = 76419861598415984−15163 = 821
19761531115311−15460 = −14919871685916859−15984 = 875
19771560315603−15311 = 29219881815018150−16859 = 1291
19781586115861−15603 = 25819891897018970−18150 = 820
19791680716807−15861 = 94619901932819328−18970 = 358
19801691916919−16807 = 11219911933719337−19328 = 9
19811638816388−16919 = −53119921887618876−19337 = −461

Step 1: Universe of Discourse (given)

U = [−1000, 1400], divide into 4 equal intervals.

Range = 1400 − (−1000) = 2400. Interval length = 2400 / 4 = 600

u₁ = [−1000, −400] midpoint = −700 u₂ = [−400, 200] midpoint = −100 u₃ = [200, 800] midpoint = 500 u₄ = [800, 1400] midpoint = 1100

Step 2: Define 4 Fuzzy Sets

A₁ = 1/u₁ + 0.5/u₂ + 0/u₃ + 0/u₄ A₂ = 0.5/u₁ + 1/u₂ + 0.5/u₃ + 0/u₄ A₃ = 0/u₁ + 0.5/u₂ + 1/u₃ + 0.5/u₄ A₄ = 0/u₁ + 0/u₂ + 0.5/u₃ + 1/u₄

Step 3: Fuzzify Each Variation

YearV(t)Falls inFuzzifiedYearV(t)Falls inFuzzified
1972508u₃ [200,800]A₃198364u₂ [−400,200]A₂
1973304u₃A₃1984−352u₂A₂
1974829u₄ [800,1400]A₄198518u₂A₂
1975764u₃A₃1986821u₄A₄
1976−149u₂A₂1987875u₄A₄
1977292u₃A₃19881291u₄A₄
1978258u₃A₃1989820u₄A₄
1979946u₄A₄1990358u₃A₃
1980112u₂A₂19919u₂A₂
1981−531u₁ [−1000,−400]A₁1992−461u₁A₁
1982−955u₁A₁

Step 4: Build ALL FLRs

V(1972)→V(1973): A₃ → A₃ V(1978)→V(1979): A₃ → A₄ V(1984)→V(1985): A₂ → A₂ V(1973)→V(1974): A₃ → A₄ V(1979)→V(1980): A₄ → A₂ V(1985)→V(1986): A₂ → A₄ V(1974)→V(1975): A₄ → A₃ V(1980)→V(1981): A₂ → A₁ V(1986)→V(1987): A₄ → A₄ V(1975)→V(1976): A₃ → A₂ V(1981)→V(1982): A₁ → A₁ V(1987)→V(1988): A₄ → A₄ V(1976)→V(1977): A₂ → A₃ V(1982)→V(1983): A₁ → A₂ V(1988)→V(1989): A₄ → A₄ V(1977)→V(1978): A₃ → A₃ V(1983)→V(1984): A₂ → A₂ V(1989)→V(1990): A₄ → A₃ V(1990)→V(1991): A₃ → A₂ V(1991)→V(1992): A₂ → A₁

Step 5: Group into FLRGs

A₁ → A₁, A₂ A₂ → A₁, A₂, A₃, A₄ A₃ → A₂, A₃, A₄ A₄ → A₂, A₃, A₄

Step 6: Forecast 1975

We need the variation for 1975. The variation for 1974 is V(1974) = 829 → fuzzified as A₄.

FLRG for A₄: A₄ → A₂, A₃, A₄

Predicted variation = average of midpoints = (−100 + 500 + 1100) / 3 = 1500 / 3 = 500

Forecast enrollment 1975 = Actual(1974) + predicted change = 14696 + 500 = 15,196

(Actual 1975 = 15,460. Error = 264)

Forecast 1976

The variation for 1975 is V(1975) = 764 → fuzzified as A₃.

FLRG for A₃: A₃ → A₂, A₃, A₄

Predicted variation = (−100 + 500 + 1100) / 3 = 500

Forecast enrollment 1976 = Actual(1975) + predicted change = 15460 + 500 = 15,960

(Actual 1976 = 15,311. Error = 649)

Question 3 from the test: Effect of Increasing Order

From 1st to 2nd order: Instead of just looking at one previous variation to predict the next, you look at TWO previous variations together. For example, the pattern (A₃, A₄) → A₃ is more specific than just A₄ → ?. This generally improves accuracy because you capture more context. Think of it like: instead of predicting weather based on just today, you use today AND yesterday's weather.

From 1st to 20th order: You'd use 20 previous variations to predict the next. With only 21 data points (1972-1992), you'd have very few examples of any specific 20-length pattern. This likely leads to overfitting — the model memorizes the training data but can't generalize. Most patterns won't repeat, so many FLRGs will be empty or have only one example. There's a sweet spot: higher order helps up to a point, then accuracy plateaus or degrades.

Fuzzy TS

All 4 Improvements to Chen's Method

1. High-Order Models

Problem: First-order only uses F(t−1). Why just one previous value?

Solution: Use multiple previous values. 2nd order: (Ai, Aj) → Ak. 3rd order: (Ai, Aj, Ak) → Am.

Dynamic approach (Chen et al., 2015): Use LCS/LRS (Longest Common/Repeated Subsequence) algorithm to automatically find the optimal order from the data.

2. Better Universe of Discourse (Poulsen, 2009)

Problem: D₁, D₂ are arbitrary. Equal intervals may not suit the data.

Solution: (1) Sort values ascending. (2) Compute average distance between consecutive sorted values + standard deviation. (3) Remove outliers (distances > 1 std dev from average). (4) Compute revised average distance (ADR). (5) Use U = [Dmin − ADR, Dmax + ADR]. (6) Use trapezoidal (not triangular) membership functions for smoother transitions.

3. Non-Equal Interval Lengths via Clustering

Problem: Equal-width intervals waste resolution on sparse regions and under-represent dense regions.

Solution (Singh & Samariya, 2015): Use data clustering (e.g., k-means) to determine natural groupings in the data. Dense clusters get narrower intervals (more precision where data concentrates), sparse areas get wider intervals.

4. Trend Handling via Variations (Chen et al., 2015)

Problem: Chen's method on raw values doesn't explicitly handle trend.

Solution: Compute V(t) = Y(t) − Y(t−1) and apply FTS to the variation series. This is the approach from your professor's test! The universe of discourse is defined over variations instead of actual values.

OMTS

Decomposition Analysis

An Open-Model technique that breaks data into Level, Trend, Seasonality, Noise using a centered moving average. Needs 48+ data points.

Steps: 1. Compute CENTERED 12-month MA → removes noise and seasonality → gives Level + Trend 2. Trend = change between consecutive MA values 3. Original − MA = Seasonality + Noise 4. Average within each month across years → isolates Seasonality (removes noise) 5. Forecast = Last Level + (months ahead × Trend) + Seasonal adjustment Example: F(Jan04) = 2584 + (7 × 45) + (−203) = 2696
Decomposition uses additive seasonal adjustments. These are added to (Level + Trend), not multiplied.
Associative

Regression Analysis

ŷ = a + bx where b = (nΣxy − ΣxΣy)/(nΣx² − (Σx)²) a = ȳ − bx̄

r (correlation coefficient): −1 to +1. Strength of linear relationship. Does NOT prove causation.

R² = r²: % of variation in y explained by x. Example: r = 0.901 → R² = 0.81 → 81% explained.

Standard Error Sy,x: Used for prediction intervals: ŷ ± tα/2 × Sy,x (use t with df = n−2).

Nodel Construction Example

Sales = 1.75 + 0.25 × (payroll). Payroll next year = $6B.

Forecast: Sales = 1.75 + 0.25(6) = $3.25M. Sy,x = $0.306M. With n=6, df=4, t₀.₀₂₅ = 2.78:

95% CI: $3.25M ± 2.78 × $0.306M = [$2.40M, $4.10M]

Adding interest rates (multiple regression): r improved from 0.901 to 0.96.

Statistics

Key Probability Distributions

DistributionTypeKey FormulaWhen
BinomialDiscretep(x) = C(n,x)·px·qn−x; μ=np, σ=√(npq)n fixed trials, 2 outcomes, independent
PoissonDiscretep(x) = e−λ·λx/x!; μ=σ²=λEvents per interval (time/area)
NormalContinuousz = (x−μ)/σ; use z-tableBell-shaped, symmetric. Basis of CLT
ExponentialContinuousP(x>a) = e−a/μTime between events

Central Limit Theorem: For n ≥ 30, x̄ is approximately normal with μ = μ and σ = σ/√n.

Normal approx. to Binomial: When np ≥ 5 and nq ≥ 5. Use continuity correction ±0.5.

Quick Reference

Formula Cheat Sheet

MethodFormulaConstants
Exp. SmoothingFt+1 = αAt + (1−α)Ftα
Holt'sFt=αAt-1+(1−α)(Ft-1+Tt-1); Tt=β(Ft−Ft-1)+(1−β)Tt-1α, β
Winters' (Mult)L=α(Y/St-m)+(1−α)(L+B); Forecast=(L+Bh)×Sα, γ, δ
ARIMA(p,d,q)AR: uses past X values; MA: uses past errors; I: differencingp, d, q
ACF/PACFAR→PACF cuts off; MA→ACF cuts off; ARMA→both decay
Error metricsMAD=Σ|e|/n; MSE=Σe²/n; MAPE=Σ(|e|/A×100)/n
Method Selection

Which Method When?

SituationUseWhy
Stable, no trend/season, limited dataExp. SmoothingSimple, dampens noise
Level shifts unpredictablyAdaptive SmoothingAuto-adjusts α
Clear trend, no seasonHolt'sCaptures trend with β
Trend + seasonalityWinters'Full 3-parameter model
Long history (48+), complex patternsARIMADiscovers optimal model
External factors drive demandRegressionUses exogenous variables
Uncertain/linguistic/limited dataFuzzy TSHandles imprecision
Test Yourself

Self-Test Quiz — 12 Questions

Q1: In exponential smoothing, α = 0.9 means the forecast puts what percentage of weight on the most recent actual value?

F = α×A + (1−α)×F. With α=0.9: 90% weight on last actual, 10% on previous forecast. Very reactive — almost a naive forecast.

Q2: For Winters' multiplicative method, seasonal factors must:

Multiplicative factors are ratios (1.15 = 15% above average). They must average to 1.0. Additive factors (not ratios) must sum to 0.

Q3: ACF decays slowly, PACF cuts off after lag 2. The model is:

PACF cuts off → AR model. Cuts off at lag 2 → AR(2). ACF decaying confirms it. Remember: AR = PACF cuts off.

Q4: ACF cuts off after lag 1 (lag-1 autocorrelation is negative). PACF decays gradually. The model is:

ACF cuts off → MA model. Cuts off at lag 1 → MA(1). Negative lag-1 autocorrelation is an extra hint toward MA. PACF decaying confirms it.

Q5: The Ljung-Box test gives p-value = 0.43. This means:

H₀: residuals are random. p-value 0.43 > 0.05 → fail to reject H₀ → residuals appear random → model is adequate! We WANT a high p-value here.

Q6: In Chen's fuzzy TS, FLRG is A₃ → A₂, A₃, A₅. Midpoints: m₂=14500, m₃=15500, m₅=17500. The forecast is:

One-to-many: average ALL midpoints on the right side. (14500+15500+17500)/3 = 15,833. No weighting — simple average in basic Chen.

Q7: Using the variations approach in fuzzy TS addresses which problem?

Computing V(t) = Y(t) − Y(t−1) removes the trend, allowing FTS to focus on patterns of change rather than absolute values. This is Improvement #4.

Q8: ARIMA(0,1,1) means:

ARIMA(p=0, d=1, q=1): zero AR terms, one differencing, one MA term. So it's an MA(1) on the first-differenced series.

Q9: R gives ARIMA(1,0,0) with ar1=0.39, intercept=36.09. The TRUE intercept β₀ is:

R reports the MEAN as "intercept." For AR models, true intercept = mean × (1 − Σφ). Here: 36.09 × (1−0.39) = 22.01. This is a trap your lecture specifically warned about!

Q10: A 12-month MA on monthly data with 12-month seasonality will:

A moving average removes any pattern whose length equals the MA period. A 12-month MA kills a 12-month seasonal pattern. This is used intentionally in decomposition, but dangerous if you actually need the seasonal forecast.

Q11: Which model should you compare if ARMA(1,1) seems to fit?

AR and MA terms can cancel each other's effects. If ARMA(1,1) fits, try models with one fewer AR or MA term. Simpler models are preferred (parsimony principle). Compare with AIC.

Q12: Increasing fuzzy TS order from 1 to 20 on a dataset with 22 years of data will likely:

With 22 years, you'd have only ~2 examples of any 20-length FLR. Not enough data to establish reliable patterns. Moderate increases (1→2, 1→3) help; extreme orders overfit.
Strategy

Exam Strategy

🎯 Priority Topics

HIGH Fuzzy TS — Chen's method + variations approach (your prof tested this specifically)
HIGH ARIMA — ACF/PACF identification, Box-Jenkins process, reading R output
HIGH Exponential smoothing calculations (with & without trend)
HIGH Winters' multiplicative initialization & forecasting
MED Error metrics (MAD, MSE, MAPE) calculation
MED Regression, decomposition, probability distributions

🧠 Common Traps

• Additive seasonal factors sum to 0 ≠ Multiplicative factors average to 1
• Winters' multiplicative forecast: (L + B×h) × S (multiply!), not + S
• AR → PACF cuts off (not ACF!); MA → ACF cuts off
• R's "intercept" = mean for AR models. True intercept = mean × (1−Σφ)
• Ljung-Box: HIGH p-value = GOOD (residuals are random = model works)
• Lower AIC = better model
• Fuzzy fuzzification: assign to HIGHEST membership set
• Fuzzy defuzzification (one-to-many): simple average of midpoints, no weighting
• For variations approach: forecast = last actual + predicted variation

✅ Night-Before Checklist

□ Can I do exponential smoothing by hand, step by step?
□ Can I do Holt's method (F, T, FIT) for 2-3 periods?
□ Can I initialize Winters' multiplicative (regression → detrend → seasonal factors)?
□ Can I walk through Chen's fuzzy 6 steps completely?
□ Can I do the VARIATIONS approach from scratch? (the test problem)
□ Can I identify AR/MA/ARMA from an ACF/PACF description?
□ Can I interpret R output for ARIMA and convert mean→intercept?
□ Do I know the Box-Jenkins 5-step process?
□ Can I compute MAD, MSE, MAPE for a dataset?
□ Can I explain what Ljung-Box tests and what "p > 0.05" means?

Good luck tomorrow! You've done the work. Trust the preparation. 💪