Time Series & Forecasting — Full Exam Study Guide

Start Here

Course Map

Forecasting Methods ├── Qualitative (Delphi, Jury of Executives, Sales Force, Market Survey) └── Quantitative ├── Time Series (Endogenous — uses ONLY past data patterns) │ ├── FMTS (Fixed-Model) — assumes components, simple formulas │ │ ├── Average / Naive │ │ ├── Moving Average (Simple & Weighted) │ │ ├── Exponential Smoothing (α) │ │ ├── Adaptive Smoothing (auto-tuning α) │ │ ├── Holt's Method — Exp Smoothing + Trend (α, β) │ │ └── Winters' Method — Exp Smoothing + Trend + Season (α, β, γ) │ │ ├── Additive (seasonal added) │ │ └── Multiplicative (seasonal multiplied) │ ├── OMTS (Open-Model) — analyzes data first, then builds model │ │ ├── Decomposition Analysis │ │ └── ARIMA / Box-Jenkins (AR, MA, ARMA, ARIMA) │ └── Soft Computing │ ├── Fuzzy Time Series (Chen's method + improvements) │ └── ML/DL approaches (CNN, RNN, Neuro-Fuzzy) └── Causal / Associative └── Regression (Simple Linear, Multiple)

Endogenous vs Exogenous: Time series methods are endogenous — they only look at the data's own past patterns. Regression is exogenous — it uses outside factors (price, economy, etc.) to explain the data. This is a fundamental distinction the professor draws.

Foundation

Time Series Components

Every time series is made up of four components. All forecasting techniques try to identify and project some or all of these:

📊 Level

The horizontal baseline — what sales would be if there were no trend, seasonality, or noise
Think of it as the "average base" of the series
Example: a component whose demand is always ~1000/month has a level of 1000

📈 Trend

A continuing pattern of increase or decrease over time
Can be linear (straight line) or curved
Caused by population growth, technology changes, culture shifts
For exponential smoothing, think of trend as a "step function" — level steps up/down each period

🔄 Seasonality

Repeating up-down pattern within ≤ 1 year
Examples: air conditioners peak in summer, toys peak in fall
Patterns longer than 1 year are called "cycles"
Key: the pattern repeats itself every year

🎲 Noise (Random)

Random fluctuation that TS techniques CANNOT explain
If noise looks non-random, your model is missing something!
A good test: if residuals don't look random, there's still pattern left

Additive vs. Multiplicative Seasonality

This comes up in Winters' method and decomposition. You must know the difference:

Additive: Y = Level + Seasonal + Error

Seasonal swings stay the SAME size regardless of level
Peaks and valleys have constant amplitude
Seasonal factors sum to ZERO
Forecast: (L + B×h) + S

Multiplicative: Y = Level × Seasonal + Error

Seasonal swings GROW as the level grows
Peaks and valleys get wider over time
Seasonal factors average to 1.0
Forecast: (L + B×h) × S

If you see a plot where the "wave" gets bigger as the series goes up → multiplicative. If the wave stays the same height → additive. Most real-world business data is multiplicative.

Foundation

Forecast Error Metrics — Fully Explained

Error for any period: e_t = Actual_t − Forecast_t

MAD — Mean Absolute Deviation

MAD = Σ|e_t| / n = (|e₁| + |e₂| + ... + |eₙ|) / n

Take each error, make it positive (absolute value), average them. Tells you the average size of your mistakes in the same units as your data. Easy to understand: "on average, we're off by X units."

MSE — Mean Squared Error

MSE = Σ(e_t²) / n = (e₁² + e₂² + ... + eₙ²) / n

Square each error, then average. Big errors get punished much more than small ones (because squaring amplifies large numbers). Good for comparing models when you want to penalize big misses.

MAPE — Mean Absolute Percentage Error

MAPE = [Σ(|e_t| / A_t) × 100] / n

Turns each error into a percentage of the actual value, then averages. Great for comparing accuracy across products with different scales. "On average, we're off by X%."

Worked Example — Port of Baltimore (from lecture)

α = 0.10, initial forecast = 175. Actual data: 180, 168, 159, 175, 190, 205, 180, 182

Qtr	Actual	Forecast (α=.10)	Error	\|Error\|	Error²	\|Error\|/Actual
1	180	175.00	5.00	5.00	25.00	2.78%
2	168	175.50	−7.50	7.50	56.25	4.46%
3	159	174.75	−15.75	15.75	248.06	9.91%
4	175	173.18	1.82	1.82	3.33	1.04%
5	190	173.36	16.64	16.64	276.89	8.76%
6	205	175.02	29.98	29.98	898.80	14.62%
7	180	178.02	1.98	1.98	3.92	1.10%
8	182	178.22	3.78	3.78	14.29	2.08%
TOTALS				82.45	1526.54	44.74%

MAD = 82.45 / 8 = 10.31 MSE = 1526.54 / 8 = 190.82 MAPE = 44.74% / 8 = 5.59%

How was F₂ calculated? F₂ = 0.10 × 180 + 0.90 × 175 = 18 + 157.5 = 175.50

How was F₃ calculated? F₃ = 0.10 × 168 + 0.90 × 175.50 = 16.8 + 157.95 = 174.75

FMTS

Naive & Moving Averages

Naive Forecast

F_t+1 = A_t

"Tomorrow will be the same as today." Free, simple, surprisingly hard to beat sometimes. Used as a baseline benchmark.

Simple Moving Average (SMA)

F_t+1 = (A_t + A_t-1 + ... + A_t-N+1) / N

Worked Example — 3-month SMA

Data: Jan=12, Feb=13, Mar=16, Apr=19, May=23

F_Apr = (12 + 13 + 16) / 3 = 41/3 = 13.67

F_May = (13 + 16 + 19) / 3 = 48/3 = 16.00

F_Jun = (16 + 19 + 23) / 3 = 58/3 = 19.33

Notice: each new forecast "drops" the oldest value and "adds" the newest.

Weighted Moving Average

F_t+1 = (w₁·A_t + w₂·A_t-1 + w₃·A_t-2) / (w₁+w₂+w₃)

Example with weights 3, 2, 1: F = (3×16 + 2×13 + 1×12) / 6 = (48+26+12)/6 = 86/6 = 14.33

Visual Intuition

Raw Series vs Smoothers

This plot uses the exact worked-example numbers: actuals for Jan-May, then SMA and WMA forecasts for Apr-Jun.

Actual 3-period SMA 3,2,1 WMA

Exam takeaway: the smoother the line, the more it lags when the series is trending.

Decision Cue

What Bigger N Does

Increasing the window lowers noise but delays the forecast’s response to real movement.

Actual Smaller N Larger N

Exam takeaway: if the series is trending hard, a large moving average can look “clean” while still forecasting too low or too high.

The big trade-off with N: More periods → smoother but slower to react (lags behind trends). Fewer periods → reactive but noisy. A 12-month MA on monthly data completely destroys the seasonal pattern!

FMTS

Exponential Smoothing — Full Explanation

Think of exponential smoothing as a "smart moving average" that puts more weight on recent data and less on older data, with the weights decaying exponentially into the past. You only need the last forecast and last actual — no need to store all history!

F_t+1 = α × A_t + (1 − α) × F_t Same thing written differently: F_t+1 = F_t + α × (A_t − F_t) = Old Forecast + α × (Error)

What does α control? It's the "reactivity dial" between 0 and 1:

Low α (0.05 – 0.2)

Forecast is mostly old forecast (heavy smoothing)
Slow to react to changes
Great when data is noisy but stable
Example: α=0.1 means 10% new data, 90% old forecast

High α (0.5 – 0.9)

Forecast is mostly last actual value
Reacts quickly to changes
Good when level is shifting
But overreacts to noise!
At α=1.0, it becomes a naive forecast

How weights actually decay

With α=0.1, the weight on each past period's sales is:

Period t: α = 0.10 (10.0% weight) Period t-1: α(1-α) = 0.09 (9.0%) Period t-2: α(1-α)² = 0.081 (8.1%) Period t-3: α(1-α)³ = 0.073 (7.3%) ...and so on, decaying exponentially forever

Exponential smoothing assumes NO trend and NO seasonality. The forecast for ALL future periods is the same flat number. If your data has trend or seasonality, you need Holt's or Winters' method.

FMTS

Holt's Method (Exponential Smoothing with Trend)

Adds a trend component to capture upward/downward movement. Uses two constants: α (level) and β (trend).

Step 1 — Smooth the level: F_t = α × A_t-1 + (1-α) × (F_t-1 + T_t-1) Step 2 — Smooth the trend: T_t = β × (F_t − F_t-1) + (1-β) × T_t-1 Step 3 — Combined forecast: FIT_t = F_t + T_t For h periods ahead: FIT_t+h = F_t + h × T_t

Visual Intuition

Holt Tracks the Slope

The smoothed level absorbs noise, then Holt projects forward using the current trend estimate instead of staying flat.

Actual Level Forecast

Exam takeaway: Holt is still smoothing, but it refuses to stay flat because it carries an explicit trend estimate.

Worked Example — Portland Manufacturer (from lecture)

Given: F₁ = 11, T₁ = 2, α = 0.2, β = 0.4. Actual data: A₁=12, A₂=17, ...

Computing Month 2:

F₂ = α × A₁ + (1-α) × (F₁ + T₁) = 0.2 × 12 + 0.8 × (11 + 2) = 2.4 + 10.4 = 12.80

T₂ = β × (F₂ − F₁) + (1-β) × T₁ = 0.4 × (12.8 − 11) + 0.6 × 2 = 0.72 + 1.2 = 1.92

FIT₂ = F₂ + T₂ = 12.8 + 1.92 = 14.72

Computing Month 3 (A₂=17):

F₃ = 0.2 × 17 + 0.8 × (12.8 + 1.92) = 3.4 + 11.776 = 15.18

T₃ = 0.4 × (15.18 − 12.80) + 0.6 × 1.92 = 0.952 + 1.152 = 2.10

FIT₃ = 15.18 + 2.10 = 17.28

FMTS — Important

Winters' Method (Trend + Seasonality)

Multiplicative Formulation

Level: L_t = α × (Y_t / S_t-m) + (1-α) × (L_t-1 + B_t-1) Trend: B_t = γ × (L_t − L_t-1) + (1-γ) × B_t-1 Seasonal: S_t = δ × (Y_t / L_t) + (1-δ) × S_t-m Forecast h steps ahead: Ŷ_t+h = (L_t + B_t × h) × S_t+h-m

What each piece does in the Level formula:
• Y_t / S_t-m = "de-seasonalize" this period's actual (divide out last year's seasonal factor) → gives you an estimate of the level without seasonal noise
• L_t-1 + B_t-1 = last period's level + trend → the "expected level" from the previous model
• α blends these two estimates together

Seasonality Shape

Additive vs Multiplicative

Both have trend and repeating waves. The difference is whether the wave stays constant or grows with the level.

Additive seasonality Multiplicative seasonality

Exam takeaway: if the seasonal peaks get larger as the series rises, think multiplicative.

Forecast Logic

De-seasonalize, Then Reapply Season

Winters first strips the seasonal effect out, updates level and trend, then puts seasonality back onto the future forecast.

Exam takeaway: multiplicative Winters is “trend forecast first, seasonal multiplier last.”

Initialization (How to Get Started)

Winters' needs initial values for L₀, B₀, and all m seasonal factors. The procedure from your lecture (Kuliah 6):

Full Initialization Procedure (Multiplicative)

Fit a least-squares regression to the first few years of data (at least 4-5 years for quarterly data). The y-intercept = L₀ (initial level), the slope = B₀ (initial trend).

Compute fitted values ŷ_t = L₀ + B₀ × t for each period t = 1, 2, ..., n used in regression.

Detrend the data: Compute S₀_t = Y_t / ŷ_t for each period. This removes the trend and gives you raw seasonal ratios.

Average seasonal values for each season across years. For quarterly data: average all Q1 ratios, all Q2 ratios, all Q3 ratios, all Q4 ratios.
Example: S̄[Q1] = (0.7368 + 0.7156 + 0.6894 + 0.6831) / 4 = 0.7062

Normalize so seasonal factors average to 1.0:
CF = m / Σ(averages) where m = number of seasons (4 for quarterly)
Multiply each average by CF.
Example: if averages sum to 3.9999, CF = 4/3.9999 ≈ 1.0000
Initial seasonal factors: S₋₃=0.7062, S₋₂=1.1114, S₋₁=1.2937, S₀=0.8886

The normalization constraint:
• Multiplicative: seasonal factors must average to 1.0 (equivalently, sum to m)
• Additive: seasonal factors must sum to 0
A factor of 1.15 means that season is 15% above the average. A factor of 0.85 means 15% below.

For the multiplicative forecast formula (L + B×h) × S, remember it's × (times), not + (plus). The additive version uses +. This distinction has appeared on exams.

ARIMA Deep Dive

What is ARIMA? — The Big Picture

ARIMA stands for AutoRegressive Integrated Moving Average. It's the most powerful classical time series method. While FMTS techniques assume which patterns exist (and use fixed formulas), ARIMA first analyzes the data to discover what patterns are there, then builds a custom model.

Think of ARIMA as being built from three ingredients you mix together:

AR (AutoRegressive): Today's value depends on yesterday's value (and maybe the day before, etc.)
I (Integrated): We may need to "difference" the data to make it stationary first
MA (Moving Average): Today's value depends on yesterday's forecast error (and maybe earlier errors)

ARIMA(p, d, q) → p = AR order, d = differencing order, q = MA order

Why do we need this? Sometimes a time series has complex dependency patterns that simple exponential smoothing can't capture. Maybe today's sales depend on sales from 2 weeks ago, or maybe they depend on how wrong our recent forecasts were. ARIMA can model these patterns.

ARIMA

AR (AutoRegressive) Models — Explained from Scratch

The Idea

In an AR model, the current value X_t is predicted using past values of itself. It's literally a regression where the predictors are the variable's own lagged values.

AR(1) — First Order

X_t = β₀ + β₁·X_t-1 + e_t

This says: "Today's value = some constant + some fraction of yesterday's value + random shock."

If β₁ = 0: There's no temporal dependence. X_t is just random noise around β₀.

If β₁ is large (close to 1): Yesterday's value strongly influences today. The series has "memory."

If β₁ > 1: The series is explosive (grows without bound) — not stationary!

Real-world analogy: Think of temperature. Today's temperature is strongly related to yesterday's (high AR). But tomorrow's stock price might barely relate to today's (low AR). An AR model captures exactly this kind of "yesterday influences today" relationship.

AR(2) — Second Order

X_t = β₁·X_t-1 + β₂·X_t-2 + e_t

Now today depends on the last TWO values. More complex patterns can be captured — like oscillating behavior.

AR(p) — General

X_t = β₁·X_t-1 + β₂·X_t-2 + ... + β_p·X_t-p + e_t

Today depends on the last p values. p is the "order" of the AR model.

How to tell if you have an AR process

AR Signature:
• PACF: sharp cutoff after lag p (only first p lags are significant, rest drop to zero)
• ACF: gradual decay (slowly decreasing, possibly oscillating, but many lags significant)

The PACF tells you the order: if PACF cuts off after lag 2 → AR(2).

If model works well, the residuals (what's left after fitting) should look random — no patterns, no dependence. If residuals still show patterns, the model isn't capturing everything.

ARIMA

MA (Moving Average) Models — Explained from Scratch

The Idea

In an MA model, the current value depends on past random shocks (errors), NOT past values of itself. It's like saying: "Today's value is the normal level, plus an echo of yesterday's surprise, plus an echo of the surprise before that..."

MA(1) — First Order

X_t = μ + e_t + α₁·e_t-1

μ = the mean level. e_t = today's random shock. α₁·e_t-1 = an echo of yesterday's shock.

If α₁ = 0: no temporal dependence, just pure random noise around the mean.

If α₁ is large: past shocks strongly influence current value.

Real-world analogy: Imagine a lake. You throw a stone (shock) and it creates ripples. The MA model says today's water level is the normal level plus the ripple from today's stone plus the lingering ripple from yesterday's stone. An MA(2) would include ripples from two days ago too.

MA(2) — Second Order

X_t = μ + e_t + α₁·e_t-1 + α₂·e_t-2

How to tell if you have an MA process

MA Signature:
• ACF: sharp cutoff after lag q (only first q lags are significant)
• PACF: gradual decay (slowly decreasing)

The ACF tells you the order: if ACF cuts off after lag 1 → MA(1).
Extra hint: If the lag-1 autocorrelation is NEGATIVE, strongly consider an MA term.

ARIMA — Critical for Exam

ACF & PACF — The Complete Identification Guide

What Are These?

ACF (Autocorrelation Function): Measures the correlation between the series and lagged versions of itself. ACF at lag 3 = correlation between X_t and X_t-3. This includes BOTH direct effects and indirect effects that "propagate" through intermediate lags.

PACF (Partial Autocorrelation Function): Measures the correlation between X_t and X_t-k AFTER removing the effects of all intermediate lags. So PACF at lag 3 = the "pure" correlation at lag 3 that isn't explained by lags 1 and 2.

Analogy: Imagine you measure how much a child's height correlates with their grandparent's height. The ACF would show a strong correlation (because height passes through the parent). The PACF would remove the parent's influence first, showing only the "direct" grandparent effect — which is much weaker.

The Master Identification Table

Model	ACF Pattern	PACF Pattern	How to Read It
AR(1)	Decays exponentially from lag 1 (many lags slowly fading)	Cuts off sharply after lag 1 (only lag 1 is significant)	PACF shows 1 spike then nothing → AR(1)
AR(2)	Decays gradually (may oscillate — go positive/negative)	Cuts off after lag 2 (lags 1 and 2 significant, rest zero)	PACF shows 2 spikes then nothing → AR(2)
MA(1)	Cuts off sharply after lag 1 (only lag 1 is significant)	Decays exponentially from lag 1	ACF shows 1 spike then nothing → MA(1)
MA(2)	Cuts off after lag 2 (lags 1 and 2 significant, rest zero)	Decays gradually	ACF shows 2 spikes then nothing → MA(2)
ARMA(1,1)	Both decay gradually	Both decay gradually	Neither cuts off cleanly → try ARMA

The Memory Trick (never forget this!):

AR → PACF cuts off (the one with the different letter cuts off: A≠P)
MA → ACF cuts off (the one with the matching A cuts off: MA↔ACF)

Or even simpler: "AR = Partial, MA = Auto" — each model is identified by the function that cuts off.

AR(1)

ACF

PACF

PACF cuts off first, so this is AR.

AR(2)

ACF

PACF

Two PACF spikes then silence is the AR(2) signature.

MA(1)

ACF

PACF

ACF cuts off first, so this is MA.

MA(2)

ACF

PACF

Two ACF spikes then silence means MA(2).

What "Cuts Off" vs "Decays" Looks Like

Sharp Cutoff

Lag 1: ████████ 0.65 (BIG) Lag 2: █ 0.05 (tiny) Lag 3: ░ 0.02 (tiny) Lag 4: ░ -0.01 (tiny) ...all remaining near zero

One or two significant spikes, then everything drops into the "not significant" band (dashed lines on the plot).

Gradual Decay

Lag 1: ████████ 0.65 (big) Lag 2: ██████ 0.42 (still big) Lag 3: ████ 0.27 (smaller) Lag 4: ███ 0.18 (smaller) Lag 5: ██ 0.12 (fading) ...slowly dying out

Values slowly shrink over many lags. May be all positive (exponential decay) or alternate positive/negative (oscillating decay).

On the exam, you might see an ACF/PACF plot and be asked "What model does this suggest?" Follow these steps:
1. Look at PACF — does it cut off sharply? If yes → AR model, order = lag where it cuts off
2. Look at ACF — does it cut off sharply? If yes → MA model, order = lag where it cuts off
3. Both decay gradually? → Try ARMA. Try the simplest (1,1) first.
4. "Significant" = the bar extends beyond the blue dashed confidence lines on the plot

ARIMA

ARMA & ARIMA Models

ARMA(p,q) — Mixing AR and MA

X_t = β₁X_t-1 + ... + β_pX_t-p + e_t + α₁e_t-1 + ... + α_qe_t-q

Combines autoregressive terms (past values) with moving average terms (past errors). Sometimes an ARMA(1,1) works better than AR(3,0) and is simpler.

Rule of thumb: Simpler models are better! If ARMA(1,1) fits as well as AR(3), use ARMA(1,1). Also, if a mixed ARMA model seems to fit, try dropping one AR and one MA term — they might be canceling each other.

ARIMA(p,d,q) — Adding Differencing for Non-Stationary Data

Problem: AR and MA only work on stationary data (constant mean, constant variance). Real data often has trends → non-stationary.

Solution: Difference the data to remove the trend, then apply ARMA.

First difference: Y'_t = Y_t − Y_t-1 This removes a linear trend. Second difference: Y''_t = Y'_t − Y'_t-1 This removes a quadratic trend. d = how many times you difference.

ARIMA Notation

Notation	p (AR)	d (diff)	q (MA)	English
ARIMA(1,0,0)	1	0	0	Just AR(1) — no differencing, no MA
ARIMA(0,0,1)	0	0	1	Just MA(1) — no differencing, no AR
ARIMA(1,0,1)	1	0	1	ARMA(1,1) — mixed, no differencing
ARIMA(1,1,0)	1	1	0	AR(1) on first-differenced data
ARIMA(0,1,1)	0	1	1	MA(1) on first-differenced data
ARIMA(2,0,0)	2	0	0	AR(2) on original stationary data

ARIMA

The Box-Jenkins Process — Step by Step

This is the systematic approach to building an ARIMA model. It's an iterative cycle:

STEP 1: PLOT THE SERIES ↓ Is it stationary? (constant mean, constant variance) No → Difference it (d=1, maybe d=2) until stationary Yes ↓ STEP 2: MODEL IDENTIFICATION Examine ACF and PACF of the (differenced) series Use the signature table to pick p and q ↓ STEP 3: MODEL ESTIMATION Fit the ARIMA(p,d,q) model (minimize sum of squared errors) ↓ STEP 4: IS THE MODEL ADEQUATE? Check 1: Are residuals random? (plot them, Ljung-Box test) Check 2: Are coefficients significant? (check standard errors) Check 3: Compare AIC/BIC with alternative models (lower = better) │ No → Modify p and/or q, go back to Step 3 Yes ↓ STEP 5: FORECAST Use the estimated model to predict future values Include confidence intervals (±2 × standard error)

Stationarity — When Do You Difference?

A stationary series fluctuates around a constant mean with constant variance. Visually, it looks like "random noise around a flat line."

A non-stationary series has a trending mean or changing variance. If you see the data going up or down over time → non-stationary → difference it.

The Ljung-Box Test

H₀: The residuals are random (white noise) — THIS IS WHAT WE WANT H₁: The residuals are NOT random — model is inadequate If p-value > 0.05 → fail to reject H₀ → residuals are random → MODEL IS GOOD ✓ If p-value < 0.05 → reject H₀ → residuals have patterns → MODEL NEEDS WORK ✗ fitdf parameter = p + q (total AR + MA terms in the model)

For model comparison, lower AIC = better model. In the GNP growth rate example from lecture: ARIMA(1,0,0) had AIC=2261.7, ARIMA(0,0,1) had AIC=2273.4, and ARIMA(2,0,0) had AIC=2255.3. So AR(2) was the best — lowest AIC wins.

ARIMA

Reading R Output — A Trap to Know

Your lecture specifically warned about this:

R calls the mean "intercept" — but it's NOT the true intercept when AR terms are present!

For an AR(1) model: X_t = β₀ + β₁·X_t-1 + e_t
The mean μ and intercept β₀ are related by: μ = β₀ / (1 − β₁)
Or equivalently: β₀ = μ × (1 − β₁)

R gives you μ (labeled as "intercept"). You need to convert if you want the true intercept.

Example from Lecture — GNP Growth Rate

R output for ARIMA(1,0,0):

Coefficients: ar1 intercept 0.390 36.093 ← R calls this "intercept" but it's actually the MEAN (μ) s.e. 0.062 4.268 sigma² = 1513, AIC = 2261.66

To get the true intercept β₀:

β₀ = μ × (1 − β₁) = 36.093 × (1 − 0.39) = 36.093 × 0.61 = 22.017

The actual model equation: X_t = 22.017 + 0.39·X_t-1 + e_t

For the ARIMA(2,0,0) model: β₀ = 36.052 × (1 − (0.3136 + 0.1931)) = 36.052 × 0.4933 = 17.784

Model: X_t = 17.784 + 0.3136·X_t-1 + 0.1931·X_t-2 + e_t

When is mean = intercept? Only when there is NO AR term (p=0). For pure MA models, R's "intercept" IS the true mean/intercept.

ARIMA

ARIMA — Practice Scenarios

Scenario 1: You plot ACF and see: lag 1 = 0.8, lag 2 = 0.6, lag 3 = 0.4, lag 4 = 0.25... (slowly decaying). PACF shows: lag 1 = 0.8, lag 2 = −0.05, lag 3 = 0.02 (sharp cutoff after lag 1). What model?

Scenario 1 ACF

Scenario 1 PACF

Read it fast: decaying ACF + cutoff PACF means AR.

PACF cuts off sharply after lag 1 (only lag 1 significant) → AR signature. ACF decays gradually → confirms AR. Order = 1 (where PACF cuts off). Model: AR(1) = ARIMA(1,0,0).

Scenario 2: ACF shows lag 1 = −0.6, lag 2 = 0.03, lag 3 = −0.01 (cutoff after lag 1). PACF decays: lag 1 = −0.6, lag 2 = −0.3, lag 3 = −0.15... What model?

Scenario 2 ACF

Scenario 2 PACF

Read it fast: cutoff ACF, even if negative, still points to MA.

ACF cuts off sharply after lag 1 → MA signature. Lag-1 autocorrelation is negative (−0.6) → extra confirmation for MA. PACF decays → confirms MA. Model: MA(1) = ARIMA(0,0,1).

Scenario 3: The original series trends upward. After first differencing, the series looks stationary. The differenced series has PACF cutting off at lag 2 and ACF decaying. What model?

We differenced once (d=1). The differenced series shows AR(2) signature (PACF cuts off at 2, ACF decays). So the model is ARIMA(p=2, d=1, q=0).

Scenario 4: Both ACF and PACF decay gradually (neither cuts off cleanly). What should you try?

Both decaying = ARMA signature. Start with ARMA(1,1). Compare AIC values for different (p,q) combinations. Remember: simpler models are better, and AR/MA terms can cancel each other out.

Fuzzy TS Deep Dive

What is Fuzzy Time Series?

The core idea: Classical methods (ARIMA, smoothing) work with precise numbers. But forecasting is inherently uncertain! Fuzzy Time Series says: instead of working with exact values like "15,460 students," let's work with fuzzy categories like "many students" — and build relationships between these categories to forecast.

Traditional (Boolean) logic: a student count is either in the range [15000, 16000] or it's not. It's 0 or 1.

Fuzzy logic: a student count of 15,900 might be "mostly in [15000, 16000]" (membership 0.9) and "slightly in [16000, 17000]" (membership 0.1). It can belong to multiple sets with different degrees.

Why Fuzzy for Time Series?

Statistical methods like ARIMA are powerful but they're "certain" — they give you a precise number. Fuzzy TS embraces uncertainty and works well when data is limited, imprecise, or when you want an interpretable model. It was first proposed by Song and Chissom in 1993 and improved significantly by Chen in 1996.

Fuzzy TS — Critical

Chen's 6-Step Method — Fully Explained

Step 1: Define the Universe of Discourse & Partition It

First, find the range of your data and add some padding:

U = [D_min − D₁, D_max + D₂] D₁ and D₂ are arbitrary buffer values to ensure all data fits comfortably. Then divide U into n EQUAL-length intervals: u₁, u₂, ..., uₙ

The number of intervals (n) and the buffer values (D₁, D₂) are your choices. They significantly affect accuracy.

Step 2: Define Fuzzy Sets on the Universe of Discourse

Each interval gets a fuzzy set A₁, A₂, ..., Aₙ. Each set has a triangular membership function:

A₁ = 1/u₁ + 0.5/u₂ + 0/u₃ + 0/u₄ + ... A₂ = 0.5/u₁ + 1/u₂ + 0.5/u₃ + 0/u₄ + ... A₃ = 0/u₁ + 0.5/u₂ + 1/u₃ + 0.5/u₄ + ... ... Format: membership_degree/interval A_i has membership 1 in u_i, 0.5 in adjacent intervals, 0 elsewhere.

These can be given linguistic labels: A₁ = "not many", A₂ = "not too many", A₃ = "many", etc.

Step 3: Fuzzify Historical Data

For each historical data point, find which interval it falls in, and assign it to the fuzzy set with the highest membership degree (which is the set whose interval contains the value).

Example: if enrollment is 15,460 and u₃ = [15000, 16000] Then 15,460 ∈ u₃, so it gets fuzzified as A₃.

Step 4: Identify Fuzzy Logical Relationships (FLR)

Look at consecutive time periods. If F(t−1) is fuzzified as A_j and F(t) is fuzzified as A_k, then:

FLR: A_j → A_k This is a FIRST-ORDER relationship (depends on only 1 previous period). "If last year was A_j, then this year is A_k."

Step 5: Establish Fuzzy Logical Relationship Groups (FLRG)

Group all FLRs that have the same left-hand side. Merge the right-hand sides (remove duplicates):

If you have: A₃ → A₂, A₃ → A₃, A₃ → A₄, A₃ → A₃ Group becomes: A₃ → A₂, A₃, A₄ (duplicates removed)

Step 6: Defuzzify the Forecasted Output

To forecast F(t), check which fuzzy set F(t−1) belongs to, look up its FLRG, then:

Case 1: Empty group

If A_j has no relationships (never appeared as a left-hand side), forecast = midpoint of u_j

Case 2: One-to-one (A_j → A_k)

Only one target. Forecast = midpoint of u_k

Case 3: One-to-many (A_j → A_a, A_b, A_c)

Forecast = average of midpoints of u_a, u_b, u_c

Example: A₃ → A₂, A₃, A₅. Midpoints: m₂=14500, m₃=15500, m₅=17500.
Forecast = (14500 + 15500 + 17500) / 3 = 47500 / 3 = 15,833

Fuzzy TS

Full Worked Example — Alabama Enrollments (Chen's Method)

Complete Solution — Using ACTUAL VALUES

Data: Enrollments 1971–1992 at the University of Alabama

Year	Enroll	Year	Enroll	Year	Enroll
1971	13055	1979	16807	1987	16859
1972	13563	1980	16919	1988	18150
1973	13867	1981	16388	1989	18970
1974	14696	1982	15433	1990	19328
1975	15460	1983	15497	1991	19337
1976	15311	1984	15145	1992	18876
1977	15603	1985	15163
1978	15861	1986	15984

Step 1: Universe of Discourse

Min = 13055, Max = 19337. Let D₁ = 55, D₂ = 663 → U = [13000, 20000]

Divide into 7 equal intervals (length = 1000 each):

u₁ = [13000, 14000] midpoint = 13500 u₂ = [14000, 15000] midpoint = 14500 u₃ = [15000, 16000] midpoint = 15500 u₄ = [16000, 17000] midpoint = 16500 u₅ = [17000, 18000] midpoint = 17500 u₆ = [18000, 19000] midpoint = 18500 u₇ = [19000, 20000] midpoint = 19500

Step 2: Define Fuzzy Sets

A₁ = 1/u₁ + 0.5/u₂ + 0/u₃ + 0/u₄ + 0/u₅ + 0/u₆ + 0/u₇ A₂ = 0.5/u₁ + 1/u₂ + 0.5/u₃ + 0/u₄ + 0/u₅ + 0/u₆ + 0/u₇ A₃ = 0/u₁ + 0.5/u₂ + 1/u₃ + 0.5/u₄ + 0/u₅ + 0/u₆ + 0/u₇ A₄ = 0/u₁ + 0/u₂ + 0.5/u₃ + 1/u₄ + 0.5/u₅ + 0/u₆ + 0/u₇ A₅ = 0/u₁ + 0/u₂ + 0/u₃ + 0.5/u₄ + 1/u₅ + 0.5/u₆ + 0/u₇ A₆ = 0/u₁ + 0/u₂ + 0/u₃ + 0/u₄ + 0.5/u₅ + 1/u₆ + 0.5/u₇ A₇ = 0/u₁ + 0/u₂ + 0/u₃ + 0/u₄ + 0/u₅ + 0.5/u₆ + 1/u₇

Step 3: Fuzzify Every Data Point

Year	Enroll	Interval	Fuzzified	Year	Enroll	Interval	Fuzzified
1971	13055	u₁	A₁	1982	15433	u₃	A₃
1972	13563	u₁	A₁	1983	15497	u₃	A₃
1973	13867	u₁	A₁	1984	15145	u₃	A₃
1974	14696	u₂	A₂	1985	15163	u₃	A₃
1975	15460	u₃	A₃	1986	15984	u₃	A₃
1976	15311	u₃	A₃	1987	16859	u₄	A₄
1977	15603	u₃	A₃	1988	18150	u₆	A₆
1978	15861	u₃	A₃	1989	18970	u₆	A₆
1979	16807	u₄	A₄	1990	19328	u₇	A₇
1980	16919	u₄	A₄	1991	19337	u₇	A₇
1981	16388	u₄	A₄	1992	18876	u₆	A₆

Step 4: Build FLRs (consecutive pairs)

1971→1972: A₁ → A₁ 1977→1978: A₃ → A₃ 1983→1984: A₃ → A₃ 1972→1973: A₁ → A₁ 1978→1979: A₃ → A₄ 1984→1985: A₃ → A₃ 1973→1974: A₁ → A₂ 1979→1980: A₄ → A₄ 1985→1986: A₃ → A₃ 1974→1975: A₂ → A₃ 1980→1981: A₄ → A₄ 1986→1987: A₃ → A₄ 1975→1976: A₃ → A₃ 1981→1982: A₄ → A₃ 1987→1988: A₄ → A₆ 1976→1977: A₃ → A₃ 1982→1983: A₃ → A₃ 1988→1989: A₆ → A₆ 1989→1990: A₆ → A₇ 1990→1991: A₇ → A₇ 1991→1992: A₇ → A₆

Step 5: Build FLRGs

A₁ → A₁, A₂ A₂ → A₃ A₃ → A₃, A₄ A₄ → A₃, A₄, A₆ A₆ → A₆, A₇ A₇ → A₆, A₇

Step 6: Forecast (example: forecast 1975)

1974 is fuzzified as A₂. FLRG: A₂ → A₃ (one-to-one)

Forecast = midpoint of u₃ = 15,500 (actual: 15,460 — very close!)

Forecast 1976: 1975 is A₃. FLRG: A₃ → A₃, A₄ (one-to-many)

Forecast = (midpoint u₃ + midpoint u₄) / 2 = (15500 + 16500) / 2 = 16,000 (actual: 15,311)

Fuzzy TS

The Variations Approach — Handling Trend

The Problem: Chen's original method works on actual values. But if there's a strong upward trend, the values keep climbing and the FLRs become less useful (everything just goes up). The forecasts tend to lag behind.

The Fix: Instead of fuzzifying actual values, compute the year-to-year changes (variations) first: V(t) = Y(t) − Y(t−1). Then apply Chen's method to the variation series. This removes the trend from the analysis!

To get the final forecast: The defuzzified output gives you a predicted CHANGE. Add this change to the last known actual value: Forecast(t) = Actual(t−1) + Predicted Change.

Why It Works

Trend Before vs After Variations

This version mirrors the Alabama example: rising enrollments through 1975, then a negative variation in 1976.

Exam takeaway: fuzzy variations model the changes, not the climbing level, so the rule base stops getting dragged upward.

Forecast Rule

Predict Change, Then Add It Back

The fuzzy model outputs a variation first. Converting that to the final enrollment forecast is a second step.

Exam takeaway: the defuzzified result is not the final enrollment itself until you add it back to the latest actual value.

From Your Professor's Actual Test!

Fuzzy Exam Problem — Fully Solved

📝 The Exact Problem from Your In-Class Test

"Use variations of historical enrollment data. Let U = [−1000, 1400] be the universe of discourse. Partition into 4 equal intervals to produce 4 fuzzy sets. Find first-order FLRGs. Forecast enrollment for 1975 and 1976."

Complete Step-by-Step Solution

Step 0: Compute ALL Variations

Year	Enrollment	Variation V(t)	Year	Enrollment	Variation V(t)
1971	13055	—	1982	15433	15433−16388 = −955
1972	13563	13563−13055 = 508	1983	15497	15497−15433 = 64
1973	13867	13867−13563 = 304	1984	15145	15145−15497 = −352
1974	14696	14696−13867 = 829	1985	15163	15163−15145 = 18
1975	15460	15460−14696 = 764	1986	15984	15984−15163 = 821
1976	15311	15311−15460 = −149	1987	16859	16859−15984 = 875
1977	15603	15603−15311 = 292	1988	18150	18150−16859 = 1291
1978	15861	15861−15603 = 258	1989	18970	18970−18150 = 820
1979	16807	16807−15861 = 946	1990	19328	19328−18970 = 358
1980	16919	16919−16807 = 112	1991	19337	19337−19328 = 9
1981	16388	16388−16919 = −531	1992	18876	18876−19337 = −461

Step 1: Universe of Discourse (given)

U = [−1000, 1400], divide into 4 equal intervals.

Range = 1400 − (−1000) = 2400. Interval length = 2400 / 4 = 600

u₁ = [−1000, −400] midpoint = −700 u₂ = [−400, 200] midpoint = −100 u₃ = [200, 800] midpoint = 500 u₄ = [800, 1400] midpoint = 1100

Step 2: Define 4 Fuzzy Sets

A₁ = 1/u₁ + 0.5/u₂ + 0/u₃ + 0/u₄ A₂ = 0.5/u₁ + 1/u₂ + 0.5/u₃ + 0/u₄ A₃ = 0/u₁ + 0.5/u₂ + 1/u₃ + 0.5/u₄ A₄ = 0/u₁ + 0/u₂ + 0.5/u₃ + 1/u₄

Step 3: Fuzzify Each Variation

Year	V(t)	Falls in	Fuzzified	Year	V(t)	Falls in	Fuzzified
1972	508	u₃ [200,800]	A₃	1983	64	u₂ [−400,200]	A₂
1973	304	u₃	A₃	1984	−352	u₂	A₂
1974	829	u₄ [800,1400]	A₄	1985	18	u₂	A₂
1975	764	u₃	A₃	1986	821	u₄	A₄
1976	−149	u₂	A₂	1987	875	u₄	A₄
1977	292	u₃	A₃	1988	1291	u₄	A₄
1978	258	u₃	A₃	1989	820	u₄	A₄
1979	946	u₄	A₄	1990	358	u₃	A₃
1980	112	u₂	A₂	1991	9	u₂	A₂
1981	−531	u₁ [−1000,−400]	A₁	1992	−461	u₁	A₁
1982	−955	u₁	A₁

Step 4: Build ALL FLRs

V(1972)→V(1973): A₃ → A₃ V(1978)→V(1979): A₃ → A₄ V(1984)→V(1985): A₂ → A₂ V(1973)→V(1974): A₃ → A₄ V(1979)→V(1980): A₄ → A₂ V(1985)→V(1986): A₂ → A₄ V(1974)→V(1975): A₄ → A₃ V(1980)→V(1981): A₂ → A₁ V(1986)→V(1987): A₄ → A₄ V(1975)→V(1976): A₃ → A₂ V(1981)→V(1982): A₁ → A₁ V(1987)→V(1988): A₄ → A₄ V(1976)→V(1977): A₂ → A₃ V(1982)→V(1983): A₁ → A₂ V(1988)→V(1989): A₄ → A₄ V(1977)→V(1978): A₃ → A₃ V(1983)→V(1984): A₂ → A₂ V(1989)→V(1990): A₄ → A₃ V(1990)→V(1991): A₃ → A₂ V(1991)→V(1992): A₂ → A₁

Step 5: Group into FLRGs

A₁ → A₁, A₂ A₂ → A₁, A₂, A₃, A₄ A₃ → A₂, A₃, A₄ A₄ → A₂, A₃, A₄

Step 6: Forecast 1975

We need the variation for 1975. The variation for 1974 is V(1974) = 829 → fuzzified as A₄.

FLRG for A₄: A₄ → A₂, A₃, A₄

Predicted variation = average of midpoints = (−100 + 500 + 1100) / 3 = 1500 / 3 = 500

Forecast enrollment 1975 = Actual(1974) + predicted change = 14696 + 500 = 15,196

(Actual 1975 = 15,460. Error = 264)

Forecast 1976

The variation for 1975 is V(1975) = 764 → fuzzified as A₃.

FLRG for A₃: A₃ → A₂, A₃, A₄

Predicted variation = (−100 + 500 + 1100) / 3 = 500

Forecast enrollment 1976 = Actual(1975) + predicted change = 15460 + 500 = 15,960

(Actual 1976 = 15,311. Error = 649)

Question 3 from the test: Effect of Increasing Order

From 1st to 2nd order: Instead of just looking at one previous variation to predict the next, you look at TWO previous variations together. For example, the pattern (A₃, A₄) → A₃ is more specific than just A₄ → ?. This generally improves accuracy because you capture more context. Think of it like: instead of predicting weather based on just today, you use today AND yesterday's weather.

From 1st to 20th order: You'd use 20 previous variations to predict the next. With only 21 data points (1972-1992), you'd have very few examples of any specific 20-length pattern. This likely leads to overfitting — the model memorizes the training data but can't generalize. Most patterns won't repeat, so many FLRGs will be empty or have only one example. There's a sweet spot: higher order helps up to a point, then accuracy plateaus or degrades.

Fuzzy TS

All 4 Improvements to Chen's Method

1. High-Order Models

Problem: First-order only uses F(t−1). Why just one previous value?

Solution: Use multiple previous values. 2nd order: (A_i, A_j) → A_k. 3rd order: (A_i, A_j, A_k) → A_m.

Dynamic approach (Chen et al., 2015): Use LCS/LRS (Longest Common/Repeated Subsequence) algorithm to automatically find the optimal order from the data.

2. Better Universe of Discourse (Poulsen, 2009)

Problem: D₁, D₂ are arbitrary. Equal intervals may not suit the data.

Solution: (1) Sort values ascending. (2) Compute average distance between consecutive sorted values + standard deviation. (3) Remove outliers (distances > 1 std dev from average). (4) Compute revised average distance (ADR). (5) Use U = [D_min − ADR, D_max + ADR]. (6) Use trapezoidal (not triangular) membership functions for smoother transitions.

3. Non-Equal Interval Lengths via Clustering

Problem: Equal-width intervals waste resolution on sparse regions and under-represent dense regions.

Solution (Singh & Samariya, 2015): Use data clustering (e.g., k-means) to determine natural groupings in the data. Dense clusters get narrower intervals (more precision where data concentrates), sparse areas get wider intervals.

4. Trend Handling via Variations (Chen et al., 2015)

Problem: Chen's method on raw values doesn't explicitly handle trend.

Solution: Compute V(t) = Y(t) − Y(t−1) and apply FTS to the variation series. This is the approach from your professor's test! The universe of discourse is defined over variations instead of actual values.

OMTS

Decomposition Analysis

An Open-Model technique that breaks data into Level, Trend, Seasonality, Noise using a centered moving average. Needs 48+ data points.

Steps: 1. Compute CENTERED 12-month MA → removes noise and seasonality → gives Level + Trend 2. Trend = change between consecutive MA values 3. Original − MA = Seasonality + Noise 4. Average within each month across years → isolates Seasonality (removes noise) 5. Forecast = Last Level + (months ahead × Trend) + Seasonal adjustment Example: F(Jan04) = 2584 + (7 × 45) + (−203) = 2696

Decomposition uses additive seasonal adjustments. These are added to (Level + Trend), not multiplied.

Associative

Regression Analysis

ŷ = a + bx where b = (nΣxy − ΣxΣy)/(nΣx² − (Σx)²) a = ȳ − bx̄

r (correlation coefficient): −1 to +1. Strength of linear relationship. Does NOT prove causation.

R² = r²: % of variation in y explained by x. Example: r = 0.901 → R² = 0.81 → 81% explained.

Standard Error S_y,x: Used for prediction intervals: ŷ ± t_α/2 × S_y,x (use t with df = n−2).

Nodel Construction Example

Sales = 1.75 + 0.25 × (payroll). Payroll next year = $6B.

Forecast: Sales = 1.75 + 0.25(6) = $3.25M. S_y,x = $0.306M. With n=6, df=4, t₀.₀₂₅ = 2.78:

95% CI: $3.25M ± 2.78 × $0.306M = [$2.40M, $4.10M]

Adding interest rates (multiple regression): r improved from 0.901 to 0.96.

Statistics

Key Probability Distributions

Distribution	Type	Key Formula	When
Binomial	Discrete	p(x) = C(n,x)·p^x·q^n−x; μ=np, σ=√(npq)	n fixed trials, 2 outcomes, independent
Poisson	Discrete	p(x) = e^−λ·λ^x/x!; μ=σ²=λ	Events per interval (time/area)
Normal	Continuous	z = (x−μ)/σ; use z-table	Bell-shaped, symmetric. Basis of CLT
Exponential	Continuous	P(x>a) = e^−a/μ	Time between events

Central Limit Theorem: For n ≥ 30, x̄ is approximately normal with μ_x̄ = μ and σ_x̄ = σ/√n.

Normal approx. to Binomial: When np ≥ 5 and nq ≥ 5. Use continuity correction ±0.5.

Quick Reference

Formula Cheat Sheet

Method	Formula	Constants
Exp. Smoothing	F_t+1 = αA_t + (1−α)F_t	α
Holt's	F_t=αA_t-1+(1−α)(F_t-1+T_t-1); T_t=β(F_t−F_t-1)+(1−β)T_t-1	α, β
Winters' (Mult)	L=α(Y/S_t-m)+(1−α)(L+B); Forecast=(L+Bh)×S	α, γ, δ
ARIMA(p,d,q)	AR: uses past X values; MA: uses past errors; I: differencing	p, d, q
ACF/PACF	AR→PACF cuts off; MA→ACF cuts off; ARMA→both decay	—
Error metrics	MAD=Σ\|e\|/n; MSE=Σe²/n; MAPE=Σ(\|e\|/A×100)/n	—

Method Selection

Which Method When?

Situation	Use	Why
Stable, no trend/season, limited data	Exp. Smoothing	Simple, dampens noise
Level shifts unpredictably	Adaptive Smoothing	Auto-adjusts α
Clear trend, no season	Holt's	Captures trend with β
Trend + seasonality	Winters'	Full 3-parameter model
Long history (48+), complex patterns	ARIMA	Discovers optimal model
External factors drive demand	Regression	Uses exogenous variables
Uncertain/linguistic/limited data	Fuzzy TS	Handles imprecision

Test Yourself

Self-Test Quiz — 12 Questions

Q1: In exponential smoothing, α = 0.9 means the forecast puts what percentage of weight on the most recent actual value?

F = α×A + (1−α)×F. With α=0.9: 90% weight on last actual, 10% on previous forecast. Very reactive — almost a naive forecast.

Q2: For Winters' multiplicative method, seasonal factors must:

Multiplicative factors are ratios (1.15 = 15% above average). They must average to 1.0. Additive factors (not ratios) must sum to 0.

Q3: ACF decays slowly, PACF cuts off after lag 2. The model is:

PACF cuts off → AR model. Cuts off at lag 2 → AR(2). ACF decaying confirms it. Remember: AR = PACF cuts off.

Q4: ACF cuts off after lag 1 (lag-1 autocorrelation is negative). PACF decays gradually. The model is:

ACF cuts off → MA model. Cuts off at lag 1 → MA(1). Negative lag-1 autocorrelation is an extra hint toward MA. PACF decaying confirms it.

Q5: The Ljung-Box test gives p-value = 0.43. This means:

H₀: residuals are random. p-value 0.43 > 0.05 → fail to reject H₀ → residuals appear random → model is adequate! We WANT a high p-value here.

Q6: In Chen's fuzzy TS, FLRG is A₃ → A₂, A₃, A₅. Midpoints: m₂=14500, m₃=15500, m₅=17500. The forecast is:

One-to-many: average ALL midpoints on the right side. (14500+15500+17500)/3 = 15,833. No weighting — simple average in basic Chen.

Q7: Using the variations approach in fuzzy TS addresses which problem?

Computing V(t) = Y(t) − Y(t−1) removes the trend, allowing FTS to focus on patterns of change rather than absolute values. This is Improvement #4.

Q8: ARIMA(0,1,1) means:

ARIMA(p=0, d=1, q=1): zero AR terms, one differencing, one MA term. So it's an MA(1) on the first-differenced series.

Q9: R gives ARIMA(1,0,0) with ar1=0.39, intercept=36.09. The TRUE intercept β₀ is:

R reports the MEAN as "intercept." For AR models, true intercept = mean × (1 − Σφ). Here: 36.09 × (1−0.39) = 22.01. This is a trap your lecture specifically warned about!

Q10: A 12-month MA on monthly data with 12-month seasonality will:

A moving average removes any pattern whose length equals the MA period. A 12-month MA kills a 12-month seasonal pattern. This is used intentionally in decomposition, but dangerous if you actually need the seasonal forecast.

Q11: Which model should you compare if ARMA(1,1) seems to fit?

AR and MA terms can cancel each other's effects. If ARMA(1,1) fits, try models with one fewer AR or MA term. Simpler models are preferred (parsimony principle). Compare with AIC.

Q12: Increasing fuzzy TS order from 1 to 20 on a dataset with 22 years of data will likely:

With 22 years, you'd have only ~2 examples of any 20-length FLR. Not enough data to establish reliable patterns. Moderate increases (1→2, 1→3) help; extreme orders overfit.

Strategy

Exam Strategy

🎯 Priority Topics

HIGH Fuzzy TS — Chen's method + variations approach (your prof tested this specifically)
HIGH ARIMA — ACF/PACF identification, Box-Jenkins process, reading R output
HIGH Exponential smoothing calculations (with & without trend)
HIGH Winters' multiplicative initialization & forecasting
MED Error metrics (MAD, MSE, MAPE) calculation
MED Regression, decomposition, probability distributions

🧠 Common Traps

• Additive seasonal factors sum to 0 ≠ Multiplicative factors average to 1
• Winters' multiplicative forecast: (L + B×h) × S (multiply!), not + S
• AR → PACF cuts off (not ACF!); MA → ACF cuts off
• R's "intercept" = mean for AR models. True intercept = mean × (1−Σφ)
• Ljung-Box: HIGH p-value = GOOD (residuals are random = model works)
• Lower AIC = better model
• Fuzzy fuzzification: assign to HIGHEST membership set
• Fuzzy defuzzification (one-to-many): simple average of midpoints, no weighting
• For variations approach: forecast = last actual + predicted variation

✅ Night-Before Checklist

□ Can I do exponential smoothing by hand, step by step?
□ Can I do Holt's method (F, T, FIT) for 2-3 periods?
□ Can I initialize Winters' multiplicative (regression → detrend → seasonal factors)?
□ Can I walk through Chen's fuzzy 6 steps completely?
□ Can I do the VARIATIONS approach from scratch? (the test problem)
□ Can I identify AR/MA/ARMA from an ACF/PACF description?
□ Can I interpret R output for ARIMA and convert mean→intercept?
□ Do I know the Box-Jenkins 5-step process?
□ Can I compute MAD, MSE, MAPE for a dataset?
□ Can I explain what Ljung-Box tests and what "p > 0.05" means?

Good luck tomorrow! You've done the work. Trust the preparation. 💪