Course Map
Time Series Components
Every time series is made up of four components. All forecasting techniques try to identify and project some or all of these:
📊 Level
- The horizontal baseline — what sales would be if there were no trend, seasonality, or noise
- Think of it as the "average base" of the series
- Example: a component whose demand is always ~1000/month has a level of 1000
📈 Trend
- A continuing pattern of increase or decrease over time
- Can be linear (straight line) or curved
- Caused by population growth, technology changes, culture shifts
- For exponential smoothing, think of trend as a "step function" — level steps up/down each period
🔄 Seasonality
- Repeating up-down pattern within ≤ 1 year
- Examples: air conditioners peak in summer, toys peak in fall
- Patterns longer than 1 year are called "cycles"
- Key: the pattern repeats itself every year
🎲 Noise (Random)
- Random fluctuation that TS techniques CANNOT explain
- If noise looks non-random, your model is missing something!
- A good test: if residuals don't look random, there's still pattern left
Additive vs. Multiplicative Seasonality
This comes up in Winters' method and decomposition. You must know the difference:
Additive: Y = Level + Seasonal + Error
- Seasonal swings stay the SAME size regardless of level
- Peaks and valleys have constant amplitude
- Seasonal factors sum to ZERO
- Forecast: (L + B×h) + S
Multiplicative: Y = Level × Seasonal + Error
- Seasonal swings GROW as the level grows
- Peaks and valleys get wider over time
- Seasonal factors average to 1.0
- Forecast: (L + B×h) × S
Forecast Error Metrics — Fully Explained
Error for any period: et = Actualt − Forecastt
Take each error, make it positive (absolute value), average them. Tells you the average size of your mistakes in the same units as your data. Easy to understand: "on average, we're off by X units."
Square each error, then average. Big errors get punished much more than small ones (because squaring amplifies large numbers). Good for comparing models when you want to penalize big misses.
Turns each error into a percentage of the actual value, then averages. Great for comparing accuracy across products with different scales. "On average, we're off by X%."
α = 0.10, initial forecast = 175. Actual data: 180, 168, 159, 175, 190, 205, 180, 182
| Qtr | Actual | Forecast (α=.10) | Error | |Error| | Error² | |Error|/Actual |
|---|---|---|---|---|---|---|
| 1 | 180 | 175.00 | 5.00 | 5.00 | 25.00 | 2.78% |
| 2 | 168 | 175.50 | −7.50 | 7.50 | 56.25 | 4.46% |
| 3 | 159 | 174.75 | −15.75 | 15.75 | 248.06 | 9.91% |
| 4 | 175 | 173.18 | 1.82 | 1.82 | 3.33 | 1.04% |
| 5 | 190 | 173.36 | 16.64 | 16.64 | 276.89 | 8.76% |
| 6 | 205 | 175.02 | 29.98 | 29.98 | 898.80 | 14.62% |
| 7 | 180 | 178.02 | 1.98 | 1.98 | 3.92 | 1.10% |
| 8 | 182 | 178.22 | 3.78 | 3.78 | 14.29 | 2.08% |
| TOTALS | 82.45 | 1526.54 | 44.74% | |||
MAD = 82.45 / 8 = 10.31 MSE = 1526.54 / 8 = 190.82 MAPE = 44.74% / 8 = 5.59%
How was F₂ calculated? F₂ = 0.10 × 180 + 0.90 × 175 = 18 + 157.5 = 175.50
How was F₃ calculated? F₃ = 0.10 × 168 + 0.90 × 175.50 = 16.8 + 157.95 = 174.75
Naive & Moving Averages
Naive Forecast
"Tomorrow will be the same as today." Free, simple, surprisingly hard to beat sometimes. Used as a baseline benchmark.
Simple Moving Average (SMA)
Data: Jan=12, Feb=13, Mar=16, Apr=19, May=23
Notice: each new forecast "drops" the oldest value and "adds" the newest.
Weighted Moving Average
Example with weights 3, 2, 1: F = (3×16 + 2×13 + 1×12) / 6 = (48+26+12)/6 = 86/6 = 14.33
Visual Intuition
Raw Series vs Smoothers
This plot uses the exact worked-example numbers: actuals for Jan-May, then SMA and WMA forecasts for Apr-Jun.
Decision Cue
What Bigger N Does
Increasing the window lowers noise but delays the forecast’s response to real movement.
Exponential Smoothing — Full Explanation
What does α control? It's the "reactivity dial" between 0 and 1:
Low α (0.05 – 0.2)
- Forecast is mostly old forecast (heavy smoothing)
- Slow to react to changes
- Great when data is noisy but stable
- Example: α=0.1 means 10% new data, 90% old forecast
High α (0.5 – 0.9)
- Forecast is mostly last actual value
- Reacts quickly to changes
- Good when level is shifting
- But overreacts to noise!
- At α=1.0, it becomes a naive forecast
How weights actually decay
With α=0.1, the weight on each past period's sales is:
Holt's Method (Exponential Smoothing with Trend)
Adds a trend component to capture upward/downward movement. Uses two constants: α (level) and β (trend).
Visual Intuition
Holt Tracks the Slope
The smoothed level absorbs noise, then Holt projects forward using the current trend estimate instead of staying flat.
Given: F₁ = 11, T₁ = 2, α = 0.2, β = 0.4. Actual data: A₁=12, A₂=17, ...
Computing Month 2:
Computing Month 3 (A₂=17):
Winters' Method (Trend + Seasonality)
Multiplicative Formulation
•
Yt / St-m = "de-seasonalize" this period's actual (divide out last year's seasonal factor) → gives you an estimate of the level without seasonal noise•
Lt-1 + Bt-1 = last period's level + trend → the "expected level" from the previous model• α blends these two estimates together
Seasonality Shape
Additive vs Multiplicative
Both have trend and repeating waves. The difference is whether the wave stays constant or grows with the level.
Forecast Logic
De-seasonalize, Then Reapply Season
Winters first strips the seasonal effect out, updates level and trend, then puts seasonality back onto the future forecast.
Initialization (How to Get Started)
Winters' needs initial values for L₀, B₀, and all m seasonal factors. The procedure from your lecture (Kuliah 6):
Example: S̄[Q1] = (0.7368 + 0.7156 + 0.6894 + 0.6831) / 4 = 0.7062
CF = m / Σ(averages) where m = number of seasons (4 for quarterly)
Multiply each average by CF.
Example: if averages sum to 3.9999, CF = 4/3.9999 ≈ 1.0000
Initial seasonal factors: S₋₃=0.7062, S₋₂=1.1114, S₋₁=1.2937, S₀=0.8886
• Multiplicative: seasonal factors must average to 1.0 (equivalently, sum to m)
• Additive: seasonal factors must sum to 0
A factor of 1.15 means that season is 15% above the average. A factor of 0.85 means 15% below.
What is ARIMA? — The Big Picture
ARIMA stands for AutoRegressive Integrated Moving Average. It's the most powerful classical time series method. While FMTS techniques assume which patterns exist (and use fixed formulas), ARIMA first analyzes the data to discover what patterns are there, then builds a custom model.
AR (AutoRegressive): Today's value depends on yesterday's value (and maybe the day before, etc.)
I (Integrated): We may need to "difference" the data to make it stationary first
MA (Moving Average): Today's value depends on yesterday's forecast error (and maybe earlier errors)
ARIMA(p, d, q) → p = AR order, d = differencing order, q = MA order
AR (AutoRegressive) Models — Explained from Scratch
The Idea
In an AR model, the current value Xt is predicted using past values of itself. It's literally a regression where the predictors are the variable's own lagged values.
AR(1) — First Order
This says: "Today's value = some constant + some fraction of yesterday's value + random shock."
If β₁ = 0: There's no temporal dependence. Xt is just random noise around β₀.
If β₁ is large (close to 1): Yesterday's value strongly influences today. The series has "memory."
If β₁ > 1: The series is explosive (grows without bound) — not stationary!
AR(2) — Second Order
Now today depends on the last TWO values. More complex patterns can be captured — like oscillating behavior.
AR(p) — General
Today depends on the last p values. p is the "order" of the AR model.
How to tell if you have an AR process
• PACF: sharp cutoff after lag p (only first p lags are significant, rest drop to zero)
• ACF: gradual decay (slowly decreasing, possibly oscillating, but many lags significant)
The PACF tells you the order: if PACF cuts off after lag 2 → AR(2).
MA (Moving Average) Models — Explained from Scratch
The Idea
In an MA model, the current value depends on past random shocks (errors), NOT past values of itself. It's like saying: "Today's value is the normal level, plus an echo of yesterday's surprise, plus an echo of the surprise before that..."
MA(1) — First Order
μ = the mean level. et = today's random shock. α₁·et-1 = an echo of yesterday's shock.
If α₁ = 0: no temporal dependence, just pure random noise around the mean.
If α₁ is large: past shocks strongly influence current value.
MA(2) — Second Order
How to tell if you have an MA process
• ACF: sharp cutoff after lag q (only first q lags are significant)
• PACF: gradual decay (slowly decreasing)
The ACF tells you the order: if ACF cuts off after lag 1 → MA(1).
Extra hint: If the lag-1 autocorrelation is NEGATIVE, strongly consider an MA term.
ACF & PACF — The Complete Identification Guide
What Are These?
ACF (Autocorrelation Function): Measures the correlation between the series and lagged versions of itself. ACF at lag 3 = correlation between Xt and Xt-3. This includes BOTH direct effects and indirect effects that "propagate" through intermediate lags.
PACF (Partial Autocorrelation Function): Measures the correlation between Xt and Xt-k AFTER removing the effects of all intermediate lags. So PACF at lag 3 = the "pure" correlation at lag 3 that isn't explained by lags 1 and 2.
The Master Identification Table
| Model | ACF Pattern | PACF Pattern | How to Read It |
|---|---|---|---|
| AR(1) | Decays exponentially from lag 1 (many lags slowly fading) | Cuts off sharply after lag 1 (only lag 1 is significant) | PACF shows 1 spike then nothing → AR(1) |
| AR(2) | Decays gradually (may oscillate — go positive/negative) | Cuts off after lag 2 (lags 1 and 2 significant, rest zero) | PACF shows 2 spikes then nothing → AR(2) |
| MA(1) | Cuts off sharply after lag 1 (only lag 1 is significant) | Decays exponentially from lag 1 | ACF shows 1 spike then nothing → MA(1) |
| MA(2) | Cuts off after lag 2 (lags 1 and 2 significant, rest zero) | Decays gradually | ACF shows 2 spikes then nothing → MA(2) |
| ARMA(1,1) | Both decay gradually | Both decay gradually | Neither cuts off cleanly → try ARMA |
AR → PACF cuts off (the one with the different letter cuts off: A≠P)
MA → ACF cuts off (the one with the matching A cuts off: MA↔ACF)
Or even simpler: "AR = Partial, MA = Auto" — each model is identified by the function that cuts off.
AR(1)
PACF cuts off first, so this is AR.
AR(2)
Two PACF spikes then silence is the AR(2) signature.
MA(1)
ACF cuts off first, so this is MA.
MA(2)
Two ACF spikes then silence means MA(2).
What "Cuts Off" vs "Decays" Looks Like
Sharp Cutoff
One or two significant spikes, then everything drops into the "not significant" band (dashed lines on the plot).
Gradual Decay
Values slowly shrink over many lags. May be all positive (exponential decay) or alternate positive/negative (oscillating decay).
1. Look at PACF — does it cut off sharply? If yes → AR model, order = lag where it cuts off
2. Look at ACF — does it cut off sharply? If yes → MA model, order = lag where it cuts off
3. Both decay gradually? → Try ARMA. Try the simplest (1,1) first.
4. "Significant" = the bar extends beyond the blue dashed confidence lines on the plot
ARMA & ARIMA Models
ARMA(p,q) — Mixing AR and MA
Combines autoregressive terms (past values) with moving average terms (past errors). Sometimes an ARMA(1,1) works better than AR(3,0) and is simpler.
ARIMA(p,d,q) — Adding Differencing for Non-Stationary Data
Problem: AR and MA only work on stationary data (constant mean, constant variance). Real data often has trends → non-stationary.
Solution: Difference the data to remove the trend, then apply ARMA.
ARIMA Notation
| Notation | p (AR) | d (diff) | q (MA) | English |
|---|---|---|---|---|
| ARIMA(1,0,0) | 1 | 0 | 0 | Just AR(1) — no differencing, no MA |
| ARIMA(0,0,1) | 0 | 0 | 1 | Just MA(1) — no differencing, no AR |
| ARIMA(1,0,1) | 1 | 0 | 1 | ARMA(1,1) — mixed, no differencing |
| ARIMA(1,1,0) | 1 | 1 | 0 | AR(1) on first-differenced data |
| ARIMA(0,1,1) | 0 | 1 | 1 | MA(1) on first-differenced data |
| ARIMA(2,0,0) | 2 | 0 | 0 | AR(2) on original stationary data |
The Box-Jenkins Process — Step by Step
This is the systematic approach to building an ARIMA model. It's an iterative cycle:
Stationarity — When Do You Difference?
A stationary series fluctuates around a constant mean with constant variance. Visually, it looks like "random noise around a flat line."
A non-stationary series has a trending mean or changing variance. If you see the data going up or down over time → non-stationary → difference it.
The Ljung-Box Test
Reading R Output — A Trap to Know
Your lecture specifically warned about this:
For an AR(1) model: Xt = β₀ + β₁·Xt-1 + et
The mean μ and intercept β₀ are related by: μ = β₀ / (1 − β₁)
Or equivalently: β₀ = μ × (1 − β₁)
R gives you μ (labeled as "intercept"). You need to convert if you want the true intercept.
R output for ARIMA(1,0,0):
To get the true intercept β₀:
β₀ = μ × (1 − β₁) = 36.093 × (1 − 0.39) = 36.093 × 0.61 = 22.017
The actual model equation: Xt = 22.017 + 0.39·Xt-1 + et
For the ARIMA(2,0,0) model: β₀ = 36.052 × (1 − (0.3136 + 0.1931)) = 36.052 × 0.4933 = 17.784
Model: Xt = 17.784 + 0.3136·Xt-1 + 0.1931·Xt-2 + et
ARIMA — Practice Scenarios
Scenario 1: You plot ACF and see: lag 1 = 0.8, lag 2 = 0.6, lag 3 = 0.4, lag 4 = 0.25... (slowly decaying). PACF shows: lag 1 = 0.8, lag 2 = −0.05, lag 3 = 0.02 (sharp cutoff after lag 1). What model?
Scenario 1 ACF
Scenario 1 PACF
Scenario 2: ACF shows lag 1 = −0.6, lag 2 = 0.03, lag 3 = −0.01 (cutoff after lag 1). PACF decays: lag 1 = −0.6, lag 2 = −0.3, lag 3 = −0.15... What model?
Scenario 2 ACF
Scenario 2 PACF
Scenario 3: The original series trends upward. After first differencing, the series looks stationary. The differenced series has PACF cutting off at lag 2 and ACF decaying. What model?
Scenario 4: Both ACF and PACF decay gradually (neither cuts off cleanly). What should you try?
What is Fuzzy Time Series?
Traditional (Boolean) logic: a student count is either in the range [15000, 16000] or it's not. It's 0 or 1.
Fuzzy logic: a student count of 15,900 might be "mostly in [15000, 16000]" (membership 0.9) and "slightly in [16000, 17000]" (membership 0.1). It can belong to multiple sets with different degrees.
Why Fuzzy for Time Series?
Statistical methods like ARIMA are powerful but they're "certain" — they give you a precise number. Fuzzy TS embraces uncertainty and works well when data is limited, imprecise, or when you want an interpretable model. It was first proposed by Song and Chissom in 1993 and improved significantly by Chen in 1996.
Chen's 6-Step Method — Fully Explained
Step 1: Define the Universe of Discourse & Partition It
First, find the range of your data and add some padding:
The number of intervals (n) and the buffer values (D₁, D₂) are your choices. They significantly affect accuracy.
Step 2: Define Fuzzy Sets on the Universe of Discourse
Each interval gets a fuzzy set A₁, A₂, ..., Aₙ. Each set has a triangular membership function:
These can be given linguistic labels: A₁ = "not many", A₂ = "not too many", A₃ = "many", etc.
Step 3: Fuzzify Historical Data
For each historical data point, find which interval it falls in, and assign it to the fuzzy set with the highest membership degree (which is the set whose interval contains the value).
Step 4: Identify Fuzzy Logical Relationships (FLR)
Look at consecutive time periods. If F(t−1) is fuzzified as Aj and F(t) is fuzzified as Ak, then:
Step 5: Establish Fuzzy Logical Relationship Groups (FLRG)
Group all FLRs that have the same left-hand side. Merge the right-hand sides (remove duplicates):
Step 6: Defuzzify the Forecasted Output
To forecast F(t), check which fuzzy set F(t−1) belongs to, look up its FLRG, then:
Case 1: Empty group
If Aj has no relationships (never appeared as a left-hand side), forecast = midpoint of uj
Case 2: One-to-one (Aj → Ak)
Only one target. Forecast = midpoint of uk
Forecast = average of midpoints of ua, ub, uc
Example: A₃ → A₂, A₃, A₅. Midpoints: m₂=14500, m₃=15500, m₅=17500.
Forecast = (14500 + 15500 + 17500) / 3 = 47500 / 3 = 15,833
Full Worked Example — Alabama Enrollments (Chen's Method)
Data: Enrollments 1971–1992 at the University of Alabama
| Year | Enroll | Year | Enroll | Year | Enroll |
|---|---|---|---|---|---|
| 1971 | 13055 | 1979 | 16807 | 1987 | 16859 |
| 1972 | 13563 | 1980 | 16919 | 1988 | 18150 |
| 1973 | 13867 | 1981 | 16388 | 1989 | 18970 |
| 1974 | 14696 | 1982 | 15433 | 1990 | 19328 |
| 1975 | 15460 | 1983 | 15497 | 1991 | 19337 |
| 1976 | 15311 | 1984 | 15145 | 1992 | 18876 |
| 1977 | 15603 | 1985 | 15163 | ||
| 1978 | 15861 | 1986 | 15984 |
Step 1: Universe of Discourse
Min = 13055, Max = 19337. Let D₁ = 55, D₂ = 663 → U = [13000, 20000]
Divide into 7 equal intervals (length = 1000 each):
Step 2: Define Fuzzy Sets
Step 3: Fuzzify Every Data Point
| Year | Enroll | Interval | Fuzzified | Year | Enroll | Interval | Fuzzified | |
|---|---|---|---|---|---|---|---|---|
| 1971 | 13055 | u₁ | A₁ | 1982 | 15433 | u₃ | A₃ | |
| 1972 | 13563 | u₁ | A₁ | 1983 | 15497 | u₃ | A₃ | |
| 1973 | 13867 | u₁ | A₁ | 1984 | 15145 | u₃ | A₃ | |
| 1974 | 14696 | u₂ | A₂ | 1985 | 15163 | u₃ | A₃ | |
| 1975 | 15460 | u₃ | A₃ | 1986 | 15984 | u₃ | A₃ | |
| 1976 | 15311 | u₃ | A₃ | 1987 | 16859 | u₄ | A₄ | |
| 1977 | 15603 | u₃ | A₃ | 1988 | 18150 | u₆ | A₆ | |
| 1978 | 15861 | u₃ | A₃ | 1989 | 18970 | u₆ | A₆ | |
| 1979 | 16807 | u₄ | A₄ | 1990 | 19328 | u₇ | A₇ | |
| 1980 | 16919 | u₄ | A₄ | 1991 | 19337 | u₇ | A₇ | |
| 1981 | 16388 | u₄ | A₄ | 1992 | 18876 | u₆ | A₆ |
Step 4: Build FLRs (consecutive pairs)
Step 5: Build FLRGs
Step 6: Forecast (example: forecast 1975)
1974 is fuzzified as A₂. FLRG: A₂ → A₃ (one-to-one)
Forecast = midpoint of u₃ = 15,500 (actual: 15,460 — very close!)
Forecast 1976: 1975 is A₃. FLRG: A₃ → A₃, A₄ (one-to-many)
Forecast = (midpoint u₃ + midpoint u₄) / 2 = (15500 + 16500) / 2 = 16,000 (actual: 15,311)
The Variations Approach — Handling Trend
The Fix: Instead of fuzzifying actual values, compute the year-to-year changes (variations) first: V(t) = Y(t) − Y(t−1). Then apply Chen's method to the variation series. This removes the trend from the analysis!
To get the final forecast: The defuzzified output gives you a predicted CHANGE. Add this change to the last known actual value: Forecast(t) = Actual(t−1) + Predicted Change.
Why It Works
Trend Before vs After Variations
This version mirrors the Alabama example: rising enrollments through 1975, then a negative variation in 1976.
Forecast Rule
Predict Change, Then Add It Back
The fuzzy model outputs a variation first. Converting that to the final enrollment forecast is a second step.
Fuzzy Exam Problem — Fully Solved
"Use variations of historical enrollment data. Let U = [−1000, 1400] be the universe of discourse. Partition into 4 equal intervals to produce 4 fuzzy sets. Find first-order FLRGs. Forecast enrollment for 1975 and 1976."
Step 0: Compute ALL Variations
| Year | Enrollment | Variation V(t) | Year | Enrollment | Variation V(t) | |
|---|---|---|---|---|---|---|
| 1971 | 13055 | — | 1982 | 15433 | 15433−16388 = −955 | |
| 1972 | 13563 | 13563−13055 = 508 | 1983 | 15497 | 15497−15433 = 64 | |
| 1973 | 13867 | 13867−13563 = 304 | 1984 | 15145 | 15145−15497 = −352 | |
| 1974 | 14696 | 14696−13867 = 829 | 1985 | 15163 | 15163−15145 = 18 | |
| 1975 | 15460 | 15460−14696 = 764 | 1986 | 15984 | 15984−15163 = 821 | |
| 1976 | 15311 | 15311−15460 = −149 | 1987 | 16859 | 16859−15984 = 875 | |
| 1977 | 15603 | 15603−15311 = 292 | 1988 | 18150 | 18150−16859 = 1291 | |
| 1978 | 15861 | 15861−15603 = 258 | 1989 | 18970 | 18970−18150 = 820 | |
| 1979 | 16807 | 16807−15861 = 946 | 1990 | 19328 | 19328−18970 = 358 | |
| 1980 | 16919 | 16919−16807 = 112 | 1991 | 19337 | 19337−19328 = 9 | |
| 1981 | 16388 | 16388−16919 = −531 | 1992 | 18876 | 18876−19337 = −461 |
Step 1: Universe of Discourse (given)
U = [−1000, 1400], divide into 4 equal intervals.
Range = 1400 − (−1000) = 2400. Interval length = 2400 / 4 = 600
Step 2: Define 4 Fuzzy Sets
Step 3: Fuzzify Each Variation
| Year | V(t) | Falls in | Fuzzified | Year | V(t) | Falls in | Fuzzified | |
|---|---|---|---|---|---|---|---|---|
| 1972 | 508 | u₃ [200,800] | A₃ | 1983 | 64 | u₂ [−400,200] | A₂ | |
| 1973 | 304 | u₃ | A₃ | 1984 | −352 | u₂ | A₂ | |
| 1974 | 829 | u₄ [800,1400] | A₄ | 1985 | 18 | u₂ | A₂ | |
| 1975 | 764 | u₃ | A₃ | 1986 | 821 | u₄ | A₄ | |
| 1976 | −149 | u₂ | A₂ | 1987 | 875 | u₄ | A₄ | |
| 1977 | 292 | u₃ | A₃ | 1988 | 1291 | u₄ | A₄ | |
| 1978 | 258 | u₃ | A₃ | 1989 | 820 | u₄ | A₄ | |
| 1979 | 946 | u₄ | A₄ | 1990 | 358 | u₃ | A₃ | |
| 1980 | 112 | u₂ | A₂ | 1991 | 9 | u₂ | A₂ | |
| 1981 | −531 | u₁ [−1000,−400] | A₁ | 1992 | −461 | u₁ | A₁ | |
| 1982 | −955 | u₁ | A₁ |
Step 4: Build ALL FLRs
Step 5: Group into FLRGs
Step 6: Forecast 1975
We need the variation for 1975. The variation for 1974 is V(1974) = 829 → fuzzified as A₄.
FLRG for A₄: A₄ → A₂, A₃, A₄
Predicted variation = average of midpoints = (−100 + 500 + 1100) / 3 = 1500 / 3 = 500
Forecast enrollment 1975 = Actual(1974) + predicted change = 14696 + 500 = 15,196
(Actual 1975 = 15,460. Error = 264)
Forecast 1976
The variation for 1975 is V(1975) = 764 → fuzzified as A₃.
FLRG for A₃: A₃ → A₂, A₃, A₄
Predicted variation = (−100 + 500 + 1100) / 3 = 500
Forecast enrollment 1976 = Actual(1975) + predicted change = 15460 + 500 = 15,960
(Actual 1976 = 15,311. Error = 649)
Question 3 from the test: Effect of Increasing Order
From 1st to 2nd order: Instead of just looking at one previous variation to predict the next, you look at TWO previous variations together. For example, the pattern (A₃, A₄) → A₃ is more specific than just A₄ → ?. This generally improves accuracy because you capture more context. Think of it like: instead of predicting weather based on just today, you use today AND yesterday's weather.
From 1st to 20th order: You'd use 20 previous variations to predict the next. With only 21 data points (1972-1992), you'd have very few examples of any specific 20-length pattern. This likely leads to overfitting — the model memorizes the training data but can't generalize. Most patterns won't repeat, so many FLRGs will be empty or have only one example. There's a sweet spot: higher order helps up to a point, then accuracy plateaus or degrades.
All 4 Improvements to Chen's Method
Problem: First-order only uses F(t−1). Why just one previous value?
Solution: Use multiple previous values. 2nd order: (Ai, Aj) → Ak. 3rd order: (Ai, Aj, Ak) → Am.
Dynamic approach (Chen et al., 2015): Use LCS/LRS (Longest Common/Repeated Subsequence) algorithm to automatically find the optimal order from the data.
Problem: D₁, D₂ are arbitrary. Equal intervals may not suit the data.
Solution: (1) Sort values ascending. (2) Compute average distance between consecutive sorted values + standard deviation. (3) Remove outliers (distances > 1 std dev from average). (4) Compute revised average distance (ADR). (5) Use U = [Dmin − ADR, Dmax + ADR]. (6) Use trapezoidal (not triangular) membership functions for smoother transitions.
Problem: Equal-width intervals waste resolution on sparse regions and under-represent dense regions.
Solution (Singh & Samariya, 2015): Use data clustering (e.g., k-means) to determine natural groupings in the data. Dense clusters get narrower intervals (more precision where data concentrates), sparse areas get wider intervals.
Problem: Chen's method on raw values doesn't explicitly handle trend.
Solution: Compute V(t) = Y(t) − Y(t−1) and apply FTS to the variation series. This is the approach from your professor's test! The universe of discourse is defined over variations instead of actual values.
Decomposition Analysis
An Open-Model technique that breaks data into Level, Trend, Seasonality, Noise using a centered moving average. Needs 48+ data points.
Regression Analysis
r (correlation coefficient): −1 to +1. Strength of linear relationship. Does NOT prove causation.
R² = r²: % of variation in y explained by x. Example: r = 0.901 → R² = 0.81 → 81% explained.
Standard Error Sy,x: Used for prediction intervals: ŷ ± tα/2 × Sy,x (use t with df = n−2).
Sales = 1.75 + 0.25 × (payroll). Payroll next year = $6B.
Forecast: Sales = 1.75 + 0.25(6) = $3.25M. Sy,x = $0.306M. With n=6, df=4, t₀.₀₂₅ = 2.78:
95% CI: $3.25M ± 2.78 × $0.306M = [$2.40M, $4.10M]
Adding interest rates (multiple regression): r improved from 0.901 to 0.96.
Key Probability Distributions
| Distribution | Type | Key Formula | When |
|---|---|---|---|
| Binomial | Discrete | p(x) = C(n,x)·px·qn−x; μ=np, σ=√(npq) | n fixed trials, 2 outcomes, independent |
| Poisson | Discrete | p(x) = e−λ·λx/x!; μ=σ²=λ | Events per interval (time/area) |
| Normal | Continuous | z = (x−μ)/σ; use z-table | Bell-shaped, symmetric. Basis of CLT |
| Exponential | Continuous | P(x>a) = e−a/μ | Time between events |
Central Limit Theorem: For n ≥ 30, x̄ is approximately normal with μx̄ = μ and σx̄ = σ/√n.
Normal approx. to Binomial: When np ≥ 5 and nq ≥ 5. Use continuity correction ±0.5.
Formula Cheat Sheet
| Method | Formula | Constants |
|---|---|---|
| Exp. Smoothing | Ft+1 = αAt + (1−α)Ft | α |
| Holt's | Ft=αAt-1+(1−α)(Ft-1+Tt-1); Tt=β(Ft−Ft-1)+(1−β)Tt-1 | α, β |
| Winters' (Mult) | L=α(Y/St-m)+(1−α)(L+B); Forecast=(L+Bh)×S | α, γ, δ |
| ARIMA(p,d,q) | AR: uses past X values; MA: uses past errors; I: differencing | p, d, q |
| ACF/PACF | AR→PACF cuts off; MA→ACF cuts off; ARMA→both decay | — |
| Error metrics | MAD=Σ|e|/n; MSE=Σe²/n; MAPE=Σ(|e|/A×100)/n | — |
Which Method When?
| Situation | Use | Why |
|---|---|---|
| Stable, no trend/season, limited data | Exp. Smoothing | Simple, dampens noise |
| Level shifts unpredictably | Adaptive Smoothing | Auto-adjusts α |
| Clear trend, no season | Holt's | Captures trend with β |
| Trend + seasonality | Winters' | Full 3-parameter model |
| Long history (48+), complex patterns | ARIMA | Discovers optimal model |
| External factors drive demand | Regression | Uses exogenous variables |
| Uncertain/linguistic/limited data | Fuzzy TS | Handles imprecision |
Self-Test Quiz — 12 Questions
Q1: In exponential smoothing, α = 0.9 means the forecast puts what percentage of weight on the most recent actual value?
Q2: For Winters' multiplicative method, seasonal factors must:
Q3: ACF decays slowly, PACF cuts off after lag 2. The model is:
Q4: ACF cuts off after lag 1 (lag-1 autocorrelation is negative). PACF decays gradually. The model is:
Q5: The Ljung-Box test gives p-value = 0.43. This means:
Q6: In Chen's fuzzy TS, FLRG is A₃ → A₂, A₃, A₅. Midpoints: m₂=14500, m₃=15500, m₅=17500. The forecast is:
Q7: Using the variations approach in fuzzy TS addresses which problem?
Q8: ARIMA(0,1,1) means:
Q9: R gives ARIMA(1,0,0) with ar1=0.39, intercept=36.09. The TRUE intercept β₀ is:
Q10: A 12-month MA on monthly data with 12-month seasonality will:
Q11: Which model should you compare if ARMA(1,1) seems to fit?
Q12: Increasing fuzzy TS order from 1 to 20 on a dataset with 22 years of data will likely:
Exam Strategy
HIGH Fuzzy TS — Chen's method + variations approach (your prof tested this specifically)
HIGH ARIMA — ACF/PACF identification, Box-Jenkins process, reading R output
HIGH Exponential smoothing calculations (with & without trend)
HIGH Winters' multiplicative initialization & forecasting
MED Error metrics (MAD, MSE, MAPE) calculation
MED Regression, decomposition, probability distributions
• Additive seasonal factors sum to 0 ≠ Multiplicative factors average to 1
• Winters' multiplicative forecast: (L + B×h) × S (multiply!), not + S
• AR → PACF cuts off (not ACF!); MA → ACF cuts off
• R's "intercept" = mean for AR models. True intercept = mean × (1−Σφ)
• Ljung-Box: HIGH p-value = GOOD (residuals are random = model works)
• Lower AIC = better model
• Fuzzy fuzzification: assign to HIGHEST membership set
• Fuzzy defuzzification (one-to-many): simple average of midpoints, no weighting
• For variations approach: forecast = last actual + predicted variation
□ Can I do exponential smoothing by hand, step by step?
□ Can I do Holt's method (F, T, FIT) for 2-3 periods?
□ Can I initialize Winters' multiplicative (regression → detrend → seasonal factors)?
□ Can I walk through Chen's fuzzy 6 steps completely?
□ Can I do the VARIATIONS approach from scratch? (the test problem)
□ Can I identify AR/MA/ARMA from an ACF/PACF description?
□ Can I interpret R output for ARIMA and convert mean→intercept?
□ Do I know the Box-Jenkins 5-step process?
□ Can I compute MAD, MSE, MAPE for a dataset?
□ Can I explain what Ljung-Box tests and what "p > 0.05" means?