About Data Analysis Report

This RMarkdown file contains the report of the data analysis done for the project on RFM (Recency, Frequency, Monetary) Customer Segmentation using Superstore sales data in R. It contains analysis such as data cleaning, computation of RFM metrics, customer segmentation, visualization, and business insights. The final report was completed on 2025-11-05.

Data Description:

This dataset contains detailed sales transactions for a retail “Superstore”, including order information, customer details, sales, quantity, discounts, profit, and shipping data. It covers a wide range of customers, products, and regions.

Data Source: Superstore Dataset

Disclaimer:

This dataset is used for educational purposes only. All analyses, results, and insights presented in this report are meant for learning, demonstration, and practice. The original dataset belongs to its respective owners and creators. No commercial use or reproduction of the dataset is intended.

1. Load Data

data_path <- here("data", "superstore.csv")
superstore <- read_csv(data_path, show_col_types = FALSE) %>% clean_names()
glimpse(superstore)

Dataset Overview:

  • Total records: 9994
  • Date range: 1/1/2017 to 9/9/2017
  • Unique customers: 793

2. Clean Data

df <- superstore %>%
  mutate(order_date = as.Date(order_date, tryFormats = c("%Y-%m-%d", "%m/%d/%Y"))) %>%
  filter(!is.na(customer_id), !is.na(order_date), !is.na(sales))

Data Quality:

  • Records after cleaning: 9994
  • Records removed: 0

3. Compute RFM Metrics

analysis_date <- max(df$order_date, na.rm = TRUE) + 1
rfm_data <- df %>%
  group_by(customer_id, customer_name) %>%
  summarise(
    recency_days = as.integer(analysis_date - max(order_date)),
    frequency = n_distinct(order_id),
    monetary = sum(sales, na.rm = TRUE),
    .groups = "drop"
  )

RFM Metrics Summary:

Metric Average Minimum Maximum
Recency (days) 147.80 1.00 1,166.00
Frequency (orders) 6.30 1.00 17.00
Monetary ($) 2,896.85 4.83 25,043.05

4. Assign RFM Scores

rfm_data <- rfm_data %>%
  mutate(
    R_score = ntile(-recency_days, 5),  # smaller recency = higher score
    F_score = ntile(frequency, 5),
    M_score = ntile(monetary, 5),
    RFM_Score = R_score + F_score + M_score
  )

Score Distribution:

  • RFM Scores range from 3 to 15
  • Average RFM Score: 9

5. Customer Segmentation

rfm_data <- rfm_data %>%
  mutate(
    Segment = case_when(
      RFM_Score >= 13 ~ "Champions",
      RFM_Score >= 10 ~ "Loyal Customers",
      RFM_Score >= 7  ~ "Potential Loyalists",
      RFM_Score >= 4  ~ "Needs Attention",
      TRUE ~ "At Risk"
    )
  )

Segment Distribution:

Segment Customers Percentage (%)
Loyal Customers 256 32.3
Potential Loyalists 218 27.5
Needs Attention 159 20.1
Champions 121 15.3
At Risk 39 4.9

6. Segment Characteristics

Segment Customers Avg_Recency Avg_Frequency Avg_Monetary
Champions 121 27.4 9.4 5,254.07
Loyal Customers 256 75.0 7.5 3,879.87
Potential Loyalists 218 137.7 5.6 2,170.64
Needs Attention 159 283.2 4.0 1,129.12
At Risk 39 503.0 2.5 396.99

7. Visualizations

7.1 Customer Segment Distribution

7.2 Recency vs Monetary Value by Segment

Interpretation:

  • Customers toward the left (low recency) and high on y-axis (high spenders) are Champions
  • Those farther right show longer inactivity periods and lower spend, representing At Risk segments
  • Clear visual separation between high-value and at-risk customer groups

8. Business Insights & Recommendations

Strategic Recommendations by Segment
Segment Description Suggested_Action
Champions Recent, frequent, and high spenders Offer loyalty rewards, exclusive deals, VIP treatment
Loyal Customers Frequent and consistent buyers Encourage reviews, referrals, upsell, cross-sell
Potential Loyalists Medium RFM score, possible repeat buyers Engage with promotions, targeted campaigns
Needs Attention Low-medium RFM score, may churn Send reactivation campaigns, special offers
At Risk Low in all RFM metrics Win-back campaigns, discounts, personalized outreach

Key Strategic Insights

Overall Customer Behavior Summary:

  • The store serves 793 unique customers across 5 distinct segments
  • Top-performing segment: Champions - highest average spend and frequency
  • Most vulnerable segment: At Risk - lowest engagement metrics
  • 121 Champions (15.3%) drive premium sales and deserve VIP treatment
  • 39 At Risk customers (4.9%) require immediate reactivation efforts

Strategic Priorities:

  1. Champions & Loyal Customers (377 customers)
    • Maintain engagement with VIP programs and early product access
    • Encourage reviews and referrals with incentive programs
    • Offer exclusive bundles and personalized recommendations
  2. Potential Loyalists (218 customers)
    • Deploy targeted email campaigns with personalized discounts
    • Highlight complementary products based on purchase history
    • Create urgency with limited-time offers
  3. Needs Attention & At Risk (198 customers)
    • Launch win-back campaigns: ‘We miss you’ messaging
    • Offer special comeback discounts (10-20% off next order)
    • Survey to understand pain points and improve service

Operational Focus:

  • Allocate 60-70% of marketing budget to Champions and Loyal Customers for maximum ROI
  • Monitor segment transitions monthly to catch early warning signs of churn
  • Implement automated triggers for customers moving to ‘At Risk’ category
  • A/B test campaign effectiveness across segments to optimize messaging

9. Interactive Customer Details

10. Save Outputs

# Ensure folders exist
if(!dir.exists(here("output"))) dir.create(here("output"), recursive = TRUE)
if(!dir.exists(here("figures"))) dir.create(here("figures"), recursive = TRUE)

# Save CSV files
write_csv(rfm_data, here("output", "rfm_scores.csv"))
write_csv(segment_summary, here("output", "rfm_segment_summary.csv"))

# Also save with alternative name for compatibility
write_csv(rfm_data, here("output", "rfm_customer_segments.csv"))

# Save plot
ggsave(here("figures", "rfm_segment_plot.png"), plot = rfm_plot, width = 10, height = 6)

Outputs saved successfully:

  • CSV files exported to output/ folder
  • Visualizations saved to figures/ folder

11. Key Findings and Conclusions

Summary of Analysis

Throughout this project, I analyzed customer behavior using the Superstore sales dataset by applying RFM (Recency, Frequency, Monetary) analysis. Here are the key findings:

1. Data Exploration

The dataset contains 9994 transactions across 793 customers. After cleaning, all records have valid customer IDs, order dates, and sales amounts, ensuring accurate RFM computation.

2. RFM Metrics Computation

  • Recency: Measures days since last purchase (lower = better)
  • Frequency: Counts unique orders per customer
  • Monetary: Sums total customer spend

Customers were assigned R, F, and M scores (1–5), then combined into an overall RFM Score (3–15). The distribution reveals clear distinctions between high-value, loyal customers and low-value or at-risk customers.

3. Customer Segmentation Results

  • Champions (121 customers): Very recent, frequent, and high spenders
  • Loyal Customers (256 customers): Frequent and consistent buyers
  • Potential Loyalists (218 customers): Medium RFM scores, growth potential
  • Needs Attention (159 customers): Low-medium RFM scores, churn risk
  • At Risk (39 customers): Low in all RFM metrics, immediate action needed

4. Visualization Insights

The visualizations confirm that most customers are Loyal Customers and Potential Loyalists, with smaller but critical segments of Champions and At Risk customers. The scatter plot clearly shows the relationship between recency and monetary value across segments.

5. Business Implications

  • Champions & Loyal Customers should receive loyalty rewards and VIP treatment
  • Potential Loyalists need targeted campaigns to increase engagement
  • Needs Attention & At Risk segments require urgent reactivation efforts

6. Next Steps

This RFM analysis (Project 1) provides the foundation for Project 2: Predictive Customer Segmentation, where machine learning models will predict:

  • Customer Lifetime Value (CLV): Forecast future spending
  • Churn Probability: Identify at-risk customers proactively
  • Advanced Segmentation: Combine predictions with RFM for targeted marketing

The predictive approach will enable proactive customer management and optimized marketing spend allocation.


Analysis completed on: 2025-11-05

For questions or feedback, please contact .