This RMarkdown file contains the report of the data analysis done for the project on RFM (Recency, Frequency, Monetary) Customer Segmentation using Superstore sales data in R. It contains analysis such as data cleaning, computation of RFM metrics, customer segmentation, visualization, and business insights. The final report was completed on 2025-11-05.
Data Description:
This dataset contains detailed sales transactions for a retail “Superstore”, including order information, customer details, sales, quantity, discounts, profit, and shipping data. It covers a wide range of customers, products, and regions.
Data Source: Superstore Dataset
Disclaimer:
This dataset is used for educational purposes only. All analyses, results, and insights presented in this report are meant for learning, demonstration, and practice. The original dataset belongs to its respective owners and creators. No commercial use or reproduction of the dataset is intended.
data_path <- here("data", "superstore.csv")
superstore <- read_csv(data_path, show_col_types = FALSE) %>% clean_names()
glimpse(superstore)
Dataset Overview:
df <- superstore %>%
mutate(order_date = as.Date(order_date, tryFormats = c("%Y-%m-%d", "%m/%d/%Y"))) %>%
filter(!is.na(customer_id), !is.na(order_date), !is.na(sales))
Data Quality:
analysis_date <- max(df$order_date, na.rm = TRUE) + 1
rfm_data <- df %>%
group_by(customer_id, customer_name) %>%
summarise(
recency_days = as.integer(analysis_date - max(order_date)),
frequency = n_distinct(order_id),
monetary = sum(sales, na.rm = TRUE),
.groups = "drop"
)
RFM Metrics Summary:
| Metric | Average | Minimum | Maximum |
|---|---|---|---|
| Recency (days) | 147.80 | 1.00 | 1,166.00 |
| Frequency (orders) | 6.30 | 1.00 | 17.00 |
| Monetary ($) | 2,896.85 | 4.83 | 25,043.05 |
rfm_data <- rfm_data %>%
mutate(
R_score = ntile(-recency_days, 5), # smaller recency = higher score
F_score = ntile(frequency, 5),
M_score = ntile(monetary, 5),
RFM_Score = R_score + F_score + M_score
)
Score Distribution:
rfm_data <- rfm_data %>%
mutate(
Segment = case_when(
RFM_Score >= 13 ~ "Champions",
RFM_Score >= 10 ~ "Loyal Customers",
RFM_Score >= 7 ~ "Potential Loyalists",
RFM_Score >= 4 ~ "Needs Attention",
TRUE ~ "At Risk"
)
)
Segment Distribution:
| Segment | Customers | Percentage (%) |
|---|---|---|
| Loyal Customers | 256 | 32.3 |
| Potential Loyalists | 218 | 27.5 |
| Needs Attention | 159 | 20.1 |
| Champions | 121 | 15.3 |
| At Risk | 39 | 4.9 |
| Segment | Customers | Avg_Recency | Avg_Frequency | Avg_Monetary |
|---|---|---|---|---|
| Champions | 121 | 27.4 | 9.4 | 5,254.07 |
| Loyal Customers | 256 | 75.0 | 7.5 | 3,879.87 |
| Potential Loyalists | 218 | 137.7 | 5.6 | 2,170.64 |
| Needs Attention | 159 | 283.2 | 4.0 | 1,129.12 |
| At Risk | 39 | 503.0 | 2.5 | 396.99 |
Interpretation:
| Segment | Description | Suggested_Action |
|---|---|---|
| Champions | Recent, frequent, and high spenders | Offer loyalty rewards, exclusive deals, VIP treatment |
| Loyal Customers | Frequent and consistent buyers | Encourage reviews, referrals, upsell, cross-sell |
| Potential Loyalists | Medium RFM score, possible repeat buyers | Engage with promotions, targeted campaigns |
| Needs Attention | Low-medium RFM score, may churn | Send reactivation campaigns, special offers |
| At Risk | Low in all RFM metrics | Win-back campaigns, discounts, personalized outreach |
Overall Customer Behavior Summary:
Strategic Priorities:
Operational Focus:
# Ensure folders exist
if(!dir.exists(here("output"))) dir.create(here("output"), recursive = TRUE)
if(!dir.exists(here("figures"))) dir.create(here("figures"), recursive = TRUE)
# Save CSV files
write_csv(rfm_data, here("output", "rfm_scores.csv"))
write_csv(segment_summary, here("output", "rfm_segment_summary.csv"))
# Also save with alternative name for compatibility
write_csv(rfm_data, here("output", "rfm_customer_segments.csv"))
# Save plot
ggsave(here("figures", "rfm_segment_plot.png"), plot = rfm_plot, width = 10, height = 6)
Outputs saved successfully:
output/ folderfigures/ folderThroughout this project, I analyzed customer behavior using the Superstore sales dataset by applying RFM (Recency, Frequency, Monetary) analysis. Here are the key findings:
The dataset contains 9994 transactions across 793 customers. After cleaning, all records have valid customer IDs, order dates, and sales amounts, ensuring accurate RFM computation.
Customers were assigned R, F, and M scores (1–5), then combined into an overall RFM Score (3–15). The distribution reveals clear distinctions between high-value, loyal customers and low-value or at-risk customers.
The visualizations confirm that most customers are Loyal Customers and Potential Loyalists, with smaller but critical segments of Champions and At Risk customers. The scatter plot clearly shows the relationship between recency and monetary value across segments.
This RFM analysis (Project 1) provides the foundation for Project 2: Predictive Customer Segmentation, where machine learning models will predict:
The predictive approach will enable proactive customer management and optimized marketing spend allocation.
Analysis completed on: 2025-11-05
For questions or feedback, please contact moneteer808@gmail.com.