
1. Time series analysis
This query calculates daily total sales from 2011 to 2014 and breaks it down by region (Global, North America, Europe, and Pacific). It enables the comparison of sales trends between geographic territories, useful for identifying growth patterns and seasonality in each market.
2. Linear regression
This query prepares a dataset combining customer demographics and sales history. It includes fields like income, age, education, and first purchase date, which are key features for building a regression model that explains or predicts purchasing behavior based on personal attributes.
3. Logistic regression
This query constructs a classification-ready dataset by labeling customers who have purchased bicycles (based on product category). It joins this label with demographic and behavioral data to support logistic regression models that predict the probability of a customer making a bicycle purchase.
Business data modelling for predictive analysis
The queries in this project were developed on the AdventureWorks2008 OLTP database, a complex transactional data model with over 70 interconnected tables covering business domains such as sales, production, human resources, and purchasing.
To design analytical outputs from this schema, I worked across tables like SalesOrderHeader, SalesOrderDetail, SalesTerritory, Person, and Customer, which involve foreign key dependencies and normalized structures. The diagram below illustrates the relational model of the database used.
This added complexity required:
-
Understanding the dimensional relationships between business entities.
-
Writing efficient SQL queries to join multiple normalized tables.
-
Structuring outputs into flat analytical datasets suitable for time series visualization and predictive modeling.
This environment simulates real-world business databases and emphasizes the ability to translate relational models into meaningful BI-ready data structures.