Extracting Insights by Clustering Structured Data

EasyChair Preprint 6821

9 pages•Date: October 9, 2021

Abstract

As part of the higher education ecosystem, Institutional Research (IR) is an integral part. Institutional data is one of the building blocks that makes IR vital in decision making and shaping policy and strategy. All the institutional entities ––. students and courses consisting of different attributes, such as program code and name, course code and names, credit points, etcetera –– are stored in defined structures named tables. These tables are conventionally stored in the form of structured data elements (fields or columns) and tuples (records or rows) in Relational Database Management Systems (RDBMSs). Breaking down the concept of entities and their attributes and storing them into tables is called normalization. This process is for reducing the data redundancies which is the main concern in large RDBMSs. Hence, given the fact that the entities and their attributes are the concepts already categorized and stored in the database tables, to what extent can this cliché structure negatively impact on researchers by limiting their views to the institutional data?

The objective of this research presentation is to introduce a new lens by which to analyze structured data with the aid of Clustering algorithms. To achieve this objective, the attributes of different entities can be merged using classical database views. Before we embark on the conventional analysis of the extracted data, we can apply an unsupervised Machine Learning algorithm (Clustering) to detect hidden correlations among the attributes and thereby re-group the datapoints into new clusters in order to start the analyzing process. This can assist institutional researchers to distill different perspectives of data and to extract invaluable insights based on the automatically detected clusters. The key factor in this approach is defining the appropriate number of clusters and, subsequently, the interpretation skills for the new clusters.

Keyphrases: Clustering, Insight extraction, RDBMS, machine learning, structured data

Links:

https://easychair.org/publications/preprint/WJCh

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:6821,
  author    = {Amir Hossein Rouhi},
  title     = {Extracting Insights by Clustering Structured Data},
  howpublished = {EasyChair Preprint 6821},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser