This notebook demonstrates an end-to-end gene expression analysis workflow starting from a raw GEO Series Matrix file. The focus is on a core real-world bioinformatics skill: turning “messy” public repository files into analysis-ready matrices and metadata, then performing basic differential expression and exploratory analysis.
Although the dataset is public and compact, the same workflow scales to large transcriptomics studies and is directly applicable in oncology pipelines.
This project demonstrates a reproducible end-to-end gene expression analysis workflow turning chaotic public repository data into analysis-ready matrices and metadata.
| Figure |
|---|
![]() Figure 1: Volcano plot Tumor vs Normal |
![]() Figure 2: PCA plot Tumor vs Normal |
![]() Figure 3: Top 20 genes based on p-value |
git clone https://github.com/Anwesha19-prog/Breast-Cancer-Microarray-Analysis
cd Breast-Cancer-Microarray-Analysis
pip install pandas numpy scipy matplotlib seaborn scikit-learn
jupyter notebook Project-1_clean.ipynb