# Replication Materials — v1.4

**Mapping the Structural Divide: Institutional Resilience, Post-College Market Position, and Artificial Intelligence Exposure Across U.S. Higher Education**

Saunders, K. (2026). Working Paper v1.4, April 2026.

---

## Quick Start

1. Install Python dependencies: `pip install pandas numpy scipy`
2. Place `university_mapping_dataset.csv` (from the Data & Downloads tab) in the same directory as the script
3. Run: `python replicate.py`
4. Output prints to console: composite scores, quadrant assignments, correlation diagnostics, factor analysis, and sensitivity analysis

**Note on scope:** The replication script verifies all analytical results reported in the working paper — composite scores, quadrant assignments, pre-factor correlation diagnostics, factor analysis (KMO, Bartlett's, parallel analysis, promax rotation), the full sensitivity analysis, z-score robustness check, Carnegie tier stratification table, and AI exposure summary — starting from the prepared institutional dataset. The raw data sources used to construct that dataset are documented below for transparency and independent verification.

---

## Raw Data Sources

The following publicly available data sources were used to construct the institutional dataset. They are documented here so that any researcher can trace and verify the inputs.

### IPEDS (nces.ed.gov/ipeds/datacenter/DataFiles.aspx)

Download the "Data" CSV for each survey component. Select the most recent available year.

| File | Survey Component | Year |
|------|-----------------|------|
| `hd2024.csv` | Institutional Characteristics (HD) | 2024 |
| `f2324_f1a.csv` | Finance - Public (GASB, F1A) | 2023-24 |
| `f2324_f2.csv` | Finance - Private (FASB, F2) | 2023-24 |
| `effy2024.csv` | 12-Month Enrollment (EFFY) | 2024 |
| `effy2019.csv` | 12-Month Enrollment (EFFY) | 2019 |

### College Scorecard (collegescorecard.ed.gov/data)

| File | Description |
|------|-------------|
| `Most-Recent-Cohorts-Institutionedited.csv` | Institution-level data, most recent cohort. Download the full "Most Recent Institution-Level Data" file. Note: the original file may need column trimming to reduce size; keep columns starting with UNITID, INSTNM, CONTROL, ADM_RATE, UGDS, C150_4, MD_EARN_WNE, GRAD_DEBT, NPT4, PCIP, CURROPER, PREDDEG. |

### WICHE (wiche.edu/knocking)

| File | Description |
|------|-------------|
| `Knocking-at-the-College-Door-11th-Edition-Projections-Dataset-12-11-2024.xlsx` | 11th Edition projections dataset. Available from the Knocking at the College Door data download page. |

### O*NET (onetcenter.org/database.html)

Download the full O*NET database (v29.0 or later). We use two files:

| File | Description |
|------|-------------|
| `Work_Activities.xlsx` | Generalized Work Activities ratings by occupation |
| `Job_Zones.xlsx` | Job Zone assignments by occupation |

### NCES CIP-SOC Crosswalk (nces.ed.gov)

| File | Description |
|------|-------------|
| `CIP2020_SOC2018_Crosswalk.xlsx` | CIP 2020 to SOC 2018 crosswalk file |

### Anthropic Economic Index (huggingface.co/datasets/Anthropic/EconomicIndex)

| File | Description |
|------|-------------|
| `aei_enriched_claude_ai_2025-08-04_to_2025-08-11.csv` | Enriched conversation-level data with SOC occupation codes. Download from the Anthropic Economic Index HuggingFace repository. |

---

## Output Description

### university_mapping_dataset.csv

1,609 institutions (1,556 with both composite scores; 53 lack sufficient data). Fields:

**Identifiers:** UNITID, INSTNM, STABBR, CONTROL, C21BASIC, CARNEGIE_LABEL, IC2025, IC2025_NAME, RESEARCH2025, RESEARCH2025_NAME, CARNEGIE25_TIER, CARNEGIE25_TIER_CODE

**Raw Variables:** UGDS (enrollment)

**AI Exposure:** AI_EXPOSURE_BLENDED (entry-level weighted average), OBSERVED_AI_ADOPTION (from Anthropic Economic Index)

**Resilience Axis Components (percentile ranks, 0–1):**

| Column | Component |
|--------|-----------|
| `R_ENDOW` | Endowment per student |
| `R_REVDIV` | Revenue diversification (1 − tuition dependence) |
| `R_ENROLL` | Enrollment trajectory (5-year change) |
| `R_SELECT` | Selectivity (1 − admission rate) |

**Market Position Axis Components (percentile ranks, 0–1):**

| Column | Component |
|--------|-----------|
| `R_COMPLETION` | Completion rate (6-year graduation rate) |
| `L_EARNDEBT` | Earnings-to-debt ratio |
| `L_AIEXP` | AI exposure (inverted; low exposure = high score) |
| `L_DEMO` | Demographic trajectory (WICHE projected HS graduates, 2024–2030) |

**Composites:**

| Column | Description |
|--------|-------------|
| `RESILIENCE` | Mean of R_ENDOW, R_REVDIV, R_ENROLL, R_SELECT (0–1) |
| `LABOR_ALIGNMENT` | Mean of R_COMPLETION, L_EARNDEBT, L_AIEXP, L_DEMO (0–1) |
| `QUADRANT` | HC (High Capacity), SE (Structurally Exposed), MM (Market Misaligned), HS (High Stress) |
| `STABILITY_SCORE` | Number of 13 sensitivity specifications producing same quadrant |

Note: `LABOR_ALIGNMENT` is a legacy internal name for the Post-College Market Position axis, retained for code compatibility.

Note: `R_COMPLETION` uses the `R_` prefix because it is a percentile rank, but since v1.1 it belongs to the Market Position axis (not Resilience). The prefix convention predates the v1.1 restructure.

### cip2_ai_exposure.csv

AI exposure scores for each 2-digit CIP field, with entry-level weighted, raw, and blended scores plus the number of linked occupations.

---

## Framework Structure (8 components, 4 per axis)

**X-Axis: Institutional Resilience** (equal weight 0.25 each): Endowment per student, revenue diversification, enrollment trajectory, selectivity.

**Y-Axis: Post-College Market Position** (equal weight 0.25 each): Completion rate, earnings-to-debt ratio, AI exposure (inverted), demographic trajectory.

**Changes from v1.0:** Dropped raw median earnings (redundant with earnings-to-debt ratio at r = 0.80). Moved completion rate from Resilience to Market Position axis to resolve cross-axis loading identified in factor analysis (completion–earnings r = 0.734). Framework now uses 8 components (4+4) rather than the original 9 (5+4).

**Changes from v1.1:** Added formal factor analysis diagnostics (KMO, Bartlett's test, parallel analysis, communalities). Softened interpretive framing. Added composite index methodology literature. Reframed PSEO section as earnings context rather than external benchmarking. Documented missingness in factor analysis sample. See working paper Section 5.6.4 and Supplementary Appendix C.7 for complete diagnostics.

---

## Critical Methodological Notes

### GASB/FASB Finance Correction

Public institutions report under GASB accounting standards where IPEDS F1A form fields F1A01 and F1A05 are cumulative running totals, NOT individual revenue line items. Using them directly produces tuition dependence ratios near 1.0 for all publics (obviously wrong). The correct columns are F1D01 (total core revenues) and F1B01 (net tuition and fees). After correction, public mean tuition dependence is ~17%, private is ~78%.

### AI Exposure Pipeline

The AI exposure measure classifies O*NET Generalized Work Activities as AI-positive (routine cognitive: processing information, analyzing data, working with computers, documenting, evaluating compliance, administrative activities, getting information) or AI-negative (physical/interpersonal: physical activities, handling objects, operating vehicles, repairing equipment, inspecting, assisting others, public-facing work, coaching, training, controlling machines). Occupation-level scores are weighted by Job Zone for entry-level relevance (Zones 2-3 full weight, Zone 4 half, Zones 1 and 5 minimal), crosswalked to CIP degree fields, and aggregated to institutions by program mix.

### Equal Weighting

All components within each axis receive equal weight. This is a deliberate transparency choice, not an empirical claim, consistent with composite index methodology (Saisana & Saltelli, 2011; NORC, 2024). Factor analysis across eight specifications (raw/ranked variables × varimax/promax rotation × 2/3 factors) reveals that the data's natural structure is dominated by a single institutional hierarchy dimension (completion, earnings, endowment, selectivity) rather than the theorized two-axis structure. The forward-looking indicators the framework deliberately elevates — AI exposure, enrollment trend, demographic trajectory — are orthogonal to this hierarchy (uniqueness > 0.85 in factor analysis), meaning they capture variation that institutional quality and prestige alone cannot explain. Whether this orthogonal variation proves predictive of future institutional outcomes is the framework's central empirical bet.

Sampling adequacy diagnostics (v1.2): KMO = 0.63, Bartlett's χ²(28) = 1900.58, p < .001. Parallel analysis (Horn, 1965; 1,000 replications) confirms a three-factor solution. The moderate KMO is expected given deliberately orthogonal components; PCA is used for data reduction, not latent construct identification. See working paper Section 5.6 and Supplementary Appendix C for complete results.

---

## License

Dataset released under CC BY 4.0. Replication code released under MIT License.

## Citation

Saunders, K. (2026). "Mapping the Structural Divide: Institutional Resilience, Post-College Market Position, and Artificial Intelligence Exposure Across U.S. Higher Education." Working Paper v1.4, April 2026. Available at: https://kylesaunders.com/university-map

## Version History

| Version | Date | Changes |
|---------|------|---------|
| v1.0 | March 2026 | Initial release: 9 components (5 resilience, 4 market position), including median earnings and completion on resilience axis |
| v1.1 | March 2026 | Restructured to 8 components (4+4). Dropped median earnings. Moved completion rate to Market Position axis. Recomputed all composites, quadrants, factor analysis, and sensitivity. |
| v1.2 | March 2026 | Added factor analysis diagnostics (KMO, Bartlett's, parallel analysis, communalities). Softened interpretive framing. Added composite index methodology references. Reframed PSEO as earnings context. Documented missingness bias in factor analysis sample. |
| v1.2.1 | March 2026 | Fixed quadrant label swap in sensitivity analysis function. Clarified replication scope in README (verification from prepared dataset; raw-build pipeline forthcoming). |
| v1.3 | March 2026 | Full Carnegie 2025 integration (9-tier grouping from IC2025 + Research Activity Designation). Recomputed all tier-dependent statistics. Added 5 new sensitivity specifications: revenue diversification sponsored research discount (ICR and full HERD removal), endowment yield replacement, half-weight endowment, admission yield replacement. Total sensitivity specifications: 18. |
| v1.4 | April 2026 | Added z-score (probit transform) robustness check: 92.1% quadrant agreement, ρ > 0.98 on both axes. Updated replication script with z-score specification, Carnegie tier cross-tabulation, and AI exposure summary. Revised manuscript framing of AI exposure measure and factor analysis. |
