Data Dictionary
This is the canonical reference for field names, types, and normalization rules across EDORA datasets. Use it to align dashboards, briefs, and exports so numbers remain consistent and comparable over time.
Core Schemas
- County-Year Panel: one row per county and year, normalized to ages 12–17 where applicable. Includes education exclusion, detention/diversion rates, SUD/MH proxies, foster care, housing proxies, violent incidents, and composite YARI+.
- State JJ Pipeline: narrative fields and structured keys summarizing intake screening, detention criteria, adjudication/disposition, commitment/placement, reentry/aftercare, authority, statutes, data handoffs, and sources.
- Provider Directory: programs and coverage, referral pathways, eligibility, contact info, and citations.
- Program Outcomes: minimal “what works/counterexamples” schema with initiation, completion, improvement, follow-up outcomes, and study design notes.
Field Conventions
- snake_case for field names (e.g.,
county_fips
,detention_rate
). - Rates per 1,000 unless stated; specify denominator in field description.
- Booleans as
true
/false
; enums as lowercase strings. - Dates as ISO
YYYY-MM-DD
; years as integers. - Missing as
null
; small-n pooled values flagged withpooled_small_n=true
.
County-Year Panel (Minimal)
Field | Type | Description |
---|---|---|
state | string | Display name (e.g., “Arkansas (AR)”). |
county_fips | string | 5-digit FIPS code. |
county_name | string | County name. |
year | number | Calendar year. |
edu_exclusion_rate | number | Suspensions/expulsions per 1,000 students. |
detention_rate | number | Juvenile detention admissions per 1,000 youth (12–17). |
diversion_rate | number | Diversions per 1,000 youth (12–17). |
sudmh_need_rate | number | Proxy for behavioral health need (per 1,000). |
foster_entries_per_1k | number | Foster care entries per 1,000 youth (12–17). |
homelessness_proxy_rate | number | McKinney–Vento or equivalent proxy, per 1,000. |
violent_incidents_per_1k | number | Proxy for violent incidents, per 1,000. |
yari_score | number | Composite risk index (0–1 recommended). |
pooled_small_n | boolean | True if values pooled across years due to small counts. |
notes | string | Context about pooling/series breaks. |
State JJ Pipeline (Keys)
intake_screening
,detention_criteria
,adjudication_disposition
,commitment_placement
,reentry_aftercare
authority_structure
,statutes_policies
,data_handoff_gaps
,sources[]
,last_updated
Provider Directory (Keys)
state
,name
,program_type
,coverage
,referral_path
eligibility
,process_notes
,contact
(url/phone/email),sources[]
Program Outcomes (Keys)
program_name
,state
,domain
,cohort_n
,start_year
,end_year
outcome_metric
,baseline_value
,post_value
,followup_days
,effect_size
design
(RCT/QED/PrePost),caveats
Normalization & Pooling
- Rates normalized to population ages 12–17 unless noted; school metrics to enrolled students.
- For small counts, pool across 2–3 years and set
pooled_small_n=true
; annotatenotes
. - Mark breaks in series when definitions or systems change; do not stitch incompatible series.
Versioning & Provenance
- Use semantic versions on exports:
county_panel.v1.tsv
,providers.v1.tsv
. - Record dataset vintage in briefs and dashboards (e.g., “2015–2025; pulled 2025-08-15”).
- Per-page citations live on topic pages; see Sources.
Downloads & Appendix
- TSV examples live at Data Appendix.
- For visualization specs and column mappings, see Methods.
Transparency note: Where values are pooled or proxied, we flag them and explain the implications for interpretation.