Justice Sector · Synthetic Data
Synthetic Justice
Data Test Packs
Structured synthetic legal matter packs for evaluating AI systems, testing court technology, and benchmarking digital justice infrastructure before deployment.
The Problem
Evaluating justice AI requires realistic legal data.
AI systems entering courts, tribunals, and legal aid services need to be tested against realistic, structured legal matter data — not synthetic datasets that look nothing like actual proceedings.
Real court records are protected. Fabricated data is too clean, too simple, and too coherent to reveal how systems behave under operational conditions — with real documentary noise, procedural variation, and the kinds of evidentiary gaps that characterise actual litigation.
Synthetic Justice Data Test Packs provide structured, jurisdiction-aware, procedurally varied matter packs designed to stress-test AI systems in conditions that approximate real-world deployment.
Product
What you receive
Documents
Complete matter packs
Each pack includes a full set of procedurally appropriate documents — from originating process and pleadings through to evidence schedules, draft orders, and judgment scaffolds — rendered in proper .docx format.
Structure
JusticeData-aligned schema
All matter metadata follows a modular schema aligned to JusticeData MVP — including case lifecycle, parties and representation, hearing events, evidence inventory, fees, and confidentiality markers.
Variation
Procedural diversity
Matters span seven procedural postures — commencement through appeal — with configurable representation mix, evidence quality, vulnerability flags, and noise profiles that simulate operational documentary conditions.
Artifacts
Hearing events & transcripts
Each matter pack optionally includes structured hearing event metadata, synthetic transcript scaffolds, and audio placeholder files — enabling testing of transcription and audio-processing pipelines.
Evidence
Evidence inventory
A structured evidence inventory and exhibit register per matter, with evidence dates that correctly extend before commencement — as in real proceedings.
QA
Batch index & provenance
Every batch includes a machine-readable index, spreadsheet export, and per-matter provenance log recording every profile applied — enabling systematic QA and regression testing.
Samples
Review before you enquire
Sample Originating Process
A rendered originating process from a commercial dispute matter pack — commencement bundle, Technology & Construction List, NSW Supreme Court.
Request sample ↓Sample Chronology
A chronology document demonstrating procedural event sequencing, noise-affected date notation, and conditional section rendering.
Request sample ↓Sample Case Metadata
A complete case_truth.json from a generated matter — including
JusticeData MVP fields, batch metadata, and provenance log.
Audience
Who uses test packs
Test packs are designed for technical teams and evaluators working at the intersection of AI and justice sector deployment.
- Court technology teams evaluating AI document processing
- Legal technology vendors preparing for tender or procurement
- Justice sector AI governance bodies conducting benchmarking
- Academic and research groups studying AI in court contexts
- Digital transformation teams building court system integrations
- Legal aid technology teams testing self-represented litigant tools
- Procurement officers assessing AI product claims
Offer
Available formats
Standard Pack
50-matter batch
Fifty synthetic matters across a single jurisdiction and procedural profile. Configurable edge-case percentage, representation mix, and evidence quality. Delivered as structured folders with full batch index.
Evaluation Pack
200-matter multi-profile batch
Two hundred matters spanning multiple procedural postures, representation types, and noise profiles. Suitable for systematic AI evaluation and benchmarking. Includes hearing artifacts and evidence inventories.
Custom Commission
Bespoke matter configuration
Tailored to your jurisdiction, court level, matter types, and evaluation objectives. Configurable at the schema level — specific vulnerability flags, procedural complexity profiles, and noise characteristics.
Enterprise Licence
Generator access + ongoing packs
Access to the underlying generator with configuration rights, schema documentation, and ongoing pack generation on demand. Suitable for vendors building AI products for the justice sector.
Use Cases
What test packs enable
AI document classification
Test whether a document classification model correctly identifies pleading type, procedural posture, party roles, and filing context across varied document sets.
Transcription pipeline testing
Evaluate transcription AI against structured hearing scaffolds with audio placeholders, attendance records, and procedural orders — aligned to realistic court settings.
Information extraction benchmarking
Assess extraction accuracy against known ground-truth values in
case_truth.json — testing party identification, date extraction,
claim value parsing, and relief interpretation.
Noise and anomaly handling
Test how AI systems respond to the kinds of documentary noise that appear in real matters — duplicate exhibits, inconsistent labels, padded witness statements, and partial chains of custody.
Representation and access equity
Evaluate AI behaviour across self-represented litigants, legal aid matters, interpreter-required proceedings, and matters with vulnerability and support flags.
Regression and load testing
Use seeded reproducible batches for regression testing — the same seed always produces identical matter configurations, enabling consistent comparison across model versions.
Delivery
Format and structure
All packs are delivered as structured folder hierarchies with machine-readable indices, document files, and metadata.
- Per-matter folders with .docx documents
case_truth.json— full matter schema and provenancebatch_index.jsonandbatch_index.csv— machine-readable indicesevidence/— inventory, exhibit register CSV, evidence schedulehearings/— transcript markdown and audio placeholder MP3metadata/provenance.json— full audit trail per matter- Delivered as ZIP archive or via OneDrive / secure transfer
- JusticeData-aligned schema documentation included
Differentiation
Why these packs are different
Procedurally grounded
Documents reflect actual procedural posture. An interlocutory bundle generates a notice of motion, supporting affidavit, and draft orders — not a generic document set.
Realistic noise, not random corruption
Noise profiles simulate real documentary conditions — duplicate exhibits, padded affidavits, inconsistent chronology notes — traceable and reproducible, not random mangling.
JusticeData-aligned schema
Metadata follows a structured, modular schema aligned to JusticeData MVP — enabling integration with broader justice data infrastructure.
Edge cases are first-class
Fourteen documented edge-case profiles with verification metadata — missing solicitor details, sparse pleadings, conflicting roles — each with expected system behaviour documented.
Access equity built in
Representation mix, vulnerability flags, and interpreter-required settings are configurable dimensions — not an afterthought.
Audit trail per matter
Every matter carries a provenance log recording the seed, profiles, bundles, and validation results — enabling full reproducibility and systematic regression testing.
Enquiries
Request a sample or discuss a commission
Test packs are available by commission. Contact us to discuss your evaluation objectives, jurisdiction, and volume requirements.