Synthetic Justice Data Test Packs

Justice Sector · Synthetic Data

Synthetic Justice
Data Test Packs

Structured synthetic legal matter packs for evaluating AI systems, testing court technology, and benchmarking digital justice infrastructure before deployment.

View offer Sample downloads

The Problem

Evaluating justice AI requires realistic legal data.

AI systems entering courts, tribunals, and legal aid services need to be tested against realistic, structured legal matter data — not synthetic datasets that look nothing like actual proceedings.

Real court records are protected. Fabricated data is too clean, too simple, and too coherent to reveal how systems behave under operational conditions — with real documentary noise, procedural variation, and the kinds of evidentiary gaps that characterise actual litigation.

Synthetic Justice Data Test Packs provide structured, jurisdiction-aware, procedurally varied matter packs designed to stress-test AI systems in conditions that approximate real-world deployment.

Product

What you receive

Documents

Complete matter packs

Each pack includes a full set of procedurally appropriate documents — from originating process and pleadings through to evidence schedules, draft orders, and judgment scaffolds — rendered in proper .docx format.

Structure

JusticeData-aligned schema

All matter metadata follows a modular schema aligned to JusticeData MVP — including case lifecycle, parties and representation, hearing events, evidence inventory, fees, and confidentiality markers.

Variation

Procedural diversity

Matters span seven procedural postures — commencement through appeal — with configurable representation mix, evidence quality, vulnerability flags, and noise profiles that simulate operational documentary conditions.

Artifacts

Hearing events & transcripts

Each matter pack optionally includes structured hearing event metadata, synthetic transcript scaffolds, and audio placeholder files — enabling testing of transcription and audio-processing pipelines.

Evidence

Evidence inventory

A structured evidence inventory and exhibit register per matter, with evidence dates that correctly extend before commencement — as in real proceedings.

Batch index & provenance

Every batch includes a machine-readable index, spreadsheet export, and per-matter provenance log recording every profile applied — enabling systematic QA and regression testing.

Samples

Review before you enquire

.docx · NSW Supreme Court

Sample Originating Process

A rendered originating process from a commercial dispute matter pack — commencement bundle, Technology & Construction List, NSW Supreme Court.

Request sample ↓

.docx · Chronology document

Sample Chronology

A chronology document demonstrating procedural event sequencing, noise-affected date notation, and conditional section rendering.

Request sample ↓

.json · Case metadata

Sample Case Metadata

A complete case_truth.json from a generated matter — including JusticeData MVP fields, batch metadata, and provenance log.

Request sample ↓

Audience

Who uses test packs

Test packs are designed for technical teams and evaluators working at the intersection of AI and justice sector deployment.

Court technology teams evaluating AI document processing
Legal technology vendors preparing for tender or procurement
Justice sector AI governance bodies conducting benchmarking
Academic and research groups studying AI in court contexts
Digital transformation teams building court system integrations
Legal aid technology teams testing self-represented litigant tools
Procurement officers assessing AI product claims

Offer

Available formats

Standard Pack

50-matter batch

Fifty synthetic matters across a single jurisdiction and procedural profile. Configurable edge-case percentage, representation mix, and evidence quality. Delivered as structured folders with full batch index.

Evaluation Pack

200-matter multi-profile batch

Two hundred matters spanning multiple procedural postures, representation types, and noise profiles. Suitable for systematic AI evaluation and benchmarking. Includes hearing artifacts and evidence inventories.

Custom Commission

Bespoke matter configuration

Tailored to your jurisdiction, court level, matter types, and evaluation objectives. Configurable at the schema level — specific vulnerability flags, procedural complexity profiles, and noise characteristics.

Enterprise Licence

Generator access + ongoing packs

Access to the underlying generator with configuration rights, schema documentation, and ongoing pack generation on demand. Suitable for vendors building AI products for the justice sector.

Use Cases

What test packs enable

AI document classification

Test whether a document classification model correctly identifies pleading type, procedural posture, party roles, and filing context across varied document sets.

Transcription pipeline testing

Evaluate transcription AI against structured hearing scaffolds with audio placeholders, attendance records, and procedural orders — aligned to realistic court settings.

Information extraction benchmarking

Assess extraction accuracy against known ground-truth values in case_truth.json — testing party identification, date extraction, claim value parsing, and relief interpretation.

Noise and anomaly handling

Test how AI systems respond to the kinds of documentary noise that appear in real matters — duplicate exhibits, inconsistent labels, padded witness statements, and partial chains of custody.

Representation and access equity

Evaluate AI behaviour across self-represented litigants, legal aid matters, interpreter-required proceedings, and matters with vulnerability and support flags.

Regression and load testing

Use seeded reproducible batches for regression testing — the same seed always produces identical matter configurations, enabling consistent comparison across model versions.

Delivery

Format and structure

All packs are delivered as structured folder hierarchies with machine-readable indices, document files, and metadata.

Per-matter folders with .docx documents
case_truth.json — full matter schema and provenance
batch_index.json and batch_index.csv — machine-readable indices
evidence/ — inventory, exhibit register CSV, evidence schedule
hearings/ — transcript markdown and audio placeholder MP3
metadata/provenance.json — full audit trail per matter
Delivered as ZIP archive or via OneDrive / secure transfer
JusticeData-aligned schema documentation included

Differentiation

Why these packs are different

Procedurally grounded

Documents reflect actual procedural posture. An interlocutory bundle generates a notice of motion, supporting affidavit, and draft orders — not a generic document set.

Realistic noise, not random corruption

Noise profiles simulate real documentary conditions — duplicate exhibits, padded affidavits, inconsistent chronology notes — traceable and reproducible, not random mangling.

JusticeData-aligned schema

Metadata follows a structured, modular schema aligned to JusticeData MVP — enabling integration with broader justice data infrastructure.

Edge cases are first-class

Fourteen documented edge-case profiles with verification metadata — missing solicitor details, sparse pleadings, conflicting roles — each with expected system behaviour documented.

Access equity built in

Representation mix, vulnerability flags, and interpreter-required settings are configurable dimensions — not an afterthought.

Audit trail per matter

Every matter carries a provenance log recording the seed, profiles, bundles, and validation results — enabling full reproducibility and systematic regression testing.

Enquiries

Request a sample or discuss a commission

Test packs are available by commission. Contact us to discuss your evaluation objectives, jurisdiction, and volume requirements.

Visit nicolaspatrick.me Send enquiry

Synthetic JusticeData Test Packs

Evaluating justice AI requires realistic legal data.

What you receive

Complete matter packs

JusticeData-aligned schema

Procedural diversity

Hearing events & transcripts

Evidence inventory

Batch index & provenance

Review before you enquire

Sample Originating Process

Sample Chronology

Sample Case Metadata

Who uses test packs

Available formats

50-matter batch

200-matter multi-profile batch

Bespoke matter configuration

Generator access + ongoing packs

What test packs enable

AI document classification

Transcription pipeline testing

Information extraction benchmarking

Noise and anomaly handling

Representation and access equity

Regression and load testing

Format and structure

Why these packs are different

Procedurally grounded

Realistic noise, not random corruption

JusticeData-aligned schema

Edge cases are first-class

Access equity built in

Audit trail per matter

Request a sample or discuss a commission

Synthetic Justice
Data Test Packs