Synthetic Justice Data Test Packs

Justice Sector · Synthetic Data

Synthetic Justice
Data Test Packs

Structured synthetic legal matter packs for evaluating AI systems, testing court technology, and benchmarking digital justice infrastructure before deployment.

Classical bust — jurisprudence

The Problem

Evaluating justice AI requires realistic legal data.

AI systems entering courts, tribunals, and legal aid services need to be tested against realistic, structured legal matter data — not synthetic datasets that look nothing like actual proceedings.

Real court records are protected. Fabricated data is too clean, too simple, and too coherent to reveal how systems behave under operational conditions — with real documentary noise, procedural variation, and the kinds of evidentiary gaps that characterise actual litigation.

Synthetic Justice Data Test Packs provide structured, jurisdiction-aware, procedurally varied matter packs designed to stress-test AI systems in conditions that approximate real-world deployment.

Product

What you receive

Documents

Complete matter packs

Each pack includes a full set of procedurally appropriate documents — from originating process and pleadings through to evidence schedules, draft orders, and judgment scaffolds — rendered in proper .docx format.

Structure

JusticeData-aligned schema

All matter metadata follows a modular schema aligned to JusticeData MVP — including case lifecycle, parties and representation, hearing events, evidence inventory, fees, and confidentiality markers.

Variation

Procedural diversity

Matters span seven procedural postures — commencement through appeal — with configurable representation mix, evidence quality, vulnerability flags, and noise profiles that simulate operational documentary conditions.

Artifacts

Hearing events & transcripts

Each matter pack optionally includes structured hearing event metadata, synthetic transcript scaffolds, and audio placeholder files — enabling testing of transcription and audio-processing pipelines.

Evidence

Evidence inventory

A structured evidence inventory and exhibit register per matter, with evidence dates that correctly extend before commencement — as in real proceedings.

QA

Batch index & provenance

Every batch includes a machine-readable index, spreadsheet export, and per-matter provenance log recording every profile applied — enabling systematic QA and regression testing.

Samples

Review before you enquire

.docx · NSW Supreme Court

Sample Originating Process

A rendered originating process from a commercial dispute matter pack — commencement bundle, Technology & Construction List, NSW Supreme Court.

Request sample ↓

.docx · Chronology document

Sample Chronology

A chronology document demonstrating procedural event sequencing, noise-affected date notation, and conditional section rendering.

Request sample ↓

.json · Case metadata

Sample Case Metadata

A complete case_truth.json from a generated matter — including JusticeData MVP fields, batch metadata, and provenance log.

Request sample ↓

Audience

Who uses test packs

Test packs are designed for technical teams and evaluators working at the intersection of AI and justice sector deployment.

  • Court technology teams evaluating AI document processing
  • Legal technology vendors preparing for tender or procurement
  • Justice sector AI governance bodies conducting benchmarking
  • Academic and research groups studying AI in court contexts
  • Digital transformation teams building court system integrations
  • Legal aid technology teams testing self-represented litigant tools
  • Procurement officers assessing AI product claims

Offer

Available formats

Standard Pack

50-matter batch

Fifty synthetic matters across a single jurisdiction and procedural profile. Configurable edge-case percentage, representation mix, and evidence quality. Delivered as structured folders with full batch index.

Evaluation Pack

200-matter multi-profile batch

Two hundred matters spanning multiple procedural postures, representation types, and noise profiles. Suitable for systematic AI evaluation and benchmarking. Includes hearing artifacts and evidence inventories.

Custom Commission

Bespoke matter configuration

Tailored to your jurisdiction, court level, matter types, and evaluation objectives. Configurable at the schema level — specific vulnerability flags, procedural complexity profiles, and noise characteristics.

Enterprise Licence

Generator access + ongoing packs

Access to the underlying generator with configuration rights, schema documentation, and ongoing pack generation on demand. Suitable for vendors building AI products for the justice sector.

Use Cases

What test packs enable

AI document classification

Test whether a document classification model correctly identifies pleading type, procedural posture, party roles, and filing context across varied document sets.

Transcription pipeline testing

Evaluate transcription AI against structured hearing scaffolds with audio placeholders, attendance records, and procedural orders — aligned to realistic court settings.

Information extraction benchmarking

Assess extraction accuracy against known ground-truth values in case_truth.json — testing party identification, date extraction, claim value parsing, and relief interpretation.

Noise and anomaly handling

Test how AI systems respond to the kinds of documentary noise that appear in real matters — duplicate exhibits, inconsistent labels, padded witness statements, and partial chains of custody.

Representation and access equity

Evaluate AI behaviour across self-represented litigants, legal aid matters, interpreter-required proceedings, and matters with vulnerability and support flags.

Regression and load testing

Use seeded reproducible batches for regression testing — the same seed always produces identical matter configurations, enabling consistent comparison across model versions.

Delivery

Format and structure

All packs are delivered as structured folder hierarchies with machine-readable indices, document files, and metadata.

  • Per-matter folders with .docx documents
  • case_truth.json — full matter schema and provenance
  • batch_index.json and batch_index.csv — machine-readable indices
  • evidence/ — inventory, exhibit register CSV, evidence schedule
  • hearings/ — transcript markdown and audio placeholder MP3
  • metadata/provenance.json — full audit trail per matter
  • Delivered as ZIP archive or via OneDrive / secure transfer
  • JusticeData-aligned schema documentation included

Differentiation

Why these packs are different

Procedurally grounded

Documents reflect actual procedural posture. An interlocutory bundle generates a notice of motion, supporting affidavit, and draft orders — not a generic document set.

Realistic noise, not random corruption

Noise profiles simulate real documentary conditions — duplicate exhibits, padded affidavits, inconsistent chronology notes — traceable and reproducible, not random mangling.

JusticeData-aligned schema

Metadata follows a structured, modular schema aligned to JusticeData MVP — enabling integration with broader justice data infrastructure.

Edge cases are first-class

Fourteen documented edge-case profiles with verification metadata — missing solicitor details, sparse pleadings, conflicting roles — each with expected system behaviour documented.

Access equity built in

Representation mix, vulnerability flags, and interpreter-required settings are configurable dimensions — not an afterthought.

Audit trail per matter

Every matter carries a provenance log recording the seed, profiles, bundles, and validation results — enabling full reproducibility and systematic regression testing.

Enquiries

Request a sample or discuss a commission

Test packs are available by commission. Contact us to discuss your evaluation objectives, jurisdiction, and volume requirements.