Rassada Pier schedule labeling dataset

1) Purpose
This folder contains schedule datasets exported from the PHP $schedules array.
These files standardize time, destination parsing, and boat type naming for reuse in pages, APIs, and AI retrieval.

2) Files
1) schedule.json
2) schedule.jsonl
3) README.txt

3) Record schema (one trip)
Fields
1) id
2) origin
3) timezone
4) destination_text
5) destination_parts
6) departure_time
7) arrival_time
8) duration_min
9) boat_type
10) class_info
11) tags
12) source

Definitions
1) id: stable row identifier, rassada_schedule_0001 format
2) origin: "Rassada Pier"
3) timezone: "Asia/Bangkok"
4) destination_text: human readable destination string
5) destination_parts: destination list split from destination_text when multiple stops exist
6) departure_time: 24 hour HH:MM
7) arrival_time: 24 hour HH:MM
8) duration_min: integer minutes from departure_time to arrival_time
9) boat_type: "ferry boat" or "speedboat" only
10) class_info: label for display or filtering, for example "Standard Class" or "Premium Class"
11) tags: list of tags, for example ["fastest_option"]
12) source: origin pointer, { "file": "index_schedule.php", "row_index": N }

4) Normalization rules (locked)
1) Time format is HH:MM and uses 24 hour time.
2) Type mapping
- Ferry becomes "ferry boat"
- Speedboat becomes "speedboat"
3) Destination split
- Any HTML <br> is treated as a separator and converted into destination_parts items.
- destination_text is the comma separated readable form.
4) Fastest tag
- If the PHP row contains fastest true, tags includes "fastest_option".

5) Quality checks
1) departure_time and arrival_time match regex ^[0-2][0-9]:[0-5][0-9]$
2) duration_min is greater than 0
3) boat_type is exactly "ferry boat" or "speedboat"
4) destination_parts length is at least 1
5) Each id is unique

6) How to validate on Ubuntu
Commands
1) jq . schedule.json >/dev/null && echo "schedule.json OK"
2) wc -l schedule.jsonl
3) head -n 3 schedule.jsonl