Tya v0.23 Specification

This document is the specification for Tya v0.23 after v0.22 unit testing.

Theme

Tya v0.23 expands the standard library with data-format and small utility modules, completes the deferred filesystem stdlib expansion, and adds three character-level primitives that enable data-format work in Tya.

v0.23 introduces five pure-Tya modules — toml, json, csv, base64, and url — covering human-edited configuration, machine data interchange, tabular data, byte-to-text encoding, and URL handling. v0.23 also ships the filesystem expansion that was deferred from v0.22 (dir, file.remove / file.rename / file.stat, path.expand_user, os.cwd / os.chdir).

Three small global built-ins are added to enable pure-Tya data-format implementations: ord, chr, and kind. These are character-level and type-introspection primitives, not data-format functions; they unblock parser and emitter work in Tya itself.

Goals

partial parsing.

needed by pure-Tya data-format code.

Included in v0.23

v0.23 includes all v0.22 behavior and adds:

url.encode_query, url.decode_query)

Not Included in v0.23

v0.23 does not include:

Importing

All five modules are standard attached-library modules. They are not available without import.

import toml
import json
import csv
import base64
import url

The standard module search behavior from v0.17 applies.

toml

The toml module reads and writes TOML 1.0 documents.

Functions

Data model

| TOML kind | Tya kind | |---|---| | string | string | | integer | int | | float | float | | boolean | bool | | array | array | | inline table | dict | | table | dict | | array of tables | array of dicts | | offset date-time | string (as written, RFC 3339) | | local date-time | string (as written) | | local date | string (as written) | | local time | string (as written) |

The top-level result of toml.parse(text) is always a dict.

Example

import toml

text = "
title = \"Example\"
[server]
host = \"127.0.0.1\"
port = 8080
"

config = toml.parse(text)
println config["title"]
println config["server"]["port"]

println toml.dump(config)

Subset

v0.23 supports the following TOML 1.0 features: comments, bare and quoted keys, dotted keys, basic and multi-line basic strings, literal and multi-line literal strings, integers (decimal/hex/oct/bin), floats including inf and nan, booleans, homogeneous and heterogeneous arrays, inline tables, [table] headers, [[array.of.tables]] headers, and date/time scalars (returned as strings).

Errors

toml.parse(text) raises a structured error on syntax errors with a line number.

toml.dump(value) requires a dict at the top level. It raises a structured error when the value contains nil, a function, class, object, module, or a dict with a non-string key.

json

The json module reads and writes JSON documents (RFC 8259).

Functions

Data model

| JSON kind | Tya kind | |---|---| | object | dict | | array | array | | string | string | | number (integer) | int | | number (fractional or exponent) | float | | true / false | bool | | null | nil |

Example

import json

text = "{\"name\": \"tya\", \"versions\": [0.22, 0.23]}"
data = json.parse(text)
println data["name"]
println data["versions"][1]

println json.dump(data)
println json.dump(data, 2)

Behavior

json.parse(text) parses a complete JSON document. The top-level value may be any JSON value (object, array, string, number, boolean, or null).

json.dump(value) emits compact JSON (no insignificant whitespace).

json.dump(value, indent) emits pretty-printed JSON with indent spaces of indentation per nesting level. indent must be a non-negative integer.

Errors

json.parse(text) raises a structured error on syntax errors with a line number and a short message.

json.dump(value) raises a structured error when the value contains a function, class, object, module, a dict with a non-string key, or a float that is nan, inf, or -inf (JSON does not represent these).

csv

The csv module reads and writes CSV documents (RFC 4180).

Functions

Data model

CSV is row-oriented. By default csv.parse(text) returns an array of arrays of strings, one inner array per row. With the header: true option, it returns an array of dicts using the first row as keys.

csv.dump(rows) accepts:

and their union determines the header row)

Every CSV value is emitted and parsed as a string. Numeric or boolean inference is not performed in v0.23.

Options

options is a dict with the following keys; all are optional:

TSV.

true on dump, emit the dict keys as the first row.

Example

import csv

text = "name,age\ntya,1\nzig,12\n"
rows = csv.parse(text, { header: true })
for row in rows
  println "{row[\"name\"]}: {row[\"age\"]}"

raw = [["a", "b"], ["1", "2"]]
println csv.dump(raw)

Behavior

csv.parse(text) accepts CRLF or LF line endings.

csv.dump(rows) emits LF line endings and quotes fields that contain the separator, double-quote, CR, or LF. Double-quotes inside quoted fields are doubled per RFC 4180.

Errors

csv.parse(text) raises a structured error on unterminated quoted fields or malformed escapes.

csv.dump(rows) raises a structured error when rows are not a uniform structure (mixed array-of-arrays and array-of-dicts), when a row contains non-string values, or when dict rows have inconsistent keys.

base64

The base64 module encodes and decodes Base64 (RFC 4648).

Functions

Behavior

base64.encode(text) encodes a Tya string (UTF-8 bytes) and returns the Base64 representation as a string. Output uses the standard alphabet with = padding.

base64.decode(text) decodes a Base64 string and returns a Tya string. It accepts standard alphabet input with or without padding. Whitespace inside the input is ignored.

Example

import base64

encoded = base64.encode("hello")
println encoded                # aGVsbG8=

println base64.decode(encoded) # hello

Errors

base64.encode(text) requires a string argument.

base64.decode(text) raises a structured error on characters outside the standard alphabet, or on input that does not decode cleanly. The decoded result is interpreted as a UTF-8 string; non-UTF-8 byte sequences raise an error.

Out of scope

URL-safe Base64 (- and _ instead of + and /) is not part of v0.23. Binary blob handling is not part of v0.23 because Tya has no byte-array type yet.

url

The url module performs percent-encoding, percent-decoding, and basic URL parsing and construction.

Functions

Encoding helpers

url.encode(text) percent-encodes a string for safe inclusion in a URL component. Characters that are unreserved per RFC 3986 (letters, digits, -, ., _, ~) are passed through; all other bytes are encoded as %XX.

url.decode(text) reverses percent-encoding. It decodes %XX to bytes and returns the result interpreted as a UTF-8 string. Plus signs are not treated as spaces; for that, use url.decode_query.

Query helpers

url.encode_query(pairs) accepts either:

keys).

It emits key=value&key=value form, percent-encoding each key and value using the same rules as url.encode. Spaces are encoded as %20 (not +).

url.decode_query(text) parses a key=value&key=value string. It returns an array of [key, value] two-element arrays so that order and duplicate keys are preserved. Both + and %20 decode to a space, matching common practice.

Parsing and building

url.parse(text) returns a dict with the following string members:

Members that are absent in the input are returned as the empty string. The query is returned as the raw string; pass it to url.decode_query to get key/value pairs.

url.build(parts) is the inverse of url.parse. It accepts a dict with the same keys (any may be omitted) and returns a URL string.

Example

import url

println url.encode("hello world")            # hello%20world
println url.decode("hello%20world")          # hello world

q = url.encode_query({ q: "tya lang", page: "2" })
println q                                    # q=tya%20lang&page=2

parts = url.parse("https://example.com:8080/path?x=1#frag")
println parts["host"]                        # example.com
println parts["port"]                        # 8080

rebuilt = url.build({
  scheme: "https",
  host: "example.com",
  path: "/search",
  query: "q=tya",
})
println rebuilt                              # https://example.com/search?q=tya

Errors

All url.* functions raise a structured error on wrong argument kinds. url.decode and url.decode_query raise a structured error on malformed percent-escapes or non-UTF-8 byte sequences.

New Built-ins

v0.23 adds three small global built-ins that data-format code in Tya needs:

argument string. Empty input is an error.

Out-of-range values are an error.

"string", "array", "dict", "object", "function", "error".

ord and chr are byte-level. They are not full Unicode primitives. Multi-byte sequences must be handled by the caller (for example by reading raw UTF-8 bytes from a string).

kind distinguishes integer and float numbers based on whether the numeric value is exactly representable as an integer.

Short-circuit Evaluation

v0.23 changes and and or to short-circuit. The right operand is evaluated only when the left operand does not already determine the result. This matches the conventional behavior of Boolean operators in mainstream languages and is required to write parser-style Tya code that guards index accesses with i < n and is_valid(s[i]).

Filesystem Modules

v0.23 ships the filesystem stdlib expansion deferred from v0.22. See docs/STDLIB.md for usage details. Native operation failures raise structured errors so they can be caught with block try ... catch.

dictionary order, excluding . and ...

home directory.

Diagnostics

v0.23 implementations should report source-oriented errors for:

(with line number where applicable)

url.build

Diagnostics should mention the module name, function name, expected argument shape, and actual value kind when available.