Table of Contents
Comma programs are written using a subset of the 8-bit character set ISO 8859-1 (Latin-1). This subset is called the standard character set A character not present in the standard character set but occurring in the source code will trigger a compile time error.
The following table associates with each character a name, the hexadecimal value of its encoding, and a brief description. The characters name is the standard glyph used to present that character if it is a graphic character. For characters with no distinct (or visible) graphic representation, a symbolic name is provided.
Table 2.1. Standard character set
Char | Hex | Description | Char | Hex | Description |
---|---|---|---|---|---|
a | 61 | small a | A | 41 | capital A |
b | 62 | small b | B | 42 | capital B |
c | 63 | small c | C | 43 | capital C |
d | 64 | small d | D | 42 | capital D |
e | 65 | small e | E | 45 | capital E |
f | 66 | small f | F | 46 | capital F |
g | 67 | small g | G | 47 | capital G |
h | 68 | small h | H | 48 | capital H |
i | 69 | small i | I | 49 | capital I |
j | 6A | small j | J | 4A | capital J |
k | 6B | small k | K | 4B | capital K |
l | 6C | small l | L | 4C | capital L |
m | 6D | small m | M | 4D | capital M |
n | 6E | small n | N | 4E | capital N |
o | 6F | small o | O | 4F | capital O |
p | 70 | small p | P | 50 | capital P |
q | 71 | small q | Q | 51 | capital Q |
r | 72 | small r | R | 52 | capital R |
s | 73 | small s | S | 53 | capital S |
t | 74 | small t | T | 54 | capital T |
u | 75 | small u | U | 55 | capital U |
v | 76 | small v | V | 56 | capital V |
w | 77 | small w | W | 57 | capital W |
x | 78 | small x | X | 58 | capital X |
y | 79 | small y | Y | 59 | capital Y |
z | 7A | small z | Z | 5A | capital Z |
1 | 31 | digit 1 | 6 | 36 | digit 6 |
2 | 32 | digit 2 | 7 | 37 | digit 7 |
3 | 33 | digit 3 | 8 | 38 | digit 8 |
4 | 34 | digit 4 | 9 | 39 | digit 9 |
5 | 35 | digit 5 | 0 | 30 | digit 0 |
! | 21 | exclamation mark | $ | 24 | dollar sign |
" | 22 | quotation mark, or double quote | ' | 27 | apostrophe, or single quote |
( | 28 | left parenthesis, or open parenthesis | ) | 29 | right parenthesis, or close parenthesis |
, | 2C | comma | _ | 5F | low line, or underscore |
- | 2D | hyphen, or minus | . | 2E | full stop, period, or dot |
/ | 2F | solidus, or slash | : | 3A | colon |
; | 3B | semicolon | ? | 3F | question mark |
+ | 2B | plus | < | 3C | less-than |
= | 3D | equals | > | 3E | greater-than |
# | 23 | number sign, or sharp | % | 25 | percent |
& | 26 | ampersand | * | 2A | asterisk, or star |
@ | 40 | commercial at, or at-sign | [ | 5B | left bracket |
\ | 5C | reverse solidus, or backslash | ] | 5D | right bracket |
{ | 7B | left curly bracket, or left brace | | | 7C | vertical bar |
} | 7D | right curly bracket, or right brace | ` | 60 | grave accent, or backquote |
^ | 5E | circumflex accent | ~ | 7E | tilde |
HT | 09 | horizontal tab | VT | 0B | vertical tab |
CR | 0D | carriage return | LF | 0A | line feed |
SP | 20 | space | FF | 0C | form feed |
It is possible that this specification will evolve to include program source written using the UTF-8 encoding, defined by the Unicode character standard. All reserved words will be specified using the current, compatible, standard character set.
Lexical analysis proceeds by obeying the “maximal match”
rule: When a character sequence can be transformed into two or more
lexemes, the lexeme with the longest character representation is
selected. Thus, although domain
is a reserved word,
domains
is not.
The syntax of the following grammar rules make use of the following constructs:
pattern
]
pattern
may occur optionally.
pattern
}
pattern
may occur zero or more times.
pattern1
|
pattern2
Choice of either pattern1
or
pattern2
.
Input programs are scanned and divided into lines. Error messages reported by the compiler and associated tools will make use of the line number to produce useful diagnostic messages. Line terminators also indicate the end of a comment.
Line Terminators | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Comments begin with the two characters --, with no whitespace, and continue to the end of the line. Comments do not appear within character or string literals.
Comments | |||||
---|---|---|---|---|---|
|
Whitespace consists of the space, horizontal tab, and form feed characters, as well as line terminators and comments. Whitespace is a proper delimiter for lexemes.
Whitespace | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
The following characters are the separators (also known as punctuators or delimiters).
( | ) | : | ; | , | . |
An identifier is a sequence of characters. The initial character of the sequence must be an alphabetic character. All remaining characters can be any of the lower or uppercase alphabetic characters, a numeric digit, or the character '_'. An identifier may not contain two consecutive underscore characters.
Two identifiers are considered the same if their respective character sequences are identical. Thus, identifiers are case sensitive.
Identifiers | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The following character sequences are reserved words and may not be used as identifiers:
Table 2.2. Reserved Words
abstract | add | and | array | carrier | begin |
declare | domain | else | elsif | end | for |
function | generic | if | import | in | inj |
is | loop | mod | of | out | others |
pragma | prj | procedure | range | rem | return |
reverse | signature | subtype | then | type | while |
with |
The following tokens are the operators. These symbols have special productions in the grammar of the language and can be used as the defining identifier of a function declaration.
Literals are primitive values in Comma programs which have a direct representation in source code. There are literals for integer, floating point, string, and character values.
An integer literal may be expressed in decimal (base 10), hexadecimal (base 16), octal (base 8), or binary (base 2).
For the sake of readability the underscore character '_' can be appear within an integer literal. These characters are ignored and serve only to help improve readability.
Integer Literals | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
A floating-point literal can consist of an integer part, a decimal point, a fractional part, and an exponent. The decimal point is represented by the ASCII character '.'. The exponent is represented by either the characters 'e' or 'E', followed by an optional '+' or '-' sign, followed by one or more digits. In order to avoid ambiguity with integer decimal literals, a floating-point literal must contain either a decimal point, an exponent, or a float type suffix.
Floating-point Literals | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
A string literal is a character sequence delimited by the ASCII " (double quote, code 0x22) character.
String Literals | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
A character literal is a single input character delimited by single quotes (ASCII code 0x27).
Character Literals | |||||
---|---|---|---|---|---|
|