Ocean Programming Language Definition

Lawrence A. Crowl

Department of Computer Science
Oregon State University
Corvallis, Oregon  97331-3202

Technical Report 94-60-08 (Revised)

December 1995

Abstract

Ocean is a simple imperative programming language with the flavor of the C programming language. This report describes the Ocean programming language, particularly as it compares with the C programming language. The Ocean programming language has two goals:
  1. to demonstrate by example that a language can be simpler and safer than the C programming language while still preserving the flavor and expressiveness of C; and
  2. to require a compiler that is simple enough for students to implement in a single course while still touching on most compilation issues.

Contents

  1. Introduction
  2. Lexical Structure
  3. Declarations and Definitions
    1. General Form
    2. Externals and Forwards
    3. Variables
    4. Manifest Constants
    5. Members and Parameters
    6. Functions
    7. In-Line Functions
    8. Type Definitions
    9. Type Specifiers
  4. Types
    1. Void
    2. Boolean
    3. Character
    4. Word
    5. Integer
    6. Floating-Point
    7. Pointer
    8. Array
    9. Structure and Union
    10. Functions
  5. Expressions
    1. Operations
      1. Lvalue
      2. Primary
      3. Unary
      4. Power
      5. Product
      6. Sum
      7. Relation
      8. Conjunction
      9. Disjunction
      10. Assignment
    2. Predefined Functions
  6. Statements
    1. Empty
    2. Expression
    3. Compound
    4. If
    5. Switch
    6. While
    7. Do-While
    8. For-Loop
    9. Return
    10. Goto
  7. Grammar
    1. Tokens
    2. Grammar

Introduction

Ocean is intended as an example of a language with the conciseness and expressiveness of the C programming language, but without the syntactic or semantic pitfalls of C. As such, it is similar to C in many ways. Ocean is mostly similar to K&R C [Kernighan and Ritchie 1978], but it has elements of ANSI C [ANSI, 1988; Kernighan and Ritchie, 1988; Plauger and Brodie, 1992].

Ocean is also intended as for use in a compiler course. As such, it is simple to compile, but still touches on many of the issues in compiling imperative languages. Furthermore, it is relatively easily translatable to C.

This report defines the Ocean programming language. As a language definition, this report primarily highlights the differences between C and Ocean. Note however, that this report does not generally provide the rationale for specific design features.

Ocean is a block-structured, imperative programming language. Its primary features are as follows.

The main body of this report is generally informal about the syntax of Ocean; the appendix contains the formal syntax.

Lexical Structure

Ocean programs consist of a sequence of tokens. Each token is a sequence of one or more characters from the ASCII character set.

Where possible ambiguities exist in determining the tokens, Ocean defines the interpretation to be the longest possible match reading from left to right. (This interpretation matches that of lex.) Tokens that might otherwise be ambiguous or confusing may be separated with white space. White space consists of one or more of spaces, tabs, or newlines. White space is not itself a token; it separates tokens.

Comments start with `$' and extend to the end of the line. Comments act as white space. This comment structure permits nested comments on a single line, but relies on a good text editor to comment out sections of programs.

Identifiers consist a sequence of characters starting with a letter (upper or lower case) or an underscore, and followed by zero or more letters, digits, or underscores, excluding those sequences that are keywords.

The remaining tokens are described as needed in the remainder of this document. These tokens are quoted and appear in teletype font, e.g. `return' or `<='.

Declarations and Definitions

Ocean declarations and definitions are the most complex part of the language. They look similar to those of C, but are simpler. A declaration associates an identifier with a type. A definition associates an identifier with a type and with a storage area, a value, or both. A declaration does not allocate storage or provide a value. Variables and functions may have only one definition per program, but may have an arbitrary number of declarations, provided that the declarations match the definition. Types, manifest constants, and in-line functions may have only one definition per compilation unit, but may be defined several times within the same program. Types, manifest constants, and in-line functions have no declarations.

Variables and functions must be declared (or defined) before they are used. Identifiers declared local to a function shadow variables declared as parameters, which in turn shadow global identifiers.

Types need not be defined before being used, but no operations may be applied to objects of this type until the type is defined. This rule enables forward references to types and enables the implementation of opaque types.

General Form

Ocean declarations and definitions consist of a mode, a base type, and one or more comma separated declarators. Modes are similar to the storage classes of C, except that modes are more general, some mode names are different, and the mode is required. The modes for declarations are `extern' and `forward'. The modes for definitions are `public', `private', `static', `auto', `register', `manifest', and `inline'. The base type is usually a simple identifier, but may be a structure or union description (described later). Declarators give the name and final type of variables (or types, or functions) being declared. They consist of the identifier followed by any number of derived-type indicators for arrays (`[number-of-elements `]'), pointers (`^'), and functions (`(parameters `)'). These indicators are the same as the operators used to access the types, so as in C, variables are ``declared by example of use''. In Ocean, the pointer derefence is a postfix `^', and not a prefix `*', which means that all derived-type indicators are uniformly postfix. For example, an array of integers and a pointer to integers are declared as:
static int ids[ 8 ], ptr^ ;

Externals and Forwards

Variables and functions defined in another compilation unit are declared in the current compilation unit with the external statement, which is like that of a global variable definition, except that the mode is `extern' and initializers are not allowed.

Variables and functions defined later in the current compilation unit are declared in the current compilation unit with the forward statement, which is like that of external declarations except that the mode is `forward'.

Variables

Modes for global variable definitions are `public' and `private'. Public means that the variable names are given to the linker, and private means that they are not. Modes for local variable definitions are `static', `auto' (automatic), and `register'. Static means that the variables have only one instance, which is exists throughout the lifetime of the program. Automatic means that the variables have one instance per execution of the compound statement in which the definition appears. Their lifetime is exactly that of the compound statement. Register means the same as automatic, except that the compiler should place the variable in a register rather than on the stack frame. Compilers are free to implement a register variable as an automatic variable. Also, because registers (generally) have no address, the ``address of'' operator does not apply to register variables.

Definitions may have initializers, as in C, except that the assignment operator is `:=' rather than `='. The initializers for `public', `private', and `static' definitions must be constant expressions (evaluable at compile time), but the `auto' and `register' initializers may be dynamic expressions.

Manifest Constants

Manifest constants are symbolic names for compile-time constants. The syntax is like that of variable definition, except that the mode is `manifest' and initializers are required. Furthermore, local manifest constants must have constant (exression) initializers; dynamic initializers are not permitted. For example,
manifest int a := 8*20 ;

Members and Parameters

The declaration of parameters, structure members, and union members share the same form, which is a base type followed by a comma separated list of declarators. For example,
int alpha, beta ;
float gamma ;

Functions

The function definitions are like those of global variables (i.e. `public' and `private'), except the initializer is a compound statement rather than an expression and there is no assignment operator. The function type is derived from its return type and its parameter types. Unlike K&R C, Ocean functions declarations include their parameters. Like K&R C, Ocean function definitions include their parameters. For example,
extern void malloc( int n )^ ; $ returns a pointer to void
forward int double( int n ) ;
public int quadruple( int n ) { return double( double( n ) ) ; }

In-Line Functions

The in-line function syntax is like that of regular function definitions, except that the mode is `inline' and initializers are required. In-line functions have the semantics of regular functions, not the semantics of macro substitution. There are several consequences to this interpretation of in-line functions.

Type Definitions

The definition of symbolic names for types is similar to that of variables, except that the mode is `typedef' and initializers are not allowed. Type definitions may appear locally as well as globally. Type definitions define an alias for a type, they do not define a new type. As in C, Ocean types are structurally equivalent.

Type Specifiers

Some Ocean constructs require a type specifier independent of any specific declaration. In these cases, the syntax for a type specifier is like that of a member declaration, except that the member identifier is omitted. That is, a type specifier consists of a base type and a sequence of zero or more derived-type indicators. So, the type ``pointer to an array of integers'' has the specifier `int ^[]'.

Types

The C language overloads the concepts of booleans, characters, machine words, and integers onto one type, the machine word (called an (unsignedint). Ocean separates those concepts into distinct numeric types. To support this distinction, Ocean provides seven primitive types: void, boolean, character, word, integer, floating-point, and pointer. The word, integer, and floating-point types are collectively called numeric types, because they are castable among each other and share many operations. The numeric and pointer types are collectively called linear types, because they share addition and subtraction operations.

Ocean provides three data structuring types, the array, the structure, and the union. Within these types, Ocean distinguishes between sized types and unsized types. Sized types have a size associated with them, and as such may be allocated space (defined as a variable) and copied. Unsized types have no size, so they cannot be allocated space (defined as a variable) or copied. A consequence is that variables, parameters, and function results may not have an unsized type. Furthermore, unsized types may not be assigned. Unsized types must be handled indirectly, via a pointer. Pointers are always sized types, even if what they point to is unsized.

See section ``Expressions'' for a description of the operations that apply to these types.

Void

Void is the primitive unsized type. Unlike all other unsized types, void is a legal function return value. This is used to indicate functions that have no return value.

Boolean

The boolean type is identified by the keyword `bool'. Its literals are `true' and `false'.

Character

The character type is identified by the keyword `char'. Character literals consist of a printable character (including space and tab but excluding newline) enclosed within single quotes. Examples are `'H'' and `'"''.

Within quotes, the `\' character begins an escaped character. In both character and string literals, the following sequences represent a single (escaped) character: `\b' (backspace), `\e' (escape), `\n' (newline), `\t' (horizontal tab), `\v' (verticle tab), `\'' (single quote), `\"' (double quote), `\\' (backslash), and `\w.' (the character with the code w where w is a word number constant, see below). All other backslash sequences are illegal.

Simple character string literals consist of zero or more printable or escaped characters within double quotes. As in C, the value of a character string literal is a pointer to an array of characters containing the given characters and terminated by the null character, `'\0.''. This enables the definition of long string literals within relatively short source lines, Ocean provides complex character string literals. A complex character string literal consists of more than one adjacent simple string literals, and represents the concatenation of the simple literals.

Word

Ocean represents machine words explicitly with the `word' type identifier. Word values are unsigned binary numbers of a fixed number of digits, i.e. a bit string. All arithmetic operations on words are done modulo the word size of the computer. No overflow is possible.

Word literals consist of one or more digits, in binary, octal, decimal or hexadecimal bases. Binary, octal, and hexadecimal constants are specified by placing `0b', `0q', and `0x' before the digits, respectively. The hexidecimal digits `A'--`F' and `a'--`f' represent 10--15, respectively.

Integer

The signed integer type is identified by `int'. Integers do not have a bit string interpretation because the representation of sign may vary. The result of an overflow for integer operations is undefined, and may include aborting the program or ignoring the overflow.

There are no integer literals. You can achieve the effect of an integer literal by placing a `-' sign before a word literal. Alternatively, you can cast the word value to an integer value (see ``Cast to Int'').

Floating-Point

The signed floating-point number type is identified by `float'. The result of an overflow or an underflow for floating-point operations is undefined, and may include aborting the program or ignoring the overflow.

Floating-point literals consist of one or more digits followed by a decimal part, an exponent part, or both. Decimal parts consist of a decimal point `.' followed by zero or more digits. Exponent parts consist of the letter `E' (or `e'), an optional sign, `+' or `-', and one or more digits. Again, there are no negative literals, but a unary `-' will produce a negative constant.

Pointer

Pointers are described as in C, except that the derived-type indicator is a postfix `^' rather than a prefix `*'. Likewise the dereference operator is a postfix `^'. For example, an automatic integer pointer variable `ip' is defined with
auto int ip^;
And is dereferenced as in
i := ip^;

The only pointer literal is `nil', which is a nil pointer to the void type. To obtain nil pointers to other types, one must cast `nil' to the appropriate type (see ``Cast to Pointer'').

Pointers are implicitly converted to pointers to void when necessary to obtain a legal interpretation of an expression.

The arithmetic operators `+' and `-' may be applied to pointers with a sized base type, but not an unsized base type. Pointers to which `+' and `-' operations are applied must point to elements of an array. Such pointers may not point to singleton variables.

Array

Arrays are described as in C, but they are first-class objects. The first-class nature of arrays means you may assign arrays and pass arrays as parameters. It also means that there is no implicit conversion from the name of an array to a pointer to its first element.

The array derived-type indicator is `[expression `]' where the expression is a word (non-negative) constant specifying the number of elements in the array. The elements are indexed from 0 to n-1. The array type includes the element type, which must be a sized type. Array types may also include the number of elements as part of their type. These are sized array types. Programs may assign arrays and pass array parameters so long as the number of elements in both source and destination is identical.

For example, an automatic integer array variable `ia' is defined with

auto int ia[3];
And is indexed as in
i := ia[j];

If an array declaration omits the number of elements, the array is unsized. (An array definition may not omit the number of elements, because the array would have no size, and hence the corresponding variable could not be allocated. Declarations do not allocate storage, and so may be unsized.) Unsized arrays have restricted use, as discussed earlier. Where necessary, pointers to sized arrays are implicitly converted to pointers to unsized arrays. This is useful for passing parameters to library functions. For example, the C function strcpy has the Ocean declaration and use:

extern char strcpy( char s1^[], s2^[] )^[];
static char source[ 10 ], target[ 20 ];
strcpy( target@, source@ );
(The `@' operator is a postfix operator equivalent to the C prefix `&' operator.)

Structure and Union

Structure and union descriptions are as in C, excluding tags. The description consists of a `struct' or `union' followed by the member declarations (as discussed earlier) enclosed in braces. The last `;' in the member declarations may be omitted. The following example defines two structure variables, each containing three members.
public struct { int i, j; float x } s, r;

All but the last member of a structure must have a sized type. If the last member has a sized type, the structure has a sized type. If the last member is unsized, the structure has an unsized type. Any member of a union may be unsized, in which case the union is unsized. In both cases, the same restrictions apply to unsized structures and unions as apply to unsized arrays. An example of an unsized structure is a counted character string.

typedef struct { word length; char text[] } counted_string;

Functions

Functions are as in C, but there is no implicit conversion from the name of a function to a pointer to that function. Arguments to a function must match the types of the parameters to that function. Parameters are pass-by-value. A Ocean function is call compatible with an equivalent C function. This means that Ocean functions can call C functions. C functions can also call Ocean functions, except those functions with array parameters or results.

Where ANSI C would define a function:

int foobar( int n, int m, float x ) { int o = n+m; return o+x; }
Ocean declares the function parameters using the structure member syntax:
public int foobar( int n, m; float x ) { auto int o := n+m; return o+x; }

Expressions

Expressions compute values or designate variables. Expressions are composed of variable names, literal values, and prefix, infix, and postfix operators. The following table summarizes the position, precedence, and associativity of the various groups of operators.
group position precedence associativity members
lvalue infix highest left-to-right data structure access,
postfix pointer dereference
primary postfix left-to-right address of, function call, casts, post-inc/dec-rement, isnil,
prefix sizeof type
unary prefix right-to-left prepost-inc/dec-rement, invert, negate, sizeof value
power infix left-to-right power, logarithm, shift
product infix left-to-right multiply, divide, modulo, bitwise and
sum infix left-to-right add, subtract, bitwise ior, bitwise xor
relation infix left-to-right comparisons
conjunction infix left-to-right logical, short-circuit and
disjunction infix left-to-right logical, short-circuit or
assignment infix lowest right-to-left assignments

Expressions may be enclosed in parentheses to override the operator precedence or associativity.

Operations

This section defines the operators and their allowable operands. Within the tables of this section, operators are defined with an indication of the type and form of their arguments, as follows.
indicator meaning indicator meaning
a array rvalue b boolean rvalue
c character rvalue f floating-point rvalue
g function rvalue i integer rvalue
l linear rvalue n numeric rvalue
o an operator p pointer rvalue
s structure rvalue t type specifications
u union rvalues w word rvalue
z any rvalue

The upper case variants indicate an lvalue rather than an rvalue.

Where needed to provide a legal interpretation of an expression, there are implicit conversions from `char' to `word', from `word' to `int', and from `int' to `float'.

Lvalue

operation result meaning and notes
Zp^ Z --- dereference a pointer
ZA[i] Z --- select an array member
ZP[i] Z (ZP!zt^[]!^[i]) --- pointer index is array index
S. ident Z --- select a structure member
U. ident Z --- select a union member

Primary

operation result meaning and notes
Z@ zp --- a pointer to the storage of the lvalue
G(...) z --- call a function, argument types must be implicitly castable to parameter types
L++ l ((L += 1) - 1) --- increment, return old value
L-- l ((L -= 1) + 1) --- decrement, return old value
sizeof !t! w --- size of the type in addressable units
c !word! w --- the ordinal position of the character
w !int! i --- the integral value of the word, may overflow
w !float! f --- the floating-point value of the word, large words may also lose precision
i !float! f --- the floating-point value of the integer, large integers may also lose precision
c !int! i c !word! !int!
c !float! f c !word! !float!
f !int! i if f>0 then floor(f) else ceiling(f) --- truncated, undefined if overflow
i !word! w i mod 2wordsize
w !char! c --- the character with ordinal position w
f !word! w floor(f mod 2wordsize)
f !char! c f !word! !char!
i !char! c i !word! !char!
z1p !z2tp! z2p --- casting the pointer to the given type
z1 !z2! z2 --- any cast structurally equivalent to the above
p isnil b --- `true' if p is nil, `false' otherwise

Unary

operation result meaning and notes
~b b --- the boolean one's complement
~w w --- the bit-by-bit one's complement
-i i --- integer negation, undefined if overflow
-f f --- floating-point negation
++L l (L += 1) --- increment, return new value
--L l (L -= 1) --- decrement, return new value
sizeof z w --- size of the value in addressable units

Power

operation result meaning and notes
w1<<w2 w w1 × 2w2 mod 2wordsize --- left shift
i<<w i i × 2w --- left shift, undefined if overflow
f<<w f f × 2w --- add to exponent, undefined if overflow
w1>>w2 w floor(w1 ÷ 2w2) --- right shift, undefined if w2>=wordsize
i>>w i floor(i ÷ 2w) --- right shift, undefined if w>=wordsize
f>>w f f ÷ 2w --- subtract from exponent, undefined if underflow
w1**w2 w w1w2 mod 2wordsize
i**w i iw --- undefined if overflow
f1**f2 f f1f2 --- undefined if overflow
w1//w2 w floor(logw2 w1) --- undefined if w1=0 or w2<= 1, w//2 is the index of the highest set bit
f1//f2 f logf2 f1 --- undefined if f1<= 0 or f2<= 1 or underflow

Product

operation result meaning and notes
w1*w2 w w1 × w2 mod 2wordsize
i1*i2 i i1 × i2 --- undefined if overflow
f1*f2 f f1 × f2 --- undefined if over/under-flow
w1/w2 w floor(w1 ÷ w2) --- undefined if w2=0
i1/i2 i truncate( i1 ÷ i2 ) --- undefined if i2=0
f1/f2 f f1 ÷ f2 --- undefined if f2=0 or over/under-flow
w1%w2 w w1 mod w2 --- undefined if w2=0
b1&b2 b boolean b1 and b2
w1&w2 w bitwise w1 and w2

Sum

operation result meaning and notes
w1+w2 w w1+w2 mod 2wordsize
i1+i2 i i1+i2 --- undefined if overflow
f1+f2 f f1+f2 --- undefined if over/under-flow
p+i p --- the ith element higher in memory
w1-w2 w w1-w2 mod 2wordsize
i1-i2 i i1-i2 --- undefined if overflow
f1-f2 f f1-f2 --- undefined if over/under-flow
p1-p2 i i1-i2 where p1=p3+i1 and p2=p3+i2
p-i p --- the ith element lower in memory
b1|b2 b boolean b1 or b2
w1|w2 w bitwise w1 or w2
b1~b2 b boolean b1 xor b2
w1~w2 w bitwise w1 xor w2

Relation

operation result meaning and notes
n1<n2 b n1<n2
p1<p2 b i1<i2 where p1=p3+i1 and p2=p3+i2
n1>n2 b n1>n2
p1>p2 b i1>i2 where p1=p3+i1 and p2=p3+i2
n1==n2 b n1=n2
p1==p2 b --- p1 and p2 are nil or point to the same location
l1<=l2 b ~(l1> l2)
l1>=l2 b ~(l1< l2)
l1<>l2 b ~(l1== l2)

Conjunction

operation result meaning and notes
b1&&b2 b if b1 then b2 else false --- short-circuit

Disjunction

operation result meaning and notes
b1||b2 b if b1 then true else b2 --- short-circuit

Assignment

operation result meaning and notes
Z1:=z2 z1 --- copy z2!z1t! to the location Z1, the result is the assigned value z1
Z1 o= z2 z1 Z1:= Z1 o (z2) --- Z1 is evaluated only once and o is one of `<<', `>>', `**', `//', `*', `/', `%', `&', `+', `-', `|', or `~'

Predefined Functions

Implementations must provide the following predefined functions.
extern char get_char( ); $ get the next character from stdin
extern void put_char( char character ); $ put the character to stdout
extern word word_abs( int number ); $ the absolute value of the number
extern int ceiling( float number ); $ the ceiling of the number
extern int floor( float number ); $ the floor of the number
extern int truncate( float number ); $ the trunctation of the number
extern int round( float number ); $ the number rounded, prefering evens
extern float float_abs( float number ); $ the absolute value of the number
extern void allocate( word number )^; $ allocate number units of memory

Statements

Statements specify the sequence of expressions to evaluate. Their value is in controlling the side-effects of expressions.

Empty

A single semicolon is an empty statement. No action will be performed.

Expression

An expression followed by a semicolon is an expression statement. The result of the expression is discarded. A warning if the result is not void or not the result of an assignment is desireable.

Compound

A compound statement is a sequence of zero or more statements or local declarations enclosed by braces. Unlike C, local declarations may appear anywhere in the block, not just at the beginning. However, variables must still be defined before use.

If

The if statement provides selection; it has the forms:
`if' `(' boolean-expression `)' statement
`if' `(' boolean-expression `)' statement `else' statement
As in C, the ambiguity of statements like:
if ( a < b ) if ( c < d ) e := f ; else g := h ;
is resolved by associating each `else' with the nearest possible `if'.

Switch

The switch statement also provides selection; it has the form:
`switch' `(' integer-expression `)' statement
where the sub-statement is typically a compound statement. The switch statement is like that of C, with switch cases labelled the same way, except that Ocean provides for case ranges:
`case' constant-integer-expression `:' statement
`case' constant-integer-expression `to' constant-integer-expression `:' statement
`default' `:' statement
In addition, the break statement will exit the (innermost enclosing) switch statement:
`break' `;'

While

The while statement provides repetition; it has the form:
`while' `(' boolean-expression `)' statement

The break statement will exit the while statement. The continue statement will restart the while statement at the condition.

`continue' `;'

Do-While

The do-while statement also provides repetition; it has the form:
`do' statement1 `while' `(' boolean-expression `)' statement2

The do-while statement is equivalent to:

statement1 `while' `(' boolean-expression `)' `{' statement2 statement1 `}'
except that the continue will restart at the (innermost enclosing) `do', i.e. at statement1. The break statement will exit the (innermost enclosing) while statement.

An Ocean do-while statement with an empty statement1 is equivalent to the C while statement; an Ocean do-while statement with an empty statement2 is equivalent to the C do-while statement.

Input/output often prefers conditions in the middle of a loop, rather than at the beginning or end. Ocean provides these middle-condition loops with the do-while statement, which is intended to be used in situations like the following:

do                 c := getchar( );
while ( c <> EOF ) putchar( c );
Implementing these semantics in C requires more complicated code.

For-Loop

The for-loop statement provides iteration; it has the form:
`for' `(' boolean-expression `)' statement1 `loop' statement2

The for-loop statement is equivalent to:

`while' `(' boolean-expression `)' `{' statement2 statement1 `}'
except that the continue will restart at statement1 of the (innermost enclosing) for-loop statement. The break statement will exit the (innermost enclosing) for-loop statement.

The for-loop is intended to be used as follows:

{ int i := 0;
  for ( i < 100 ) i++;
  loop            j += a[i];
}

The Ocean for-loop (used as intended) provides two advantages over the C for statement. It puts declaration and initialization of iteration variables in the same place and it permits an arbitrary statement as the iterator. The latter advantage more than makes up for the loss of the C `,' and `?:' operators.

Return

The following statements provide a function return value and transfer control back to the calling function.
`return' `;'
`return' expression `;'
The first form is for void functions. The second form is for non-void functions, and the type of the expression must match that of the function definition.

Goto

The following statement unconditionally transfers control.
`goto' label-identifier `;'
Since the goto exists, statements may be labeled, as in C.
label-identifier `:' statement

Grammar

Within this section, required items are grouped by parentheses, ( ), optional items are enclosed in square brackets, [ ], zero or more occurences of items are enclosed in curly braces, { }, and alternative items are separated by vertical bars, |.

Tokens

Character sets are enclosed by angle brackets, < >. Within character sets, characters from a contiguous range are specified by the numerically extreme points separated by an en-dash, --. For example, the lower case letters are <a--z>.

boolean = `true' | `false'
word = <0--9> { <0--9_> } | `0b' <01> { <01_> } | `0q' <0--7> { <0--7_> }
| `0x' <0--9A--Fa--f> { <0--9A--Fa--f_>}
float = <0--9> { <0--9_> } `.' <0--9> { <0--9_> }
| <0--9> { <0--9_> } [ `.' <0--9> { <0--9_> } ] <Ee> [ <+-> ] <0--9> { <0--9_> }
identifier = <A--Za--z_> { <0--9A--Za--z_> }
character = `'' ( <!"#$%&()*+,./0-9:;<=>?@A-Z[]^_`a-z{}~-|>
| space | tab | `\' <betnv\'"> | `\' word `.' ) `''
string = `"' { <!#$%&'()*+,./0-9:;<=>?@A-Z[]^_`a-z{}~-|>
| space | tab | `\' <betnv\'"> | `\' word `.' } `"'

Grammar

lvalue = identifier | lvalue `.' identifier | lvalue `[' expression `]' | primary `^'
literal = boolean | character | word | float | string { string }
primary = literal | lvalue | lvalue `@' | lvalue `(' expressions `)' | lvalue ( `++' | `--' )
| `sizeof' `!' typespec `!' | primary `!' typespec `!' | `nil' | primary `isnil'
| `(' expression `)'
unary = primary | ( `++' | `--' ) lvalue | ( `~' | `-' ) unary | `sizeof' unary
power = unary | power ( `<<' | `>>' | `**' | `//' ) unary
product = power | product ( `*' | `/' | `%' | `&' ) power
sum = product | sum ( `+' | `-' | `~' | `|' ) product
relation = sum | sum ( `<' | `<=' | `>=' | `>' | `==' | `<>' ) sum
conjunction = relation | conjunction `&&' relation
disjunction = conjunction | disjunction `||' conjunction
expression = disjunction | lvalue ( `:=' | `<<=' | `>>=' | `**=' | `//=' | `*=' | `/=' | `%=' | `&=' | `+=' | `-=' | `~=' | `|=' ) expression
expressions = [ { expression `,' } expression ]
statement = `;' | expression `;'
| `break' `;' | `continue' `;' | `goto' identifier `;' | `return' [ expression ] `;'
| ( identifier | `case' expression [ `to' expression ] | `default' ) `:' statement
| `switch' `(' expression `)' statement
| `if' `(' expression `)' statement [ `else' statement ]
| [ `do' statement ] `while' `(' expression `)' statement
| `for' `(' expression `)' statement `loop' statement
| compound
value = expression | `{' [ value { `,' value } ] `}'
initializer = [ `:=' value ]
typebase = identifier | `bool' | `char' | `word' | `int' | `float' | `void'
| `struct' `{' parameters `}' | `union' `{' parameters `}'
typespec = typebase { `^' | `[' `]' | `[' expression `]' | `(' parameters `)' }
declarator = identifier { `^' | `[' `]' | `[' expression `]' | `(' parameters `)' }
member = typebase declarator { `,' declarator }
parameters = [ member { `;' member } [ `;' ] ]
declaration = ( `extern' | `forward' | `typedef' ) typebase declarator { `,' declarator } `;'
manifest = `manifest' typebase declarator `:=' value { `,' declarator `:=' value } `;'
localdef = ( `auto' | `register' | `static' ) typebase declarator initializer
{ `,' declarator initializer } `;'
compound = `{' { declaration `;' | manifest `;' | localdef `;' | statement } `}'
globaldef = ( `public' | `private' ) typebase declarator initializer
{ `,' declarator initializer } `;'
funcdef = ( `inline' | `public' | `private' ) typebase declarator compound
compilation = { declaration | manifest | globaldef | funcdef }

References

[ANSI, 1988] American National Standards Institute, Preliminary Draft Proposed Standard --- The C Language, 1988.

[Kernighan and Ritchie, 1978] Brian W. Kernighan and Dennis M. Ritchie, The C Programming Language, Prentice Hall, Inc., Englewood Cliffs, New Jersey 07632, 1978.

[Kernighan and Ritchie, 1988] Brian W. Kernighan and Dennis M. Ritchie, The C Programming Language, Prentice Hall, Inc., Englewood Cliffs, New Jersey 07632, second edition, 1988.

[Plauger and Brodie, 1992] P. J. Plauger and Jim Brodie, ANSI and ISO Standard C Programmers Reference, Microsoft Press, One Microsoft Way, Redmond, Washington 98052-6399, 1992.