m8cas - Single-pass assembler for the M8C assembly language
===========================================================

Copyright 2006 Werner Almesberger

The files in this directory are distributed under the terms of the
GNU General Public License (GPL), version 2. This license is
included in the file ../COPYING.GPLv2.


Overview
--------

This is a single-pass assembler for the M8C assembly language. It can
currently only generate absolute code and thus does not require a
linker. Instead of implementing its own preprocessor, it uses CPP.
(For compatibility, CPP is only enabled if the option -e is set.)


Usage
-----

m8cas reads the files specified on the command line. If no file is
specified or if a file's name is "-", it reads from standard input.
Local labels are only visible in the file in which they are defined.
Global labels must be defined in one of the files passed to m8cas,
i.e., there is no separate linker pass following assembly.

The assembler language either closely follows the specification in the
assembly language user guide (see below), or you can enable various
extensions, including the use of the C pre-processor (CPP), with the
option -e.

If using CPP, the CPP options -I, -D, and -U are recognized with their
usual semantics.

The output is written to standard output or the file specified with
the option -o. By the default, the output is in ROM format, which can
be overridden with the options -h (Intel HEX) and -b (raw binary).

An optional symbol map is written to the file specified with the -m
option. Each line of the symbol map contains the following fields:

- the address space of the symbol (RAM for ROM labels, EQU for
  assignments)
- the hexadecimal value of the symbol. ROM labels are padded to four
  digits, RAM labels are padded to three digits, assigned values are
  padded to four digits, except if they are larger than 0xFFFF, in which
  case they are padded to eight digits.
- "L" for a local, "G" for a global symbol
- the full name of the symbol, including the leading dot of a reusable
  label
- the area in which the symbol was defined, "-" if the symbol is not a
  label
- the source code location where the symbol was defined

Fields are separated by one or more spaces. The symbol file is sorted
alphabetically by the full symbol names.

The assembler can generate Flash protection data. By default, it looks
for a file called "flashsecurity.txt" in the current directory, and
uses the information contained therein. If no such file exists, no
protection data is generated. A different file name can be specified
with the option -f. If the protection is different from allowing all
types of access, the output file must be in Intel HEX format, since
this is the only format that can convey this information.


Compatibility
-------------

This assembler is based on the description in the Cypress document
"Assembly Language User Guide, Document #38-12004 Rev *C". This
section describes assumptions that have been made about behaviour
not explicitly specified in the above document, missing features,
and language extensions.


Assumptions
- - - - - -

- A, F, SP, X, and REG are case-insensitive
- the register names A, F, SP, X, and REG, mnemonics, and assembler directives
  are reserved words and not allowed as labels
- MVI A,[[expr]++] is a valid synonym for MVI A,[expr] (and likewise for
  MVI [[expr]++],A)
- there can be whitespace around the double plus, but not between the two plus
  signs
- label names cannot start with a decimal digit (3.1.1 says they can ?)
- hexadecimal constants can be in lower, upper, or mixed case
- the "h" of the second form of a hexadecimal constant (table 3-2) can be upper
  or lower case
- only ASCII characters are allowed in character constants
- the only escape sequences recognized in character constants are \' and \\
  (and not \n, \r, etc.; table 3-2)
- the escape sequences recognized in strings are \n, \r, \t, \b, \', \", \\,
  and between one and three octal digits
- no whitespace is allowed between a label and its colon
- no whitespace is allowed between the two colons of a global label
- no whitespace is allowed after the period of a re-usable local label
- a re-usable local label cannot be made global
- code not selected by IF ... ELSE ... ENDIF must still be syntactically
  correct
- all directives can be preceded by a label on the same line
- all calculations are done in unsigned 32 bit words (BIG question mark here)
- values that are too big for their destination are silently truncated (with
  the exception of PC-relative addresses)
- there can be whitespace between "REG" and the opening square bracket
- exporting a label more than once is valid and does not yield a warning
- the argument of ORG, BLK, etc. must be constant at the time of evaluation
- the argument of EQU and DB, etc. can be undefined at the time of evaluation
- the attributes in an AREA directive are case-insensitive and can appear in
  any order
- the attributes in an AREA directive are not reserved words
- the name of an area does not implicitly define a label, and is available for
  use as a label, even if completely unrelated to the area
- it is an error to try to store data in a RAM area
- undefined local labels are automatically resolved through global labels
- defined local labels are not affected by global labels


Missing features
- - - - - - - -

- DSU: not yet implemented
- IF, ELSE, ENDIF: not yet implemented - use CPP's conditional compilation
- INCLUDE: not yet implemented - use CPP's file inclusion
- .LITERAL, .ENDLITERAL, .SECTION, .ENDSECTION: ignored
- MACRO, ENDM: not yet implemented - use CPP's macros


Extensions
- - - - -

These are features that, according to the manual, do not exist in Cypress'
assembler:

- in addition to [X+expr], we also allow [X-expr]
- we support the unary minus
- we support the logical operators (&&, ||, and !), the relational operators,
  and the shift operators
- we support nesting of expressions with parentheses
- the input is free-format, so multiple instructions or directives can appear
  on the same line, and a single instruction or directive can span multiple
  lines
- ORG, EQU, etc. accept any valid expression
- added a unary ffs()-1 operator, >>expression. Note that its precedence is
  like that of all other unary C operators, and not like unary < or >. A bit
  mask covering a field in a register or similar can be used to shift a value
  into position with the >> operator, e.g.:
  mov reg[CPU_F],2 << >>CPU_F_PgMode
- there is an ASSERT directive that causes an assemble-time error if the
  expression evaluates to zero
- re-definable local labels similar to the ones of the GNU assembler: any label
  whose name begins with a dot and a digit is not removed by other labels, is
  accessed by appending the direction of the definition ("b" for backwards, "f"
  for forwards) to the name, and can be redefined
- the special variable @ contains the cumulative number of CPU cycles since the
  beginning of the program. Note that this information is static and does not
  take into account loops, branches, or code that is not produced by a mnemonic
  (e.g., "db 40" would not change the cycle count)

Most of these extensions are only available when the option -e is set.
Otherwise, attempting to use them yields an error.


Errata
- - -

According to "Assembly Language User Guide, Document #38-12004 Rev *C",
section 5.1, page 78, the list of attributes of AREA is not optional.
However, the example in section 9.3.1, on page 115, in "PSoC Designer
IDE User Guide, Document # 38-12002 Rev. E", shows an AREA directive
without attributes. This type of use appears reasonable, so that's also
what m8cas supports.


Known problems
--------------

- there is no provision for detecting exceeding the available ROM or RAM
  space (this is of course something the linker should do, not the assembler)
- the behaviour of AREA is still unsatisfactory
- far too many wildly distinct types of problems are reported simply as "syntax
  error"
- the line number in most error messages is off by at least one line
