| Here we'll: |
|
Describing Decaf |
Decaf is a programming language and toolset designed for persons who are not computer professionals. It bears the same relationship to Java that BASIC bore to Fortran - speed and power are willingly sacrificed to make the language easy to learn and use.
In 1964 and 1965, engineers working with IBM-donated computers at MIT were creating a revolutionary new computing technology called "time-sharing." Prior to that time computers were million-dollar machines that ran programs sequentially. The basic idea of time-sharing was that this very expensive resource could be made to split its time among multiple concurrent users, thereby bringing the resource into the realm of the affordable for lots of users.
At the same time, engineers were working at Dartmouth with GE-donated computers to achieve precisely the same goal. (Who was first? That depends on whom you ask.) But Dartmouth's math department, under which the project was run, had one additional insight: Fortran (the language of choice for scientific computing at that time) was unsuitable for casual computer users. What was needed was something far easier to learn and use.
BASIC (Beginner's All-purpose Symbolic Instruction Code - acronyms were all the rage in the mid-sixties) filled the need nicely. It's a credit to the language's designers (Professors Kemeney and Kurtz) that BASIC is still actively used today. While it has achieved the speed and power of other languages, its ongoing success is as a scripting language (in Microsoft Office and Lotus Notes, for examples) where it is used by non-programmers to create modest applications based on the underlying products.
Decaf is targeted toward the same need, but coming in the age of Java, not of Fortran. This is a sample dialog.
| User types | Decaf responds |
|---|---|
| (User types randomly in input window) | That's not a valid Decaf expression. Try "2 + 2". |
| 3 + 4 | That's a valid Decaf expression! You can assign the result to a variable or see it printed. (Buttons shown for "Assign" and "Print") |
| (Clicks "Print") | To print to the output window, use a print statement:"3 + 4 print" |
| 7 - 5 print | (Output window says "2") Congratulations! You've written a correct Decaf statement. |
Decaf will help beginners ease into programming.
I had originally intended to write Decaf in Java. Extended experiments with the Decaf Virtual Machine (DVM) suggested that this would fail. With considerable help from the capable people on Sun's discussion boards, I was unable to get the DVM to call an instruction in less than 100k machine cycles. (I do not expect any Java fan to believe that. Go ahead and try to prove me wrong.) I now intend to write the DVM and Decaf assembler in C. Decaf and its IDE will be written in Decaf assembler.
| P. S. Decaf would be a good standalone Python program. Or it might be a JavaScript application so it would run inside the browser. Or it might be written in C++ as a Firefox plug-in or even as an integral part of Firefox. Hmmm. |
|
Decaf's Design |
The Decaf application, the part that the Decaf user sees, is the topmost layer in a complete computer design that goes all the way down to hardware. This is the overall design:
|
||
|
||
|
||
|
||
|
||
|
||
|
Let's look at these layers, from the bottom up.
The Decaf Virtual Machine is simpler than the Java Virtual Machine. The CPU is documented but not programmed. This summary information is taken from the Javadoc for language.DecafComputer.
All registers are 64-bits wide. These are the CPU's registers:
| Register name | Capability |
| Addr1 | Holds memory address |
| Addr2 | Holds memory address |
| Addr3 | Holds memory address |
| Real1 | Holds double-precision float value |
| Real2 | Holds double-precision float value |
| Flags | Flags register |
| Int1 | Integer register |
| Int2 | Integer register |
| Reps | Integer repeat count register |
These are the basic data types:
| Symbol | Data Type |
| O | Object |
| R | Real (64-bit IEEE 754) |
| I | Int (64-bit signed integer) |
| C | Char (16-bit unsigned integer) |
| B | Byte (8-bit unsigned integer) |
| T | Bit (0 or 1) |
The RAM is simply arrays within the RAM. The peripherals are not designed. We'll talk to CRT, keyboard and mouse through pseudo-OS calls.
The instruction set was originally documented in language.instructions. This information originated from that Javadoc and now supercedes it.
A Program is an array of these Instruction objects.
Some instructions have suffixes to operate on many combinations of data types. For examples, ADDBB adds two bytes; ADDIR adds an int to a real (returning a real) and so on. For brevity, the add instruction is documented as ADDpq. As you will see in the next table, this means there is an ADD instruction for every numeric type except bit that adds every equal or narrower type, except bit. There is no ADDIL instruction because long is wider than int. (Requiring the wider operand on the left cuts the instruction set size without costing any functionality.) The following table shows these suffixes.
| Suffix Letter(s) | Suffixes |
| a | Any data type: "O", "R", "I", "C", "B" or "T" |
| aa | Any data type with itself: OO, RR, II, CC, BB or TT |
| cc | Comparison: EQ, NE, GE, GT, LT or LE |
| i | Any integer type: I, C, B or T |
| j | Any integer not wider than the first type |
| k | Any integer except bit (list i excluding T) |
| n | Any numeric type: R and list i |
| o | Any number not wider than the first type |
| p | Any number, except bit (list n excluding T) |
| q | Any number in p not wider than the first type |
Decaf assembler instructions are written:
The following table lists the Decaf machine instructions.opcode [address or distance][, flag number][; comment] Examples: ADDLI // add long + int -> long CMPRREQ // compare two reals for equality JBR 5 // skip 5 instructions if Flags[0] is reset LOADA 1234, 1 // loads 1234 into Addr1
| OpCode | Value | Flag (0-63) | Meaning |
| ADDpq | Add - pops two items from stacks, push sum | ||
| AND | And - replace the top two bits with their logical and | ||
| BRESET | the flag to reset | Reset the specified flag bit | |
| BSET | the flag to set | Set the specified flag bit | |
| CALL | address | Call a subroutine | |
| COMPaacc | Compare two items, setting Flags[0] | ||
| CMPijcc | Compare ints, setting Flags[0] | ||
| DIVpp | Divide any numeric except bit by any numeric except bit | ||
| JBR | distance to jump (relative to IP) | Jump if flags[0] is reset | |
| JBRF | distance to jump (relative to IP) | flag to check | Jump if a selected flag is reset |
| JBS | distance to jump (relative to IP) | Jump if flags[0] is set | |
| JBSF | distance to jump (relative to IP) | flag to check | Jump if a selected flag is set |
| JMP | address | Jump to the specified address | |
| LOADA | address | 1, 2 or 3 | Load addres into Addr1, Addr2 or Addr3 |
| JREL | distance to jump (relative to IP) | Change the IP | |
| MOVaa | Move the datum at Addr1 to Addr2 | ||
| MULpq | Multiply - pops two items from stacks, pushes product | ||
| NEGp | Negate - reverse the sign of the number on the top of the stack | ||
| NOTi | Bitwise not - reverse all bits, operate on top of stack datum | ||
| OR | Or - replace the top two bits with their logical or | ||
| POWpp |
Exponentiate - pops two items from stacks, push (top-1)^top |
||
| PUSHa | the value to push | Push a value onto a stack | |
| REMjj |
Modulus - remainder after division, any integers except bit |
||
| REPLL | repeat count | Repeat next instruction, list to list | |
| REPLS | repeat count | Repeat next instruction, list to single | |
| REPSL | repeat count | Repeat next instruction, single to list | |
| RET | return | ||
| SHLk | distance to shift | Bitwise shift left, top stack item | |
| SHRk | distance to shift | Bitwise shift right, top stack item | |
| SHRAI | distance to shift |
Shift right arithmetic, top integer shifted with sign extension |
|
| SHRAL | distance to shift |
Shift right arithmetic, top long shifted with sign extension |
|
| SUBpq |
Subtract - pops two items from stacks, push (top-1) - top |
||
| XOR | Xor - replace the top two bits with their logical xor |
The REP instructions may precede any of this list of instructions:
| Math | Shift | Bitwise and Logical |
| ADD | SHL | AND |
| DIV | SHR | NOT |
| MUL | SHRAI | OR |
| POW | SHRAL | XOR |
| REM | ||
| SUB | ||
| NEG |
The assembler layer consists of a quality code editor, the assembler itself and a machine-level debugger. While I hope that the target end-user will never see this layer, I expect that I'll use it extensively in building Decaf.
The language is partly designed. The information below is taken from the Javadoc of language.decaf. If you are not familiar with the Intel REPZ MOVS instruction, read my short article on The System.arraycopy() Magic.
Decaf - Java without the jitters™ - is a computer language that feels simple, like Basic, but is really close to Java. Decaf has a sneaky bit of APL, too, which gives it a lot of power. The design goal is to provide a modern, object-oriented language for Basic's original target audience: students and others who need to use a computer but aren't interested in professional computer programming.
The following notes give Java programmers a brief introduction to some of the features of Decaf.
A small API preserves the "Java without the jitters" feel of Decaf. It will be a contemporary API—fully graphical from the start.
Wherever possible, Java has been simplified. First, EOL is significant - a line of code is a terminated statement without the semicolon. (The semicolon is used as a statement continuation character.) Second, the idea of sender/receiver lists (a.k.a. argument/parameter lists) is generalized: they can be used wherever values are moved across a boundary, such as in method calls, method returns and in assignment statements.
The for statement is simplified to resemble the original Basic for loop. This sacrifices power in favor of being easy to teach.
A Block statement can be used anywhere a statement is permitted, but it is never required. Just as you can use a single statement or a block statement after an 'if' or a 'for' in Java (which grew from C++ which grew from C) you can use a single or a block in decaf. You can also have a method with a single statement in decaf, which is consistent. (Why wasn't this allowed in C?) If this were allowed in Java a setter might be:
public void setThing( Thing t ) myThing = t;
This language designer thinks it looks cleaner without the unnecessary braces.
The operator precedence table is discarded in favor of ultimate simplicity: all operators have equal precedence and processing is strictly left-to-right except as modified by parentheses. Here we lose the convenience of the intelligent, C-based precedence rules but we also simplify the teaching effort.
The left-to-right rule is strict. This is easy to learn but it has certain consequences that break with tradition. For example, assignment is left to right:
expression => variable
The Java setter above is this Decaf setter:
public (Thing t)setThing t => myThing
Why is the parameter list on the left? First, there are two types of methods in Decaf: procedures and functions. The former do not return a result; the latter do return a result. The definition of a function has the parameter list on the left and the return list on the right. Decaf is always left to right.
This may seem strange at first, but I suspect that it's less strange than reading an expression from left to right and then assigning from right to left. Read the Decaf setter slowly: A Thing, "t", is given to the "setThing" method which assigns "t" to the field "myThing." It reads just as it runs. Try the same read in Java and you'll see that it has the jitters. The Thing "t" is passed - right to left - to the "setThing" method; now to the far right "t" is passed left to the "myThing" field. Why have we put up with this for so long?
Now for the APL flavor.
If you've never met APL you've missed probably the strangest computer language ever designed. It was invented by Ken Iverson at IBM as a meta language. It is extremely cryptic (lots of Greek letters for operators) and extremely powerful. Around 1980 it took over on Wall Street as the language of choice for programmers in investment banking firms. Decaf attempts to capture a bit of its power without its cryptic nature.
Our simple variables (3 => int x) are called "singles" (APL calls them "scalars"). Multi-valued variables are called "lists" (7, 8, 9 => int[3] x). Our unary operators apply to singles or to lists. "-list" negates every member of the list. Every binary operator applies to singles, singles and lists, lists and singles or lists and lists. Given x holding list 7, 8, 9, the expression x + 1 produces the list 8, 9, 10.
Booleans are out. Beginners find the concept strange. The problem may be with the word "boolean." What's that?
Bits are in. A bit is an unsigned int equal to 0 or 1. A bit can be used as Java's booleans are used (1 == true). Combining this with the arithmetic virtues of 0 and 1 and the notion of lists yields powerful operations that are familiar to APL programmers. For example, assume that you have a list of customers and want to know how much the U.S. customers have spent:
bit[custs.length] inUS int[custs.length] spending custs.country == "U.S." => inUS inUS * custs.amtSpent => spending ( inUS )sum ? // reports # of custs in US ( spending )sum ? // reports total spending by US custs
(The "?" operator sends the result of an expression to the output console.) In the above a list of bits and a list of ints, both as long as the list "custs", are created in the first pair of lines. The == operator compares the single "U.S." to each list element, creating a list of bits as long as the custs list, with 1 values for each "U.S." customer and 0 values for the other customers. Multiplying this list by the "amtSpent" property assigns amtSpent or zero to the spending list.
An experienced analyst who only wanted the total spending might have done this:
( custs.country == "U.S." * custs.amtSpent )sum ?
Unlike Basic, Decaf is a strongly typed language, like Java. Unlike Java, Decaf's lists are strongly typed, too. (No more casts for whatever's put into ArrayLists or Vectors.) An expression list.field (custs.amtSpent) returns a list of "field" items.
An expression single op list or list op single applies the single to each list element producing another list of the same type that would be produced by single op single. The expression list1 op list2 is valid for lists that are equal in size. The operand is applied to each pair of elements - list1[0] op list2[0], list1[1] op list2[1], etc. - creating a third list.
The for each statement iterates through a list very nicely.
Lists have a length property - the number of elements in the list. They also have a shape property that is a list of the dimensions.
int x // single - shape is {} length is 1 int[1] x // list - shape is {1} length is 1 int[24] x // shape is {24} length is 24 int[2][3][4] x // shape is {2, 3, 4} length is 24
The shape of an item is a list of ints. As a list, it has a length property. The length of the shape of a single is zero.
Decaf calls lists "arrays" if the length of their shape (number of dimensions) is more than 1. Arrays vary their subscripts from left to right. Given int[2][3][4] x the elements are
1. x[0][0][0] 2. x[1][0][0] 3. x[0][1][0] 4. x[1][1][0] 5. x[0][2][0] 6. x[1][2][0] 7. x[0][0][1] . . .
The "unwind" method (APL called it "ravel") reduces any array to a simple, one-dimensional list. The "reshape" method (APL's rho operator) performs the opposite function. Note that these methods are extremely cheap as they don't actually touch the data - just the shape.
Operators in the form list1 op list2 apply to lists of equal length. There is no requirement for equal shape. I have no idea why anyone would want to list1 op list2 arrays of different shape, but it seems that if you let people use flexible tools, someone always finds a perfectly reasonable use for even the strangest operation.
The Decaf computer has built-in list support. (Credit where due: all this extends Intel's REPZ instruction.)
The default origin for lists is zero. The user may specify start and end, or start-only.
byte[12] eggs // dozen eggs numbered 0 thru 11 int[1-12] months // dozen months, 1 thru 12 double[1-?] amounts // ? amounts, 1 thru ?
The last range, "1-?" specifies numbering starting with 1 and not limited on the top side. (In Java this is an ArrayList or Vector but that loses strong typing.)
The native types are:
The whole language mechanism is here and in the related classes. A Tokenizer converts input into an array of Token objects. The DecafCompiler uses a Grammar built from an array of Productions (named Rules) to create a set of Instructions, which are then executed by the DecafComputer.
The IDE Layer is a simple IDE, integrating an editor, the Decaf compiler and a source-level debugger. The IDE layer is, in fact, a Decaf application.
A Decaf application includes the input console (where you can type "2 + 2 print"), the output console (which would say "4"), a class tree and the main application window.
|
Decaf Summarized |
The invention of time-sharing in the mid 1960's let many people use a computer simultaneously. This led to the requirement for a simple computer language, which was filled by the BASIC language. BASIC is still popular today as the scripting language in MS Office and Lotus Notes, among others.
The same requirement exists today for a language that is simpler and friendlier than Java for non-computer professionals. Decaf has been designed to fill that need.
Decaf starts at the bottom level with a Decaf Virtual Machine (DVM). This machine has an instruction set and simple programming tools will be built for native DVM programming.
The Decaf language will be built on these tools. It will support an Integrated Development Environment (IDE) targeted toward the programming newcomer.