#! /home/pornin/prog/bsforth/bsf-eng ( This is the Forth part of BSForth. When we enter this file, some basic Forth words are already implemented natively. ) : SOURCE ( -- c-addr u ) SOURCE-PTR @ SOURCE-LEN-PTR @ ; : NIP ( x1 x2 -- x2 ) SWAP DROP ; : IMMEDIATE ( -- ) LAST-WORD @ DUP WH-CFA-C + SWAP WH-CFA-I + @ SWAP ! ; : \ ( "ccc" -- ) SOURCE NIP >IN ! ; IMMEDIATE \ \ At that point, we have the comments introduced by a backslash. It is \ time for longer explanations. \ \ The BSForth engine is written in C and defines some of the most basic \ Forth words, as native words. This file implements the other Forth words. \ \ >IN native constant : 2DROP ( x1 x2 -- ) DROP DROP ; : 2DUP ( x1 x2 -- x1 x2 x1 x2 ) OVER OVER ; : HERE ( -- addr ) HERE-PTR @ ; : COMPILE, ( xt -- ) , ; \ STATE native constant : [ ( -- ) 0 STATE ! ; IMMEDIATE : ] ( -- ) -1 STATE ! ; \ \ Control-flow words: they use the OP_IF and OP_JMP opcodes; those \ opcodes have a static argument which contains an instruction-pointer \ displacement from the cell containing the static argument itself \ (hence an argument of CELLSIZE, where CELLSIZE is the size of a cell \ in bytes, means "no jump whatsoever"). \ \ On the control-flow stack, "orig" is the address of the opcode \ argument and "dest" is the address of the branch destination. The \ control-flow stack is implemented on the data stack itself. \ : IF ( C: -- orig ) OPCODE-IF , HERE 0 , ; IMMEDIATE : THEN ( C: orig -- ) HERE OVER - SWAP ! ; IMMEDIATE \ \ We cannot use the definition from the ANS standard because that \ definition needs POSTPONE and we need ELSE to define POSTPONE. We \ just "un-postpone" the calls. \ : ELSE ( orig1 -- orig2 / -- ) OPCODE-JMP , HERE 0 , SWAP HERE OVER - SWAP ! ; IMMEDIATE : BEGIN ( C: -- dest ) HERE ; IMMEDIATE : AGAIN ( C: dest -- ) OPCODE-JMP , HERE - , ; IMMEDIATE : UNTIL ( C: dest -- ) OPCODE-IF , HERE - , ; IMMEDIATE : AHEAD ( C: -- orig ) OPCODE-JMP , HERE 0 , ; IMMEDIATE \ \ Since the control-flow stack is implemented as the data-stack, and \ control-flow elements are simple cells, we can use plain PICK and \ ROLL for, respectively, CS-PICK and CS-ROLL . \ : CS-PICK ( C: destu ... orig0|dest0 -- destu ... orig0|dest0 destu )( S: u -- ) PICK ; : CS-ROLL ( C: origu|destu origu-1|destu-1 ... orig0|dest0 -- origu-1|destu-1 ... orig0|dest0 origu|destu )( S: u -- ) ROLL ; : CHAR ( "name" -- char ) PARSE-WORD 0= IF DROP 32 \ Note: we use the literal "32" and not BL because ELSE \ BL is not yet defined C@ THEN ; \ \ The definition of [CHAR] could use POSTPONE and LITERAL but those are \ not yet defined. \ : [CHAR] ( "name" -- ) CHAR OPCODE-LIT , , ; IMMEDIATE : CR ( -- ) 10 EMIT ; : / ( n1 n2 -- n3 ) /MOD NIP ; : > ( n1 n2 -- flag ) SWAP < ; \ \ Like / , but with unsigned numbers (not part of ANS Forth). \ : U/ ( u1 u2 -- u3 ) U/MOD NIP ; \ DEPTH native \ \ Parse a string from the input buffer, using the provided character as \ delimiter. The string is compiled into the current definition. Note that \ this word is not immediate. \ : COMPILE-STRING ( cc "ccc" -- ) ( runtime: -- c-addr u ) OPCODE-ALIT , HERE 0 , \ prepare opcode and push arg address HERE \ push MOVE destination ROT PARSE DUP ALLOT DUP >R \ parse string, allocate and keep length SWAP -ROT MOVE ALIGN \ copy string and re-align HERE HERE OVER - SWAP ! \ adjust opcode argument OPCODE-LIT , R> , ; \ compile the string length \ \ This definition of S" works only in compilation mode. ANS Forth (in the \ "file access" extension words -- note in the core words) states that S" \ may be used outside compilation mode, using some buffer which is at least \ 80-character long, and potentially transient. \ \ TODO: extend S" with proper interpretation semantics. This can be done \ later on. \ : S" [CHAR] " COMPILE-STRING ; IMMEDIATE \ \ This word is not from ANS Forth. It works along with FIND-WORD and is \ used to abort with a descriptive text when a word was looked for but \ not found. \ : CHECK-NAME-EXISTS ( c-addr u 0 | c-addr u wa n -- wa ) S" word not found: " ROT 0= IF TYPE TYPE CR ABORT ELSE 2DROP >R 2DROP R> THEN ; \ \ Parse a (blank-delimited) word name and find its header address. \ Complain and abort if the word does not exist. Not part of ANS Forth. \ : NEXT-WORD ( "name" -- wa ) PARSE-WORD 2DUP FIND-WORD-HEADER CHECK-NAME-EXISTS ; \ \ Get interpretation semantics (as an xt) from the word header address. \ Not part of ANS Forth. \ : INTERPRETATION-SEMANTICS ( wa -- xt ) WH-CFA-I + @ ; \ \ Get compilation semantics (as an xt) from the word header address. \ The returned xt is 0 for default compilation semantics. \ : COMPILATION-SEMANTICS ( wa -- xt ) WH-CFA-C + @ ; \ \ Set interpretation semantics of the provided word (designated by \ its header address). Not part of ANS Forth. \ : SET-INTERPRET ( interpret-xt wa -- ) WH-CFA-I + ! ; \ \ Set compilation semantics of the provided word (designated by \ its header address). Not part of ANS Forth. \ : SET-COMPILE ( interpret-xt wa -- ) WH-CFA-C + ! ; : ['] ( "name" -- ) NEXT-WORD INTERPRETATION-SEMANTICS OPCODE-LIT , , ; IMMEDIATE : POSTPONE ( "name" -- ) NEXT-WORD DUP COMPILATION-SEMANTICS DUP 0= IF DROP OPCODE-LIT , INTERPRETATION-SEMANTICS , ['] , , ELSE COMPILE, DROP THEN ; IMMEDIATE : LITERAL ( x -- ) ( runtime: -- x ) OPCODE-LIT , , ; IMMEDIATE : [COMPILE] ( "name" -- ) NEXT-WORD DUP COMPILATION-SEMANTICS DUP 0= IF DROP INTERPRETATION-SEMANTICS ELSE NIP THEN COMPILE, ; IMMEDIATE \ ABORT native : ABORT" ( "ccc" -- ) POSTPONE S" ROT IF TYPE ABORT ELSE DROP DROP THEN ; IMMEDIATE : WHILE ( dest -- orig dest / flag -- ) POSTPONE IF 1 CS-ROLL ; IMMEDIATE : REPEAT ( orig dest -- / -- ) POSTPONE AGAIN POSTPONE THEN ; IMMEDIATE : 1+ ( n1|u1 -- n2|u2 ) 1 + ; : 1- ( n1|u1 -- n2|u2 ) 1 - ; : NEGATE ( n1 -- n2 ) 0 SWAP - ; : ABS ( n -- u ) DUP 0< IF NEGATE THEN ; : DO-INTERNAL POSTPONE SWAP POSTPONE >R POSTPONE >R POSTPONE BEGIN ; : DO ( C: -- do-sys ) ( runtime: n1|u1 n2|u2 -- / R: -- loop-sys ) 0 DO-INTERNAL ; IMMEDIATE : ?DO POSTPONE 2DUP POSTPONE - POSTPONE IF -1 DO-INTERNAL ; IMMEDIATE : I ( -- n|u ) ( R: loop-sys -- loop-sys ) POSTPONE R@ ; IMMEDIATE \ J native : UNLOOP ( -- ) ( R: loop-sys -- ) POSTPONE R> POSTPONE DROP POSTPONE R> POSTPONE DROP ; IMMEDIATE : LOOP ( C: do-sys -- ) ( runtime: -- / R: loop-sys1 -- | loop-sys2 ) POSTPONE R> POSTPONE R@ POSTPONE SWAP POSTPONE 1+ POSTPONE DUP POSTPONE >R POSTPONE = POSTPONE UNTIL POSTPONE UNLOOP IF POSTPONE ELSE POSTPONE 2DROP POSTPONE THEN THEN ; IMMEDIATE : WITHIN ( test low high -- flag ) OVER - >R - R> U< ; \ +LOOP requires some tricky computations. \ \ We keep on the return stack the limit "l" and the index "i". We want \ to add the increment "k". If we compute "b = i - l" and "c = b + k" \ then "b" represents our relative position prior to the increment and \ "c" just after the increment. \ \ We partition the relative positions into the strictly negative \ (N-space) and the positive or zero (P-space). We use n-bit two's \ complement, hence those two spaces have exactly the same length \ (2^(n-1)). The limit is crossed (thus ending the loop) if "b" is in \ one of the spaces, "c" in the other, and the transition is done \ through the 0 and _not_ through the wrap-around "2^(n-1) <-> -2^(n-1)". \ The sign of "k" tells us which way we move. \ \ Thus, we end the loop if and only if "b" and "c" have different \ signs (0 is considered positive) _and_ "b" and "k" have different \ signs. Since we use two's complement, we can perform some bitwise \ computations which will involve the sign bits, and get the final \ result with a sign test. If "bs" is the sign bit of "b", "cs" the \ sign bit of "c" and "ks" the sign bit of "k", then we end the loop \ if and only if "((bs XOR cs) AND (bs XOR ks)) = 1", i.e. \ "((b XOR c) AND (b XOR k)) < 0". I have not found a cheaper expression. \ \ Another fully different trick is to encode the loop index differently: \ use "j = start - (limit XOR minint)" where "minint" is -2^(n-1). Then \ we add "k" (the increment) to "j" and the loop ends when this signed \ addition overflows or underflows (which can be tested at assembly level). \ The I and J word become a bit more complex. \ (TODO: understand, verify and maybe implement this trick) \ \ This word takes as arguments k, i and l, computes internally b=i-l, \ c=b+k, tests whether ((b XOR c) AND (b XOR k)) < 0. It also lets \ on the stack the sum u+v (u=k and v=i) : +LOOP-HELPER ( u v k i l -- flag u+v ) - 2DUP + OVER XOR >R XOR R> AND 0< -ROT + ; : +LOOP ( C: do-sys -- ) ( runtime: n -- / R: loop-sys1 -- | loop-sys2 ) POSTPONE R> POSTPONE 2DUP POSTPONE R@ POSTPONE +LOOP-HELPER POSTPONE >R POSTPONE UNTIL POSTPONE UNLOOP IF POSTPONE ELSE POSTPONE 2DROP POSTPONE THEN THEN ; IMMEDIATE \ obsolete \ : FOO \ S" ZOINX1" TYPE CR \ 3 0 DO \ 48 42 DO \ S" HUP" TYPE J . I . CR \ LOOP \ LOOP \ S" ZOINX2" TYPE CR ; \ \ FOO \ \ : BAR \ 0 17 DO \ S" ZAP" TYPE I . CR \ -5 +LOOP ; \ \ BAR : CONSTANT ( x "name" -- ) >R : R> POSTPONE LITERAL POSTPONE ; ; 32 CONSTANT BL : SPACE ( -- ) BL EMIT ; : SPACES ( n -- ) DUP 0 > IF 0 ?DO SPACE LOOP ELSE DROP THEN ; \ \ Get the "numeric character" for a given digit (0..9 then A..Z). In ASCII, \ 'A' is 65 whereas '9' is 57, so there is a 7-value gap. \ : NUMERIC-CHAR ( n -- n ) 9 OVER < IF 7 + THEN 48 + ; \ \ Convert a signed number into its digits, which are assembled on the \ stack, most-significant coming on top of the stack. Then the number \ of digits is pushed. If the input number is 0, a single 0-digit is \ created; otherwise, the most significant digit is not used. If the \ input number is negative, all digits are negated. \ \ Note: this implementation works only if /MOD implements symmetric \ division (rounded towards 0). In BSForth, /MOD is implemented with the \ native '/' C operator, which guarantees such rounding (at least \ in C99). \ : DIGIT-SPLIT ( n -- d0 d1 .. dk k ) 0 >R BEGIN BASE @ /MOD R> 1+ >R DUP 0= UNTIL DROP R> ; : . ( n -- ) BL EMIT DUP 0< IF [CHAR] - EMIT THEN DIGIT-SPLIT 0 DO ABS NUMERIC-CHAR EMIT LOOP ; : .R ( n1 n2 -- ) OVER 0< IF 1- -1 >R ELSE 0 >R THEN >R DIGIT-SPLIT R> OVER - SPACES R> IF [CHAR] - EMIT THEN 0 DO ABS NUMERIC-CHAR EMIT LOOP ; \ \ This word works like DIGIT-SPLIT but with unsigned numbers. It uses \ the non-standard U/MOD word, which implements division and remainder \ with unsigned numbers. \ : UDIGIT-SPLIT ( u -- d0 d1 .. dk k ) 0 >R BEGIN BASE @ U/MOD R> 1+ >R DUP 0= UNTIL DROP R> ; : U. ( u -- ) BL EMIT UDIGIT-SPLIT 0 DO NUMERIC-CHAR EMIT LOOP ; : U.R ( u n -- ) >R UDIGIT-SPLIT R> OVER - SPACES 0 DO NUMERIC-CHAR EMIT LOOP ; : .S ( -- ) 32 EMIT [CHAR] < EMIT DEPTH . [CHAR] > EMIT DEPTH 0 ?DO DATA-STACK-BASE I 1+ CELLS - @ . LOOP CR ; \ ! native \ # TODO \ #> TODO \ #S TODO : ' ( "name" -- xt ) NEXT-WORD INTERPRETATION-SEMANTICS ; \ ( native \ * native \ */ TODO \ */MOD TODO \ + native : +! ( n|u a-addr -- ) DUP @ ROT + SWAP ! ; \ , native \ - native : ." ( "quote" -- ) POSTPONE S" POSTPONE TYPE ; IMMEDIATE \ /MOD native \ U/MOD native \ 0< native \ 0= native : 2! ( x1 x2 a-addr -- ) SWAP OVER ! CELL+ ! ; : 2* ( x1 -- x2 ) DUP + ; \ 2/ native : 2@ ( a-addr -- x1 x2 ) DUP CELL+ @ SWAP @ ; : 2OVER ( x1 x2 x3 x4 -- x1 x2 x3 x4 x1 x2 ) 3 PICK 3 PICK ; : 2SWAP ( x1 x2 x3 x4 -- x3 x4 x1 x2 ) 3 ROLL 3 ROLL ; \ : native \ ; native \ < native \ <# TODO \ = native \ \ CREATEd words are supposed to begin with an OPCODE-BODY opcode, which \ is followed by an argument (the xt of the code to execute) and then by \ the word body. \ : >BODY ( xt -- a-addr ) 2 CELLS + ; \ >NUMBER native \ >R native : ?DUP ( x -- 0 | x x ) DUP IF DUP THEN ; \ @ native \ ACCEPT TODO \ ALIGN native \ ALLOT native \ AND native \ BASE native constant \ C! native : C, ( x -- ) HERE 1 ALLOT C! ; \ C@ native \ CELL+ native \ CELLS native : CHAR+ ( c-addr1 -- c-addr2 ) 1+ ; : CHARS ( n1 -- n2 ) ; : COUNT ( c-addr1 -- c-addr2 u ) DUP CHAR+ SWAP C@ ; \ CREATE native : DECIMAL ( -- ) 10 BASE ! ; \ \ This word is used at runtime by DOES> ; it pops an address and \ modifies the last defined word so that it uses that address as \ interpretation semantics, immediately after the initial OPCODE-BODY \ opcode. This should be used only with CREATEd words. \ : DOES>-HELPER ( cfa -- ) LAST-WORD @ WH-CFA-I + @ CELL+ ! ; \ \ For DOES> , we compile in a call to DOES>-HELPER with the appropriate \ argument, followed by an exit opcode. \ : DOES> ( C: colon-sys1 -- colon-sys2 ) OPCODE-LIT , HERE 3 CELLS + , POSTPONE DOES>-HELPER OPCODE-EXIT , ; IMMEDIATE \ DROP native \ DUP native \ EMIT native \ ENVIRONMENT? TODO \ EVALUATE TODO : EXECUTE ( i*x xt -- j*x ) [ OPCODE-EXECUTE COMPILE, ] ; : EXIT ( -- ) ( R: nest-sys -- ) OPCODE-EXIT , ; IMMEDIATE \ FILL native \ FIND TODO \ FM/MOD TODO \ HOLD TODO \ INVERT native \ KEY TODO \ LEAVE TODO \ LSHIFT native \ M* TODO : MAX ( n1 n2 -- n3 ) 2DUP < IF SWAP THEN DROP ; : MIN ( n1 n2 -- n3 ) 2DUP > IF SWAP THEN DROP ; : MOD ( n1 n2 -- n3 ) /MOD DROP ; \ MOVE native \ OR native \ OVER native \ QUIT TODO \ R> native \ R@ native : RECURSE ( -- ) CURRENT-XT @ , ; IMMEDIATE \ ROT native \ -ROT native \ RSHIFT native : S>D ( n -- d ) DUP 0< ; \ SIGN TODO \ SM/REM TODO \ SWAP native \ TYPE native \ U< native \ UM* TODO \ UM/MOD TODO : VARIABLE ( "name" -- ) CREATE 1 CELLS ALLOT ; \ WORD TODO \ XOR native \ ========================================================================== \ #TIB TODO : 0<> ( x -- flag ) 0= 0= ; : 0> ( n -- flag ) 0 SWAP < ; : 2>R ( x1 x2 -- ) ( R: -- x1 x2 ) SWAP >R >R ; : 2R> ( -- x1 x2 ) ( R: x1 x2 -- ) R> R> SWAP ; : 2R@ ( -- x1 x2 ) ( R: x1 x2 -- x1 x2 ) R> R> 2DUP >R >R SWAP ; \ \ For :NONAME, we push a 0 (so that ; knows what to do), reference \ CURRENT-XT (for RECURSE) and enter compilation state. Unnamed code \ chunks have no header. \ : :NONAME ( C: -- colon-sys ) ( S: -- xt ) ALIGN HERE CURRENT-XT ! 0 ] ; \ \ BSForth uses, for each word, two code field addresses, one for \ interpretation semantics and one for compilation semantics. The \ following word is used to define a word by specifying both its \ interpretation and compilation semantics. Its name and semantics are \ copied from those in Gforth. \ : INTERPRET/COMPILE: ( interpret-xt compile-xt "name" -- ) : POSTPONE ;NO-TERMINATOR LAST-WORD @ SWAP OVER WH-CFA-C + ! WH-CFA-I + ! ; \ \ This implementation of .( works both when compiling and when interprating \ (ANS Forth does not define interpretation behaviour). \ :NONAME ( "ccc" -- ) [CHAR] ) PARSE TYPE ; :NONAME ( "ccc" -- ) [CHAR] ) COMPILE-STRING POSTPONE TYPE ; INTERPRET/COMPILE: .( : <> ( x1 x2 -- flag ) = 0= ; \ C" TODO \ This implementation is copied from ANS Forth 0 CONSTANT CASE IMMEDIATE ( init count of OFs ) : OF ( #of -- orig #of+1 / x -- ) 1+ >R POSTPONE OVER POSTPONE = POSTPONE IF POSTPONE DROP R> ; IMMEDIATE : ENDOF ( orig1 #of -- orig2 #of ) >R POSTPONE ELSE R> ; IMMEDIATE : ENDCASE ( orig1..orign #of -- ) POSTPONE DROP 0 ?DO POSTPONE THEN LOOP ; IMMEDIATE \ CONVERT TODO : ERASE ( addr u -- ) 0 FILL ; \ EXPECT TODO 0 CONSTANT FALSE : HEX ( -- ) 16 BASE ! ; \ MARKER TODO \ Not sure about this one... : PAD ( -- c-addr ) HERE ; \ PARSE native \ PICK native \ QUERY TODO \ REFILL native \ RESTORE-INPUT TODO \ ROLL native \ SAVE-INPUT TODO : SOURCE-ID ( -- 0 | -1 | file-id ) SOURCE-ID-PTR @ ; \ SPAN TODO \ TIB TODO \ \ TO must have a special definition because it has defined \ interpretation semantics AND defined non-default compilation \ semantics, and those are not equal (hence, that word cannot be simply \ immediate). Making it state-smart (immediate, with a behaviour which \ depends upon the value of STATE) is not a satisfying solution (it \ fails in a number of settings involving '). TO is the main reason why \ BSForth has two xts for each word, one for interpretation and one for \ execution. \ :NONAME ' >BODY ! ; :NONAME ' POSTPONE LITERAL POSTPONE >BODY POSTPONE ! ; INTERPRET/COMPILE: TO \ \ State-smart definition (obsolete). \ \ : TO ( x "name" -- ) ( C: "name" -- ) \ STATE @ IF \ ' POSTPONE LITERAL POSTPONE >BODY POSTPONE ! \ ELSE \ ' >BODY ! \ THEN ; IMMEDIATE : VALUE ( x "name" -- ) CREATE , DOES> @ ; -1 CONSTANT TRUE : TUCK ( x1 x2 -- x2 x1 x2 ) DUP -ROT ; : U> ( u1 u2 -- flag ) SWAP U< ; \ UNUSED TODO \ ==========================================================================