PDA

View Full Version : Help


KBH_BIO
04-09-2006, 12:09 PM
Hi guys, newbie nih di forum ini.

Gue mao nanya dong, ada yang pernah bikin Lexical Analyser untuk C ++ nga di Java?

ecwx
04-09-2006, 12:40 PM
gw belon pernahhhh :D

Shaka_RDR
04-09-2006, 04:43 PM
Lexical Analyser itu apaan yah ?
pengenalan tulisan ? atau pengenalan semantic ? ga gitu jelas nih

KBH_BIO
05-09-2006, 08:58 AM
Lexical analyser itu kayak pengenalan tulisan. tapi gue mao ampe di kelompok kelompokin.
Ada yang pernah bikin nga??

Shaka_RDR
05-09-2006, 09:59 AM
menggunakan bahasa pemrograman apa yah ?
skrispsi kk angkat gw buat pengenalan tulisan tangan menggunakan Fuzzy Artmap , pakenya bahasa java. bener itu yang dimaksud ?

atau ... pengenalan semantic bahasa indonesia yang seperti temen gw buat untuk skripsi nya ?

btw Lexical itu apa sih ? klo maen ACM ada kata "Lexicography" tapi gw kaga pernah ngerti :P

KBH_BIO
07-09-2006, 08:34 AM
macem yang pengenalan semantic yang bahasa indonesia itu. tapi kalo gue pengenalan bahasa C. dan cuman ampe di pisah pisah doang. gue lagi bingung algoritmanya. bisa bantu??

ruboW
08-09-2006, 09:34 AM
apa maksud anda tulisan kayak gini:

ini ibu budi ->dipecah jadi
i ni i bu bu di

Lexical analysis is the processing of an input sequence of characters (such as the source code of a computer program) to produce, as output, a sequence of symbols called "lexical tokens", or just "tokens". For example, lexers for many programming languages convert the character sequence 123 abc into two tokens: 123 and abc (whitespace is not a token in most languages). The purpose of producing these tokens is usually to forward them as input to another program, such as a parser.

A lexical analyzer, or lexer for short, can be thought of having two stages, namely a scanner and an evaluator. (These are often integrated, for efficiency reasons, so they operate in parallel.)

The first stage, the scanner, is usually based on a finite state machine. It has encoded within it information on the possible sequences of characters that can be contained within any of the tokens it handles (individual instances of these character sequences are known as a lexemes). For instance, an integer token may contain any sequence of numerical digit characters. In many cases the first non-whitespace character can be used to deduce the kind of token that follows, the input characters are then processed one at a time until reaching a character that is not in the set of characters acceptable for that token (this is known as the maximal munch rule). In some languages the lexeme creation rules are more complicated and may involve backtracking over previously read characters.

A lexeme, however, is only a string of characters known to be of a certain type. In order to construct a token, the lexical analyzer needs a second stage, the evaluator, which goes over the characters of the lexeme to produce a value. The lexeme's type combined with its value is what properly constitutes a token, which can be given to a parser. (Some tokens such as parentheses do not really have values, and so the evaluator function for these can return nothing. The evaluators for integers, identifiers, and strings can be considerably more complex. Sometimes evaluators can suppress a lexeme entirely, concealing it from the parser, which is useful for whitespace and comments.)
source: http://en.wikipedia.org/wiki/Lexical_analysis
http://www.cs.princeton.edu/~appel/modern/java/JLex/current/Main.java
http://www.cs.princeton.edu/~appel/modern/java/JLex/current/manual.html
source:http://www.cs.princeton.edu/~appel/modern/java/JLex/

ruboW
08-09-2006, 09:39 AM
int read_ch(void) {
int ch = fgetc(source);
cur_col++;
if (ch == '\n') {
cur_line++;
cur_col = 0;
}
return ch;
}

void put_back(int ch) {
ungetc(ch, source);
cur_col--;
if (ch == '\n') cur_line--;
}

Symbol getsym(void) {
int ch;

while ((ch = read_ch()) != EOF && ch <= ' ')
;
err_line = cur_line;
err_col = cur_col;
switch (ch) {
case EOF: return eof;
case '+': return plus;
case '-': return minus;
case '*': return times;
case '/': return slash;
case '=': return eql;
case '(': return lparen;
case ')': return rparen;
case ',': return comma;
case ';': return semicolon;
case '.': return period;
case ':':
ch = read_ch();
return (ch == '=') ? becomes : nul;
case '<':
ch = read_ch();
if (ch == '>') return neq;
if (ch == '=') return leq;
put_back(ch);
return lss;
case '>':
ch = read_ch();
if (ch == '=') return geq;
put_back(ch);
return gtr;
default:
if (isdigit(ch)) {
num = 0;
do { /* no checking for overflow! */
num = 10 * num + ch - '0';
} while ((ch = read_ch()) != EOF && isdigit(ch));
put_back(ch);
return number;
}
if (isalpha(ch)) {
Entry *entry;
id_len = 0;
do {
if (id_len < MAX_ID) {
id[id_len] = (char)ch;
id_len++;
}
} while ((ch = read_ch()) != EOF && isalnum(ch));
id[id_len] = '\0';
put_back(ch);
entry = find_htab(keywords, id);
return entry ? (Symbol)get_htab_data(entry) : ident;
}

error("getsym: invalid character '%c'", ch);
return nul;
}
}

int init_scan(const char fn[]) {
if ((source = fopen(fn, "r")) == NULL) return 0;
cur_line = 1;
cur_col = 0;
keywords = create_htab(11);
enter_htab(keywords, "begin", beginsym);
enter_htab(keywords, "call", callsym);
enter_htab(keywords, "const", constsym);
enter_htab(keywords, "do", dosym);
enter_htab(keywords, "end", endsym);
enter_htab(keywords, "if", ifsym);
enter_htab(keywords, "odd", oddsym);
enter_htab(keywords, "procedure", procsym);
enter_htab(keywords, "then", thensym);
enter_htab(keywords, "var", varsym);
enter_htab(keywords, "while", whilesym);
return 1;
}

Now, contrast the above code with the code needed for a FLEX generated scanner for the same language:

%{
#include "y.tab.h"
%}

digit [0-9]
letter [a-zA-Z]

%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"." { return PERIOD; }
":=" { return BECOMES; }
"=" { return EQL; }
"<>" { return NEQ; }
"<" { return LSS; }
">" { return GTR; }
"<=" { return LEQ; }
">=" { return GEQ; }
"begin" { return BEGINSYM; }
"call" { return CALLSYM; }
"const" { return CONSTSYM; }
"do" { return DOSYM; }
"end" { return ENDSYM; }
"if" { return IFSYM; }
"odd" { return ODDSYM; }
"procedure" { return PROCSYM; }
"then" { return THENSYM; }
"var" { return VARSYM; }
"while" { return WHILESYM; }
{letter}({letter}|{digit})* {
yylval.id = (char *)strdup(yytext);
return IDENT; }
{digit}+ { yylval.num = atoi(yytext);
return NUMBER; }
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%

int yywrap(void){return 1;} maaf!!! double post.. soalnya yg berikut ini agak panjang tulisannya