Latest Tweets

BennuGD parser BUG detecting char[]=

The issue

People used to developing in C or C++ are quite familiar with escaping some characters when defining arrays of some sort. For example, let’s show a trivial code snippet in order to define some valid chars and put them all inside an initialized array of chars:

char _valid[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789";

Now, in BennuGD language it works pretty the same, as long as you are NOT INTENDING to escape, let’s say, the quotation char . In C, it’d look like:

char _valid[] = "abcde ...\"";

So, \” is the usual way of escaping the quotation symbol in C or C-derivatives. When trying to do so in BennuGD, its compiler (bgdc) says:

tonicas@catxarru:~/dev/kids/editor/tests$ bgdc -d ./chars.prg
BGDC 1.0.0 (Jun 20 2010 23:59:57)
Copyright © 2006-2010 SplinterGU (Fenix/BennuGD)
Copyright © 2002-2006 Fenix Team (Fenix)
Copyright © 1999-2002 José Luis Cebrián Pagüe (Fenix)
Bennu Game Development comes with ABSOLUTELY NO WARRANTY;
see COPYING for details

././chars.prg:7: error: “;” expected (12345)

Pretty annoying, huh? The code I was trying to compile looked like:

1 import “mod_say”
2 import “mod_key”
3
4 Process Main()
5 Private
6     /* C style, we are going to escape ” this way: “\””; */
7     char a[] = “abcdefghijkl’\”12345”;
8     int idx=-1;
9 Begin
10
11     Repeat
12         frame;
13     Until(Key(_ESC))
14
15 End

Again, I had to deal with the BenuGD entrails. So let’s do it, matey!

Inside the BennuGD strings.c parser

BennuGD can process and parse the code language structure using different C files, but when it comes to dealing with strings, the one to be looking at is core/bgdc/src/strings.c. After reading this source file carefully for the best part of an hour, I had an idea: Hey, let’s try one character instead! So I did:

char a[] = “\””;

Then, I compiled – or at least I tried to -, my BennuGD source file, getting what you are about to see:

BGDC 1.0.0 (Jun 20 2010 23:59:57)
Copyright © 2006-2010 SplinterGU (Fenix/BennuGD)
Copyright © 2002-2006 Fenix Team (Fenix)
Copyright © 1999-2002 José Luis Cebrián Pagüe (Fenix)
Bennu Game Development comes with ABSOLUTELY NO WARRANTY;
see COPYING for details

././chars.prg:16: error: “;” expected (EOF)

Again! The compiler says it finds “;”. Somehow, my first \ char was totally ignored. So, bearing this in mind, I came to happen quite close to the problem’s root right ‘ere:

if ( *( *source ) == c )   /* Termina la string? */

Hmm… quite interesting. This code was determining WHEN the string comes to an end. In order to do so, it compares with . All the code inside the function string_compile() compares one character at a time, not having any worries at all about, well, the previous one. AHA! I thought I had it! That was the problem: if you are comparing one char at a time, it is TOTALLY impossible to detect any kind of escape sequence, is that right? Of course it is, goddamnit!, ’cause an escape sequence has two characters not just one!

Let’s add some code to fix it!

Pretty easy. I add this code – and, in doing so, changing the line I discussed earlier – :

150         /* TCG patch: process \", that is, to escape " using char[] = "ancde\""; */
151         if( *(*source)==0x5c && *(*source+1)==0x22){
152             (*source)++;        /* Get quotation char instead of backslash */
153             goto _lbl_convert;
154         }
(...)
193 _lbl_convert:
194             conv = convert( *( *source ) ) ;
195             string_mem[ string_used++ ] = conv ;
196 
197             ( *source ) ++ ;
198         }

Now, we are going to determine that, when our string character is \ and then it comes , it is an escape sequence, so the right character we have to process is not \, but instead. That’s why I added 1 to the pointer and then jump to _lbl_convert. Okay, let’s see what happens now after compiling BennuGD sources again and using it to compile my BennuGD code – let’s add some debug information by the way- :

—- 11 strings —-

0:
1: 2010/08/22
2: 19:03:57
3: 1.0.0
4: mod_say
5: mod_say
6: mod_key
7: mod_key
8: libkey
9: libsdlhandler
10: “

Have a look at the 10th line. It worked! Now, we are going to try the same thing, but now my array’s gonna contain more chars:

char _valid[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789\"$%&/(!";

—- 11 strings —-

0:
1: 2010/08/22
2: 19:05:24
3: 1.0.0
4: mod_say
5: mod_say
6: mod_key
7: mod_key
8: libkey
9: libsdlhandler
10: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789$%&/(!

Again, the 10th line shows clearly that, among all those chars, we have got the one. Yeah!

Strangely, as soon as I executed the BennuGD DBC file …

According to the compilation process, all was pretty fine. The “chars” appearing on the console were, no doubt, the right ones. So, when I ran the program through the interpreter (bgdi), I was totally astonished because of what I got (below, the example BennuGD code I ran):

1 import “mod_say”
2 import “mod_key”
3
4 Process Main()
5 Private
6     /* C style, we are going to escape ” this way: “\””; */
7     char _valid[] =”abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789\”$%&/(! “;
8     String chars;
9     int idx = -1;
10 Begin
11     While(_valid[idx++]!= ‘ ‘)
12         chars+=_valid[idx];
13     End
14     say(“Chars are: ” + chars);
15 End

What that trivial code reported me was:

Chars are: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567″”$%&/(!

Shit! Obviously, I had to have a better look at the BennuGD sources. So, I came to realize something odd was happening when running the program, this way I came to a realization: whatever it was, it had to be when “generating” the DCB segments, right ‘ere (file core/bgdc/src/c_code.c), function compile_array_data():

76                   segment_add_as( data,
77                             (*(str+2)==0x22)?*(str+=2):*str++,
79                             *t);

This code snippet is in charge of generating the array characters for BennuGD DCB segment. I got two chars all the time, ommiting at the same time the next two chars following my . In the previous execution output, I got the string composed of all the previous chars before but the two 8 and 9, being replaced by two identical characters. This segment_add_as code snippet was the key. I do some trick code this way:

75 //                TCG patch to detect \" escape sequences:
76                   segment_add_as( data,
77 //                          (*(str+2)==0x22)?*(str+=2):*str++,
78                             (*(str+1)==0x5c && *(str+2)==0x22)?*(str+=3):*str++,
79                             *t
80                   );

Then, I recompiled the BennuGD sources and tried again:

Chars are: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789″$%&/(!

So far, so good. This time it worked!