CakePHP encoding problem : storing uppercase S with caron on top, saves in the database but causes errors while processed by cake

0

So I am working in a site that sores cuneiform tablets info. We use semitic chars for transliteration.

In my script, I create a term list from the translittaration of a tablet.

My problem is that with the Š, my script created two different terms because it thinks there is a space in the word because of the way cake treats the special char.

Exemple :

Partial contents of a tablet :

  1. utu-DIŠ-nu-il2

Terms from the tablet when treated by my script :

utu-DIŠ, -nu-il2

it should be :

utu-DIŠ-nu-il2

When I print the contents of my array in course of treatment of the contents, I see this :

  1. utu-DI� -nu-il2

So this means the uncorrect parsing of the text creates a space that is interpreted in my script as 2 words instead of one.

In the database, the text is fine…

I also get these errors :

Warning (512): SQL Error: 1366: Incorrect string value: ‘xC5’ for column ‘term’ at row 1 [COREcakelibsmodeldatasourcesdbo_source.php, line 684]

Query: INSERT INTO terms (term, lft, rght) VALUES (‘utu-DI�’, 449, 450)

Query: INSERT INTO terms (term, lft, rght) VALUES (‘A�’, 449, 450)

Query: INSERT INTO terms (term, lft, rght) VALUES (‘xDI�’, 449, 450)

Anybody knows what I could do to make this work ?

Thanks !

Added info :

    $terms=$this->data['Tablet']['translit'];
$terms= str_replace(array('rn', 'r', 'n','nr','t'), ' ', $terms);
$terms = trim($terms, chr(173));
print_r($terms);
$terms = preg_replace('/s+/', ' ', $terms);
$terms = explode(" ", $terms);
$terms=array_map('trim', $terms);
$anti_terms = array('@tablet','1.','2.','3.','4.','5.','6.','7.','7.','9.','10.','11.','12.','13.','14.','15.','16.','17.','18.','19.','20.','Rev.',
'Obv.','@tablet','@obverse','@reverse','C1','C2','C3','C4','C5','C6','C7','C8','C9', 'r', 'n','rn', 't',''. ' ', null, chr(173), 'x', '[x]','[...]' );
foreach($terms as $key => $term) {
    if(in_array($term, $anti_terms) || is_numeric($term)) {
        unset($terms[$key]);
        }
    }

If I put my print_r before the preg, the S are good, if I do it after, they display with the black lozenge. So I guess the preg function is the problem !


just found this :
http://www.php.net/manual/fr/function.preg-replace.php#84385

But it seems that

mb_ereg_replace()

causes the same problem as preg_replace() ….


Solutuion :

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
$terms = mb_ereg_replace('s+', ' ', $terms);

and error is gone … !