Merci Beer !
J’ai presque honte de ne pas avoir vu ça…
Je réalise également que j’aurais pu joindre tout le code concerné…
Comme il me reste encore un truc incompris, je corrige ça :
/**
* Return array of Emojis. We can't move this function inside a common lib because we need it for security before loading any file.
*
* @return array<string,array<string>> Array of Emojis in hexadecimal
* @see getArrayOfEmojiBis()
*/
function getArrayOfEmoji()
{
$arrayofcommonemoji = array(
'misc' => array('2600', '26FF'), // Miscellaneous Symbols
'ding' => array('2700', '27BF'), // Dingbats
'????' => array('9989', '9989'), // Variation Selectors
'vars' => array('FE00', 'FE0F'), // Variation Selectors
'pict' => array('1F300', '1F5FF'), // Miscellaneous Symbols and Pictographs
'emot' => array('1F600', '1F64F'), // Emoticons
'tran' => array('1F680', '1F6FF'), // Transport and Map Symbols
'flag' => array('1F1E0', '1F1FF'), // Flags (note: may be 1F1E6 instead of 1F1E0)
'supp' => array('1F900', '1F9FF'), // Supplemental Symbols and Pictographs
);
return $arrayofcommonemoji;
}
/**
* Return the real char for a numeric entities.
* WARNING: This function is required by testSqlAndScriptInject() and the GETPOST 'restricthtml'. Regex calling must be similar.
*
* @param array<int,string> $matches Array with a decimal numeric entity into key 0, value without the &# into the key 1
* @return string New value
*/
function realCharForNumericEntities($matches)
{
$newstringnumentity = preg_replace('/;$/', '', $matches[1]);
//print ' $newstringnumentity='.$newstringnumentity;
if (preg_match('/^x/i', $newstringnumentity)) { // if numeric is hexadecimal
$newstringnumentity = hexdec(preg_replace('/^x/i', '', $newstringnumentity));
} else {
$newstringnumentity = (int) $newstringnumentity;
}
// The numeric values we don't want as entities because they encode ascii char, and why using html entities on ascii except for haking ?
if (($newstringnumentity >= 65 && $newstringnumentity <= 90) || ($newstringnumentity >= 97 && $newstringnumentity <= 122)) {
return chr((int) $newstringnumentity);
}
// The numeric values we want in UTF8 instead of entities because it is emoji
$arrayofemojis = getArrayOfEmoji();
foreach ($arrayofemojis as $valarray) {
if ($newstringnumentity >= hexdec($valarray[0]) && $newstringnumentity <= hexdec($valarray[1])) {
// This is a known emoji
return html_entity_decode($matches[0], ENT_COMPAT | ENT_HTML5, 'UTF-8');
}
}
return '&#'.$matches[1]; // Value will be unchanged because regex was /&#( )/
}
/**
* Security: WAF layer for SQL Injection and XSS Injection (scripts) protection (Filters on GET, POST, PHP_SELF).
* Warning: Such a protection can't be enough. It is not reliable as it will always be possible to bypass this. Good protection can
* only be guaranteed by escaping data during output.
*
* @param string $val Brute value found into $_GET, $_POST or PHP_SELF
* @param int<0, 3> $type 0=POST, 1=GET, 2=PHP_SELF, 3=GET without sql reserved keywords (the less tolerant test)
* @return int >0 if there is an injection, 0 if none
*/
function testSqlAndScriptInject($val, $type)
{
// Decode string first because a lot of things are obfuscated by encoding or multiple encoding.
// So <svg onload='console.log("123")' become <svg onload='console.log("123")'
// So ":'" become ":'" (due to ENT_HTML5)
// So "	
" become ""
// So "()" become "()"
// Loop to decode until no more things to decode.
//print "before decoding $val\n";
do {
$oldval = $val;
$val = html_entity_decode($val, ENT_QUOTES | ENT_HTML5); // Decode ':', ''', '	', '&NewLine', ...
// Sometimes we have entities without the ; at end so html_entity_decode does not work but entities is still interpreted by browser.
$val = preg_replace_callback(
'/&#(x?[0-9][0-9a-f]+;?)/i',
/**
* @param string[] $m
* @return string
*/
static function ($m) {
// Decode 'n', ...
return realCharForNumericEntities($m);
},
$val
);
Ce qui me pose problème, c’est la ligne :
return html_entity_decode($matches[0], ENT_COMPAT | ENT_HTML5, 'UTF-8');
Pourquoi $matches[0] ? ça ne correspond pas à l’index attribué par preg_replace_callback ? Ce n’est pas plutôt la valeur qu’il faudrait décoder ?