Saturday, May 05, 2007

Register Globals

If you are wandering why the CommunityColor upgrade is taking so long; It is taking so long because I am having a bear of a time trying to figure out how to code with register globals off. In PHP 4, all of the GET and POST parameters magically appeared in each script as a variable. So, if you called a function as "page.php?a=1&b=hello," the web server would automatically turn a and b into variables. All form elements magically appeared as variables as well. This created a big security hole. If a programmer forgot to initialize their variables, they could create a security hole in their code. The PHP.net example of a hole is:

if (authenticated_user($user_nm, $password)) {
$authorized = 'yes';
}
if ($authorized == 'yes') {
showSecretInformation();
}


Since the coder forgot to initialize $authorized, a hacker could get at the super secret information by calling the function with "page.html?autorized=yes."

My way around this security hole was to put everything in functions. The parameter list would filter out noise thrown by hackers. The page doSomething.php might look like:

// define function
function doSomething($parm1, $parm2) {
// do something ...
}

// main program flow
doSomething($parm1, $parm2);


In PHP 4, the variables ($parm1 and $parm2) would magically appear in the code. I would pass them on to doSomething() as parameters.

In PHP 5, all the data comes in "super globals." The superglobal $_GET contains data appended to URL strings. The superglobal $_POST contains the data from HTML Forms.

My job as a programmer is to get the data from these super globals an into the code. I am having a hard time finding an elegant way of doing this. For that matter, I find myself wandering if I need to convert the data to variables at all.

So, rather than fixing my broken sites. I've been wasting days doing all sorts of experiments with ways of getting data out of the super globals.

It is strange how one change in the foundation has ramifications throughout a system. The change even affects the overall thinking about programs.

In the case of this register globals change, I am going through a great deal of brain damage on a problem that I had (at least theoretically) solved in the way I was doing my coding.

If I just turned register globals on. All of my programs would work.

With it off, I find I have to figure out where and how to pull the data from the super globals. I am even left wondering if I should pull the data out as variables.

The PHP manual suggested that I do the following to create variables.

$var = (array_key_exists('var',$_POST))? $_POST['var'] : '';


I would have to write that code for every single variable. It turns out that the code is very slow. I would rather deal with all the variables at once.

I would rather deal with the variables all at once. In the next section of code, I create an area with the different variables I might see in the code. The foreach line loops through all of the variables from $_POST The sample code would produce variables called $a, $b, or $c.

$type=array('a'=>'int','b'=>'str','c'=>'int');

foreach($_POST AS $key => $value) {
if ($type[$key] == 'int') {
${$key} = (int) $value;
} elseif ($type[$key] == 'str') {
${$key} = strip_tags($value);
}
}


The above code is essentially a filter that leaves only the variables I list in varType.

The question of where to process the variables is more problematic.

I could place the foreach code at the top of each page. This is essentially the same as having register globals on. I gain a little security as a I place the variables through two filters (the code and the function list).

There is also the possibility of putting the variable translation code in each of the functions. This eliminates the need for creating functions with large variable lists. It also breaks some of the coding rules that I have developed over the years. For example, I believe in using an n-tiered approach to database development. IMHO, the objects that encapsulate a database should know nothing about the source of the data. The objects should know nothing beyond what is given to them as parameters. Taking data directly from $_POST violates this mantra.

Of course, if I am going so far as to break my rules on n-tiered structure; why not go all the way and simply feed the $_POST data directly into the database? Why waste precious computer cycles translating things into interim variables. For example, it would be possible to fill a database with the command:

$sql="UPDATE My_Table 
SET fname='".validStr($_POST['fname'])."',
lname='".validStr($_POST['lname'])."',
children='".intval($_POST['children'])."',
hometown'".validStr($_POST['hometown'])."'
WHERE user_id='".intval($_POST['user_id'])."';"


(FYI, The jab at local culture was intended to be humorous).

Folding the data directly into SQL statements is the absolutely fastest way to move data from a web page into the database. Of course, I become 100% dependent on my little validation programs (validateStr(), intval() and floatval()) to keep the hackers at bay. Of course, even with the other methods I've experiment with, I am dependent on the integrity of validation programs to keep the data safe from hackers.

Anyway, my poor little brain is filled to capacity as I think through of all the ramifications of one minor change in a PHP revision.

No comments: