Jump to content
MakeWebGames

Arrays 101 - Lead by examples


Spudinski

Recommended Posts

Arrays 101 (PHP 5)

Level: Intermediate

Most people learn by practice, right? Well, so it may be, and I created this tutorial by using scenario-solution examples. I hope most people will take part in this if they want to learn more about using arrays to their full potential.

Just to make it clear, I will not be explaining how to define arrays as you should already know the basics of using arrays and defining them.

This is just a article to help you find out the different ways of handeling arrays in a script.

Arrays is personally for me one of the best things within any programming language.

An array is a set of variables, or rather defined data. There are many ways to handle arrays and like with many other things in the PHP, there are a lot of functions that can be used to manipulate arrays.

I would like to help everyone better their skills with arrays, but I will require scenario's from you guys and then I will make a script explaning the functions I used.

Here is an example bellow:

Scenario 1:

You want to make a script to show the most active users, but you currently only have timestamps of their visits within the last month.

First you have decided collect all the user timestamps from a database, and you have put them in an array(wether via mySQL or a flat DB - not explaining) with the following structure:

Array ( username => Array( timestamp, timestamp, etc.. ) )

Solution 1:

<?php

// sample population data, ignore this, please.
$arr_usertimestamp = array();
for($i=0;$i<=50;$i++) {
$arr_usertimestamp['user' . $i] = array();
for($v=0;$v<=10;$v++) 
	$arr_usertimestamp['user' . $i][] = rand(time()-999, time()+999);
}

// here we loop through the first dimension of the array, containing the usernames.
// it will loop through the usernames, by order that the array was assign at
foreach($arr_usertimestamp as $user => $timestamps) {
$arr_usertimestamp[$user] = intval(array_sum($timestamps)); 
// Here, we sum up all the values contained within the second dimension of the original array
// but because we are now in a foreach loop, that is able to work within two dimensions, hence the “>=”, that defines the array’s key, and then, the value to be used, which is normally an array.
// the main thing to remember, is that the original array, is split into piecies when run by a foreach loop. The original structure of array(username <= array(t,t,t)), has now become username <= array(t,t,t).
// you can think of it as if we are within the array, sequentially running down it.
// PS. array_sum returns COULD POTENTIALLY, THOUGH NOT LIKELY a float, so I use the intval() function, which converts/rounds the decimal value, into a whole integer.

}

array_flip($arr_usertimestamp); 
// here I flip the keys and values, basically turning the order in direct opposite to what it was
// take for instance array(a,b,c,d), using array_flip(), it will become array(d,c,b,a)
// the function doesn’t return a value, instead it directly works on the variable containing 
// the array, which makes that the return value of array_flip() is not needed to be stored

asort($arr_usertimestamp, SORT_DESC);
// and here I sort the array keys ONLY, which is sorted descendingly, from high to low.
// the second argument, SORT_DESC, does that. 
// the function doesn’t return a value, instead it directly works on the variable containing 
// the array, which makes that the return value of asort() is not needed to be stored


$arr_topten = array_slice($arr_usertimestamp, 0, 10, true); 
// the function array_slice() does exactly what it says, it slices the array at a given point, keeps one part, and discards of the other.
// the first argument is the variable(array assigned to it), the second is where to start keeping
// the array, and the third, where to cut it off, fourt, and last argument, is wether to preserve the 
// order of the array, we set it to “true” to keep the order of the array the same.
// we only want to get the first 10 entries from array.
// for an example of this: take array(0,1,2,3,4,5,6,7,8,9), we only want values 4 to 8, so we will
// use array_slice($var, 3, 7, true), which will return array(4,5,6,7,8)

echo implode('<br>', array_keys($arr_topten)); 
// implode can be used to glue arrays together, and return it as a string
// the first argument is what the “glue” should be, or what to place between the arrays.
// now with the second argument, we make use of the function array_keys()
// the reason for this, is because we have a two-dimensional array at had, being
// array(username <= sum of all t). We only need to retrive the first part(or referred to as “key”)
// of the array. So, we now have an array of only usernames, like array(peter, jane, john).
// the function still returns an array, but because we defined it as the argument array for implode
// it will be converted into a string, and separated by the first argument of the function implode()

?>

 

So, setup a scenario and I'll show you a solution. Also a kind of test of my abilities. :P

Edited by Spudinski
Documentation
Link to comment
Share on other sites

foreach($arr_usertimestamp as $user => $timestamps) {
   $arr_usertimestamp[$user] = intval(array_sum($timestamps)); // PS. array_sum returns a float
}

what? Sorry, you lost me totally there - why are you aggregating timestamps? why the int cast? this makes no sense.

I have better documented my code, please see if you can better understand it now.

Link to comment
Share on other sites

No still not clear - why are you aggregating timestamps?

The initial loop populates an array such that:

$arr_usertimestamp[ <element> ] = array( <timestamp1>, <timestamp2>, <timestamp3>, ... );

Which is fine, but then you aggregate those timestamps creating

$arr_usertimestamp[ <element> ] = <timestamp1> + <timestamp2> + <timestamp3> ...

Why? What purpose does adding times together get you?

 

As for array_sum returning an int... does it?

cat x1.php
<?php

$a = array(1,2,3,4,time());
$b = array_sum($a);

echo "b is :\n";
echo $b . "\n";

echo "serialize(b) is\n";
echo serialize($b) . "\n";

echo "print_r(b) is\n";
print_r($b);

echo "\nvardump(b) is\n";
var_dump($b);
php -f x1.php
b is :
1313705145
serialize(b) is
i:1313705145;
print_r(b) is
1313705145
vardump(b) is
int(1313705145)

 

[edit]

PHP manual states that array_sum returns an int or a float; so the result will be determined by the values store in the array and the bit-size of the machine although I've not verified the latter assumption. In my case, on a 64-bit machine the cast to int is redundant.

Edited by Anonymous
Link to comment
Share on other sites

No still not clear - why are you aggregating timestamps?

The initial loop populates an array such that:

$arr_usertimestamp[ <element> ] = array( <timestamp1>, <timestamp2>, <timestamp3>, ... );

Which is fine, but then you aggregate those timestamps creating

$arr_usertimestamp[ <element> ] = <timestamp1> + <timestamp2> + <timestamp3> ...

Why? What purpose does adding times together get you?

Well, there are three basic methods to get the highest ranking user here; 1. sum up all the timestamps in the array, or 2. find the median of all the timestamps, or 3. just simply use "count()" to count the values in the array. To keep it at an intermediate level, I had chosen to sum up all the timestamps, which in return gives me a integer, or float.

The reason why I do this, is because I depend on the array's structure being array(string username <= int timestamp), as can be seen in the script's later stages of execution.

 

PHP manual states that array_sum returns an int or a float; so the result will be determined by the values store in the array and the bit-size of the machine although I've not verified the latter assumption. In my case, on a 64-bit machine the cast to int is redundant.

Indeed it does. But a good script has to be able to deal with the odds.

The chances are more than likely(spare me the math, please) that it will return an integer.

Oh my, and I have to apologize, it shouldn't return a float with the values presented, as they are all integers. It's probably something that went wrong in testing, and I probably patched it. There are actually no excuses, sorry, I should have double checked the code before posting.

Should I elaborate more, or do you get what I'm doing in the loop?

Edited by Spudinski
like my posts
Link to comment
Share on other sites

If I may presume to simplify; assume user A logs in on Jan 1st 1970 @ midday, and is active for 2 more hits spread 30 minutes apart, and user B logs in on Jan 2nd 1970 @ midday, but leaves instantly:

$array[ 'user-A' ] = array( 39600, 40400, 42200 );
$array[ 'user-B' ] = array( 126000 );

Summing these we get:

$array[ 'user-A' ] = 122200
$array[ 'user-B' ] = 126000

Which user is more active? The basic assumption here is that you can add timestamps; which just doesn't work for me. The number produced is not a valid indication of activity, rather it's an arbitrary number with no meaning.

Link to comment
Share on other sites

If I may presume to simplify; assume user A logs in on Jan 1st 1970 @ midday, and is active for 2 more hits spread 30 minutes apart, and user B logs in on Jan 2nd 1970 @ midday, but leaves instantly:
$array[ 'user-A' ] = array( 39600, 40400, 42200 );
$array[ 'user-B' ] = array( 126000 );

Summing these we get:

$array[ 'user-A' ] = 122200
$array[ 'user-B' ] = 126000

Which user is more active? The basic assumption here is that you can add timestamps; which just doesn't work for me. The number produced is not a valid indication of activity, rather it's an arbitrary number with no meaning.

True, then a median approach would be more applicable if you were to use it.

Example(no documentation beyond alterations:

<?php

$arr_usertimestamp = array();
for($i=0;$i<=50;$i++) {
$arr_usertimestamp['user' . $i] = array();
for($v=0;$v<=10;$v++) 
	$arr_usertimestamp['user' . $i][] = rand(time()-999, time()+999);
}


foreach($arr_usertimestamp as $user => $timestamps) {
	$median = array_sum($timestamps) / count($timestamps);
	// here we still find the sum of all the timestamps, but then
	// divide it by the total number of timestamps there is
	// giving us a median
$arr_usertimestamp[$user] = intval($median); 
// the likelihood for a float is greater, so we still round to an integer

}

array_flip($arr_usertimestamp); 
asort($arr_usertimestamp, SORT_DESC);
$arr_topten = array_slice($arr_usertimestamp, 0, 10, true); 
echo implode('<br>', array_keys($arr_topten)); 

?>

This should suffice to the scenario of your most recent post.

Link to comment
Share on other sites

Sorry, still doesn't make sense:

 

cat x1.php
<?php

$data['user-a'] = array
(
strtotime('Jan 1st 1970 12:00:00'),
strtotime('Jan 1st 1970 12:30:00'),
strtotime('Jan 1st 1970 13:00:00'),
strtotime('Jan 1st 1970 13:30:00'),
strtotime('Jan 1st 1970 14:00:00'),
strtotime('Jan 1st 1970 14:30:00'),
strtotime('Jan 1st 1970 15:00:00'),
strtotime('Jan 1st 1970 15:30:00'),
);

$data['user-b'] = array
(
strtotime('Jan 2nd 1970 12:00:00'),
);

echo "user-a : " . floor(array_sum($data['user-a']) / count($data['user-a'])) . "\n";
echo "user-b : " . floor(array_sum($data['user-b']) / count($data['user-b'])) . "\n";
php -f x1.php
user-a : 45900
user-b : 126000
Link to comment
Share on other sites

Sorry, still doesn't make sense:

 

cat x1.php
<?php

$data['user-a'] = array
(
strtotime('Jan 1st 1970 12:00:00'),
strtotime('Jan 1st 1970 12:30:00'),
strtotime('Jan 1st 1970 13:00:00'),
strtotime('Jan 1st 1970 13:30:00'),
strtotime('Jan 1st 1970 14:00:00'),
strtotime('Jan 1st 1970 14:30:00'),
strtotime('Jan 1st 1970 15:00:00'),
strtotime('Jan 1st 1970 15:30:00'),
);

$data['user-b'] = array
(
strtotime('Jan 2nd 1970 12:00:00'),
);

echo "user-a : " . floor(array_sum($data['user-a']) / count($data['user-a'])) . "\n";
echo "user-b : " . floor(array_sum($data['user-b']) / count($data['user-b'])) . "\n";
php -f x1.php
user-a : 45900
user-b : 126000

After some research, and some logic, I found the problem with your code.

Time only reaches 7 digits at exactly "2 January 1970 05:46:40", so anything before that, would amount in what you see.

This median finding technique is based on exactly 7 digits, but the result you see is valid, though: user-b, has a timestamp further in time than user-a.

Please increment the date to anything above the minimum of "2 January 1970 05:46:40" to see desired result.

EDIT

An activity indicator is a bit more difficult than this. It works by collecting timestamps, and measuring time in between them, and subtracting the timeout(the amount of time that passes with no activity from a user, until they are presumed offline) at each "visit". One would end up with "sets" of timestamps. I'll give you an example of this later, if needed.

Edited by Spudinski
Link to comment
Share on other sites

Sorry, please explain where the problem is - As stated, if you perform your median values, then the code as shown produces those results - which suggests that user B is more active than user A which is clearly not the case. Increasing the date past your 2 January 1970 05:46:40 marker still doesn't help:

$data['user-a'] = array
(
   strtotime('Jan 1st 1980 12:00:00'),
   strtotime('Jan 1st 1980 12:30:00'),
   strtotime('Jan 1st 1980 13:00:00'),
   strtotime('Jan 1st 1980 13:30:00'),
   strtotime('Jan 1st 1980 14:00:00'),
   strtotime('Jan 1st 1980 14:30:00'),
   strtotime('Jan 1st 1980 15:00:00'),
   strtotime('Jan 1st 1980 15:30:00'),
);

$data['user-b'] = array
(
   strtotime('Jan 2nd 1980 12:00:00'),
);

echo "user-a : " . floor(array_sum($data['user-a']) / count($data['user-a'])) . "\n";
echo "user-b : " . floor(array_sum($data['user-b']) / count($data['user-b'])) . "\n";


>> user-a : 315,582,300
>> user-b : 315,662,400

(Comma's added for clarity)

For comparison, a different tack:

$now        = time();     // cache the time
$timestamps = array();    // the main data array(user => array(timestamp, ...))

foreach (range(1, 50) as $user)
{
   $timestamps[$user] = array();

   foreach (range(1, mt_rand(1, 10)) as $j)    // random number of hits (1-10) per user
   {
       $timestamps[$user][] = $now + mt_rand(-999, 999);
   }
}

/**
* Compute most recent active user
* i.e.: for each user, compute the maximum timestamp
*       display the user the highest timestamp.
*/
$best = array(0, 'none');  /* highest timestamp, user # */

array_walk($timestamps, function($value, $key, $best){
   if (($sum = max($value)) > $best[0]) { // if the highest timestamp > the best...
       $best = array($sum, $key);
   }
}, &$best); // NB: Reference

echo "Most recent user was " . $best[1] . " who's last access was at " . strftime('%Y-%m-%d %H:%M:%S', $best[0]) . "\n";

/**
* Compute the most active user in respect of page hits
* i.e.: for each user, compute the number of hits
*       display the user with the highest number of hits.
*/
$best = array(0, 'none');  /* number of hits, user # */

array_walk($timestamps, function($value, $key, $best){
   if (($num = count($value)) > $best[0]) { // if the # of page hits > the best...
       $best = array($num, $key);
   }
}, &$best); // NB: Reference

echo "Most active user was " . $best[1] . " who made " . $best[0] . " page hits\n";

>> Most recent user was 35 who's last access was at 2011-08-19 01:15:24
>> Most active user was 29 who made 10 page hits

Purely as an example - I realize we are stepping outside the bounds of the original topic. Results will differ due to random data.

Link to comment
Share on other sites

I do get what you're trying to do, something like SMF's "total time online".

It's rather difficult if you have to do it with only timestamps, usually it is done per session, or by use of cronjob.

This code doesn't make use of any array specific functions, as it's a prototype, but from what I guess(logic), it will work.

 

<?php
error_reporting(E_ERROR);

$data = array();

// random data, ignore, again.
for($i=0;$i<=19;$i++) {
$data['user-' . $i] = array();
for($x=0;$x<=49;$x++) 
	$data['user-' . $i][] = strtotime('2 January 2011 ' . 
			rand(0, 23) . ':' . rand(0, 59) . ':' . rand(0, 59));
asort($data['user-' . $i], SORT_ASC);
} // 20c;50r
	error_reporting(E_ALL);
// we need to define a new array, for the specific time users are online.
$userdata = array();	
// loop through to split up the users
foreach($data as $user => $timestamp) {
$userdata[$user] = 1; 
// the reason why the above is 1, instead of 0, is purely argumentable.
// the last timestamp in the timestamp array wont be counted.
foreach($timestamp as $t) {
	// loop through the timestamp array
	if (empty($last)) $last = $t;
	// if there is no last entry, it has to be the first, so set it so
	else { 
		if (($t - $last) >= (60*15)) $last = $t;
		// if the user hasn't visited the site for 15 minutes, we asume the
		// new value presented, which would be the current time
		else {
			// if the timestamp is recorder within 15 minutes since
			// the last one, we subtract the current timestamp with 
			// that from the last, and add it to our current "activity meter"
			$userdata[$user] += $t - $last;
			$last = $t;
			// again, just set the current time as previous
		}
	}
}
$last = 0;
// we're done looping through the timestamps of one user, so set it back to null
}

arsort($userdata, SORT_ASC);
// we sort the values of the array, while preserving the keys
$new_data = array_slice($userdata, 0, 10, true);
// only get top items from the array

$c = 0; // counter, whoo. :|
foreach($new_data as $user => $time) { // loop through a 2D array
$c++; // increment the counter by 1
echo '#' . $c . "\t" . $user . "\t" . round($time / 60) . " minutes active.\n"; 
// echo the results, round() function used to convert to minutes, hence "/ 60".
// \t = tab \n = newline
}

?>

 

It's not point accurate, but it's the basic concept most activity counters work.

Link to comment
Share on other sites

  • 11 months later...

Spudinski, you were asking for a test array to show how this would work. How about a function(s) that first shows the most popular day of the week players are on your game, then break it down further that would show the most popular time frame of that day. It should potentially return say something like Sunday between 3pm and 6pm.

Is that something that arrays could do?

I realize that this is probably going back to using the timestamp again but it could be a very useful function for a game owner to have for creating activities that could take place during that time or to have activities during the deadest time to promote activity.

Link to comment
Share on other sites

Spudinski, you were asking for a test array to show how this would work. How about a function(s) that first shows the most popular day of the week players are on your game, then break it down further that would show the most popular time frame of that day. It should potentially return say something like Sunday between 3pm and 6pm.

Is that something that arrays could do?

I realize that this is probably going back to using the timestamp again but it could be a very useful function for a game owner to have for creating activities that could take place during that time or to have activities during the deadest time to promote activity.

Sure, it's something arrays could do, but it would require a lot of manipulation.

The overhead on something like this would be insane(depending on user-base/activity).

MySQL is designed for this, using date ranges would be much more efficient.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...