Spudinski Posted August 18, 2011 Share Posted August 18, 2011 (edited) Arrays 101 (PHP 5) Level: Intermediate Most people learn by practice, right? Well, so it may be, and I created this tutorial by using scenario-solution examples. I hope most people will take part in this if they want to learn more about using arrays to their full potential. Just to make it clear, I will not be explaining how to define arrays as you should already know the basics of using arrays and defining them. This is just a article to help you find out the different ways of handeling arrays in a script. Arrays is personally for me one of the best things within any programming language. An array is a set of variables, or rather defined data. There are many ways to handle arrays and like with many other things in the PHP, there are a lot of functions that can be used to manipulate arrays. I would like to help everyone better their skills with arrays, but I will require scenario's from you guys and then I will make a script explaning the functions I used. Here is an example bellow: Scenario 1: You want to make a script to show the most active users, but you currently only have timestamps of their visits within the last month. First you have decided collect all the user timestamps from a database, and you have put them in an array(wether via mySQL or a flat DB - not explaining) with the following structure: Array ( username => Array( timestamp, timestamp, etc.. ) ) Solution 1: <?php // sample population data, ignore this, please. $arr_usertimestamp = array(); for($i=0;$i<=50;$i++) { $arr_usertimestamp['user' . $i] = array(); for($v=0;$v<=10;$v++) $arr_usertimestamp['user' . $i][] = rand(time()-999, time()+999); } // here we loop through the first dimension of the array, containing the usernames. // it will loop through the usernames, by order that the array was assign at foreach($arr_usertimestamp as $user => $timestamps) { $arr_usertimestamp[$user] = intval(array_sum($timestamps)); // Here, we sum up all the values contained within the second dimension of the original array // but because we are now in a foreach loop, that is able to work within two dimensions, hence the “>=”, that defines the array’s key, and then, the value to be used, which is normally an array. // the main thing to remember, is that the original array, is split into piecies when run by a foreach loop. The original structure of array(username <= array(t,t,t)), has now become username <= array(t,t,t). // you can think of it as if we are within the array, sequentially running down it. // PS. array_sum returns COULD POTENTIALLY, THOUGH NOT LIKELY a float, so I use the intval() function, which converts/rounds the decimal value, into a whole integer. } array_flip($arr_usertimestamp); // here I flip the keys and values, basically turning the order in direct opposite to what it was // take for instance array(a,b,c,d), using array_flip(), it will become array(d,c,b,a) // the function doesn’t return a value, instead it directly works on the variable containing // the array, which makes that the return value of array_flip() is not needed to be stored asort($arr_usertimestamp, SORT_DESC); // and here I sort the array keys ONLY, which is sorted descendingly, from high to low. // the second argument, SORT_DESC, does that. // the function doesn’t return a value, instead it directly works on the variable containing // the array, which makes that the return value of asort() is not needed to be stored $arr_topten = array_slice($arr_usertimestamp, 0, 10, true); // the function array_slice() does exactly what it says, it slices the array at a given point, keeps one part, and discards of the other. // the first argument is the variable(array assigned to it), the second is where to start keeping // the array, and the third, where to cut it off, fourt, and last argument, is wether to preserve the // order of the array, we set it to “true” to keep the order of the array the same. // we only want to get the first 10 entries from array. // for an example of this: take array(0,1,2,3,4,5,6,7,8,9), we only want values 4 to 8, so we will // use array_slice($var, 3, 7, true), which will return array(4,5,6,7,8) echo implode('<br>', array_keys($arr_topten)); // implode can be used to glue arrays together, and return it as a string // the first argument is what the “glue” should be, or what to place between the arrays. // now with the second argument, we make use of the function array_keys() // the reason for this, is because we have a two-dimensional array at had, being // array(username <= sum of all t). We only need to retrive the first part(or referred to as “key”) // of the array. So, we now have an array of only usernames, like array(peter, jane, john). // the function still returns an array, but because we defined it as the argument array for implode // it will be converted into a string, and separated by the first argument of the function implode() ?> So, setup a scenario and I'll show you a solution. Also a kind of test of my abilities. :P Edited August 18, 2011 by Spudinski Documentation Quote Link to comment Share on other sites More sharing options...
Anonymous Posted August 18, 2011 Share Posted August 18, 2011 foreach($arr_usertimestamp as $user => $timestamps) { $arr_usertimestamp[$user] = intval(array_sum($timestamps)); // PS. array_sum returns a float } what? Sorry, you lost me totally there - why are you aggregating timestamps? why the int cast? this makes no sense. Quote Link to comment Share on other sites More sharing options...
Spudinski Posted August 18, 2011 Author Share Posted August 18, 2011 foreach($arr_usertimestamp as $user => $timestamps) { $arr_usertimestamp[$user] = intval(array_sum($timestamps)); // PS. array_sum returns a float } what? Sorry, you lost me totally there - why are you aggregating timestamps? why the int cast? this makes no sense. I have better documented my code, please see if you can better understand it now. Quote Link to comment Share on other sites More sharing options...
Anonymous Posted August 18, 2011 Share Posted August 18, 2011 (edited) No still not clear - why are you aggregating timestamps? The initial loop populates an array such that: $arr_usertimestamp[ <element> ] = array( <timestamp1>, <timestamp2>, <timestamp3>, ... ); Which is fine, but then you aggregate those timestamps creating $arr_usertimestamp[ <element> ] = <timestamp1> + <timestamp2> + <timestamp3> ... Why? What purpose does adding times together get you? As for array_sum returning an int... does it? cat x1.php <?php $a = array(1,2,3,4,time()); $b = array_sum($a); echo "b is :\n"; echo $b . "\n"; echo "serialize(b) is\n"; echo serialize($b) . "\n"; echo "print_r(b) is\n"; print_r($b); echo "\nvardump(b) is\n"; var_dump($b); php -f x1.php b is : 1313705145 serialize(b) is i:1313705145; print_r(b) is 1313705145 vardump(b) is int(1313705145) [edit] PHP manual states that array_sum returns an int or a float; so the result will be determined by the values store in the array and the bit-size of the machine although I've not verified the latter assumption. In my case, on a 64-bit machine the cast to int is redundant. Edited August 18, 2011 by Anonymous Quote Link to comment Share on other sites More sharing options...
Spudinski Posted August 18, 2011 Author Share Posted August 18, 2011 (edited) No still not clear - why are you aggregating timestamps? The initial loop populates an array such that: $arr_usertimestamp[ <element> ] = array( <timestamp1>, <timestamp2>, <timestamp3>, ... ); Which is fine, but then you aggregate those timestamps creating $arr_usertimestamp[ <element> ] = <timestamp1> + <timestamp2> + <timestamp3> ... Why? What purpose does adding times together get you? Well, there are three basic methods to get the highest ranking user here; 1. sum up all the timestamps in the array, or 2. find the median of all the timestamps, or 3. just simply use "count()" to count the values in the array. To keep it at an intermediate level, I had chosen to sum up all the timestamps, which in return gives me a integer, or float. The reason why I do this, is because I depend on the array's structure being array(string username <= int timestamp), as can be seen in the script's later stages of execution. PHP manual states that array_sum returns an int or a float; so the result will be determined by the values store in the array and the bit-size of the machine although I've not verified the latter assumption. In my case, on a 64-bit machine the cast to int is redundant. Indeed it does. But a good script has to be able to deal with the odds. The chances are more than likely(spare me the math, please) that it will return an integer. Oh my, and I have to apologize, it shouldn't return a float with the values presented, as they are all integers. It's probably something that went wrong in testing, and I probably patched it. There are actually no excuses, sorry, I should have double checked the code before posting. Should I elaborate more, or do you get what I'm doing in the loop? Edited August 18, 2011 by Spudinski like my posts Quote Link to comment Share on other sites More sharing options...
Anonymous Posted August 18, 2011 Share Posted August 18, 2011 If I may presume to simplify; assume user A logs in on Jan 1st 1970 @ midday, and is active for 2 more hits spread 30 minutes apart, and user B logs in on Jan 2nd 1970 @ midday, but leaves instantly: $array[ 'user-A' ] = array( 39600, 40400, 42200 ); $array[ 'user-B' ] = array( 126000 ); Summing these we get: $array[ 'user-A' ] = 122200 $array[ 'user-B' ] = 126000 Which user is more active? The basic assumption here is that you can add timestamps; which just doesn't work for me. The number produced is not a valid indication of activity, rather it's an arbitrary number with no meaning. Quote Link to comment Share on other sites More sharing options...
Spudinski Posted August 18, 2011 Author Share Posted August 18, 2011 If I may presume to simplify; assume user A logs in on Jan 1st 1970 @ midday, and is active for 2 more hits spread 30 minutes apart, and user B logs in on Jan 2nd 1970 @ midday, but leaves instantly:$array[ 'user-A' ] = array( 39600, 40400, 42200 ); $array[ 'user-B' ] = array( 126000 ); Summing these we get: $array[ 'user-A' ] = 122200 $array[ 'user-B' ] = 126000 Which user is more active? The basic assumption here is that you can add timestamps; which just doesn't work for me. The number produced is not a valid indication of activity, rather it's an arbitrary number with no meaning. True, then a median approach would be more applicable if you were to use it. Example(no documentation beyond alterations: <?php $arr_usertimestamp = array(); for($i=0;$i<=50;$i++) { $arr_usertimestamp['user' . $i] = array(); for($v=0;$v<=10;$v++) $arr_usertimestamp['user' . $i][] = rand(time()-999, time()+999); } foreach($arr_usertimestamp as $user => $timestamps) { $median = array_sum($timestamps) / count($timestamps); // here we still find the sum of all the timestamps, but then // divide it by the total number of timestamps there is // giving us a median $arr_usertimestamp[$user] = intval($median); // the likelihood for a float is greater, so we still round to an integer } array_flip($arr_usertimestamp); asort($arr_usertimestamp, SORT_DESC); $arr_topten = array_slice($arr_usertimestamp, 0, 10, true); echo implode('<br>', array_keys($arr_topten)); ?> This should suffice to the scenario of your most recent post. Quote Link to comment Share on other sites More sharing options...
Anonymous Posted August 18, 2011 Share Posted August 18, 2011 Sorry, still doesn't make sense: cat x1.php <?php $data['user-a'] = array ( strtotime('Jan 1st 1970 12:00:00'), strtotime('Jan 1st 1970 12:30:00'), strtotime('Jan 1st 1970 13:00:00'), strtotime('Jan 1st 1970 13:30:00'), strtotime('Jan 1st 1970 14:00:00'), strtotime('Jan 1st 1970 14:30:00'), strtotime('Jan 1st 1970 15:00:00'), strtotime('Jan 1st 1970 15:30:00'), ); $data['user-b'] = array ( strtotime('Jan 2nd 1970 12:00:00'), ); echo "user-a : " . floor(array_sum($data['user-a']) / count($data['user-a'])) . "\n"; echo "user-b : " . floor(array_sum($data['user-b']) / count($data['user-b'])) . "\n"; php -f x1.php user-a : 45900 user-b : 126000 Quote Link to comment Share on other sites More sharing options...
Spudinski Posted August 18, 2011 Author Share Posted August 18, 2011 (edited) Sorry, still doesn't make sense: cat x1.php <?php $data['user-a'] = array ( strtotime('Jan 1st 1970 12:00:00'), strtotime('Jan 1st 1970 12:30:00'), strtotime('Jan 1st 1970 13:00:00'), strtotime('Jan 1st 1970 13:30:00'), strtotime('Jan 1st 1970 14:00:00'), strtotime('Jan 1st 1970 14:30:00'), strtotime('Jan 1st 1970 15:00:00'), strtotime('Jan 1st 1970 15:30:00'), ); $data['user-b'] = array ( strtotime('Jan 2nd 1970 12:00:00'), ); echo "user-a : " . floor(array_sum($data['user-a']) / count($data['user-a'])) . "\n"; echo "user-b : " . floor(array_sum($data['user-b']) / count($data['user-b'])) . "\n"; php -f x1.php user-a : 45900 user-b : 126000 After some research, and some logic, I found the problem with your code. Time only reaches 7 digits at exactly "2 January 1970 05:46:40", so anything before that, would amount in what you see. This median finding technique is based on exactly 7 digits, but the result you see is valid, though: user-b, has a timestamp further in time than user-a. Please increment the date to anything above the minimum of "2 January 1970 05:46:40" to see desired result. EDIT An activity indicator is a bit more difficult than this. It works by collecting timestamps, and measuring time in between them, and subtracting the timeout(the amount of time that passes with no activity from a user, until they are presumed offline) at each "visit". One would end up with "sets" of timestamps. I'll give you an example of this later, if needed. Edited August 18, 2011 by Spudinski Quote Link to comment Share on other sites More sharing options...
Anonymous Posted August 19, 2011 Share Posted August 19, 2011 Sorry, please explain where the problem is - As stated, if you perform your median values, then the code as shown produces those results - which suggests that user B is more active than user A which is clearly not the case. Increasing the date past your 2 January 1970 05:46:40 marker still doesn't help: $data['user-a'] = array ( strtotime('Jan 1st 1980 12:00:00'), strtotime('Jan 1st 1980 12:30:00'), strtotime('Jan 1st 1980 13:00:00'), strtotime('Jan 1st 1980 13:30:00'), strtotime('Jan 1st 1980 14:00:00'), strtotime('Jan 1st 1980 14:30:00'), strtotime('Jan 1st 1980 15:00:00'), strtotime('Jan 1st 1980 15:30:00'), ); $data['user-b'] = array ( strtotime('Jan 2nd 1980 12:00:00'), ); echo "user-a : " . floor(array_sum($data['user-a']) / count($data['user-a'])) . "\n"; echo "user-b : " . floor(array_sum($data['user-b']) / count($data['user-b'])) . "\n"; >> user-a : 315,582,300 >> user-b : 315,662,400 (Comma's added for clarity) For comparison, a different tack: $now = time(); // cache the time $timestamps = array(); // the main data array(user => array(timestamp, ...)) foreach (range(1, 50) as $user) { $timestamps[$user] = array(); foreach (range(1, mt_rand(1, 10)) as $j) // random number of hits (1-10) per user { $timestamps[$user][] = $now + mt_rand(-999, 999); } } /** * Compute most recent active user * i.e.: for each user, compute the maximum timestamp * display the user the highest timestamp. */ $best = array(0, 'none'); /* highest timestamp, user # */ array_walk($timestamps, function($value, $key, $best){ if (($sum = max($value)) > $best[0]) { // if the highest timestamp > the best... $best = array($sum, $key); } }, &$best); // NB: Reference echo "Most recent user was " . $best[1] . " who's last access was at " . strftime('%Y-%m-%d %H:%M:%S', $best[0]) . "\n"; /** * Compute the most active user in respect of page hits * i.e.: for each user, compute the number of hits * display the user with the highest number of hits. */ $best = array(0, 'none'); /* number of hits, user # */ array_walk($timestamps, function($value, $key, $best){ if (($num = count($value)) > $best[0]) { // if the # of page hits > the best... $best = array($num, $key); } }, &$best); // NB: Reference echo "Most active user was " . $best[1] . " who made " . $best[0] . " page hits\n"; >> Most recent user was 35 who's last access was at 2011-08-19 01:15:24 >> Most active user was 29 who made 10 page hits Purely as an example - I realize we are stepping outside the bounds of the original topic. Results will differ due to random data. Quote Link to comment Share on other sites More sharing options...
Spudinski Posted August 19, 2011 Author Share Posted August 19, 2011 I do get what you're trying to do, something like SMF's "total time online". It's rather difficult if you have to do it with only timestamps, usually it is done per session, or by use of cronjob. This code doesn't make use of any array specific functions, as it's a prototype, but from what I guess(logic), it will work. <?php error_reporting(E_ERROR); $data = array(); // random data, ignore, again. for($i=0;$i<=19;$i++) { $data['user-' . $i] = array(); for($x=0;$x<=49;$x++) $data['user-' . $i][] = strtotime('2 January 2011 ' . rand(0, 23) . ':' . rand(0, 59) . ':' . rand(0, 59)); asort($data['user-' . $i], SORT_ASC); } // 20c;50r error_reporting(E_ALL); // we need to define a new array, for the specific time users are online. $userdata = array(); // loop through to split up the users foreach($data as $user => $timestamp) { $userdata[$user] = 1; // the reason why the above is 1, instead of 0, is purely argumentable. // the last timestamp in the timestamp array wont be counted. foreach($timestamp as $t) { // loop through the timestamp array if (empty($last)) $last = $t; // if there is no last entry, it has to be the first, so set it so else { if (($t - $last) >= (60*15)) $last = $t; // if the user hasn't visited the site for 15 minutes, we asume the // new value presented, which would be the current time else { // if the timestamp is recorder within 15 minutes since // the last one, we subtract the current timestamp with // that from the last, and add it to our current "activity meter" $userdata[$user] += $t - $last; $last = $t; // again, just set the current time as previous } } } $last = 0; // we're done looping through the timestamps of one user, so set it back to null } arsort($userdata, SORT_ASC); // we sort the values of the array, while preserving the keys $new_data = array_slice($userdata, 0, 10, true); // only get top items from the array $c = 0; // counter, whoo. :| foreach($new_data as $user => $time) { // loop through a 2D array $c++; // increment the counter by 1 echo '#' . $c . "\t" . $user . "\t" . round($time / 60) . " minutes active.\n"; // echo the results, round() function used to convert to minutes, hence "/ 60". // \t = tab \n = newline } ?> It's not point accurate, but it's the basic concept most activity counters work. Quote Link to comment Share on other sites More sharing options...
Anonymous Posted August 19, 2011 Share Posted August 19, 2011 Actually no, I wasn't - I was retrieving two values from the initial data-array: the user id who made the most recent hit, and the user id of the user making the most hits. No attempt was made to extract time-online at any stage. Quote Link to comment Share on other sites More sharing options...
Razor42 Posted July 21, 2012 Share Posted July 21, 2012 nice tutorial, got some useful information :) Quote Link to comment Share on other sites More sharing options...
newttster Posted July 22, 2012 Share Posted July 22, 2012 Spudinski, you were asking for a test array to show how this would work. How about a function(s) that first shows the most popular day of the week players are on your game, then break it down further that would show the most popular time frame of that day. It should potentially return say something like Sunday between 3pm and 6pm. Is that something that arrays could do? I realize that this is probably going back to using the timestamp again but it could be a very useful function for a game owner to have for creating activities that could take place during that time or to have activities during the deadest time to promote activity. Quote Link to comment Share on other sites More sharing options...
Spudinski Posted July 22, 2012 Author Share Posted July 22, 2012 Spudinski, you were asking for a test array to show how this would work. How about a function(s) that first shows the most popular day of the week players are on your game, then break it down further that would show the most popular time frame of that day. It should potentially return say something like Sunday between 3pm and 6pm. Is that something that arrays could do? I realize that this is probably going back to using the timestamp again but it could be a very useful function for a game owner to have for creating activities that could take place during that time or to have activities during the deadest time to promote activity. Sure, it's something arrays could do, but it would require a lot of manipulation. The overhead on something like this would be insane(depending on user-base/activity). MySQL is designed for this, using date ranges would be much more efficient. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.