TagPop: The Making Of

As promised before here is a description of the inner workings of TagPop.

TagPop is a tag cloud of all tags used on this website, in which the size of each tag represent its popularity among the visitors of this site. The popularity is determined by pageviews of individual articles.

I have assigned tags to each article on this site, those tags reside in a table, called tags, in the database behind this site. This looks somewhat like this:

     --------------------------
    | article_id | tagname     |
    |--------------------------|
    |          1 | foo         |
    |          1 | bar         |
    |          2 | baz         |
    |          2 | bar         |
    |          2 | qux         |
    |          3 | qux         |
    |          4 | quux        |
     --------------------------

In the navigation bar on the left of this page, this table is used directly. A count(article_id) is selected and grouped by tagname this is then processed in a fashion similar to TagPop.

When an article is requested, the following query gets executed:

1
2
3
4
5
6
7
8
9
10
11
      INSERT OR REPLACE INTO tagpopularity (
              tagname,
              hits)
          SELECT  tags.tagname,
              COALESCE(tagpopularity.hits + 1, 1)
          FROM tags
          LEFT JOIN tagpopularity
              ON tags.tagname = tagpopularity.tagname
          WHERE tags.article = ?
          GROUP BY tags.tagname
      ;

This selects all the tags assigned to the requested article. It will then insert into a new table, called tagpopularity. If there is no value in the tagpopularity table for a certain tagname, it will insert the value 1. If a value was already there, this will be replaced by that value incremented by 1. So, for instance, with a freshly initialized tagpopularity table, when the article with id 1 is requested, the tagnames foo and bar will be inserted in the tagpopularity table, both with the value 1 for hits. When the next request is for the article with id 2, both tagname baz and qux will be inserted in tagpopularity with value 1 for hits, the record where tagname is bar will be replaced by tagname bar and hits 2. This way tagpopularity maintains a count of all hits for a certain tag with a single SQL query per request.

Then a piece of Perl gathers these counts and presents them as a tag cloud:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
     $poptags = $dbh->selectall_hashref(
          q {
             SELECT tagname, hits
                  FROM tagpopularity
              ;
          }, 1);
      
      # assign tags to requency buckets
      my ($key, $minval, $maxval, $tagrange);
      for $key (keys %$poptags) {
          $minval = $$poptags{$key}{'hits'} if ( !defined $minval );
          $minval = $$poptags{$key}{'hits'} if ( $$poptags{$key}{'hits'} < $minval );
          $maxval = $$poptags{$key}{'hits'} if ( $$poptags{$key}{'hits'} > $maxval );
      }
      $tagrange = $maxval - $minval;
      foreach my $key (keys %$poptags) {
              TAGFREQ:{
                      $$poptags{$key}{'class'} = 'tag0', last TAGFREQ if (
                              $$poptags{$key}{'hits'} < ($minval + $tagrange/6));
      
                      $$poptags{$key}{'class'} = 'tag1', last TAGFREQ if (
                              $$poptags{$key}{'hits'} >= ($minval + $tagrange/6) &&
                      $$poptags{$key}{'hits'} < ($minval + 2*$tagrange/6));
      
                      $$poptags{$key}{'class'} = 'tag2', last TAGFREQ if (
                              $$poptags{$key}{'hits'} >= ($minval + 2*$tagrange/6) &&
                      $$poptags{$key}{'hits'} < ($minval + 3*$tagrange/6));
      
                      $$poptags{$key}{'class'} = 'tag3', last TAGFREQ if (
                              $$poptags{$key}{'hits'} >= ($minval + 3*$tagrange/6) &&
                      $$poptags{$key}{'hits'} < ($minval + 4*$tagrange/6));
      
                      $$poptags{$key}{'class'} = 'tag4', last TAGFREQ if (
                              $$poptags{$key}{'hits'} >= ($minval + 4*$tagrange/6) &&
                      $$poptags{$key}{'hits'} < ($minval + 5*$tagrange/6));
      
                      $$poptags{$key}{'class'} = 'tag5', last TAGFREQ if (
                              $$poptags{$key}{'hits'} >= ($minval + 5*$tagrange/6));
              };
      };

The above queries the database for all records in tagpopularity. It then derives the range of values for hits. This is to assure that when the cloud gets displayed, it uses the full dynamic range of possible sizes to reflect the differences in popularity.

Then all the tags are assigned to buckets based upon in wich popularity range they fall. These buckets tag0 to tag5 are CSS class names which get assigned to the tags, these have an increasing size in the stylesheet that is used, like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
      .tag {
              display:        inline;
      }
      
      .tag a {
          text-decoration: none;
      }
      
      .tag a:hover {
          color:        #800517;
      }
      
      #tags .tag0 {
              font-size:      1.5em;
      }
      
      #tags .tag1 {
              font-size:      2em;
      }
      
      #tags .tag2 {
              font-size:      3em;
      }
      
      #tags .tag3 {
              font-size:      4em;
      }
      
      #tags .tag4 {
              font-size:      5em;
      }
      
      #tags .tag5 {
              font-size:      6em;
      }
      
      #tags .tagS {
              color:          #800517;
      }

Then for each tag a item in a HTML unordered list will be written to the page like this:

1
2
3
             <li class="ptag $$poptags{$tag}{'class'} ">
                      <a href="index.html?tag=$tag" rel="tag"> $tag </a>
              </li>

And there is your tagcloud, pretty simple actually, but very shiny and web 2.0-ish. ;)

Comments