Impact Software Blog: June 2012

Monday, June 25, 2012

Apache Thrift PHP Client for Workflow Engine

In a previous post, I outlined the reasons why using Apache Thrift as a mechanism to invoke the Workflow Execution Engine (WEE) made sense and also listed some code for creating an Apache Thrift client and server in Erlang.

In this post, I wanted to demonstrate how to use Apache Thrift to create a simple client in PHP. This might be useful if you wanted to integrate the Workflow engine with a web page.

The first step is to generate the PHP bindings for the WEE Thrift service discussed previously. From the command line, run the following example:

thrift-0.8.0.exe -r --gen php wee.thrift

Second, create a directory where your web server can process a PHP file. In my case, I'm using Apache on a Windows computer so I create a directory named C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\example

Third, copy the PHP sample code shipped with the Apache Thrift distribution into the directory created above. In my case, the sample code is located in C:\thrift-0.8.0\lib\php\src and I copy this to C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\example\src

Four, take the code that was generated in Step 1 and copy this to the packages directory. the complete path in my example is C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\example\src\packages.

The following screen shot shows how the directory structures and files should appear.

Five, create a new file named wee_client.php. This file should be located in the C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\example directory. Copy the following code and paste it into the file.

<?php
$GLOBALS['THRIFT_ROOT'] = 'src';

require_once $GLOBALS['THRIFT_ROOT'].'/Thrift.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/THttpClient.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';

require_once $GLOBALS['THRIFT_ROOT'].'/packages/wee/Wee.php';

try {
    $socket = new TSocket('localhost', 9090);
    $transport = new TBufferedTransport($socket, 1024, 1024);
    $protocol = new TBinaryProtocol($transport);
    $client = new WeeClient($protocol);

    $transport->open();

    // create a new WEE Rule structure
    $r = new Rule();
    $r->filename = 'c:/temp/wee2.txt';

    // Run the WEE rule
    if ($client->run($r))
    {
        echo 'Rule ran succesfully</br>';
    } else {
        echo 'Rule did not run succesfully</br>';
    }

} catch (TException $tx) {
print 'Thrift Exception: '.$tx->getMessage()."\n";
}

?>

I have highlighted in yellow the code that is specific to my example. Normally, you would replace this highlighted code with the names appropriate to your Apache Thrift service definition.

To run the PHP client, I open the URL http://localhost:8081/example/wee_client.php in my browser which displays the message Rule ran succesfully.

As you can see, most of the code that is needed is already supplied in the Apache Thrift distribution. You just need to copy this code and revise as needed.

Thursday, June 21, 2012

Using Apache Thrift

A few months ago, we developed a port of the Workflow Execution Engine (WEE) using the Erlang programming language. The source code is now available on SourceForge here. This workflow engine has been used on some internal projects with great success.

Because our company uses a variety of programming languages for development, I looked for a mechanism for integrating WEE into these other environments. My goal was to end up with something as shown below.

My investigation discovered that there were several good alternatives, but the Apache Thrift seemed best because of its broad programming language support and general features. Documentation for Thrift is very spotty and inconsistent and it was a struggle to get all the pieces and parts working together. I strongly recommend the online document Thrift: The Missing Guide for anyone else attempting to do this. Apache Thrift is actually quite simple to use and I realize now that I was making things much harder than they needed to be.

The first step in using Thrift is to develop a specification of what the remote service and data structures will look like. In the example shown below, a structure named Rule is defined with three elements named filename, contextList, and rules. The remote application will provide one service named run with the Rule structure as the payload.

/*
   This file contains the definition of an Apache Thrift service for executing rules using the
   Workflow Execution Engine (WEE).

   Version 1.0 - June 2012
*/

struct Rule {
    1: optional string filename;            // name of a file containing rules to run
    2: optional string contextList;        // array of variable names and values in Erlang list format

3: optional string rules; // Instead of embedding rules in a file, specify the rules to run here
}

service Wee {
string run(1:Rule rule);
}

The next step is to compile this specification using the Thrift Compiler into the various programming languages that will act as clients or servers. Three examples are shown below for generating Erlang, PHP, and Java code bindings respectively.

thrift-0.8.0.exe -r --gen erl wee.thrift

thrift-0.8.0.exe -r --gen php wee.thrift

thrift-0.8.0.exe -r --gen java wee.thrift

The client-side code is relatively simple as shown by the following example using Erlang. First, a connection is made to the server-side code. Next, a data structure is created and populated with the appropriate values. After that, the client invokes the run service passing the data that was defined in the previous step. Finally, the connection is closed.

start(Args) ->
    #options{port = Port, client_opts = ClientOpts} = parse_args(Args),

    % Connect to server
    {ok, Client0} = thrift_client_util:new("127.0.0.1", Port, wee_thrift, ClientOpts),

    % Create data structure
    RuleStruct = #rule {
        filename = "c:/temp/wee_rules.txt"
    },

    % Call remote server
    {Client01, R} = thrift_client:call(Client0, run, [RuleStruct]),

    % Display results
    io:format("Result ~p~n", [R]),

    % Close connection
    thrift_client:close(Client01).

The server-side code is also simple. An example in Erlang is shown below.

handle_function(run, {Rule})
when is_record(Rule, rule) ->

    % extract data components
    Filename = Rule#rule.filename,
    ContextList = Rule#rule.contextList,
    Rules = Rule#rule.rules,

% do some processing here

% send reply back to client

{reply, Result}.

Note that the code shown above is the only custom-code that I needed to write. However, there is some additional boiler-plate code that is required to setup and tear down the connection, but this boiler-plate code is provided in the Apache Thrift distribution.

Monday, June 11, 2012

Building a Recommendation System

I have been interested in recommendation systems for a number of years and following research in this area and finally had an opportunity to experiment more with this technology. I was spurred on by a chance conversation with a family relative that had purchased an eReader and went to some of the public domain book sites that I told them about, but was a bit overwhelmed with the number of books available and the relative difficulty in browsing their lists.

So I decided to build a mobile web application that would serve as a front-end to these public domain sites and help guide people to: new books, top books that others enjoyed reading, and recommendations for other books based on your reading history and ratings. You can try the application at http://www.impactsoftwarelabs.com/book/new.php and please provide comments about ways that it could be improved.

It turns out that a book application is a great subject area for recommendation systems because:

The quantity of items is large enough (about 40,000 books) to be non-trivial, but not too large to be difficult to work with.
Each book has a variety of primary attributes that can be used by the recommendation algorithm including: title, book category, subject, author, date written, and language.
Availability of download information over several years that can help rate a book's popularity.

I built the system using the following key pieces of technology:

Information about books, categories, downloads, and recommendations are stored in a MySQL database.
User interface built using HTML5 and JQuery Mobile Javascript libraries.
Initially I built the recommendation components using custom code, but later replaced it using the MyMediaLite Recommender System Library. MyMediaLite is an open-source system developed at the University of Hildesheim in Germany. It contains a number of different recommendation methods that can be easily applied to many different problem domains.
XML parsing of book catalog and download information performed by custom-written Java parsers.

I think the finished product turned out much nicer that I had hoped and the recommendations generated by the engine seem to be very relevant. Please try it and let me know what you think.