CPS: User Manual

Tool used

All these diagram where created using umbrello, an open-source and free program able to generate as well as read Java code. CPS was designed with this powerfull tool. It allows the generation of code in several languages (among which Java) as well as the importation of files written in the same languages. Furthermore, the generated code contains javadoc formatted commentaries containing all the variable, class, method, etc. descriptions recorded.

Before going any further

This page describes the way splitsort is implemented in CPS. If you have not read the principle page of this documentation yet, you are highly recommended to do it now. If you plan to use CPS, here is a checklist containing everything you should have before starting.

Class Diagram

Legend:

red: sorter
blue: myShell
green: myDataBases
yellow: crowdUser

Global sequence diagram of the CPS

a sequence diagramm describing the general steps of a CPS execution

This sequence diagram explains how the interaction between the user and CPS works in order for media to be placed along the specified axes.

First of all, the user calls CPS with the the new parameter using this command:

./cps.sh --new 'insert here the DB password'

It checks several thing (connection to the MySQL database, content of the config.xml file, etc) beofre going any further, as explained below (initialize method).
Such a command triggers the call of the MainClass.generateJob() method, a function that will perform the first half of a splitsort iteration. The details of this "first half" are explained in another sequence diagram below. In doing so, CPS uploads data to be used in HITs to CrowdFlower.
A loop "powered" by the user then starts. It will stop when media are placed along each axis (or when the user is bored and stops firing new iterations).
Inside the loop
1. When all the HITs for a job have been submitted, CrowdFlower sends a webhook to the antechamber at an address set before using this software using their web interface. A PHP script, when receiving this webhook, transforms its content into a csv file and sends a mail at a given address which you also have tp set before.
2. When receiving this email, the user logs on the machine hosting CPS and uses:
  
  ./cps.sh --iteration 'insert here the DB password'
  
  This triggers a splitsort iteration.
3. First, the part of the iteration corresponding to the retrieving of the results is performed by downloading HITs results from the antechamber.
4. Then, these results are read and interprated by CPS to fill the database with comparison results.
5. These results are used in the next part of the iteration to create new HITs to upload.
6. The data for the new HITs is uploaded to CrowdFlower where they will be processed by workers.
7. Temporary results along each axis are saved as well as the identifier of the CrowdFlower's job in which data was uploaded.

Keep in mind that this diagram is only a scheme: the real one would be MUCH more complex. More details are available below.

Initialisation

This sequence diagram describes how the classes used to perform the splitsort are initialised using the... initialize() method. The names diplayed are the actual names of the method, so feel free to browse the part of this website dedicated to the documentation of each class in order to get more precision on how each method works.

getConfig() reads the content of the config.xml file and returns, among other things, the user of the database, the path to the media.xml file, etc. For more details, go here.
getAxes() reads the same file and returns the axes along which the sort will be performed.
A MediaBase instance is created in order to easily retrieve information on the media to sort later.
An Interpreter instance is also built. Its role will be to write the results obtained in the correct file.
This method then enter in a loop allowing to do the following actions for each axis:
1. Call the getCurrentState() function in order to know the current state of the media along this axis.
2. A SplitSort instance is created using the data returned by the previous method.
3. Whether the sort along this axis is finished is checked.
The program then enters in an if statement:
- If it turns out that notFinished() is true at least for one axis:
  1. A MySQLbase instance is created to allow the program to communicate with the database.
  2. A Comparator instance is built to manage the sort along every axes,to gather the comparisons and to send them to...
  3. A CrowdManager instance which is created right after.
- Otherwise, the sort is finished:

The `generatingJob()` method

This diagram describes the generatingJob() method.

First of all, every instances are initialised with the initialize() method.
Using his sortIteration() method, the Comparator enters in a loop:
1. Along each axis, the corresponding SplitSort instance contained in the sorters attribute uses its iteration() method.
2. In doing so, each SplitSort uses addComparisons() to generate the comparisons it needs.
3. The Comparator gathers the comparisons using SplitSort's getGreater() and getSmaller().
The Comparator checks in the MySQLbase instance which comparisons have already been performed
Unknown comparisons are submitted to a CrowdManager instance using demandNewComparisons().
It then creates a new job at CrowdFlower. To do so, it copies the job whose ID is in the config.xml file.
The data necessary for CrowdFlower to submit the job is then generated. It is a csv file containing data such as the names of the media to compare and their identifier. This file is uploaded using a http PUT.

Once all this has been done, you have to wait for the mail that will signal a webhook.

The `Useresults()` method

In this part, we assume that results have been downloaded and read.

The Comparator instance retrieves the comparisons results from the MySQLbase instance
The comparisons are sent to each SplitSort instance using their endIteration() method that also triggers the splitting of all arrays in three parts using the insertion() method.
A dump of the database is then generated in order for its content to be easily put back in it if anything went horribly wrong.
In the end, xml files containing the intermediary results along each axis are generated.

UML modelisation