0.8
Sorting media using crowdsourcing.   
Doxygen
LIRIS
Public Member Functions | Private Member Functions | Private Attributes

crowdUser.Refiner Class Reference

Parses csv formatted raw results and allows their amelioration by get another label, a program designed by P Ipeirotis (a researcher in computer science specialized in crowdsourcing). More...

List of all members.

Public Member Functions

 Refiner (List< String > fields)
 Creates a Refiner instance and sets the fields considered in the formatted results. <../..>
void csv2List (String csvFile)
 Puts the data contained in a csv file in the data attribute. <../..>
void getanotherlabel ()
 Calls Panos Ipeirotis and his students' get another label to treat the results of the HIT. <../..>
List< String[]> getData ()
 Returns the data attribute. <../..>

Private Member Functions

String signature (String axis, String idMedia1, String idMedia2)
 Returns the "signature" of a comparison, a String identifying it without ambiguity. <../..>
void getFields (List< String > csvFields)
 Initializes columnsCSV and column* using the content of the 'csvFields' parameter. <../..>
String generateCorrectFile ()
 Creates the "correct file" used by get another label (it is returned as a String). <../..>
String generateInputFile ()
 Creates the "input file" used by 'get another label' (it is returned as a String) and puts the comparisons in the compScores and compNumber Map. <../..>

Private Attributes

String[] columnsCSV
 The labels of the columns in the csv file. <../..>
Integer columnAxis
 The column in which the axis considered for the comparison is stored. <../..>
Integer columnMedia1
 The column in which the identifier of the first media is. <../..>
Integer columnMedia2
 The column in which the identifier of the second media is. <../..>
Integer columnWorkerId
 The column in which the identifier of the worker who performed the comparison. <../..>
Integer columnAnswer
 The column in which is stored the result of the question asked to the worker. <../..>
Map< String, Integer > compScores
 A Map<String,Integer> containing the score of a comparison : for each hit, if the worker said the right member was greater, score+=1 ; else score += -1. <../..>
List< String[]> data
 A List of array of string containing the data currently studied : hard data or treated one, depending on the moment. <../..>

Detailed Description

Parses csv formatted raw results and allows their amelioration by get another label, a program designed by P Ipeirotis (a researcher in computer science specialized in crowdsourcing).

In doing so, the noise is reduced and the comparisons are more accurate, thus assuring a better sort.

More generally, this class parses results in order them to be easily used by a CrowdManager instance, for example by transforming a csv file into a list of String.

Author:
Leo Perrin (perrin.leo@gmail.com)

Definition at line 25 of file Refiner.java.


Constructor & Destructor Documentation

crowdUser.Refiner.Refiner ( List< String >  fields)

Creates a Refiner instance and sets the fields considered in the formatted results.

It will use the fields contained in the 'fields' parameter to initialize the columnsCSV attribute as well as all those corresponding to a column number (i.e : column[Axis | Media[1|2] | Answer | WorkerId]).

Parameters:
fieldsA list containing the fields of the CSV file containing the HIT results.

Definition at line 78 of file Refiner.java.

  {
        this.data = new ArrayList <String[]> ();
        getFields(fields);
  }

Member Function Documentation

void crowdUser.Refiner.csv2List ( String  csvFile)

Puts the data contained in a csv file in the data attribute.

The data attribute is a List of Arrays of String. Each array is made of the content of one row of the csv file, each one of its case being the content of a column in this line.

Parameters:
csvFileThe path to the csv file within the '../data/HITresul../..de> directory, minus the extension (for instance, results$JOB_ID instead of results$JOB_ID.csv).

Definition at line 153 of file Refiner.java.

  {

      this.data.clear();
        try
        {
              // connecting to the .csv "database"
              Class.forName("org.relique.jdbc.csv.CsvDriver");
            Connection conCSV = DriverManager.getConnection("jdbc:relique:csv:" + "../data/HITresults/");
            Statement stmt = conCSV.createStatement();
            // creating and sending the query
            String queryCSV = "SELECT * FROM " + csvFile;
            ResultSet rs = stmt.executeQuery(queryCSV);
            // using the results
            while (rs.next())
            {
              List<String> request = new ArrayList<String>();
              for (int i=1; i<this.columnsCSV.length+1; i++)
                    request.add(rs.getString(i));
              this.data.add(request.toArray(new String[0]));
            }
              // closing everything
              rs.close();
              stmt.close();
              conCSV.close();
        }
        catch(Exception e)  { e.printStackTrace(); }
  }
String crowdUser.Refiner.generateCorrectFile ( ) [private]

Creates the "correct file" used by get another label (it is returned as a String).

"Get another label" needs a "correct file" in its input. For more details, see this (correct file section).

Definition at line 265 of file Refiner.java.

  {
        String correctFile = "" ;
        try
        {
              String line;
              for (int i=0; i<this.data.size(); i++)
              {
                    String[] dataLine = this.data.get(i);
                    String sig = signature(dataLine[this.columnAxis], dataLine[this.columnMedia1], dataLine[this.columnMedia2]) ;
                    line = dataLine[this.columnWorkerId] + "\t" + sig + "\t" ;
                    if (  this.compScores.get(sig) > 0)
                          line += "1";
                    else
                          line += "2";
                    correctFile += line+"\n";  
              }
        }
        catch (Exception e) { e.printStackTrace(); }
        return correctFile ;
  }
String crowdUser.Refiner.generateInputFile ( ) [private]

Creates the "input file" used by 'get another label' (it is returned as a String) and puts the comparisons in the compScores and compNumber Map.

"Get another label" needs an "input file" in its input. For more details, see this (input file section).

This String formatted like a file (it contains "\n" at the end of each "line") contains the hard results of the HITs. The String is actually generated in the end of the "try" block. In this String, results are recorded they way they are in the csv file, i.e "1" means that the greater of the two media is the first, "2" meaning the contrary.

However, the score of each comparisons is calculated in this function for better performance. The score is an integer. This score is 0 in the beginning, then 1 is added if media1 IS greater than media2 and -1 is if it is actually the contrary. All the hard results are checked only once, results being added to the compScores Map attribute as they are checked.

Definition at line 304 of file Refiner.java.

  {
        String inputFile = "";
        this.compScores = new Hashtable <String,Integer> () ;
        try{
              String line;
              for (int i=0; i<this.data.size(); i++)
              {
                    String[] dataLine = this.data.get(i);
                    String sig = signature(dataLine[this.columnAxis], dataLine[this.columnMedia1], dataLine[this.columnMedia2]) ;
                  
                    // storing the results for future retrieval
                    if (dataLine[this.columnAnswer].equals("Media 1"))
                          if (this.compScores.containsKey(sig))
                                this.compScores.put(sig, this.compScores.get(sig) + 1) ;
                          else
                                this.compScores.put(sig, 1);
                    else
                          if (this.compScores.containsKey(sig))
                                this.compScores.put(sig, this.compScores.get(sig) - 1) ;
                          else
                                this.compScores.put(sig, -1);
                    
                    // writing the "input file"
                    line = dataLine[this.columnWorkerId] + "\t" + sig + "\t" ;
                    if (dataLine[this.columnAnswer].equals("Media 1"))
                          line += "1";
                    else
                          line += "2";
                    inputFile += line + "\n";
              }
        }
        catch (Exception e) { e.printStackTrace(); }
        return inputFile ;
  }
void crowdUser.Refiner.getanotherlabel ( )

Calls Panos Ipeirotis and his students' get another label to treat the results of the HIT.

This method has three steps:

  1. All the "files" (here, String formatted like files) are generated using the appropriate method, except the costFile which is hard coded.
  2. An ipeirotis.gal.scripts.DawidSkene instance is then created if their is enough HIT results to better their quality ; otherwise results are taken directly from the correctFile String. Indeed, to few data causes get another label to violently crash.
  3. The results are parsed and put in the data attribute.

For more details on how get another label works, feel free to read its partially commented source files or to browse its absence of documentation.

Definition at line 199 of file Refiner.java.

  {
        // step 1: creation of the "files"
        String inputFile = generateInputFile() ;
        String correctFile = generateCorrectFile() ;
        String costFile = "1\t1\t0\n2\t2\t0\n1\t2\t1\n2\t1\t1" ;
        Integer iterations = 10 ;
        HashMap<String,String> posterior_voting = new HashMap <String,String>() ;
        List<String[]> newData = new ArrayList <String[]> () ;
        
        // step 2: get another label is called if their is enough HIT results (here, 10).
        if (this.data.size() > 10)
              try
              {   
                    // the content of this block is roughly a copy-pasta from "get another label" main method.
                    String[] lines_input = inputFile.split("\n");
                    Vector<Labeling> labelings = DawidSkene.loadLabels(lines_input);
                    String[] lines_correct = correctFile.split("\n");
                    Vector<Labeling> correct = DawidSkene.loadLabels(lines_correct);
                    String[] lines_cost = costFile.split("\n") ; 
                    Vector<Labeling> costs = DawidSkene.loadLabels(lines_cost);
                    DawidSkene ds = new DawidSkene(labelings, correct, costs);
                    ds.estimate(iterations);
                    ds.updateAnnotatorCosts();
                    posterior_voting = ds.getMajorityVote();
              }
              catch (Exception e) { e.printStackTrace(); }

        // step 2 (again): if their is not enough HIT results, raw results from the "correctFile" String are used directly.
        else
        {
              for (int i=0; i<this.data.size(); i++)
              {
                    String[] dataLine = this.data.get(i);
                    String sig = signature(dataLine[this.columnAxis], dataLine[this.columnMedia1], dataLine[this.columnMedia2]) ;
                    if (  this.compScores.get(sig) > 0)
                          posterior_voting.put(sig, "1");
                    else
                          posterior_voting.put(sig, "2");
              }     
        }
        
        // Step 3: The refined results are put in the "data" attribute
        for (int i=0; i<this.data.size(); i++)
        {
              List<String> newEntry = new ArrayList <String> () ;
              String sigComparisonStudied = signature( this.data.get(i)[this.columnAxis] , 
                              this.data.get(i)[this.columnMedia1] , this.data.get(i)[this.columnMedia2]);
              newEntry.add(this.data.get(i)[this.columnAxis]);
              newEntry.add(this.data.get(i)[this.columnMedia1]);
              newEntry.add(this.data.get(i)[this.columnMedia2]);
              newEntry.add ( posterior_voting.get(sigComparisonStudied) ) ;
              newData.add(newEntry.toArray(new String[0]));
        }
        this.data = newData ;  
  }
List<String[]> crowdUser.Refiner.getData ( )

Returns the data attribute.

Returns:
this.data

Definition at line 345 of file Refiner.java.

  {
        return this.data ;
  }
void crowdUser.Refiner.getFields ( List< String >  csvFields) [private]

Initializes columnsCSV and column* using the content of the 'csvFields' parameter.

For each String contained in csvFields, it is checked whether its value is one of the compulsory ones (i.e, idMedia1, idMedia2, WorkerId, axis, and a last much longer one corresponding to the question asked), in which case the column number is saved in the corresponding attribute.

Although MiscData[1/2] are also compulsory fields, they are not saved here in order to save unnecessary space.

Parameters:
csvFieldsA lists containing the fields of the CSV file containing the HIT results.

Definition at line 122 of file Refiner.java.

  {
       this.columnsCSV = csvFields.toArray(new String[0]) ;
       Integer rank = 0;
       for (String field : csvFields)
            {
              if (field.equals("idMedia1") )
                    this.columnMedia1 = rank ;
              else if (field.equals("idMedia2") )
                    this.columnMedia2 = rank ;
              else if (field.equals("_worker_id") )
                    this.columnWorkerId = rank ;
              else if (field.equals("axis") )
                    this.columnAxis = rank ;
              else if (field.length() > 25 )
                    this.columnAnswer = rank ;
              rank ++ ;
            }
  }
String crowdUser.Refiner.signature ( String  axis,
String  idMedia1,
String  idMedia2 
) [private]

Returns the "signature" of a comparison, a String identifying it without ambiguity.

The signature of a comparison is defined as follow : if a comparisons has been performed between idMedia1 and idMedia2 along the axis axis, its signature is : axis__idMedia1__idMedia2. It is the key under which the result of the comparison is stored in the compScores attribute.

Parameters:
axisthe axis along which the studied comparison is performed
idMedia1the identifier of the first media
idMedia2the identifier of the second media
Returns:
sig, a String : the signature of a comparison.
See also:
.compScores

Definition at line 104 of file Refiner.java.

  {
        String sig =  axis + "__" + idMedia1 + "__" + idMedia2;
        return sig;
  }

Member Data Documentation

Integer crowdUser.Refiner.columnAnswer [private]

The column in which is stored the result of the question asked to the worker.

Definition at line 54 of file Refiner.java.

Integer crowdUser.Refiner.columnAxis [private]

The column in which the axis considered for the comparison is stored.

Definition at line 38 of file Refiner.java.

Integer crowdUser.Refiner.columnMedia1 [private]

The column in which the identifier of the first media is.

Definition at line 42 of file Refiner.java.

Integer crowdUser.Refiner.columnMedia2 [private]

The column in which the identifier of the second media is.

Definition at line 46 of file Refiner.java.

String [] crowdUser.Refiner.columnsCSV [private]

The labels of the columns in the csv file.

Definition at line 34 of file Refiner.java.

The column in which the identifier of the worker who performed the comparison.

Definition at line 50 of file Refiner.java.

Map<String,Integer> crowdUser.Refiner.compScores [private]

A Map<String,Integer> containing the score of a comparison : for each hit, if the worker said the right member was greater, score+=1 ; else score += -1.

Each comparisons is identified by its signature.

Definition at line 59 of file Refiner.java.

List<String[]> crowdUser.Refiner.data [private]

A List of array of string containing the data currently studied : hard data or treated one, depending on the moment.

Definition at line 63 of file Refiner.java.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables