Parses csv formatted raw results and allows their amelioration by get another label
, a program designed by P Ipeirotis (a researcher in computer science specialized in crowdsourcing).
More...
Public Member Functions | |
Refiner (List< String > fields) | |
Creates a Refiner instance and sets the fields considered in the formatted results. <../..> | |
void | csv2List (String csvFile) |
Puts the data contained in a csv file in the data attribute. <../..> | |
void | getanotherlabel () |
Calls Panos Ipeirotis and his students' get another label to treat the results of the HIT. <../..> | |
List< String[]> | getData () |
Returns the data attribute. <../..> | |
Private Member Functions | |
String | signature (String axis, String idMedia1, String idMedia2) |
Returns the "signature" of a comparison, a String identifying it without ambiguity. <../..> | |
void | getFields (List< String > csvFields) |
Initializes columnsCSV and column* using the content of the 'csvFields' parameter. <../..> | |
String | generateCorrectFile () |
Creates the "correct file" used by get another label (it is returned as a String). <../..> | |
String | generateInputFile () |
Creates the "input file" used by 'get another label' (it is returned as a String) and puts the comparisons in the compScores and compNumber Map. <../..> | |
Private Attributes | |
String[] | columnsCSV |
The labels of the columns in the csv file. <../..> | |
Integer | columnAxis |
The column in which the axis considered for the comparison is stored. <../..> | |
Integer | columnMedia1 |
The column in which the identifier of the first media is. <../..> | |
Integer | columnMedia2 |
The column in which the identifier of the second media is. <../..> | |
Integer | columnWorkerId |
The column in which the identifier of the worker who performed the comparison. <../..> | |
Integer | columnAnswer |
The column in which is stored the result of the question asked to the worker. <../..> | |
Map< String, Integer > | compScores |
A Map<String,Integer> containing the score of a comparison : for each hit, if the worker said the right member was greater, score+=1 ; else score += -1. <../..> | |
List< String[]> | data |
A List of array of string containing the data currently studied : hard data or treated one, depending on the moment. <../..> |
Parses csv formatted raw results and allows their amelioration by get another label
, a program designed by P Ipeirotis (a researcher in computer science specialized in crowdsourcing).
In doing so, the noise is reduced and the comparisons are more accurate, thus assuring a better sort.
More generally, this class parses results in order them to be easily used by a CrowdManager instance, for example by transforming a csv file into a list of String.
Definition at line 25 of file Refiner.java.
crowdUser.Refiner.Refiner | ( | List< String > | fields | ) |
Creates a Refiner instance and sets the fields considered in the formatted results.
It will use the fields contained in the 'fields' parameter to initialize the columnsCSV
attribute as well as all those corresponding to a column number (i.e : column[Axis | Media[1|2] | Answer | WorkerId]
).
fields | A list containing the fields of the CSV file containing the HIT results. |
Definition at line 78 of file Refiner.java.
void crowdUser.Refiner.csv2List | ( | String | csvFile | ) |
Puts the data contained in a csv file in the data
attribute.
The data
attribute is a List of Arrays of String. Each array is made of the content of one row of the csv file, each one of its case being the content of a column in this line.
csvFile | The path to the csv file within the '../data/HITresul../..de> directory, minus the extension (for instance, |
Definition at line 153 of file Refiner.java.
{ this.data.clear(); try { // connecting to the .csv "database" Class.forName("org.relique.jdbc.csv.CsvDriver"); Connection conCSV = DriverManager.getConnection("jdbc:relique:csv:" + "../data/HITresults/"); Statement stmt = conCSV.createStatement(); // creating and sending the query String queryCSV = "SELECT * FROM " + csvFile; ResultSet rs = stmt.executeQuery(queryCSV); // using the results while (rs.next()) { List<String> request = new ArrayList<String>(); for (int i=1; i<this.columnsCSV.length+1; i++) request.add(rs.getString(i)); this.data.add(request.toArray(new String[0])); } // closing everything rs.close(); stmt.close(); conCSV.close(); } catch(Exception e) { e.printStackTrace(); } }
String crowdUser.Refiner.generateCorrectFile | ( | ) | [private] |
Creates the "correct file" used by get another label
(it is returned as a String).
"Get another label" needs a "correct file" in its input. For more details, see this (correct file section).
Definition at line 265 of file Refiner.java.
{ String correctFile = "" ; try { String line; for (int i=0; i<this.data.size(); i++) { String[] dataLine = this.data.get(i); String sig = signature(dataLine[this.columnAxis], dataLine[this.columnMedia1], dataLine[this.columnMedia2]) ; line = dataLine[this.columnWorkerId] + "\t" + sig + "\t" ; if ( this.compScores.get(sig) > 0) line += "1"; else line += "2"; correctFile += line+"\n"; } } catch (Exception e) { e.printStackTrace(); } return correctFile ; }
String crowdUser.Refiner.generateInputFile | ( | ) | [private] |
Creates the "input file" used by 'get another label' (it is returned as a String) and puts the comparisons in the compScores and compNumber Map.
"Get another label" needs an "input file" in its input. For more details, see this (input file section).
This String formatted like a file (it contains "\n" at the end of each "line") contains the hard results of the HITs. The String is actually generated in the end of the "try" block. In this String, results are recorded they way they are in the csv file, i.e "1" means that the greater of the two media is the first, "2" meaning the contrary.
However, the score of each comparisons is calculated in this function for better performance. The score is an integer. This score is 0 in the beginning, then 1 is added if media1 IS greater than media2 and -1 is if it is actually the contrary. All the hard results are checked only once, results being added to the compScores
Map attribute as they are checked.
Definition at line 304 of file Refiner.java.
{ String inputFile = ""; this.compScores = new Hashtable <String,Integer> () ; try{ String line; for (int i=0; i<this.data.size(); i++) { String[] dataLine = this.data.get(i); String sig = signature(dataLine[this.columnAxis], dataLine[this.columnMedia1], dataLine[this.columnMedia2]) ; // storing the results for future retrieval if (dataLine[this.columnAnswer].equals("Media 1")) if (this.compScores.containsKey(sig)) this.compScores.put(sig, this.compScores.get(sig) + 1) ; else this.compScores.put(sig, 1); else if (this.compScores.containsKey(sig)) this.compScores.put(sig, this.compScores.get(sig) - 1) ; else this.compScores.put(sig, -1); // writing the "input file" line = dataLine[this.columnWorkerId] + "\t" + sig + "\t" ; if (dataLine[this.columnAnswer].equals("Media 1")) line += "1"; else line += "2"; inputFile += line + "\n"; } } catch (Exception e) { e.printStackTrace(); } return inputFile ; }
void crowdUser.Refiner.getanotherlabel | ( | ) |
Calls Panos Ipeirotis and his students' get another label
to treat the results of the HIT.
This method has three steps:
costFile
which is hard coded. correctFile
String. Indeed, to few data causes get another label
to violently crash. data
attribute. For more details on how get another label
works, feel free to read its partially commented source files or to browse its absence of documentation.
Definition at line 199 of file Refiner.java.
{ // step 1: creation of the "files" String inputFile = generateInputFile() ; String correctFile = generateCorrectFile() ; String costFile = "1\t1\t0\n2\t2\t0\n1\t2\t1\n2\t1\t1" ; Integer iterations = 10 ; HashMap<String,String> posterior_voting = new HashMap <String,String>() ; List<String[]> newData = new ArrayList <String[]> () ; // step 2: get another label is called if their is enough HIT results (here, 10). if (this.data.size() > 10) try { // the content of this block is roughly a copy-pasta from "get another label" main method. String[] lines_input = inputFile.split("\n"); Vector<Labeling> labelings = DawidSkene.loadLabels(lines_input); String[] lines_correct = correctFile.split("\n"); Vector<Labeling> correct = DawidSkene.loadLabels(lines_correct); String[] lines_cost = costFile.split("\n") ; Vector<Labeling> costs = DawidSkene.loadLabels(lines_cost); DawidSkene ds = new DawidSkene(labelings, correct, costs); ds.estimate(iterations); ds.updateAnnotatorCosts(); posterior_voting = ds.getMajorityVote(); } catch (Exception e) { e.printStackTrace(); } // step 2 (again): if their is not enough HIT results, raw results from the "correctFile" String are used directly. else { for (int i=0; i<this.data.size(); i++) { String[] dataLine = this.data.get(i); String sig = signature(dataLine[this.columnAxis], dataLine[this.columnMedia1], dataLine[this.columnMedia2]) ; if ( this.compScores.get(sig) > 0) posterior_voting.put(sig, "1"); else posterior_voting.put(sig, "2"); } } // Step 3: The refined results are put in the "data" attribute for (int i=0; i<this.data.size(); i++) { List<String> newEntry = new ArrayList <String> () ; String sigComparisonStudied = signature( this.data.get(i)[this.columnAxis] , this.data.get(i)[this.columnMedia1] , this.data.get(i)[this.columnMedia2]); newEntry.add(this.data.get(i)[this.columnAxis]); newEntry.add(this.data.get(i)[this.columnMedia1]); newEntry.add(this.data.get(i)[this.columnMedia2]); newEntry.add ( posterior_voting.get(sigComparisonStudied) ) ; newData.add(newEntry.toArray(new String[0])); } this.data = newData ; }
List<String[]> crowdUser.Refiner.getData | ( | ) |
Returns the data
attribute.
Definition at line 345 of file Refiner.java.
{ return this.data ; }
void crowdUser.Refiner.getFields | ( | List< String > | csvFields | ) | [private] |
Initializes columnsCSV
and column*
using the content of the 'csvFields' parameter.
For each String contained in csvFields
, it is checked whether its value is one of the compulsory ones (i.e, idMedia1
, idMedia2
, WorkerId
, axis
, and a last much longer one corresponding to the question asked), in which case the column number is saved in the corresponding attribute.
Although MiscData[1/2]
are also compulsory fields, they are not saved here in order to save unnecessary space.
csvFields | A lists containing the fields of the CSV file containing the HIT results. |
Definition at line 122 of file Refiner.java.
{ this.columnsCSV = csvFields.toArray(new String[0]) ; Integer rank = 0; for (String field : csvFields) { if (field.equals("idMedia1") ) this.columnMedia1 = rank ; else if (field.equals("idMedia2") ) this.columnMedia2 = rank ; else if (field.equals("_worker_id") ) this.columnWorkerId = rank ; else if (field.equals("axis") ) this.columnAxis = rank ; else if (field.length() > 25 ) this.columnAnswer = rank ; rank ++ ; } }
String crowdUser.Refiner.signature | ( | String | axis, |
String | idMedia1, | ||
String | idMedia2 | ||
) | [private] |
Returns the "signature" of a comparison, a String identifying it without ambiguity.
The signature of a comparison is defined as follow : if a comparisons has been performed between idMedia1
and idMedia2
along the axis
axis, its signature is : axis__idMedia1__idMedia2. It is the key under which the result of the comparison is stored in the compScores
attribute.
axis | the axis along which the studied comparison is performed |
idMedia1 | the identifier of the first media |
idMedia2 | the identifier of the second media |
Definition at line 104 of file Refiner.java.
{ String sig = axis + "__" + idMedia1 + "__" + idMedia2; return sig; }
Integer crowdUser.Refiner.columnAnswer [private] |
The column in which is stored the result of the question asked to the worker.
Definition at line 54 of file Refiner.java.
Integer crowdUser.Refiner.columnAxis [private] |
The column in which the axis considered for the comparison is stored.
Definition at line 38 of file Refiner.java.
Integer crowdUser.Refiner.columnMedia1 [private] |
The column in which the identifier of the first media is.
Definition at line 42 of file Refiner.java.
Integer crowdUser.Refiner.columnMedia2 [private] |
The column in which the identifier of the second media is.
Definition at line 46 of file Refiner.java.
String [] crowdUser.Refiner.columnsCSV [private] |
The labels of the columns in the csv file.
Definition at line 34 of file Refiner.java.
Integer crowdUser.Refiner.columnWorkerId [private] |
The column in which the identifier of the worker who performed the comparison.
Definition at line 50 of file Refiner.java.
Map<String,Integer> crowdUser.Refiner.compScores [private] |
A Map<String,Integer> containing the score of a comparison : for each hit, if the worker said the right member was greater, score+=1 ; else score += -1.
Each comparisons is identified by its signature.
Definition at line 59 of file Refiner.java.
List<String[]> crowdUser.Refiner.data [private] |
A List of array of string containing the data currently studied : hard data or treated one, depending on the moment.
Definition at line 63 of file Refiner.java.