Our group has purchased the license of several data sets. Our industry partners also share some data with us. Please be careful with the license of each individual data set. Please limit the read access of the data to only yourself by default, unless you have discussed with Yi and are clear about the license of the data.
We also crawled some data sets from the internet. If you need to purchase other data sets, please contact Yi.
Some of the data are available at /home/shared/data on irkm.cse.ucsc.edu (linux)
Some of the data are on \\castlerock (windows)
The data sets usually includes a corpus, a set of queries and a set of relevance judgements (ask Yi for data)
Need to get your data annotated?
If you need many annotators for small tasks, try Amazon Mechanical Turk (suggested rate: $2-$4/hr)
If you need a few annotators for a long period of time, try oDesk https://www.odesk.com/w/odesk_story or hire workstudy undergraduate students ($5/hr for workstudy students)
If you need many annotator for many tasks, try to build a game (suggested rate: -$1-$0/hr)