Has anyone figured about incorrect annotations? The polygon co-ordinates are exceeding the height or width of the image.
Example 1: Image - id_efa2d99456fa83bfcf0a6d2b.jpg | Polygon - POLYGON ((1752 1123, 1752 1222, 1492 1222, 1492 1123, 1752 1123)); width, height = 1196, 1585
Example 2: Image - id_139c84d7b7b9355147f1f830.jpg | Polygon - POLYGON ((1473 1719, 1473 1885, 1296 1885, 1296 1719, 1473 1719)); width, height = 1577, 1567
There are over hundred such incorrect polygons, which also raise doubts about the accuracy of other polygons.
Can the moderator suggest the routes that we competitors can take?
I dont believe we are pointing to the same problem!
Images that contain this many bollworms usually come from farmers who do not check their traps as often as guidlines suggest. As a result, the pests are in a later phase of decomposition, and can thus be structurally a little different from the ones in fewer-pest, closer to guideline, images. For now we've asked annotators not to get too caught up in the details of such organic science. Hopefully in future releases of the dataset we can get more exact.
In a paper a couple years ago we talk about some of the difficulties in annotating this data. Please have a look if you're curious.
please count only 100. You will get this in very low density. So you will guess that data is right. The bounding box may be wrong in some places. We should try to make it more accurate using the given data. I counted. Once you try...
I am a member of the Wadhwani AI ML team and had some part in the curation of this data set.
I've looked into this issue and I believe this is a result of the annotation tool we used for a subset of the images, including these. The tool allowed annotators to draw boxes whose bounds could go outside of the image, but didn't "clip" the coordinates back to the valid area prior to saving. Competitors whose modelling framework doesn't automatically take care of this can force the clipping by:
from shapely.wkt import loads from shapely.ops import clip_by_rect from shapely.geometry import box
bounding_box = loads('POLYGON ((1752 1123, 1752 1222, 1492 1222, 1492 1123, 1752 1123))') image_frame = box(0, 0, 1567, 1577) valid_box = clip_by_rect(bounding_box, *image_frame.bounds)
I hope that helps. The Wadhwani AI team really appreciates everyone's effort, not just with the modelling, but in threads like this with the data. We wish everyone the best of luck!
@jerome-white @amyfIorida626 wanted to get some clarity regarding the missing annotations in the dataset. I have posed my questions below:
1. There are multiple images in the train set which have got worms (both pink bollworm and american bollworm), however, there are no annotations provided against those and the worm count in train set is 0. A few examples are id_3ccc249d56051088043fc07c.jpg, id_3ce537a8c42732341d5abf63.jpg, id_3d95259edba1e3baa82a9de0.jpg. There are many many such images. My question is - Is similar behavior visible in the test set as well? Or is the count of worms precise for the test set that wadhwaniAI holds?
2. I wanted to know if it falls under the rules of the competition to correct the annotations of images which are incorrectly annotated using a labelling tool to draw polygons? These may include - ones which are exceeding the limits of the bounding box and also - ones which arent tagged at all! The reason it is required is for cross validation. Such images in cross validation set impacts the decision making in choosing the algorithm and architecture.
Wouldnt make sense to provide a solution to the problem if thats not the approach WadhwaniAI is looking for! Do let me know the answer to the above stated points.
Repeating here for transparency:
With respect to the second point: from Wadhwani's perspective, we wouldn't have any problems with you altering the training data to boost your performance. Most competitors augment the data somehow -- I guess this would be just an extreme instance of that. We only ask that assistance with such alterations don't come from Wadhwani AI staff; I don't think that would be fair to other competitors.
If your model ends up being one of the winners, please provide your altered set along with your submission so that we can reproduce the results and really get the bottom of how your solution works. Even if you're not one of the finalists, however, it would be great if you could share whatever new resources you create with us. Once the competition is complete, I can get your annotations vetted by our experts and potentially added to our permanent set.