Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Aug 16, 2024
1 parent f417942 commit c1bc605
Show file tree
Hide file tree
Showing 2 changed files with 131 additions and 19 deletions.
44 changes: 25 additions & 19 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated August 15, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated August 16, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>There are 8 fruits in the image.</pre>
<pre>There are 9 pieces of fruit in the image.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -176,7 +176,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.345, 'y': 0.371, 'width': 0.242, 'height': 0.397}</pre>
<pre>{'x': 0.44, 'y': 0.41, 'width': 0.25, 'height': 0.34}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -237,15 +237,15 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
"price": 10
},
"B": {
"quantity": 25,
"quantity": 22,
"price": 20
},
"C": {
"quantity": 30,
"quantity": 25,
"price": 30
},
"D": {
"quantity": 35,
"quantity": 30,
"price": 40
}
}
Expand Down Expand Up @@ -303,11 +303,13 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/color.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Failed to produce a valid JSON output: {
<pre>```json
{
"R": 79,
"G": 0,
"B": 128
}</pre>
"B": 159
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -347,7 +349,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -361,13 +363,15 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
<pre>Based on the image you provided, it appears there are several cars not enclosed within the red bounding boxes. Specifically, I see two cars that are not annotated: one on the far left side of the image and one on the far right side (white car approaching the camera).

Here is the requested JSON output indicating the number of missing annotations:

```json
{
"missing": 1
"missing": 2
}
```

In the provided image, there are seven cars visible, but only six are labeled with red bounding boxes. Therefore, there is one missing annotation.</pre>
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -407,7 +411,7 @@ <h2>Measurement Test</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -421,7 +425,9 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
<pre>Based on the ruler in the image, the square sticker appears to be 3 inches in length and 3 inches in width.

```json
{
"length": 3.0,
"width": 3.0
Expand Down Expand Up @@ -651,7 +657,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/prescription.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]</pre>
<pre>[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -759,7 +765,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/easy_captcha.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>charybdis in-dubitable</pre>
<pre>charybdis indubitable</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://charlesfrye.github.io/" target="_blank">Charles Frye</a></p>
</div>
</div>
Expand Down
106 changes: 106 additions & 0 deletions results/2024-08-16.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.00481,
"pass_fail": "Pass",
"response_time": 3.2271170616149902,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.008170000000000002,
"pass_fail": "Fail",
"response_time": 3.070826530456543,
"result": "There are 9 pieces of fruit in the image."
},
"document_ocr": {
"score": 1,
"success": true,
"price": 0.008539999999999999,
"pass_fail": "Pass",
"response_time": 3.501218795776367,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 7.230176687240601,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.007220000000000001,
"pass_fail": "Pass",
"response_time": 3.956480026245117,
"result": "[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015290000000000002,
"pass_fail": "Pass",
"response_time": 4.852380037307739,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.6045519203413936,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 4.6265175342559814,
"result": "{'x': 0.44, 'y': 0.41, 'width': 0.25, 'height': 0.34}"
},
"graph_understanding": {
"score": 0.9099999999999999,
"success": false,
"price": 0.01079,
"pass_fail": "Fail",
"response_time": 3.4058024883270264,
"result": "```json\n{\n \"A\": {\n \"quantity\": 15,\n \"price\": 10\n },\n \"B\": {\n \"quantity\": 22,\n \"price\": 20\n },\n \"C\": {\n \"quantity\": 25,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 30,\n \"price\": 40\n }\n}\n```"
},
"color_recognition": {
"score": 0.9856209150326798,
"success": false,
"price": 0.008870000000000001,
"pass_fail": "Fail",
"response_time": 2.1292402744293213,
"result": "```json\n{\n \"R\": 79,\n \"G\": 0,\n \"B\": 159\n}\n```"
},
"annotation_qa": {
"score": 0.6666666666666667,
"success": false,
"price": 0.01734,
"pass_fail": "Fail",
"response_time": 5.602948188781738,
"result": "Based on the image you provided, it appears there are several cars not enclosed within the red bounding boxes. Specifically, I see two cars that are not annotated: one on the far left side of the image and one on the far right side (white car approaching the camera).\n\nHere is the requested JSON output indicating the number of missing annotations:\n\n```json\n{\n \"missing\": 2\n}\n```"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.00958,
"pass_fail": "Fail",
"response_time": 7.811402797698975,
"result": "Based on the ruler in the image, the square sticker appears to be 3 inches in length and 3 inches in width. \n\n```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.004790000000000001,
"pass_fail": "Pass",
"response_time": 1.34891939163208,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.00529,
"pass_fail": "Pass",
"response_time": 1.457381010055542,
"result": "charybdis indubitable"
}
}

0 comments on commit c1bc605

Please sign in to comment.