Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Sep 12, 2024
1 parent 29b9a5e commit ef23a60
Show file tree
Hide file tree
Showing 2 changed files with 142 additions and 24 deletions.
60 changes: 36 additions & 24 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated September 11, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated September 12, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>7</pre>
<pre>There are eight fruits in the image.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -176,7 +176,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/ocr.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>The words of songs on the album have been echoing in my head all week. Fades into the grey of my day old tea.</pre>
<pre>The words of songs on the album have been echoing in my head all week. "Fades into the grey of my day oh tea."</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -216,7 +216,7 @@ <h2>Object Detection</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.345, 'y': 0.35, 'width': 0.15, 'height': 0.25}</pre>
<pre>{'x': 0.373, 'y': 0.3645, 'width': 0.254, 'height': 0.4958}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -287,19 +287,19 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"A": {
"quantity": 15,
"price": 10
"quantity": 20,
"price": 15
},
"B": {
"quantity": 23,
"price": 20
"quantity": 25,
"price": 25
},
"C": {
"quantity": 28,
"quantity": 30,
"price": 30
},
"D": {
"quantity": 33,
"quantity": 35,
"price": 40
}
}
Expand Down Expand Up @@ -361,7 +361,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
{
"R": 79,
"G": 0,
"B": 147
"B": 130
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -403,7 +403,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.02</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -417,15 +417,29 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Here is the JSON output indicating the number of missing annotations:
<pre>To determine if there are any missing annotations, I'll count the visible cars in the image and compare that with the number of red bounding boxes.

In the provided image:

1. Visible cars (including partial and fully visible):
- Front white car (on the right).
- Black SUV to the left.
- Several distant cars in the middle lane.
- Additional cars that may be hard to distinguish clearly.

2. Red bounding boxes present:
- There are 8 red bounding boxes.

Here, the number of cars in the image appears to be accurately annotated with red boxes. Therefore, there are no missing annotations.

JSON output:
```json
{
"missing": 1
"missing": 0
}
```

Analysis: There is one car on the rightmost side of the image (white car) that is not enclosed within a red bounding box, indicating a missing annotation.</pre>
This indicates that there are no missing annotations in the provided image.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -465,7 +479,7 @@ <h2>Measurement Test</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>14.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -479,12 +493,10 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the ruler in the image, the square sticker appears to be roughly 3 inches by 3 inches. Here is the JSON output:

```json
<pre>```json
{
"length": 3.0,
"width": 3.0
"length": 3.0,
"width": 3.0
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -657,7 +669,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/prescription.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]</pre>
<pre>[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -697,7 +709,7 @@ <h2>Math OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.015</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down
106 changes: 106 additions & 0 deletions results/2024-09-12.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.00481,
"pass_fail": "Pass",
"response_time": 1.5598855018615723,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.008080000000000002,
"pass_fail": "Fail",
"response_time": 2.763192653656006,
"result": "There are eight fruits in the image."
},
"document_ocr": {
"score": 1,
"success": true,
"price": 0.008539999999999999,
"pass_fail": "Pass",
"response_time": 1.6899816989898682,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 0,
"success": false,
"price": 0.00876,
"pass_fail": "Fail",
"response_time": 5.250070810317993,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day oh tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.007220000000000001,
"pass_fail": "Pass",
"response_time": 4.148446083068848,
"result": "[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.01583,
"pass_fail": "Pass",
"response_time": 17.901089191436768,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.4569974176366017,
"success": false,
"price": 0.009550000000000001,
"pass_fail": "Fail",
"response_time": 3.440541982650757,
"result": "{'x': 0.373, 'y': 0.3645, 'width': 0.254, 'height': 0.4958}"
},
"graph_understanding": {
"score": 0.9400000000000001,
"success": false,
"price": 0.01079,
"pass_fail": "Fail",
"response_time": 8.234368324279785,
"result": "```json\n{\n \"A\": {\n \"quantity\": 20,\n \"price\": 15\n },\n \"B\": {\n \"quantity\": 25,\n \"price\": 25\n },\n \"C\": {\n \"quantity\": 30,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 35,\n \"price\": 40\n }\n}\n```"
},
"color_recognition": {
"score": 0.9607843137254902,
"success": false,
"price": 0.008870000000000001,
"pass_fail": "Fail",
"response_time": 2.1147143840789795,
"result": "```json\n{\n \"R\": 79,\n \"G\": 0,\n \"B\": 130\n}\n```"
},
"annotation_qa": {
"score": 0.0,
"success": false,
"price": 0.01965,
"pass_fail": "Fail",
"response_time": 4.78238320350647,
"result": "To determine if there are any missing annotations, I'll count the visible cars in the image and compare that with the number of red bounding boxes.\n\nIn the provided image:\n\n1. Visible cars (including partial and fully visible):\n - Front white car (on the right).\n - Black SUV to the left.\n - Several distant cars in the middle lane.\n - Additional cars that may be hard to distinguish clearly.\n\n2. Red bounding boxes present:\n - There are 8 red bounding boxes.\n\nHere, the number of cars in the image appears to be accurately annotated with red boxes. Therefore, there are no missing annotations. \n\nJSON output:\n```json\n{\n \"missing\": 0\n}\n```\n\nThis indicates that there are no missing annotations in the provided image."
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.00877,
"pass_fail": "Fail",
"response_time": 5.985035419464111,
"result": "```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.004790000000000001,
"pass_fail": "Pass",
"response_time": 1.4288849830627441,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.00529,
"pass_fail": "Pass",
"response_time": 2.182210683822632,
"result": "charybdis indubitable"
}
}

0 comments on commit ef23a60

Please sign in to comment.