Skip to content

Python: reject non-base64 data URIs in detect_media_type_from_base64#6629

Open
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/detect-media-type-non-base64-uri
Open

Python: reject non-base64 data URIs in detect_media_type_from_base64#6629
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/detect-media-type-non-base64-uri

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

Problem

detect_media_type_from_base64 is part of the public API and its docstring documents that it raises ValueError on bad input. When given a data URI that has no ;base64, segment, though, it raises a bare IndexError instead:

from agent_framework import detect_media_type_from_base64

detect_media_type_from_base64(data_uri="data:image/svg+xml,<svg/>")
# IndexError: list index out of range

Per RFC 2397 a data URI can carry a URL-encoded (non-base64) payload, so data:image/svg+xml,<svg/>, data:text/plain,Hello, etc. are all valid URIs a caller can reasonably pass in. The function only understands base64 payloads, so the right outcome is the documented ValueError, not an opaque IndexError leaking from the internal split(";base64,", 1)[1].

Fix

Guard the split with the same check the rest of this module already uses. _get_data_bytes_as_str does if ";base64," not in uri: raise ContentError(...) before splitting, and _validate_uri checks for the comma before its split — this was the one spot that skipped the check. Now a URI without ;base64, raises a clear ValueError.

Test

Added test_detect_media_type_rejects_non_base64_data_uri, which asserts the three URIs above raise ValueError. It fails on main (the call raises IndexError, which the pytest.raises(ValueError) does not catch) and passes with this change. The existing detect_media_type_from_base64 test still passes.

Copilot AI review requested due to automatic review settings June 19, 2026 15:15
@moonbox3 moonbox3 added the python Issues related to the Python codebase label Jun 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Python public API detect_media_type_from_base64 so that non-base64 data URIs (valid per RFC 2397, but unsupported by this helper) raise the documented ValueError instead of leaking an internal IndexError.

Changes:

  • Add an explicit ";base64," presence check for data_uri inputs in detect_media_type_from_base64 and raise a clear ValueError when missing.
  • Add a regression test ensuring several non-base64 data: URIs raise ValueError with a base64-related message.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/core/agent_framework/_types.py Adds validation to reject non-base64 data_uri inputs with a clear ValueError.
python/packages/core/tests/core/test_types.py Adds regression coverage for non-base64 data URIs passed to detect_media_type_from_base64.

Comment on lines 121 to 123
if data_uri is not None:
if data is not None:
raise ValueError("Provide exactly one of data_bytes, data_str, or data_uri.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Issues related to the Python codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants