Python: reject non-base64 data URIs in detect_media_type_from_base64#6629
Open
he-yufeng wants to merge 1 commit into
Open
Python: reject non-base64 data URIs in detect_media_type_from_base64#6629he-yufeng wants to merge 1 commit into
he-yufeng wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Python public API detect_media_type_from_base64 so that non-base64 data URIs (valid per RFC 2397, but unsupported by this helper) raise the documented ValueError instead of leaking an internal IndexError.
Changes:
- Add an explicit
";base64,"presence check fordata_uriinputs indetect_media_type_from_base64and raise a clearValueErrorwhen missing. - Add a regression test ensuring several non-base64
data:URIs raiseValueErrorwith a base64-related message.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/packages/core/agent_framework/_types.py | Adds validation to reject non-base64 data_uri inputs with a clear ValueError. |
| python/packages/core/tests/core/test_types.py | Adds regression coverage for non-base64 data URIs passed to detect_media_type_from_base64. |
Comment on lines
121
to
123
| if data_uri is not None: | ||
| if data is not None: | ||
| raise ValueError("Provide exactly one of data_bytes, data_str, or data_uri.") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
detect_media_type_from_base64is part of the public API and its docstring documents that it raisesValueErroron bad input. When given a data URI that has no;base64,segment, though, it raises a bareIndexErrorinstead:Per RFC 2397 a data URI can carry a URL-encoded (non-base64) payload, so
data:image/svg+xml,<svg/>,data:text/plain,Hello, etc. are all valid URIs a caller can reasonably pass in. The function only understands base64 payloads, so the right outcome is the documentedValueError, not an opaqueIndexErrorleaking from the internalsplit(";base64,", 1)[1].Fix
Guard the split with the same check the rest of this module already uses.
_get_data_bytes_as_strdoesif ";base64," not in uri: raise ContentError(...)before splitting, and_validate_urichecks for the comma before its split — this was the one spot that skipped the check. Now a URI without;base64,raises a clearValueError.Test
Added
test_detect_media_type_rejects_non_base64_data_uri, which asserts the three URIs above raiseValueError. It fails onmain(the call raisesIndexError, which thepytest.raises(ValueError)does not catch) and passes with this change. The existingdetect_media_type_from_base64test still passes.