Skip to content

fix: specify UTF-8 encoding for file I/O to prevent UnicodeEncodeError on Windows#432

Open
octo-patch wants to merge 1 commit into
stanford-oval:mainfrom
octo-patch:fix/issue-352-unicode-encode-error-utf8
Open

fix: specify UTF-8 encoding for file I/O to prevent UnicodeEncodeError on Windows#432
octo-patch wants to merge 1 commit into
stanford-oval:mainfrom
octo-patch:fix/issue-352-unicode-encode-error-utf8

Conversation

@octo-patch

Copy link
Copy Markdown

Fixes #352

Problem

On Windows, Python uses the system default encoding (e.g. cp1252) when opening files without an explicit encoding. This encoding cannot represent many Unicode characters such as Greek letters (β, α, μ, etc.), causing UnicodeEncodeError when STORM generates articles containing such characters.

Example error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u03b2' in position 2116: character maps to <undefined>

Solution

Added encoding="utf-8" to all file open calls that were missing it:

  • FileIOHelper.write_str() in knowledge_storm/utils.py — used to write article .md files and outlines
  • FileIOHelper.load_str() in knowledge_storm/utils.py — used to read saved articles
  • LLM call history write in knowledge_storm/storm_wiki/engine.py
  • Output file writes in examples/costorm_examples/run_costorm_gpt.py

The encoding parameter is exposed as an optional argument with "utf-8" as the default, maintaining backward compatibility.

Testing

The fix ensures consistent UTF-8 encoding across all platforms (Windows, macOS, Linux), preventing encoding errors when articles contain non-ASCII characters like Greek letters, Chinese characters, or other Unicode symbols.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] UnicodeEncodeError: 'charmap' codec can't encode character '\u03b2'

1 participant