Skip to content

P4suta/aozora-proof

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

aozora-proof

ci latest release license msrv

A modern, FOSS proofreading toolkit for 青空文庫記法 (Aozora Bunko notation) text — built to run locally, in CI, and in the browser as a static web app.

aozora-proof checks the character level of a manuscript — the layer the aozora parser deliberately leaves alone — and folds in the parser's notation diagnostics into one unified report:

  • Character conformance — flags characters that may not appear literally in conformant text (outside JIS X 0208 / 機種依存文字 / half-width katakana) and must instead be written as 外字注記; plus file-structure checks (BOM, line endings, encoding).
  • Old-/new-form kanji (旧字体↔新字体) — detects kanji that have an old/new-form counterpart and suggests the alternate for the editor to confirm.
  • Gaiji (外字) lookup — 注記 ⇔ character ⇔ JIS 面区点 ⇔ Unicode, both ways.

The notation level (ruby, bouten, 外字 resolution, bracket pairing, diagnostics) is handled by the aozora parser, which aozora-proof consumes rather than reimplements.

Use

$ aozora-proof check seihon.txt
$ cat seihon.txt | aozora-proof check -
$ aozora-proof check --format json *.txt          # machine-readable, for CI
$ aozora-proof check --fail-on warning chapter*.txt
$ aozora-proof check --watch draft.txt            # re-check on every save
$ aozora-proof explain aozora::char::platform_dependent   # why a code fired
$ aozora-proof completions zsh > ~/.zfunc/_aozora-proof   # shell completions

Exit codes: 0 clean · 1 findings (--strict, or at/above --fail-on) · 2 usage / IO error · 3 internal-source finding (a tool bug).

CI / pre-commit

GitHub Action — runs the checks and uploads findings to the Security tab as SARIF:

# .github/workflows/aozora-proof.yml
permissions:
  contents: read
  security-events: write
jobs:
  proof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: P4suta/aozora-proof/action@main
        with:
          files: "**/*.txt"
          fail-on: error

pre-commit (pre-commit.com):

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/P4suta/aozora-proof
    rev: main
    hooks:
      - id: aozora-proof

Workspace

crate role
aozora-proof-core the engine — pure, forbid(unsafe), WASM-clean; &str / &[u8] → findings
aozora-proof-data character-classification tables (JIS 水準, 機種依存文字, 旧字体, gaiji), baked at build time
aozora-proof-cli the aozora-proof binary
aozora-proof-wasm wasm-bindgen façade powering the in-browser web app (web/)

A static web app (web/) runs the checks in the browser — paste text to see findings plus 外字 search — published to GitHub Pages.

Develop

./bootstrap.sh provisions the toolchain and dev tools; just --list shows every task. See CONTRIBUTING and ARCHITECTURE.

License

Apache-2.0 OR MIT, at your option. Vendored character data carries its own upstream licenses; see NOTICE.

About

Character-level proofreading for 青空文庫記法 (Aozora Bunko notation) text: JIS X 0208 conformance, 機種依存文字, 旧字体/新字体, gaiji lookup. A CLI + library built on the aozora parser.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors