public inbox for pgsql-bugs@postgresql.org
help / color / mirror / Atom feedFrom: Thomas Munro <thomas.munro@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jeroen Vermeulen <jtvjtv@gmail.com>
Cc: VASUKI M <vasukianand0119@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: Tue, 14 Apr 2026 18:30:08 +1200
Message-ID: <CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com> (raw)
In-Reply-To: <CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>
References: <19354-eefe6d8b3e84f9f2@postgresql.org>
<CA+TgmoaRGSezRaA7x00X495Qho8WGTzggbDSUt-JsruXceZWug@mail.gmail.com>
<CA+zULE4L4rA2DLAcfy=eQL7w_ZexV4P5zpQRbP=_qrhJBEOzjg@mail.gmail.com>
<2292889.1765846569@sss.pgh.pa.us>
<CAE2r8H5vaSyaC_t1FcpHBo-BB_=SrFj7GFnOC-SxC6WDf5c9VA@mail.gmail.com>
<CA+zULE47EXZOp7qKYODd+mjSgDiR-WX5ZNBkwdKnj-Zc0FT58w@mail.gmail.com>
<CA+TgmoZaoc37ohnhF5inoPxWzfoznV483xQw8Fmw+ELFScv47g@mail.gmail.com>
<2393116.1765899706@sss.pgh.pa.us>
<CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>
On Wed, Dec 17, 2025 at 7:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I think there is a good chance that the right going-forward fix is to
> deprecate the encoding, because according to
> https://www.unicode.org/Public/MAPPINGS/EASTASIA/ReadMe.txt this and
> everything else that's now under
> https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/ were
> deprecated in 2001. By the time v19 is released, the deprecation will
> be a quarter-century old, and the fact that it doesn't work is good
> evidence that few people will miss it, though perhaps the original
> poster will want to put forward an argument for why we should still
> care about this.
Right, that stuff was withdrawn, along with the BIG5 and JIS X 0212
mappings (here's some interesting discussion about their normative
status[1]). From what I can figure out, JOHAB was an MS-DOS codepage
(1361), obsoleted by UHC (949) some time around MS-DOS 6.22 or MS-DOS
7 and Windows 95.
So +1 from me, set the phasers to git rm. Based on the comments for
enum pg_enc, we don't need to worry about numerical stability of
client-only encodings, so I just deleted it (unlike PG_MULE_INTERNAL
which became PG_UNUSED_1). I didn't mention it in
doc/src/sgml/appendix-obsolete.sgml: the decision criterion for that
seems to be that there was an SGML id that appeared in a URL, which is
not the case here. The release notes seem like enough of a tombstone
for something that we strongly suspect has 0 users. Wait until 20, or
just do it now?
I don't have an opinion yet whether the code in the back-branches
might be dangerous, or "fixing" it might be more dangerous, but it's
an interesting question...
[1] https://unicode.org/mail-arch/unicode-ml/y2002-m03/0691.html
Attachments:
[application/gzip] 0001-Remove-JOHAB-encoding.patch.gz (126.5K, 2-0001-Remove-JOHAB-encoding.patch.gz)
download
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-bugs@postgresql.org
Cc: thomas.munro@gmail.com, robertmhaas@gmail.com, tgl@sss.pgh.pa.us, jtvjtv@gmail.com, vasukianand0119@gmail.com, pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
In-Reply-To: <CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox