public inbox for pgsql-bugs@postgresql.org
help / color / mirror / Atom feedFrom: Henson Choi <assam258@gmail.com>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Heikki Linnakangas <hlinnaka@iki.fi>
Cc: Robert Haas <robertmhaas@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jeroen Vermeulen <jtvjtv@gmail.com>
Cc: VASUKI M <vasukianand0119@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: Wed, 15 Apr 2026 14:57:50 +0900
Message-ID: <CAAAe_zAFz1v-3b7Je4L+=wZM3UGAczXV47YVZfZi9wbJxspxeA@mail.gmail.com> (raw)
In-Reply-To: <CAAAe_zCwaccH7h+GOtHbo_docCY-o0c5NMRuYkdz15f=KL4f0g@mail.gmail.com>
References: <19354-eefe6d8b3e84f9f2@postgresql.org>
<CA+TgmoaRGSezRaA7x00X495Qho8WGTzggbDSUt-JsruXceZWug@mail.gmail.com>
<CA+zULE4L4rA2DLAcfy=eQL7w_ZexV4P5zpQRbP=_qrhJBEOzjg@mail.gmail.com>
<2292889.1765846569@sss.pgh.pa.us>
<CAE2r8H5vaSyaC_t1FcpHBo-BB_=SrFj7GFnOC-SxC6WDf5c9VA@mail.gmail.com>
<CA+zULE47EXZOp7qKYODd+mjSgDiR-WX5ZNBkwdKnj-Zc0FT58w@mail.gmail.com>
<CA+TgmoZaoc37ohnhF5inoPxWzfoznV483xQw8Fmw+ELFScv47g@mail.gmail.com>
<2393116.1765899706@sss.pgh.pa.us>
<CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>
<CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com>
<6a8122ac-123d-4e93-9269-0b3be1e4a5a4@iki.fi>
<CAAAe_zCLVunjt1u+2E86shwc3hk1x4bzUyU86nY1fq-nAVYN0Q@mail.gmail.com>
<CA+hUKGJMrcS=hBkqVk=5pjM4w8edG=_ArASC82RqB6HQro-v-g@mail.gmail.com>
<CAAAe_zCwaccH7h+GOtHbo_docCY-o0c5NMRuYkdz15f=KL4f0g@mail.gmail.com>
Subject: Fix and expand comments for Korean encodings in encnames.c
Hi hackers,
While reading through the encoding alias table in src/common/encnames.c,
I noticed a few long-standing inaccuracies and omissions in the per-entry
comments for the three Korean encodings.
The most visible issue is the JOHAB entry, whose comment describes it as
"Extended Unix Code for simplified Chinese" -- apparently a copy/paste
slip from a neighboring EUC entry. JOHAB is in fact the Korean
combining-style encoding defined in KS X 1001 annex 3.
The attached 0002 patch makes comment-only adjustments to the three
Korean encodings:
* JOHAB: replace the incorrect "simplified Chinese" description with
a correct one that identifies it as the Korean combining (Johab)
encoding standardized in KS X 1001 annex 3.
* EUC_KR: drop a stray space before the comma in the existing
comment, and note that the encoding covers the KS X 1001
precomposed (Wansung) form.
* UHC: spell out "Unified Hangul Code", clarify that it is
Microsoft Windows CodePage 949, and describe its relationship to
EUC-KR (superset covering all 11,172 precomposed Hangul syllables).
No behavior change, no catalog change, no pg_wchar.h change -- this
touches comments in src/common/encnames.c only. pgindent is clean.
Thanks,
Henson Choi
From c7a7335d2cf5a2881b25d9091fd020a2d62f7661 Mon Sep 17 00:00:00 2001
From: Henson Choi <assam258@gmail.com>
Date: Wed, 15 Apr 2026 14:52:35 +0900
Subject: [PATCH v1] Fix and expand comments for Korean encodings in encnames.c
---
src/common/encnames.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 9085dbecce1..959b991dde4 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -61,8 +61,9 @@ static const pg_encname pg_encname_tbl[] =
* Japanese, standard OSF */
{
"euckr", PG_EUC_KR
- }, /* EUC-KR; Extended Unix Code for Korean , KS
- * X 1001 standard */
+ }, /* EUC-KR; Extended Unix Code for Korean
+ * precomposed (Wansung) encoding, standard KS
+ * X 1001 */
{
"euctw", PG_EUC_TW
}, /* EUC-TW; Extended Unix Code for
@@ -119,8 +120,8 @@ static const pg_encname pg_encname_tbl[] =
}, /* ISO-8859-9; RFC1345,KXS2 */
{
"johab", PG_JOHAB
- }, /* JOHAB; Extended Unix Code for simplified
- * Chinese */
+ }, /* JOHAB; Korean combining (Johab) encoding,
+ * standard KS X 1001 annex 3 */
{
"koi8", PG_KOI8R
}, /* _dirty_ alias for KOI8-R (backward
@@ -186,7 +187,9 @@ static const pg_encname pg_encname_tbl[] =
}, /* alias for WIN1258 */
{
"uhc", PG_UHC
- }, /* UHC; Korean Windows CodePage 949 */
+ }, /* UHC; Unified Hangul Code, Microsoft Windows
+ * CodePage 949; superset of EUC-KR covering
+ * all 11,172 precomposed Hangul syllables */
{
"unicode", PG_UTF8
}, /* alias for UTF8 */
--
2.50.1 (Apple Git-155)
Attachments:
[text/plain] 0002-Fix-and-expand-comments-for-Korean-encodings.txt (1.7K, 3-0002-Fix-and-expand-comments-for-Korean-encodings.txt)
download | inline diff:
From c7a7335d2cf5a2881b25d9091fd020a2d62f7661 Mon Sep 17 00:00:00 2001
From: Henson Choi <assam258@gmail.com>
Date: Wed, 15 Apr 2026 14:52:35 +0900
Subject: [PATCH v1] Fix and expand comments for Korean encodings in encnames.c
---
src/common/encnames.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 9085dbecce1..959b991dde4 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -61,8 +61,9 @@ static const pg_encname pg_encname_tbl[] =
* Japanese, standard OSF */
{
"euckr", PG_EUC_KR
- }, /* EUC-KR; Extended Unix Code for Korean , KS
- * X 1001 standard */
+ }, /* EUC-KR; Extended Unix Code for Korean
+ * precomposed (Wansung) encoding, standard KS
+ * X 1001 */
{
"euctw", PG_EUC_TW
}, /* EUC-TW; Extended Unix Code for
@@ -119,8 +120,8 @@ static const pg_encname pg_encname_tbl[] =
}, /* ISO-8859-9; RFC1345,KXS2 */
{
"johab", PG_JOHAB
- }, /* JOHAB; Extended Unix Code for simplified
- * Chinese */
+ }, /* JOHAB; Korean combining (Johab) encoding,
+ * standard KS X 1001 annex 3 */
{
"koi8", PG_KOI8R
}, /* _dirty_ alias for KOI8-R (backward
@@ -186,7 +187,9 @@ static const pg_encname pg_encname_tbl[] =
}, /* alias for WIN1258 */
{
"uhc", PG_UHC
- }, /* UHC; Korean Windows CodePage 949 */
+ }, /* UHC; Unified Hangul Code, Microsoft Windows
+ * CodePage 949; superset of EUC-KR covering
+ * all 11,172 precomposed Hangul syllables */
{
"unicode", PG_UTF8
}, /* alias for UTF8 */
--
2.50.1 (Apple Git-155)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-bugs@postgresql.org
Cc: assam258@gmail.com, thomas.munro@gmail.com, hlinnaka@iki.fi, robertmhaas@gmail.com, tgl@sss.pgh.pa.us, jtvjtv@gmail.com, vasukianand0119@gmail.com, pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
In-Reply-To: <CAAAe_zAFz1v-3b7Je4L+=wZM3UGAczXV47YVZfZi9wbJxspxeA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox