Encoding（日本語訳）

1. 序

~UTF-8符号化法は、普遍的な有符号~文字~集合である~Unicodeの交換に最も適切な符号化法である。よって，この仕様は、新たな［ ~protocol, 形式］および［新たな文脈において配備される既存の形式］に対し，~UTF-8符号化法を要求する（また，それを定義する）。 ◎ The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore, for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the UTF-8 encoding.

符号化法には，他のもの（旧来の符号化法）もあり、過去にある程度までは定義されたが， ~UA間で常に同じ仕方で実装されているとは限らない。また、常に同じ~labelを利用するとは限らず，［符号化法の中の未定義な区画, あるいはかつての~proprietaryな区画への~~対処］も相違することが多い。この仕様は、［新たな実装が符号化法~実装を~reverse-engineerせずに済む］よう，および［既存の~UAが一つに収束できる］よう，これらの隔たりを埋めることに取組む。 ◎ The other (legacy) encodings have been defined to some extent in the past. However, user agents have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification addresses those gaps so that new user agents do not have to reverse engineer encoding implementations and existing user agents can converge.

特に，この仕様は、それらすべての符号化法を，各~符号化法の［ `~byte列$と`~scalar値$列を相互に変換する~algo, 正準的な名前, 識別-用の`~label$たち］とともに定義する。また、符号化法を成す各種~algoのうち一部を~JSに公開する~APIも定義する。 ◎ In particular, this specification defines all those encodings, their algorithms to go from bytes to scalar values and back, and their canonical names and identifying labels. This specification also defines an API to expose part of the encoding algorithms to JavaScript.

~UAは、すでに `IANA Character Sets＠~IANA-a/character-sets/character-sets.xhtml$en ~registryに挙げられている~labelからも有意に逸脱している。旧来の符号化法をこれ以上~拡散させないため、この仕様は前述の詳細~について網羅的であり， ~registryはもう不要である。特に，この仕様は、符号化法を拡張するための仕組みは供さない。 ◎ User agents have also significantly deviated from the labels listed in the IANA Character Sets registry. To stop spreading legacy encodings further, this specification is exhaustive about the aforementioned details and therefore has no need for the registry. In particular, this specification does not provide a mechanism for extending any aspect of encodings.

2. ~securityに関する背景0

符号化法には、いくつか，~securityの課題がある — 生産器と消費器の間で，［利用-中にある符号化法, あるいは所与の符号化法の実装-法］について合意されてないときに。例えば、 2011 年には，次のような攻撃が報告された：そこでは、［攻撃者が何らかの~fieldを制御し得るような，~JSON資源］内で， `Shift_JIS$n の頭部~byte `82^X が尾部~byte `22^X を “隠す” ために利用された。生産器からは，これが違法な~byte対であっても問題が見えない一方で、消費器は，この~byte対を 1 個の `FFFD^U1 として復号する~~結果、全体的な解釈が変わってしまう — `0022^U1 は重要な区切子なので。［ `~scalar値$に対し複数~byteを利用する符号化法］の復号器には、今や，違法な~byte対の事例では［範囲 `0000^U 〜 `007F^U に入る~scalar値］を “隠せない” ようにすることが要求される — 前述の~byte対に対しては、出力が［ `FFFD^U1 ］になるよう（あいにく，これには例外があり、 `~gb18030復号器$は，`~EoQ$にあるそのような~byte 1 個を “隠して” しまう）。 ◎ There is a set of encoding security issues when the producer and consumer do not agree on the encoding in use, or on the way a given encoding is to be implemented. For instance, an attack was reported in 2011 where a Shift_JIS leading byte 0x82 was used to “mask” a 0x22 trailing byte in a JSON resource of which an attacker could control some field. The producer did not see the problem even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD (�) and therefore changed the overall interpretation as U+0022 (") is an important delimiter. Decoders of encodings that use multiple bytes for scalar values now require that in case of an illegal byte combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. For the aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate exception to this, the gb18030 decoder will “mask” up to one such byte at end-of-queue.)

これは、より~~一般的には，［頭部~byteが伴われないときに，`~ASCII~byte$を`~ASCII~cp$でない何かへ対応付ける］ような符号化法における課題である。これらは， “~ASCII互換でない” 符号化法であり、あいにく，配備-済みな内容に因り要求されるが、［ `ISO-2022-JP$n, `UTF-16BE/LE$n ］以外のものは，~supportされない。（他のそのような符号化法についても、その~labelを［未知な符号化法へ~fallbackすることなく， `replacement$n 符号化法に対応付けれるかどうか］の究明が`進行中にある＠https://github.com/whatwg/encoding/issues/8$。）攻撃の例として、注意深く細工された内容を資源の中へ注入して，利用者に符号化法を上書きするよう促す~~結果、例えば，~scriptの実行へ至らすものがある。 ◎ This is a larger issue for encodings that map anything that is an ASCII byte to something that is not an ASCII code point, when there is no leading byte present. These are “ASCII-incompatible” encodings and other than ISO-2022-JP and UTF-16BE/LE, which are unfortunately required due to deployed content, they are not supported. (Investigation is ongoing whether more labels of other such encodings can be mapped to the replacement encoding, rather than the unknown encoding fallback.) An example attack is injecting carefully crafted content into a resource and then encouraging the user to override the encoding, resulting in, e.g., script execution.

［ ~HTMLや~HTMLの~form特能］において見出される~URLに利用される符号化器も、その符号化法により表現できない~scalar値がある場合には，若干の情報~喪失に至らせ得る。例えば，資源が `windows-1252$n 符号化法を利用しているとき、 ~serverは，末端利用者が~formに手入力した "💩" と "💩" とを判別できなくなる。 ◎ Encoders used by URLs found in HTML and HTML’s form feature can also result in slight information loss when an encoding is used that cannot represent all scalar values. E.g., when a resource uses the windows-1252 encoding a server will not be able to distinguish between an end user entering “💩” and “💩” into a form.

ここに要旨した問題は、 ~UTF-8を排他的に利用しているときは，消え去る。それが、今や，すべてに対し~UTF-8符号化法が義務付けられている理由の一つである。 ◎ The problems outlined here go away when exclusively using UTF-8, which is one of the many reasons that is now the mandatory encoding for all things.

注記： `§ ~browser~UI＠#browser-ui$ も見よ。 ◎ See also the Browser UI chapter.

3. 各種用語

この仕様は、 `Infra Standard^cite `INFRA$r に依存する。 ◎ This specification depends on the Infra Standard. [INFRA]

16 進数には "0x" が接頭される。 ◎ Hexadecimal numbers are prefixed with "0x".

算術式の中のすべての数値は整数であり、各種~演算は，次に挙げる記号で表現される：

記号	意味
~PLUS	加算
~MINUS	減算
~INCBY	左辺~値に対する右辺~値による加算【この訳による追加】
~DECBY	左辺~値に対する右辺~値による減算【この訳による追加】
~MUL	乗算
~DIV	整数の除算【小数切り捨て】
~MOD	整数の除算の剰余（ modulo ）【常に 0 以上（負な数には演算され得ない）】
~Lshift	論理-左~shift
~Rshift	論理-右~shift
~bAND	~bit AND
~bOR	~bit OR

◎ In equations, all numbers are integers, addition is represented by "+", subtraction by "−", multiplication by "×", integer division by "/" (returns the quotient), modulo by "%" (returns the remainder of an integer division), logical left shifts by "<<", logical right shifts by ">>", bitwise AND by "&", and bitwise OR by "|".

論理-右~shiftの演算対象の精度は、少なくとも 21 ~bit以上にするモノトスル。 ◎ For logical right shifts operands must have at least twenty-one bits precision.

`入出力~queue@ （ `I/O queue^en ）は、特定0の型（ `~byte$, `~scalar値$, `~cu$ など）の`~item$たちが成す`~list$である。 ~item型も含めて指定するときは、 “`入出力~queue$`~byte^tA” 等々と記される†。【†この表記規約は、この訳に特有。】 ◎ An I/O queue is a type of list with items of a particular type (i.e., bytes or scalar values).＼

`入出力~queue$は、特別な`~item$として `~EoQ@ （ `end-of-queue^en ）も含み得る — それは、当の~queue内には，それより後に`~item$は無いことを徴す。【`~EoQ$の型は、形式的に，それが属する`入出力~queue$の~item型と見なされる。】 ◎ End-of-queue is a special item that can be present in I/O queues of any type and it signifies that there are no more items in the queue.

注記： `入出力~queue$は、 2 つの仕方 — “~streamしている” ~mode，していない~mode — で利用され、順に［ ~networkから来ている~data, ~memory内に格納された入出力~data ］を表現する。 ~streamしていない~queue内には、最後の~itemとして【常に】`~EoQ$が在る。一方で，~streamしている`入出力~queue$は： ◎ There are two ways to use an I/O queue: in immediate mode, to represent I/O data stored in memory, and in streaming mode, to represent data coming in from the network. Immediate queues have end-of-queue as their last item, whereas streaming queues＼

`~EoQ$は無いこともあり、［ `入出力~queueから~itemを読取る$／ `入出力~queueから~item列を読取る$ ］演算は，【何らかの~itemが可用になるまで】阻まれるかもしれない。 ◎ need not have it, and so their read operation might block.
次の順に演算されるものと期待される ⇒＃空として作成される, ~networkから~dataが来るに伴い，新たな`~item$が`~push$ioQされる, 下層の~network~streamが~closeされるとき，`~EoQ$が`~push$ioQされる ◎ It is expected that streaming I/O queues will be created empty, and that new items will be pushed to it as data comes in from the network. When the underlying network stream closes, an end-of-queue item is to be pushed into the queue.
そこから読取るときは，阻まれるかもしれないので、 `~event~loop$からは利用されず，代わりに`並列的$に利用される。 ◎ Since reading from a streaming I/O queue might block, streaming I/O queues are not to be used from an event loop. They are to be used in parallel instead.

`入出力~queueから~itemを読取る@ ~algoは、所与の ( `入出力~queue$ %入出力~queue ) に対し： ◎ To read an item from an I/O queue ioQueue, run these steps:

~IF［ %入出力~queue は`空$である］ ⇒ 次が満たされるまで待機する ⇒ %入出力~queue の`~size$ ~GTE 1 ◎ If ioQueue is empty, then wait until its size is at least 1.
%結果 ~LET %入出力~queue[ 0 ] ◎ ↓
~IF［ %結果 ~NEQ `~EoQ$ ］ ⇒ %入出力~queue から最初の~itemを`除去する$ ◎ If ioQueue[0] is end-of-queue, then return end-of-queue. ◎ Remove ioQueue[0] and return it.
~RET %結果 ◎ ↑

`入出力~queueから~item列を読取る@ ~algoは、所与の ( `入出力~queue$ %入出力~queue, 無符号整数 %個数 ) に対し： ◎ To read a number number of items from ioQueue, run these steps:

~Assert：［ %個数は負でない整数である］~OR［ %個数 ~EQ `不定^i ］ ◎ ↑
%結果 ~LET « » ◎ Let readItems be « ».
~WHILE［ %結果の`~size$ ~NEQ %個数］：
1. %~item ~LET `入出力~queueから~itemを読取る$( %入出力~queue )
2. ~IF［ %~item ~EQ `~EoQ$ ］ ⇒ ~BREAK
3. %結果に %~item を`付加する$
◎ Perform the following step number times: • Append to readItems the result of reading an item from ioQueue. ◎ Remove end-of-queue from readItems.
~RET %結果 ◎ Return readItems.

【［ `入出力~queueから~itemを読取る$, `入出力~queueから~item列を読取る$ ］~algoは、原文では同じ［名前, ~ID ］を伴う 2 個の~algoとして定義されているが，この訳では異なる［名前, ~ID ］を与えることにする。】

`入出力~queueを覗見る@ ~algoは、所与の ( `入出力~queue$ %入出力~queue, 無符号整数 %個数 ) に対し： ◎ To peek a number number of items from an I/O queue ioQueue, run these steps:

次が満たされるまで待機する ⇒ ［ %入出力~queue の`~size$ ~GTE %個数］~OR［ `~EoQ$ ~IN %入出力~queue ］ ◎ Wait until either ioQueue’s size is equal to or greater than number, or ioQueue contains end-of-queue, whichever comes first.
%接頭辞 ~LET « » ◎ Let prefix be « ».
`範囲$ { 0 〜 %個数 ~MINUS 1 }【！range 1 to number】を成す ~EACH( %n ) に対し： ◎ For each n in the range 1 to number, inclusive:
1. ~IF［ %入出力~queue[ %n ] ~EQ `~EoQ$ ］ ⇒ ~BREAK ◎ If ioQueue[n] is end-of-queue, break.
2. %接頭辞に %入出力~queue[ %n ] を`付加する$ ◎ Otherwise, append ioQueue[n] to prefix.
~RET %接頭辞 ◎ Return prefix.

`入出力~queueに~pushする@ ~algoは、所与の ( `入出力~queue$ %入出力~queue, %~item列 ) に対し：

~Assert：［ %~item列は 1 個の~itemであるか［ ~itemたちが成す連列］である］~AND［ %~item列を成すどの~itemも，その型は %入出力~queue の~item型である］
%~item列を成す ~EACH( %~item ) に対し：
1. %最後の~index ~LET %入出力~queue の`~size$ ~MINUS 1
2. ~IF［ %入出力~queue[ %最後の~index ] ~EQ `~EoQ$ ］：
  1. ~IF［ %~item ~NEQ `~EoQ$ ］ ⇒ %入出力~queue の中へ %~item を %最後の~index の前に`挿入する$
3. ~ELSE ⇒ %入出力~queue に %~item を`付加する$

◎ To push an item item to an I/O queue ioQueue, run these steps: • If the last item in ioQueue is end-of-queue: •• If item is end-of-queue, do nothing. •• Otherwise, insert item before the last item in ioQueue. • Otherwise, append item to ioQueue. ◎ To push a sequence of items to an I/O queue ioQueue is to push each item in the sequence to ioQueue, in the given order.

`入出力~queueに格納し直す@ ~algoは、所与の ( `入出力~queue$ %入出力~queue, %~item列 ) に対し：

~Assert：
- %~item列は 1 個の~itemであるか［ ~itemたちが成す連列］である
- %~item列内に`~EoQ$は無い
- %~item列を成すどの~itemも，その型は %入出力~queue の~item型である
%入出力~queue の先頭に %~item列を — ~itemたちの順序を~~保ったまま — 挿入する

◎ To restore an item other than end-of-queue to an I/O queue, perform the list prepend operation. To restore a list of items excluding end-of-queue to an I/O queue, insert those items, in the given order, before the first item in the queue.

入出力~queue`~byte^tA « `92^X, `A9^X, ~EoQ » に ~byte列 « `F0^X, `9F^X » を挿入した【！Inserting】なら、結果の入出力~queueは « `F0^X, `9F^X, `92^X, `A9^X, ~EoQ » になり，次回に読取られる~itemは `F0^X になる。 ◎ Inserting the bytes « 0xF0, 0x9F » in an I/O queue « 0x92 0xA9, end-of-queue », results in an I/O queue « 0xF0, 0x9F, 0x92 0xA9, end-of-queue ». The next item to be read would be 0xF0.

`入出力~queue$ %入出力~queue を［ `~list$／`文字列$／`~byte列$ ］に `変換する@ ~algoは ⇒ ~RET `入出力~queueから~item列を読取る$( %入出力~queue, `不定^i ) ◎ To convert an I/O queue ioQueue into a list, string, or byte sequence, return the result of reading an indefinite number of items from ioQueue.

`入出力~queueに変換する@ ~algoは、所与の ( %入力 ) に対し：

~Assert： %入力は次に挙げるいずれかである ⇒＃ `文字列$（ `DOMString^I ）／ `~scalar値~文字列$（ `USVString^I ）／ `~byte列$／ `~list$
~IF［ %入力は`~list$である］ ⇒ ~Assert ⇒ ［ %入力を成すすべての~itemは同じ型である］~AND［ `~EoQ$ ~NIN %入力］
%入出力~queue ~LET 新たな`入出力~queue$`入力 を成す~itemの型^tA
%入力を成す ~EACH( %~item ) に対し ⇒ %入出力~queue に %~item を`付加する$
%入出力~queue に`~EoQ$を`付加する$
~RET %入出力~queue

◎ To convert a list, string, or byte sequence input into an I/O queue, run these steps: • Assert: input is not a list or it does not contain end-of-queue. • Return an I/O queue containing the items in input, in order, followed by end-of-queue.

`INFRA$r 標準が型~変換~周りの何らかの基盤を定義するものと期待される。 `whatwg/infra 課題 #319＠https://github.com/whatwg/infra/issues/319$ を見よ。 ◎ The Infra standard is expected to define some infrastructure around type conversions. See whatwg/infra issue #319. [INFRA]

注記： `入出力~queue$が`~queue$ではなく`~list$として定義されているのは、 `格納し直す演算$を要するからである。しかしながら，この演算は、この仕様が与える~algoの内部的な詳細であり，他の標準からは利用されない。実装は、そのような~algoを代替な仕方を見出して実装してもかまわない — 詳細は、 `§ 実装の考慮点＠#implementation-considerations$ に。 ◎ I/O queues are defined as lists, not queues, because they feature a restore operation. However, this restore operation is an internal detail of the algorithms in this specification, and is not to be used by other standards. Implementations are free to find alternative ways to implement such algorithms, as detailed in Implementation considerations.

`~surrogate対から~scalar値を得する@ ~algoは、所与の ( `頭部~surrogate$ %頭部, `尾部~surrogate$ %尾部 ) に対し ⇒ ~RET `10000^X ~PLUS ( ( %頭部 ~MINUS `D800^X ) ~Lshift 10 ) ~PLUS ( %尾部 ~MINUS `DC00^X ) ◎ To obtain a scalar value from surrogates, given a leading surrogate leading and a trailing surrogate trailing, return 0x10000 + ((leading − 0xD800) << 10) + (trailing − 0xDC00).

【この訳に特有な表記規約】

◎表記記号

加えて、次に挙げる記法も利用される：

~byte列 « %n1, %n2, … »: 数値として %n1, %n2 … と同じ値をとる`~byte$たちが成す, 挙げられた順による，新たな`~item$列を表す（括弧の中が空な “« »” と記されたときは、空な`~item$列を表す）。
~byte « %n »: ~byte列 « %n » と同義だが、 ~itemが 1 個だけの場合は，このように記される。
~cp « %n »: 数値として %n と同じ値をとる 1 個の`~cp$からなる，新たな`~item$列を表す。

（原文では，~itemが 1 個だけの場合は — 1 個の~itemをそれのみからなる~item列として透過的に扱う仕組み（`入出力~queue$に対する各種~演算を見よ）を利用して — 括弧（ «, » ）で括らずに記されているが、この訳では — “同じ値をとる” という含意を簡潔に表すため — 括弧で括ることにする。）（ “~cp列 « … »” という記法が無いのは、複数個の~cpからなる~item列が利用される所が，一箇所しかないからである。）

`Uint8Array ~objを作成する＠#create-a-uint8array-object@ ~algoは… ◎ To create a Uint8Array object, given an I/O queue ioQueue and a realm realm: • Let bytes be the result of converting ioQueue into a byte sequence. • Return the result of creating a Uint8Array object from bytes in realm.

【この訳では、この~algoを利用しない — 代わりに，この~algoが利用する`~buffer~sourceを作成する$~algo `WEBIDL$r を直に呼出す。】

4. 符号化法

`符号化法@ （ `encoding^en ）は、 `~scalar値$ 列から~byte列への対応付け【符号化-】および逆方向への対応付け【復号-】を定義する。各 `符号化法$には、 `名前@ および， 1 個~以上の `~label@ が`あてがわれている＠#encoding-labels$。 ◎ An encoding defines a mapping from a scalar value sequence to a byte sequence (and vice versa). Each encoding has a name, and one or more labels.

注記：この仕様は、 ~Unicode標準に定義される符号化~scheme（ `encoding scheme^en ）として，同じ名前を伴う 3 種の`符号化法$ — `UTF-8$n, `UTF-16LE$n, `UTF-16BE$n — を定義する。 `符号化法$は、 ~BOM（ `byte order mark^en, “バイト順マーク” ）の取扱いにおいて符号化~schemeから相違する — ~BOMの取扱いは、この仕様においては［ `符号化法$自身の一部を成す代わりに，それを包装する~algoの一部を成している］一方で， ~Unicode標準においては符号化~schemeの定義の一部を成す。 `~UTF-8復号する$~algoと一緒に利用される `UTF-8$n は、同じ名前の符号化~schemeに合致する。この仕様は、同様に命名される符号化~schemeに合致するような［ `UTF-16LE$n ／ `UTF-16BE$n ］と組合せて包装する~algoは，供さない。 `UNICODE$r ◎ This specification defines three encodings with the same names as encoding schemes defined in the Unicode standard: UTF-8, UTF-16LE, and UTF-16BE. The encodings differ from the encoding schemes by byte order mark (also known as BOM) handling not being part of the encodings themselves and instead being part of wrapper algorithms in this specification, whereas byte order mark handling is part of the definition of the encoding schemes in the Unicode Standard. UTF-8 used together with the UTF-8 decode algorithm matches the encoding scheme of the same name. This specification does not provide wrapper algorithms that would combine with UTF-16LE and UTF-16BE to match the similarly-named encoding schemes. [UNICODE]

4.1. 符号化器と復号器

各 `符号化法$には、 `復号器@ （ `decoder^en ）が結付けられ， `符号化器@ （ `encoder^en ）が結付けられ得る。 ◎ Each encoding has an associated decoder and most of them have an associated encoder.＼

［ `復号器$ ／ `符号化器$ ］の各~instanceは、 `~handler@ ~algoが結付けられることに加え，状態も伴い得る。【状態を伴うがゆえに、状態が異なるそれらを別個な~instanceとして扱う必要がある。】 ◎ Instances of decoders and encoders have a handler algorithm and might also have state.＼

`~handler$は、所与の ( `入出力~queue$, 1 個の`~item$ ) に対し，次に挙げるいずれかを返す~algoである：

`完遂d@i
1 個以上の`~item$

【 ~item型は、［符号化器の場合は`~byte$ ／復号器の場合は`~cp$ ］になる。】【 `~Big5復号器$用の~handlerだけ， 2 個の~cpを返す場合があり、他の復号器~用の~handlerは，常に 1 個の~cpからなる~item列を返す。】
`~error@i

`符号化器$用の`~handler$が返す `~error$i は、常に，~cpを伴う（`復号器$用の`~handler$が返す `~error$i が~cpを伴うことは無い）。所与の`~cp$ %~cp を伴う `~error$i を作成する所では、 “`~error$i( %~cp )” のように表記される。

【これらの記述は、この訳による補完 — 原文では、（~errorは，） “省略可能な`~cp$も伴い得る” としか記されていない。】
`継続-@i

◎ A handler algorithm takes an input I/O queue and an item, and returns finished, one or more items, error optionally with a code point, or continue.

注記：次に挙げる`符号化法$には、 `符号化器$は無い ⇒＃ `replacement$n, `UTF-16BE/LE$n ◎ The replacement and UTF-16BE/LE encodings have no encoder.

以下で利用される `~error~mode@ は： ◎ An error mode as used below is＼

`復号器$においては、次のいずれかをとる ⇒ `replacement^l, `fatal^l ◎ "replacement" or "fatal" for a decoder and＼
`符号化器$においては、次のいずれかをとる ⇒ `fatal^l, `html^l ◎ "fatal" or "html" for an encoder.

注記： ~XML処理器は、その`復号器$の`~error~mode$を `fatal^l に設定することになる。 `XML$r ◎ An XML processor would set error mode to "fatal". [XML]

注記： `~error~mode$に `html^l が存在するわけは、 ~HTML~formにおいては， `~error$i に際しても旧来の`符号化器$は終了させない取扱いが要求されることに因る。 `html^l `~error~mode$の下では、合法な入力と判別できない連列が発され得る結果，~~検知されずに~dataが失われ得る。これを防ぐため、開発者には `UTF-8$n `符号化法$の利用が強く奨励される。 `HTML$r ◎ "html" exists as error mode due to HTML forms requiring a non-terminating legacy encoder. The "html" error mode causes a sequence to be emitted that cannot be distinguished from legitimate input and can therefore lead to silent data loss. Developers are strongly encouraged to use the UTF-8 encoding to prevent this from happening. [HTML]

`~queueを処理する@ ~algoは、所与の ⇒＃ `符号化法$の［ `復号器$ ／ `符号化器$ ］の~instance %~coder `入出力~queue$ %入力, `入出力~queue$ %出力, `~error~mode$ %~mode ◎終に対し： ◎ To process a queue given an encoding’s decoder or encoder instance encoderDecoder, I/O queue input, I/O queue output, and error mode mode:

~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %入力 ) ◎ ↓
2. %結果 ~LET `~itemを処理する$( %~item, %~coder, %入力, %出力, %~mode ) ◎ Let result be the result of processing an item with the result of reading from input, encoderDecoder, input, output, and mode.
3. ~IF［ %結果 ~NEQ `継続-$i ］ ⇒ ~RET %結果 ◎ If result is not continue, then return result.

`~itemを処理する@ ~algoは、所与の ⇒＃ `~item$ %~item, `符号化法$の［ `符号化器$／`復号器$ ］の~instance %~coder, `入出力~queue$ %入力, `入出力~queue$ %出力, `~error~mode$ %~mode ◎終に対し： ◎ To process an item given an item item, encoding’s encoder or decoder instance encoderDecoder, I/O queue input, I/O queue output, and error mode mode:

~IF［ %~coder は`符号化器$の~instanceである］ ⇒ ~Assert ⇒ ［ %~mode ~NEQ `replacement^l ］~AND［ %~item は`~surrogate$ではない］ ◎ Assert: encoderDecoder is not an encoder instance or mode is not "replacement". ◎ Assert: encoderDecoder is not a decoder instance or mode is not "html". ◎ Assert: encoderDecoder is not an encoder instance or item is not a surrogate.
~IF［ %~coder は`復号器$の~instanceである］ ⇒ ~Assert ⇒ %~mode ~NEQ `html^l ◎ ↑
%結果 ~LET %~coder の`~handler$( %入力, %~item ) ◎ Let result be the result of running encoderDecoder’s handler on input and item.
~IF［ %結果 ~EQ `完遂d$i ］ ⇒＃ `入出力~queueに~pushする$( %出力, « `~EoQ$ » )； ~RET %結果 ◎ If result is finished: • Push end-of-queue to output. • Return result.
~IF［ %結果は 1 個~以上の`~item$からなる］： ◎ Otherwise, if result is one or more items:
1. ~IF［ %~coder は`復号器$の~instanceである］ ⇒ ~Assert ⇒ %結果は`~surrogate$を包含しない。 ◎ Assert: encoderDecoder is not a decoder instance or result does not contain any surrogates.
2. `入出力~queueに~pushする$( %出力, %結果 ) ◎ Push result to output.
~ELIF［ %結果は `~error$i である］： ◎ Otherwise, if result is an error,＼
1. %~mode に応じて： ◎ switch on mode and run the associated steps:
  - `replacement^l ⇒ `入出力~queueに~pushする$( %出力, « `FFFD^U1 » ) ◎ "replacement" • Push U+FFFD (�) to output.
  - `html^l：
    1. %数字列 ~LET ［ %結果を成す`~cp$ ］の`値$cpを基数 10 で最短に表現するような，［ 1 個以上の［ `30^X `0^smb 〜 `39^X `9^smb ］たちが成す~byte列
    2. `入出力~queueに~pushする$( %出力, 次の並びが成す~byte列 ) ⇒＃ `26^X `&^smb, `23^X `#^smb, %数字列, `3B^X `;^smb
    ◎ "html" • Push 0x26 (&), 0x23 (#), followed by the shortest sequence of 0x30 (0) to 0x39 (9), inclusive, representing result’s code point’s value in base ten, followed by 0x3B (;) to output.
  - `fatal^l ⇒ ~RET %結果 ◎ "fatal" • Return result.
~RET `継続-$i ◎ Return continue.

4.2. 名前と~label

~UAは、下の表tに挙げる各［ `符号化法$, それ用の`~label$すべて］を~supportするモノトスル — 他の`符号化法$や`~label$は~supportしないモノトスル。 ◎ The table below lists all encodings and their labels user agents must support. User agents must not support any other encodings or labels.

注記：どの符号化法についても、次が満たされる ⇒ その`名前$を`~ASCII小文字~化$した結果 ~IN それ用の`~label$たちが成す集合 ◎ For each encoding, ASCII-lowercasing its name yields one of its labels.

【加えて、異なる符号化法~用の~labelどうしが一致することはない。】

作者は、 `UTF-8$n `符号化法$を利用しなければナラナイ — その利用が識別されるよう，それ用の`~label$のうち `utf-8^lb （`~ASCII大小無視$）を利用しなければナラナイ。 ◎ Authors must use the UTF-8 encoding and must use its (ASCII case-insensitive) "utf-8" label to identify it.

［新たな~protocol／新たな形式／新たな文脈において配備される既存の形式］には、 `UTF-8$n `符号化法$を排他的に利用しなければナラナイ。そのような［ ~protocolや形式］が公開する`符号化法$の［ `名前$／`~label$ ］は、 `utf-8^lb でなければナラナイ。 ◎ New protocols and formats, as well as existing formats deployed in new contexts, must use the UTF-8 encoding exclusively. If these protocols and formats need to expose the encoding’s name or label, they must expose it as "utf-8".

`~labelから符号化法を取得する@ ~algoは、所与の ( 文字列 %~label ) に対し： ◎ To get an encoding from a string label, run these steps:

%~label ~SET `前後の~ASCII空白~列を剥ぐ$( %~label ) ◎ Remove any leading and trailing ASCII whitespace from label.
~IF［ %~label は［下の表tを成すいずれかの~label ］に`~ASCII大小無視$で合致する］ ⇒ ~RET 合致した~labelに対応する`符号化法$ ◎ If label is an ASCII case-insensitive match for any of the labels listed in the table below, then return the corresponding encoding;＼
~RET `失敗^i ◎ otherwise return failure.

注記：この［ ~labelを`符号化法$へ対応付ける~algo ］は、 `Unicode Technical Standard #22 § 1.4＠https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching$ によるものより基本的かつ制約的である — 配備-済みな内容と互換になることが必要yなので。 ◎ This is a more basic and restrictive algorithm of mapping labels to encodings than section 1.4 of Unicode Technical Standard #22 prescribes, as that is necessary to be compatible with deployed content.

名前 ◎ Name	~label ◎ Labels
`~~標準の符号化法＠#the-encoding$ ◎ The Encoding
`UTF-8$n	`unicode-1-1-utf-8^lb `unicode11utf8^lb `unicode20utf8^lb `utf-8^lb `utf8^lb `x-unicode20utf8^lb
`旧来の単-~byte符号化法＠#legacy-single-byte-encodings$ ◎ Legacy single-byte encodings
`IBM866$n	`866^lb `cp866^lb `csibm866^lb `ibm866^lb
`ISO-8859-2$n	`csisolatin2^lb `iso-8859-2^lb `iso-ir-101^lb `iso8859-2^lb `iso88592^lb `iso_8859-2^lb `iso_8859-2:1987^lb `l2^lb `latin2^lb
`ISO-8859-3$n	`csisolatin3^lb `iso-8859-3^lb `iso-ir-109^lb `iso8859-3^lb `iso88593^lb `iso_8859-3^lb `iso_8859-3:1988^lb `l3^lb `latin3^lb
`ISO-8859-4$n	`csisolatin4^lb `iso-8859-4^lb `iso-ir-110^lb `iso8859-4^lb `iso88594^lb `iso_8859-4^lb `iso_8859-4:1988^lb `l4^lb `latin4^lb
`ISO-8859-5$n	`csisolatincyrillic^lb `cyrillic^lb `iso-8859-5^lb `iso-ir-144^lb `iso8859-5^lb `iso88595^lb `iso_8859-5^lb `iso_8859-5:1988^lb
`ISO-8859-6$n	`arabic^lb `asmo-708^lb `csiso88596e^lb `csiso88596i^lb `csisolatinarabic^lb `ecma-114^lb `iso-8859-6^lb `iso-8859-6-e^lb `iso-8859-6-i^lb `iso-ir-127^lb `iso8859-6^lb `iso88596^lb `iso_8859-6^lb `iso_8859-6:1987^lb
`ISO-8859-7$n	`csisolatingreek^lb `ecma-118^lb `elot_928^lb `greek^lb `greek8^lb `iso-8859-7^lb `iso-ir-126^lb `iso8859-7^lb `iso88597^lb `iso_8859-7^lb `iso_8859-7:1987^lb `sun_eu_greek^lb
`ISO-8859-8$n	`csiso88598e^lb `csisolatinhebrew^lb `hebrew^lb `iso-8859-8^lb `iso-8859-8-e^lb `iso-ir-138^lb `iso8859-8^lb `iso88598^lb `iso_8859-8^lb `iso_8859-8:1988^lb `visual^lb
`ISO-8859-8-I$n	`csiso88598i^lb `iso-8859-8-i^lb `logical^lb
`ISO-8859-10$n	`csisolatin6^lb `iso-8859-10^lb `iso-ir-157^lb `iso8859-10^lb `iso885910^lb `l6^lb `latin6^lb
`ISO-8859-13$n	`iso-8859-13^lb `iso8859-13^lb `iso885913^lb
`ISO-8859-14$n	`iso-8859-14^lb `iso8859-14^lb `iso885914^lb
`ISO-8859-15$n	`csisolatin9^lb `iso-8859-15^lb `iso8859-15^lb `iso885915^lb `iso_8859-15^lb `l9^lb
`ISO-8859-16$n	`iso-8859-16^lb
`KOI8-R$n	`cskoi8r^lb `koi^lb `koi8^lb `koi8-r^lb `koi8_r^lb
`KOI8-U$n	`koi8-ru^lb `koi8-u^lb
`macintosh$n	`csmacintosh^lb `mac^lb `macintosh^lb `x-mac-roman^lb
`windows-874$n	`dos-874^lb `iso-8859-11^lb `iso8859-11^lb `iso885911^lb `tis-620^lb `windows-874^lb
`windows-1250$n	`cp1250^lb `windows-1250^lb `x-cp1250^lb
`windows-1251$n	`cp1251^lb `windows-1251^lb `x-cp1251^lb
`windows-1252$n 歴史的な “~Latin1”, “~ASCII” の概念の関係性は、 `下の注記＠#note-latin1-ascii$を見よ。 ◎ See below for the relationship to historical "Latin1" and "ASCII" concepts.	`ansi_x3.4-1968^lb `ascii^lb `cp1252^lb `cp819^lb `csisolatin1^lb `ibm819^lb `iso-8859-1^lb `iso-ir-100^lb `iso8859-1^lb `iso88591^lb `iso_8859-1^lb `iso_8859-1:1987^lb `l1^lb `latin1^lb `us-ascii^lb `windows-1252^lb `x-cp1252^lb
`windows-1253$n	`cp1253^lb `windows-1253^lb `x-cp1253^lb
`windows-1254$n	`cp1254^lb `csisolatin5^lb `iso-8859-9^lb `iso-ir-148^lb `iso8859-9^lb `iso88599^lb `iso_8859-9^lb `iso_8859-9:1989^lb `l5^lb `latin5^lb `windows-1254^lb `x-cp1254^lb
`windows-1255$n	`cp1255^lb `windows-1255^lb `x-cp1255^lb
`windows-1256$n	`cp1256^lb `windows-1256^lb `x-cp1256^lb
`windows-1257$n	`cp1257^lb `windows-1257^lb `x-cp1257^lb
`windows-1258$n	`cp1258^lb `windows-1258^lb `x-cp1258^lb
`x-mac-cyrillic$n	`x-mac-cyrillic^lb `x-mac-ukrainian^lb
`旧来の複-~byte~Chinese（簡体字）符号化法＠#legacy-multi-byte-chinese-(simplified)-encodings$ ◎ Legacy multi-byte Chinese (simplified) encodings
`GBK$n	`chinese^lb `csgb2312^lb `csiso58gb231280^lb `gb2312^lb `gb_2312^lb `gb_2312-80^lb `gbk^lb `iso-ir-58^lb `x-gbk^lb
`gb18030$n	`gb18030^lb
`旧来の複-~byte~Chinese（繁体字）符号化法＠#legacy-multi-byte-chinese-(traditional)-encodings$ ◎ Legacy multi-byte Chinese (traditional) encodings
`Big5$n	`big5^lb `big5-hkscs^lb `cn-big5^lb `csbig5^lb `x-x-big5^lb
`旧来の複-~byte~Japanese符号化法＠#legacy-multi-byte-japanese-encodings$ ◎ Legacy multi-byte Japanese encodings
`EUC-JP$n	`cseucpkdfmtjapanese^lb `euc-jp^lb `x-euc-jp^lb
`ISO-2022-JP$n	`csiso2022jp^lb `iso-2022-jp^lb
`Shift_JIS$n	`csshiftjis^lb `ms932^lb `ms_kanji^lb `shift-jis^lb `shift_jis^lb `sjis^lb `windows-31j^lb `x-sjis^lb
`旧来の複-~byte~Korean符号化法＠#legacy-multi-byte-korean-encodings$ ◎ Legacy multi-byte Korean encodings
`EUC-KR$n	`cseuckr^lb `csksc56011987^lb `euc-kr^lb `iso-ir-149^lb `korean^lb `ks_c_5601-1987^lb `ks_c_5601-1989^lb `ksc5601^lb `ksc_5601^lb `windows-949^lb
`旧来の諸々の符号化法＠#legacy-miscellaneous-encodings$ ◎ Legacy miscellaneous encodings
`replacement$n	`csiso2022kr^lb `hz-gb-2312^lb `iso-2022-cn^lb `iso-2022-cn-ext^lb `iso-2022-kr^lb `replacement^lb
`UTF-16BE$n	`unicodefffe^lb `utf-16be^lb
`UTF-16LE$n	`csunicode^lb `iso-10646-ucs-2^lb `ucs-2^lb `unicode^lb `unicodefeff^lb `utf-16^lb `utf-16le^lb
`x-user-defined$n	`x-user-defined^lb

注記：すべての`符号化法$とそれら用の`~label$は、規範的でない資源 `indexes.json$ としても可用である。 ◎ All encodings and their labels are also available as non-normative encodings.json resource.

注記： ~supportされる`符号化法$たちが成す集合は、首に［この標準の開発を開始した時点で，主要な各~browser~engineが~supportしていた集合］たちの交差集合に基づくが，符号化法のうち［稀にしか正当に利用されていない］かつ［攻撃にも利用され得る］ものは除去してある。一部の符号化法については、既存の~Web内容が利用している~~確たる証拠はなく，それを含めることには疑問がある。すなわち、それらは，各~browserから広く~supportされていたが、 ~Web内容から広く利用されているかどうかは不明瞭である。しかしながら、 `単-~byte符号化法$のうち［各~browserが広く~supportしていたもの／ ISO 8859 ~~族の一部を成すもの］を意欲的に除去する労は，為されていない。特に，次に挙げるものを含める必要性は、既存の内容を~supportする目的においては疑わしいが，除去する計画は無い ⇒＃ `IBM866$n, `macintosh$n, `x-mac-cyrillic$n, `ISO-8859-3$n, `ISO-8859-10$n, `ISO-8859-14$n, `ISO-8859-16$n ◎ The set of supported encodings is primarily based on the intersection of the sets supported by major browser engines when the development of this standard started, while removing encodings that were rarely used legitimately but that could be used in attacks. The inclusion of some encodings is questionable in the light of anecdotal evidence of the level of use by existing Web content. That is, while they have been broadly supported by browsers, it is unclear if they are broadly used by Web content. However, an effort has not been made to eagerly remove single-byte encodings that were broadly supported by browsers or are part of the ISO 8859 series. In particular, the necessity of the inclusion of IBM866, macintosh, x-mac-cyrillic, ISO-8859-3, ISO-8859-10, ISO-8859-14, and ISO-8859-16 is doubtful for the purpose of supporting existing content, but there are no plans to remove these.

注記： `windows-1252$n `符号化法$には、様々な`~label$ — `latin1^l, `iso-8859-1^l, `ascii^l など — がある。それは、歴史的に，開発者を惑わしていた。 ~web上では, および［ ~webに互換になるよう追求する~software ］においては、この標準を実装することにより，これらは同義語になる： `latin1^l も `ascii^l も， `windows-1252$n 用の~labelでしかない — この標準に従う~softwareは、例えば `80^X を［ “~Latin1” ／ “~ASCII” ］用に復号するよう依頼されたときには， `20AC^U1 として復号することになる。 ◎ The windows-1252 encoding has various labels, such as "latin1", "iso-8859-1", and "ascii", which have historically been confusing for developers. On the web, and in any software that seeks to be web-compatible by implementing this standard, these are synonyms: "latin1" and "ascii" are just labels for windows-1252, and any software following this standard will, for example, decode 0x80 as U+20AC (€) when asked for the "Latin1" or "ASCII" decoding of that byte.

この標準を従わない~softwareは、常に同じ回答を与えるとは限らない。その根源は、 ~Latin1 を指定した元の文書 `ISO8859-1$r が，範囲 { `00^X 〜 `1F^X } や { `7F^X 〜 `9F^X } 内の~byte用には対応付けを供さなかったことにある。類似に、 ~ASCIIを指定した元の文書（とりわけ `ISO646$r ）は，範囲 { `80^X 〜 `FF^X } 内の~byte用には対応付けを供さなかった。このことは、それらの~byte用に［ ~Latin1／~ASCII ］符号化法を利用するよう依頼されたとき選ばれる~cp対応付けが， ~softwareに応じて異なることを意味する。［ ~web~browser／~browserに互換な~software ］は、それらの~byteを `windows-1252$n — ［ ~Latin1, ~ASCII ］の上位集合であり，この標準~内に成文化された符号化法 — に則って対応付けることを選んだ。他の~softwareは、 ~errorを投出するか， `同型な復号-法＠~INFRA#isomorphic-decode$その他の対応付けを利用する。 ◎ Software that does not follow this standard does not always give the same answers. The root of this is that the original document that specified Latin1 (ISO/IEC 8859-1) did not provide any mappings for bytes in the inclusive ranges 0x00 to 0x1F or 0x7F to 0x9F. Similarly, the original documents that specified ASCII (ISO/IEC 646, among others) did not provide any mappings for bytes in the inclusive range 0x80 to 0xFF. This means different software has chosen different code point mappings for those bytes when asked to use Latin1 or ASCII encodings. Web browsers and browser-compatible software have chosen to map those bytes according to windows-1252, which is a superset of both, and this choice was codified in this standard. Other software throws errors, or uses isomorphic decoding, or other mappings. [ISO8859-1] [ISO646]

そのようなわけで，［実装者, 開発者］は、 “~Latin1” や “~ASCII” の用語で~APIを公開する~libraryを利用しているときは，気を付ける必要がある。そのような~libraryが，この標準に合わない回答を与えることは~~十分あり得る — 元の仕様において未定義なままにされた~byte用に他の挙動を選んだ場合には。 ◎ As such, implementers and developers need to be careful whenever they are using libraries which expose APIs in terms of "Latin1" or "ASCII". It’s very possible such libraries will not give answers in line with this standard, if they have chosen other behaviors for the bytes which were left undefined in the original specifications.

4.3. 出力~符号化法

`符号化法から出力~符号化法を取得する@ ~algoは、所与の ( `符号化法$ %符号化法 ) に対し： ◎ To get an output encoding from an encoding encoding, run these steps:

~IF［ %符号化法 ~IN { `replacement$n, `UTF-16BE$n, `UTF-16LE$n【！`UTF-16BE/LE$n】 } ］ ⇒ ~RET `UTF-8$n ◎ If encoding is replacement or UTF-16BE/LE, then return UTF-8.
~RET %符号化法 ◎ Return encoding.

注記：この~algoは、それを必要とする［ ~URL構文解析／ ~HTML~form提出］にて有用になる。 ◎ The get an output encoding algorithm is useful for URL parsing and HTML form submission, which both need exactly this.

5. 索引

ほとんどの旧来の`符号化法$では、【当の符号化法に特有な】 `索引@ が利用される。 `索引$とは、 ~entryたちが成す有順序~listであり，それを成す各~entryは［ ~pointer, それに対応する~cp ］からなる。 `索引$の中では、 ~pointerは一意であり，~cpは重複し得る。 ◎ Most legacy encodings make use of an index. An index is an ordered list of entries, each entry consisting of a pointer and a corresponding code point. Within an index pointers are unique and code points can be duplicated.

注記：効率的な実装は、各`符号化法$に対し， 2 つの`索引$ — その`復号器$に最適化されたそれ, その`符号化器$に最適化されたそれ — を備えることになろう。 ◎ An efficient implementation likely has two indexes per encoding. One optimized for its decoder and one for its encoder.

`索引$ 【の~dataを供する資源（以下を見よ）】から，~pointerとそれに対応する~cpを見出すためには：

%行l~list は，その資源の内容を `000A^U `LF^cn で一連の “行l” に分割した結果とする。
%行l~list から［空~行l ／ `0023^U1 から開始する行l ］をすべて除去する。
%行l~list の各~行lに対し，行lを `0009^U `TAB^cn で分割した結果を成す：
- 1 個目の~itemが~pointer（ 10 進表記）を与える。
- 2 個目の~itemが対応する~cp（ 16 進表記）を与える。
- 他の~itemは関連しない。

◎ To find the pointers and their corresponding code points in an index, let lines be the result of splitting the resource’s contents on U+000A LF. Then remove each item in lines that is the empty string or starts with U+0023 (#). Then the pointers and their corresponding code points are found by splitting each item in lines on U+0009 TAB. The first subitem is the pointer (as a decimal number) and the second is the corresponding code point (as a hexadecimal number). Other subitems are not relevant.

注記：各`索引$の冒頭には、変更の有無を記すため， `Identifier^i と `Date^i 【識別子と日付】が記されている。 `Identifier^i の変化は、 `索引$に変更が加えられたことを表す。 ◎ To signify changes an index includes an Identifier and a Date. If an Identifier has changed, so has the index.

%索引の中で %~pointer が指す `索引~cp@ とは、 %索引内に %~pointer は［在るならば，それに対応する~cp ／無いならば ~NULL ］である。 ◎ The index code point for pointer in index is the code point corresponding to pointer in index, or null if pointer is not in index.

%索引の中で %~cp を指す `索引~pointer@ とは、 %索引内に %~cp に対応する~pointerは［在るならば，それらのうち`最初の^em ~pointer ／無いならば ~NULL ］である。 ◎ The index pointer for codePoint in index is the first pointer corresponding to codePoint in index, or null if codePoint is not in index.

注記：各索引には，規範的でない視覚-化があり、 `索引~jis0208$には， `Shift_JIS$n 視覚-化も別にある。加えて，基本多言語面（ BMP（ `Basic Multilingual Plane^en ）, `0000^U 〜 `FFFF^U ）における被覆域の視覚-化もある。ただし、［ `索引~gb18030範囲~群$ ／ `索引~ISO-2022-JP~katakana$ ］には，これらの視覚-化はない。 ◎ There is a non-normative visualization for each index other than index gb18030 ranges and index ISO-2022-JP katakana. index jis0208 also has an alternative Shift_JIS visualization. Additionally, there is visualization of the Basic Multilingual Plane coverage of each index other than index gb18030 ranges and index ISO-2022-JP katakana.

視覚-化における凡例 ◎ The legend for the visualizations is:
表示	~~説明
	対応する~cpは無い。 ◎ Unmapped
	~UTF-8で 2 ~byte。 ◎ Two bytes in UTF-8
	~UTF-8で 2 ~byte, かつ ~cpは、前の~pointerの~cpの直後に続く。 ◎ Two bytes in UTF-8, code point follows immediately the code point of previous pointer
	~UTF-8で 3 ~byte（私用領域でない） ◎ Three bytes in UTF-8 (non-PUA)
	~UTF-8で 3 ~byte（私用領域でない）, かつ ~cpは、前の~pointerの~cpの直後に続く。 ◎ Three bytes in UTF-8 (non-PUA), code point follows immediately the code point of previous pointer
	私用領域 ◎ Private Use
	私用領域, かつ ~cpは、前の~pointerの~cpの直後に続く。 ◎ Private Use, code point follows immediately the code point of previous pointer
	~UTF-8で 4 ~byte ◎ Four bytes in UTF-8
	~UTF-8で 4 ~byte, かつ ~cpは、前の~pointerの~cpの直後に続く。 ◎ Four bytes in UTF-8, code point follows immediately the code point of previous pointer
	先に現れているものと重複する~cpに対応する。 ◎ Duplicate code point already mapped at an earlier index
	~CJK互換漢字（ `CJK Compatibility Ideograph^en ） ◎ CJK Compatibility Ideograph
	~CJK統合漢字拡張 A ◎ CJK Unified Ideographs Extension A

この仕様が定義する`索引$のうち，`単-~byte索引$でないものには、それぞれに自前の~tableがあり，以下に与えられる：【視覚-化／被覆域の~tableは巨大なことに注意】 ◎ These are the indexes defined by this specification, excluding index single-byte, which have their own table:

`名前$	`索引$	視覚-化	基本多言語面（ BMP ）の被覆域	備考
`索引~Big5@	`Big5$idx	これは、香港増補字符集（ `Hong Kong Supplementary Character Set^en ）, および他の共通な拡張と一式で、 ~Big5標準に合致する。 ◎ This matches the Big5 standard in combination with the Hong Kong Supplementary Character Set and other common extensions.
`索引~EUC-KR@	`EUC-KR$idx	これは、 KS X 1001 標準と~~統合~Hangul~code（ `Unified Hangul Code^en ）に合致する。 Windows Codepage 949 としても共通的に知られている。これ全体で、 ~Unicodeの~Hangul音節文字（ `Hangul Syllables^en ）~blockを覆う。 ~Hangul~blockのうち，視覚-化における左上隅が~pointer 9026 にあるもの【？】は、 ~Unicode順に並ぶ。 `Taken separately^en 【？】, この索引における残りの~Hangul音節文字も、 ~Unicode順に並ぶ。 ◎ This matches the KS X 1001 standard and the Unified Hangul Code, more commonly known together as Windows Codepage 949. It covers the Hangul Syllables block of Unicode in its entirety. The Hangul block whose top left corner in the visualization is at pointer 9026 is in the Unicode order. Taken separately, the rest of the Hangul syllables in this index are in the Unicode order, too.
`索引~gb18030@	`gb18030$idx	これは、 2 ~byteに符号化される~cp用の GB18030-2022 標準に合致する — ただし，配備-済みな内容と互換になるよう、 `A3^X `A0^X は `3000^U へ対応付けられる。この索引~全体で、 ~Unicodeの~CJK統合漢字（ `CJK Unified Ideographs^en ）~blockを覆う。その~block内の~entryのうち，視覚-化における（最初の） `3000^U `IDEOGRAPHIC SPACE^cn より上または左にあるものは、 ~Unicode順に並ぶ。 ◎ This matches the GB18030-2022 standard for code points encoded as two bytes, except for 0xA3 0xA0 which maps to U+3000 IDEOGRAPHIC SPACE to be compatible with deployed content. This index covers the CJK Unified Ideographs block of Unicode in its entirety. Entries from that block that are above or to the left of (the first) U+3000 in the visualization are in the Unicode order.
`索引~gb18030範囲~群@	`gb18030-ranges$idx	この`索引$は、すべての~cpを挙げていくと項目数が 100 万を超えてしまう点で，他のすべてと異なる（ 207 面の範囲と自明な範囲検査により，きれいに表現できるが）。したがって、 4 ~byteに符号化される~cp用に限り，表面的には GB18030-2000 標準に合致する。その改訂 GB18030-2005 用の変更は、この索引が付随する［ `索引~gb18030範囲~群~cp$, `索引~gb18030範囲~群~pointer$ ］用の~algoにより，~inlineに取扱われる。その改訂 GB18030-2022 用の変更に関する取扱いも、私用領域に属する~cpへ対応付けられる~byte列の個数をこれ以上~増やさないようにするため，他と異なる — 関連な［私用領域に属する各~cp ］は、それまでの対応付けとの互換性を保全するよう， `~gb18030符号化器$においてある表tを通して直に対応付けられる。 ◎ This index works different from all others. Listing all code points would result in over a million items whereas they can be represented neatly in 207 ranges combined with trivial limit checks. It therefore only superficially matches the GB18030-2000 standard for code points encoded as four bytes. The change for the GB18030-2005 revision is handled inline by the index gb18030 ranges code point and index gb18030 ranges pointer algorithms below that accompany this index. And the changes for the GB18030-2022 revision are handled differently again to not further increase the number of byte sequences mapping to Private Use code points. The relevant Private Use code points are mapped in the gb18030 encoder directly through a side table to preserve compatibility with how they were mapped before.
`索引~jis0208@	`jis0208$idx	IBM と NEC によるかつての~proprietary拡張も含まれている， JIS X 0208 標準。 ◎ This is the JIS X 0208 standard including formerly proprietary extensions from IBM and NEC.
`索引~jis0212@	`jis0212$idx	JIS X 0212 標準。これを利用するのは、 `~EUC-JP復号器$に限られる（符号化器からは利用されない） — 広く~supportされていないので。 ◎ This is the JIS X 0212 standard. It is only used by the EUC-JP decoder due to lack of widespread support elsewhere.
`索引~ISO-2022-JP~katakana@	`iso-2022-jp-katakana$idx	これは、 ~Unicode正規化~形（ `Normalization Form^en ） KC に従って，半角~katakanaを全角~katakanaへ対応付ける。ただし： `FF9E^U1 は、 `309B^U1 へ対応付ける — `3099^U1 ではなく【いずれも，濁点】 `FF9F^U1 は、 `309C^U1 へ対応付ける — `309A^U1 ではなく【いずれも，半濁点】これを利用するものは、 `~ISO-2022-JP符号化器$に限られる。 `UNICODE$r ◎ This maps halfwidth to fullwidth katakana as per Unicode Normalization Form KC, except that U+FF9E (ﾞ) and U+FF9F (ﾟ) map to U+309B (゛) and U+309C (゜) rather than U+3099 (◌゙) and U+309A (◌゚). It is only used by the ISO-2022-JP encoder. [UNICODE]

`名前$

`索引$

視覚-化

基本多言語面（ BMP ）の被覆域

備考

`索引~Big5@

`Big5$idx

これは、香港増補字符集（ `Hong Kong Supplementary Character Set^en ）, および他の共通な拡張と一式で、 ~Big5標準に合致する。 ◎ This matches the Big5 standard in combination with the Hong Kong Supplementary Character Set and other common extensions.

`索引~EUC-KR@

`EUC-KR$idx

これは、 KS X 1001 標準と~~統合~Hangul~code（ `Unified Hangul Code^en ）に合致する。 Windows Codepage 949 としても共通的に知られている。これ全体で、 ~Unicodeの~Hangul音節文字（ `Hangul Syllables^en ）~blockを覆う。 ~Hangul~blockのうち，視覚-化における左上隅が~pointer 9026 にあるもの【？】は、 ~Unicode順に並ぶ。 `Taken separately^en 【？】, この索引における残りの~Hangul音節文字も、 ~Unicode順に並ぶ。 ◎ This matches the KS X 1001 standard and the Unified Hangul Code, more commonly known together as Windows Codepage 949. It covers the Hangul Syllables block of Unicode in its entirety. The Hangul block whose top left corner in the visualization is at pointer 9026 is in the Unicode order. Taken separately, the rest of the Hangul syllables in this index are in the Unicode order, too.

`索引~gb18030@

`gb18030$idx

これは、 2 ~byteに符号化される~cp用の GB18030-2022 標準に合致する — ただし，配備-済みな内容と互換になるよう、 `A3^X `A0^X は `3000^U へ対応付けられる。この索引~全体で、 ~Unicodeの~CJK統合漢字（ `CJK Unified Ideographs^en ）~blockを覆う。その~block内の~entryのうち，視覚-化における（最初の） `3000^U `IDEOGRAPHIC SPACE^cn より上または左にあるものは、 ~Unicode順に並ぶ。 ◎ This matches the GB18030-2022 standard for code points encoded as two bytes, except for 0xA3 0xA0 which maps to U+3000 IDEOGRAPHIC SPACE to be compatible with deployed content. This index covers the CJK Unified Ideographs block of Unicode in its entirety. Entries from that block that are above or to the left of (the first) U+3000 in the visualization are in the Unicode order.

`索引~gb18030範囲~群@

`gb18030-ranges$idx

この`索引$は、すべての~cpを挙げていくと項目数が 100 万を超えてしまう点で，他のすべてと異なる（ 207 面の範囲と自明な範囲検査により，きれいに表現できるが）。したがって、 4 ~byteに符号化される~cp用に限り，表面的には GB18030-2000 標準に合致する。その改訂 GB18030-2005 用の変更は、この索引が付随する［ `索引~gb18030範囲~群~cp$, `索引~gb18030範囲~群~pointer$ ］用の~algoにより，~inlineに取扱われる。その改訂 GB18030-2022 用の変更に関する取扱いも、私用領域に属する~cpへ対応付けられる~byte列の個数をこれ以上~増やさないようにするため，他と異なる — 関連な［私用領域に属する各~cp ］は、それまでの対応付けとの互換性を保全するよう， `~gb18030符号化器$においてある表tを通して直に対応付けられる。 ◎ This index works different from all others. Listing all code points would result in over a million items whereas they can be represented neatly in 207 ranges combined with trivial limit checks. It therefore only superficially matches the GB18030-2000 standard for code points encoded as four bytes. The change for the GB18030-2005 revision is handled inline by the index gb18030 ranges code point and index gb18030 ranges pointer algorithms below that accompany this index. And the changes for the GB18030-2022 revision are handled differently again to not further increase the number of byte sequences mapping to Private Use code points. The relevant Private Use code points are mapped in the gb18030 encoder directly through a side table to preserve compatibility with how they were mapped before.

`索引~jis0208@

`jis0208$idx

IBM と NEC によるかつての~proprietary拡張も含まれている， JIS X 0208 標準。 ◎ This is the JIS X 0208 standard including formerly proprietary extensions from IBM and NEC.

`索引~jis0212@

`jis0212$idx

JIS X 0212 標準。これを利用するのは、 `~EUC-JP復号器$に限られる（符号化器からは利用されない） — 広く~supportされていないので。 ◎ This is the JIS X 0212 standard. It is only used by the EUC-JP decoder due to lack of widespread support elsewhere.

`索引~ISO-2022-JP~katakana@

`iso-2022-jp-katakana$idx

これは、 ~Unicode正規化~形（ `Normalization Form^en ） KC に従って，半角~katakanaを全角~katakanaへ対応付ける。ただし：

`FF9E^U1 は、 `309B^U1 へ対応付ける — `3099^U1 ではなく【いずれも，濁点】
`FF9F^U1 は、 `309C^U1 へ対応付ける — `309A^U1 ではなく【いずれも，半濁点】

これを利用するものは、 `~ISO-2022-JP符号化器$に限られる。 `UNICODE$r

◎ This maps halfwidth to fullwidth katakana as per Unicode Normalization Form KC, except that U+FF9E (ﾞ) and U+FF9F (ﾟ) map to U+309B (゛) and U+309C (゜) rather than U+3099 (◌゙) and U+309A (◌゚). It is only used by the ISO-2022-JP encoder. [UNICODE]

%~pointer が指す `索引~gb18030範囲~群~cp@ は、次の手続きが返す~cpである： ◎ The index gb18030 ranges code point for pointer is the return value of these steps:

~IF［ 39419 ~LT %~pointer ~LT 189000 ］~OR［ 1237575 ~LT %~pointer ］ ⇒ ~RET ~NULL ◎ If pointer is greater than 39419 and less than 189000, or pointer is greater than 1237575, then return null.
~IF［ %~pointer ~EQ 7457 ］ ⇒ ~RET ~cp `E7C7^U ◎ If pointer is 7457, then return code point U+E7C7.
%~offset ~LET `索引~gb18030範囲~群$の中で %~pointer を超えない最後の~pointer ◎ Let offset be the last pointer in index gb18030 ranges that is less than or equal to pointer and let codePointOffset be its corresponding code point.
%~cp~offset ~LET %~offset が指している~cp ◎ ↑
~RET 次を値にとる~cp ⇒ %~cp~offset ~PLUS %~pointer ~MINUS %~offset ◎ Return a code point whose value is codePointOffset + pointer − offset.

%~cp を指す `索引~gb18030範囲~群~pointer@ は、次の手続きが返す~pointerである： ◎ The index gb18030 ranges pointer for codePoint is the return value of these steps:

~IF［ %~cp ~EQ `E7C7^U ］ ⇒ ~RET ~pointer 7457 ◎ If codePoint is U+E7C7, then return pointer 7457.
%~offset ~LET `索引~gb18030範囲~群$の中で %~cp を超えない最後の~cp ◎ Let offset be the last code point in index gb18030 ranges that is less than or equal to codePoint and let pointerOffset be its corresponding pointer.
%~pointer~offset ~LET %~offset に対応する~pointer ◎ ↑
~RET 次を値にとる~cp ⇒ %~pointer~offset ~PLUS %~cp ~MINUS %~offset ◎ Return a pointer whose value is pointerOffset + codePoint − offset.

%~cp を指す `索引~Shift_JIS~pointer@ は、次の手続きが返す~pointerである： ◎ The index Shift_JIS pointer for codePoint is the return value of these steps:

%索引 ~LET `索引~jis0208$ から［ ~pointerが範囲 { 8272 〜 8835 } に入る~entry ］すべてを除外した索引 ◎ Let index be index jis0208 excluding all entries whose pointer is in the range 8272 to 8835, inclusive.

注記： `索引~jis0208$は、重複する~cpを包含するので、これらの~entryの除外により，後続の~cpが利用されるようになる。 ◎ The index jis0208 contains duplicate code points so the exclusion of these entries causes later code points to be used.
~RET %索引の中で %~cp を指す`索引~pointer$ ◎ Return the index pointer for codePoint in index.

%~cp を指す `索引~Big5~pointer@ は、次の手続きが返す~pointerである： ◎ The index Big5 pointer for codePoint is the return value of these steps:

%索引 ~LET `索引~Big5$から［ ~pointerが ( (`A1^X ~MINUS `81^X) ~MUL 157 ) 未満の~entry ］すべてを除外した索引 ◎ Let index be index Big5 excluding all entries whose pointer is less than (0xA1 - 0x81) × 157.

注記：香港増補字符集（ `Hong Kong Supplementary Character Set^en ）拡張を~literalとして返さないようにする。 ◎ Avoid returning Hong Kong Supplementary Character Set extensions literally.
~IF［ %~cp ~IN { `2550^U1, `255E^U1, `2561^U1, `256A^U1, `5341^U1, `5345^U1 } ］ ⇒ ~RET %索引の中で %~cp に対応する`最後の^em ~pointer ◎ If codePoint is U+2550 (═), U+255E (╞), U+2561 (╡), U+256A (╪), U+5341 (十), or U+5345 (卅), then return the last pointer corresponding to codePoint in index.

注記：他にも重複している~cpはあるが、それら用には，`最初の^em ~pointerが利用されることになる。 ◎ There are other duplicate code points, but for those the first pointer is to be used.
~RET %索引の中で %~cp を指す`索引~pointer$ ◎ Return the index pointer for codePoint in index.

注記：すべての`索引$は、規範的でない資源 `indexes.json$ としても可用である（`索引~gb18030範囲~群$の形式は、範囲を表現できるようにするため，少し異なるものにされている）。 ◎ All indexes are also available as a non-normative indexes.json resource. (Index gb18030 ranges has a slightly different format here, to be able to represent ranges.)

6. 他の標準~用の~hook

注記：次に挙げる各種~algo（以下に定義される）は、他の仕様からの~~利用が意図されている： ◎ The algorithms defined below (UTF-8 decode, UTF-8 decode without BOM, UTF-8 decode without BOM or fail, and UTF-8 encode) are intended for usage by other standards.

`~UTF-8復号する$ ⇒ 新たな形式は、復号するときは，これを利用すること（次項は別として）。 ◎ For decoding, UTF-8 decode is to be used by new formats.＼
`~BOMはそのままに~UTF-8復号する$／ `~BOMも失敗-もそのままに~UTF-8復号する$ ⇒ 形式や~protocolの中の識別子や~byte列~用には、これらを利用すること。 ◎ For identifiers or byte sequences within a format or protocol, use UTF-8 decode without BOM or UTF-8 decode without BOM or fail.
`~UTF-8符号化する$ ⇒ 符号化するときは、これを利用すること。 ◎ For encoding, UTF-8 encode is to be used.

各~標準は、 `~UTF-8符号化する$（および，旧来の`符号化法を利用して符号化する$）~algoに渡す［入力~用の`入出力~queue$† ］が，実質的には`~scalar値$が成す入出力~queueである — すなわち`~surrogate$は包含しない — ことを確保すること。 ◎ Standards are to ensure that the input I/O queues they pass to UTF-8 encode (as well as the legacy encode) are effectively I/O queues of scalar values, i.e., they contain no surrogates.

これらの~hook （および，`~Unicodeに復号する$, `符号化法を利用して符号化する$）は、［入力~用の入出力~queue† ］の全体が消費されるまで~call元を阻む。各~出力~tokenを，~streamの中に~pushされるたびに利用するためには、 ~call元は，［当の~hookを呼出すときに，空な［出力~用の入出力~queue†† ］を伴わせて、そこから`並列的$に読取る］こと。 `~BOMも失敗-もそのままに~UTF-8復号する$を利用するときには，少し~careが必要になることに注意 — 復号している間に~errorが見出された場合、 `~EoQ$は，［出力~用の入出力~queue ］の中へ~pushされなくなるので。【† 各~algoにおける %入出力~queue 引数／†† %出力引数】 ◎ These hooks (as well as decode and encode) will block until the input I/O queue has been consumed in its entirety. In order to use the output tokens as they are pushed into the stream, callers are to invoke the hooks with an empty output I/O queue and read from it in parallel. Note that some care is needed when using UTF-8 decode without BOM or fail, as any error found during decoding will prevent the end-of-queue item from ever being pushed into the output I/O queue.

`~UTF-8復号する@ ~algoは、所与の ( `入出力~queue$`~byte^tA %入出力~queue, `入出力~queue$`~scalar値^tA %出力 ~DF « » ) に対し： ◎ To UTF-8 decode an I/O queue of bytes ioQueue given an optional I/O queue of scalar values output (default « »), run these steps:

%~buffer ~LET `入出力~queueを覗見る$( %入出力~queue, 3 ) ◎ Let buffer be the result of peeking three bytes from ioQueue,＼ ↓converted to a byte sequence.
~IF［ ( %~buffer[0], %~buffer[1], %~buffer[2] ) ~EQ ( `EF^X, `BB^X, `BF^X ) ］ ⇒ `入出力~queueから~item列を読取る$( %入出力~queue, 3 ) （結果は利用しない。） ◎ If buffer is 0xEF 0xBB 0xBF, then read three bytes from ioQueue. (Do nothing with those bytes.)
%復号器 ~LET `UTF-8$n の`復号器$の新たな~instance ◎ ↓
`~queueを処理する$( %復号器, %入出力~queue, %出力, `replacement^l ) ◎ Process a queue with an instance of UTF-8’s decoder, ioQueue, output, and "replacement".
~RET %出力 ◎ Return output.

`~BOMはそのままに~UTF-8復号する@ ~algoは、所与の ( `入出力~queue$`~byte^tA %入出力~queue, `入出力~queue$`~scalar値^tA %出力 ~DF « » ) に対し： ◎ To UTF-8 decode without BOM an I/O queue of bytes ioQueue given an optional I/O queue of scalar values output (default « »), run these steps:

%復号器 ~LET `UTF-8$n の`復号器$の新たな~instance ◎ ↓
`~queueを処理する$( %復号器, %入出力~queue, %出力, `replacement^l ) ◎ Process a queue with an instance of UTF-8’s decoder, ioQueue, output, and "replacement".
~RET %出力 ◎ Return output.

`~BOMも失敗-もそのままに~UTF-8復号する@ ~algoは、所与の ( `入出力~queue$`~byte^tA %入出力~queue, `入出力~queue$`~scalar値^tA %出力 ~DF « » ) に対し： ◎ To UTF-8 decode without BOM or fail an I/O queue of bytes ioQueue given an optional I/O queue of scalar values output (default « »), run these steps:

%復号器 ~LET `UTF-8$n の`復号器$の新たな~instance ◎ ↓
%~errorになり得る ~LET `~queueを処理する$( %復号器, %入出力~queue, %出力, `fatal^l ) ◎ Let potentialError be the result of processing a queue with an instance of UTF-8’s decoder, ioQueue, output, and "fatal".
~IF［ %~errorになり得る ~EQ `~error$i ］ ⇒ ~RET `失敗^i ◎ If potentialError is an error, then return failure.
~RET %出力 ◎ Return output.

`~UTF-8符号化する@ ~algoは、所与の ( `入出力~queue$`~scalar値^tA %入出力~queue, `入出力~queue$`~byte^tA %出力 ~DF « » ) に対し ⇒ ~RET `符号化法を利用して符号化する$( %入出力~queue, `UTF-8$n, %出力 ) ◎ To UTF-8 encode an I/O queue of scalar values ioQueue given an optional I/O queue of bytes output (default « »), return the result of encoding ioQueue with encoding UTF-8 and output.

6.1. 各~標準~用の旧来の~hook

注記：各~標準は、互換性を得るために必要な場合を除き，次に挙げる~algoを利用しないことが強く奨励される ⇒＃ `~Unicodeに復号する$／ `~BOMを~sniffする$／ `符号化法を利用して符号化する$ ◎ Standards are strongly discouraged from using decode, BOM sniff, and encode, except as needed for compatibility.＼

これらの旧来の~hookを必要としている標準は、次の利用も必要になると見込まれる ⇒＃ `~labelから符号化法を取得する$（~labelを`符号化法$に転換するため）／ `符号化法から出力~符号化法を取得する$（`符号化法$を別の`符号化法$ — `符号化法を利用して符号化する$ときに渡すそれに相応しいもの — に転換するため） ◎ Standards needing these legacy hooks will most likely also need to use get an encoding (to turn a label into an encoding) and get an output encoding (to turn an encoding into another encoding that is suitable to pass into encode).

［ ~URL~percent-符号化法の極めて~~限定的な事例］用に，符号化器~errorに対する~customな取扱いが必要になる。［ `符号化器を取得する$／`符号化するか失敗する$ ］~algoは、そのために利用される。他の~algoは、直に利用されないことになる。 ◎ For the extremely niche case of URL percent-encoding, custom encoder error handling is needed. The get an encoder and encode or fail algorithms are to be used for that. Other algorithms are not to be used directly.

`~Unicodeに復号する@ ~algoは、所与の ( `入出力~queue$`~byte^tA %入出力~queue, ~fallback符号化法 %符号化法, `入出力~queue$`~scalar値^tA %出力 ~DF « » ) に対し： ◎ To decode an I/O queue of bytes ioQueue given a fallback encoding encoding and an optional I/O queue of scalar values output (default « »), run these steps:

%~BOM符号化法 ~LET `~BOMを~sniffする$( %入出力~queue ) ◎ Let BOMEncoding be the result of BOM sniffing ioQueue.
~IF［ %~BOM符号化法 ~NEQ ~NULL ］： ◎ If BOMEncoding is non-null:
1. %符号化法 ~SET %~BOM符号化法 ◎ Set encoding to BOMEncoding.
2. %N ~LET ［ %~BOM符号化法 ~EQ `UTF-8$n ならば 3 ／ ~ELSE_ 2 ］ ◎ ↓
3. `入出力~queueから~item列を読取る$( %入出力~queue, %N ) （結果は利用しない。） ◎ Read three bytes from ioQueue, if BOMEncoding is UTF-8; otherwise read two bytes. (Do nothing with those bytes.)
注記：配備-済みな内容との互換性を得るため、 ~BOMは他より~~優先される。 HTTP が利用される文脈においては、これは， `Content-Type` ~headerの意味論に対する違反である。 ◎ For compatibility with deployed content, the byte order mark is more authoritative than anything else. In a context where HTTP is used this is in violation of the semantics of the `Content-Type` header.
%復号器 ~LET %符号化法の`復号器$の新たな~instance ◎ ↓
`~queueを処理する$( %復号器, %入出力~queue, %出力, `replacement^l ) ◎ Process a queue with an instance of encoding’s decoder, ioQueue, output, and "replacement".
~RET %出力 ◎ Return output.

`~BOMを~sniffする@ ~algoは、所与の ( `入出力~queue$`~byte^tA %入出力~queue ) に対し： ◎ To BOM sniff an I/O queue of bytes ioQueue, run these steps:

%~BOM ~LET 次の結果を~byte列に変換した結果 ⇒ `入出力~queueを覗見る$( %入出力~queue, 3 ) ◎ Let BOM be the result of peeking 3 bytes from ioQueue, converted to a byte sequence.
下の表t内の ~EACH( %行 ) に対し，挙げられた順に ⇒ ~IF［ %~BOM は %行の 1 列目に与える~byte列`から開始して$byteいる］ ⇒ ~RET %行の 2 列目に与える`符号化法$ ◎ For each of the rows in the table below, starting with the first one and going down, if BOM starts with the bytes given in the first column, then return the encoding given in the cell in the second column of that row. Otherwise, return null.

~BOM 符号化法

`EF^X `BB^X `BF^X `UTF-8$n
`FE^X `FF^X `UTF-16BE$n
`FF^X `FE^X `UTF-16LE$n
◎ Byte order mark｜Encoding 0xEF 0xBB 0xBF｜UTF-8 0xFE 0xFF｜UTF-16BE 0xFF 0xFE｜UTF-16LE
~RET ~NULL ◎ ↑

~BOM	符号化法
`EF^X `BB^X `BF^X	`UTF-8$n
`FE^X `FF^X	`UTF-16BE$n
`FF^X `FE^X	`UTF-16LE$n

注記： `~Unicodeに復号する$~algoには、［ ~BOMが見出されたので、供された符号化法は利用していないこと］を~call元に通信する仕方が無い。この~hookは、その事への対処法であり，`~Unicodeに復号する$前に呼出されることになる†。それは、［ ~BOMが見出されたならそれに対応する符号化法／ ~ELSE_ ~NULL ］を返す。【†特に，~HTMLの構文解析~algoは、入力~streamを~Unicodeに復号する前に，`符号化法を~sniffする~algo$にてこれを呼出す。】 ◎ This hook is a workaround for the fact that decode has no way to communicate back to the caller that it has found a byte order mark and is therefore not using the provided encoding. The hook is to be invoked before decode, and it will return an encoding corresponding to the byte order mark found, or null otherwise.

`符号化法を利用して符号化する@ ~algoは、所与の ( `入出力~queue$`~scalar値^tA %入出力~queue, `符号化法$ %符号化法, `入出力~queue$`~byte^tA %出力 ~DF « » ) に対し： ◎ To encode an I/O queue of scalar values ioQueue given an encoding encoding and an optional I/O queue of bytes output (default « »), run these steps:

%符号化器 ~LET `符号化器を取得する$( %符号化法 ) ◎ Let encoder be the result of getting an encoder from encoding.
`~queueを処理する$( %符号化器, %入出力~queue, %出力, `html^l ) ◎ Process a queue with encoder, ioQueue, output, and "html".
~RET %出力 ◎ Return output.

注記：これは、 ~HTML~form用の旧来の~hookである。 `~UTF-8符号化する$を被せた方が、決して `~error$i を誘発しないので安全である。 `URL$r ◎ This is a legacy hook for HTML forms. Layering UTF-8 encode on top is safe as it never triggers errors. [HTML]

`符号化器を取得する@ ~algoは、所与の ( `符号化法$ %符号化法 ) に対し： ◎ To get an encoder from an encoding encoding:

~Assert： %符号化法 ~NIN { `replacement$n, `UTF-16BE$n, `UTF-16LE$n【！`UTF-16BE/LE$n】 } ◎ Assert: encoding is not replacement or UTF-16BE/LE.
~RET %符号化法の`符号化器$の~instance ◎ Return an instance of encoding’s encoder.

`符号化するか失敗する@ ~algoは、所与の ( `入出力~queue$`~scalar値^tA %入出力~queue, `符号化器$の~instance %符号化器 , `入出力~queue$`~byte^tA %出力 ) に対し： ◎ To encode or fail an I/O queue of scalar values ioQueue given an encoder instance encoder and an I/O queue of bytes output, run these steps:

%~errorになり得る ~LET `~queueを処理する$( %符号化器, %入出力~queue, %出力, `fatal^l ) ◎ Let potentialError be the result of processing a queue with encoder, ioQueue, output, and "fatal".
`入出力~queueに~pushする$( %出力, « `~EoQ$ » ) ◎ Push end-of-queue to output.
~IF［ %~errorになり得るは `~error$i である］ ⇒ ~RET `~error$i の`~cp$の`値$cp ◎ If potentialError is an error, then return error’s code point’s value.
~RET ~NULL ◎ Return null.

注記：これは、 ~URL~percent-符号化法 `URL$r 用の旧来の~hookである。 ~call元は、 `符号化器$の~instanceを生きたまま保つ必要がある — `~ISO-2022-JP符号化器$が `~error$i を返すときにとり得る状態は、 2 つあるので。それはまた，~call元が［ ~errorを何らかの仕方で符号化するような~byte列］を発する場合、それらの各~byteは，範囲 { `00^X 〜 `7F^X } に入る, かつ［ `0E^X, `0F^X, `1B^X, `5C^X, `7E^X ］以外にする必要があることを意味する。 ◎ This is a legacy hook for URL percent-encoding. The caller will have to keep an encoder instance alive as the ISO-2022-JP encoder can be in two different states when returning an error. That also means that if the caller emits bytes to encode the error in some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E. [URL]

特に，`~ISO-2022-JP符号化器$が `~Roman$i 状態にある下で `~error$i を返す場合、 ~call元は， `5C^X `\^smb を出力し得ない — それは、 `005C^U1 として復号されなくなるので。この理由から，`符号化するか失敗する$を意図されない目的に利用している応用 — `005C^U1 を置換~構文【~escape法】（例： `\u2603^c ）の一部として利用している，~JSや~CSSなど — は、次のいずれかで~~対処すること ⇒＃ `~ISO-2022-JP符号化器$をそのような置換~schemeと併用しないよう，~careする／置換~構文が，符号化器を必ず通過するようにする（~URL~percent-符号化法とは対照的に） ◎ In particular, if upon returning an error the ISO-2022-JP encoder is in the Roman state, the caller cannot output 0x5C (\) as it will not decode as U+005C (\). For this reason, applications using encode or fail for unintended purposes ought to take care to prevent the use of the ISO-2022-JP encoder in combination with replacement schemes, such as those of JavaScript and CSS, that use U+005C (\) as part of the replacement syntax (e.g., \u2603) or make sure to pass the replacement syntax through the encoder (in contrast to URL percent-encoding).

返り値は、［ `~error$i が生じなければ ~NULL ／ ~ELSE_ 符号化し得ない`~cp$を表現している数値］になる。数値が返された場合，~call元は、［同じ`符号化器$の~instance, 新たな出力~用の`入出力~queue$ ］を給して，再び呼出す必要があることになる。 ◎ The return value is either the number representing the code point that could not be encoded or null, if there was no error. When it returns non-null the caller will have to invoke it again, supplying the same encoder instance and a new output I/O queue.

7. ~API

この節では Web IDL `WEBIDL$r の各種用語を利用する。 ~browser~UAは、この~APIを~supportするモノトスル。 ~JS実装は、この~APIを~supportするベキである。他の~UA／~programming言語は、必要に応じて相応しい~API（これではないかもしれない）を利用することが奨励される。 ◎ This section uses terminology from Web IDL. Browser user agents must support this API. JavaScript implementations should support this API. Other user agents or programming languages are encouraged to use an API suitable to their needs, which might not be this one. [WEBIDL]

次の例は、 `TextEncoder$I ~objを利用して，文字列の配列を `ArrayBuffer$I に符号化する。結果は次を内容とする `Uint8Array$I になる：先頭が（ `Uint32Array$I としての）文字列の個数，その後は：最初の文字列の（ `Uint32Array$I としての）長さ, `UTF-8$n に符号化されたその文字列~data，
2 番目の文字列の（ `Uint32Array$I としての）長さ, その文字列~data，
… 等々と続く。 ◎ The following example uses the TextEncoder object to encode an array of strings into an ArrayBuffer. The result is a Uint8Array containing the number of strings (as a Uint32Array), followed by the length of the first string (as a Uint32Array), the UTF-8 encoded string data, the length of the second string (as a Uint32Array), the string data, and so on.

function encodeArrayOfStrings(%strings) {
  var %encoder, %encoded, %len, %bytes, %view, %offset;

  %encoder = new TextEncoder();
  %encoded = [];

  %len = Uint32Array.BYTES_PER_ELEMENT;
  for (var %i = 0; %i < %strings.length; %i++) {
    %len += Uint32Array.BYTES_PER_ELEMENT;
    %encoded[%i] = %encoder.encode(%strings[%i]);
    %len += %encoded[%i].byteLength;
  }

  %bytes = new Uint8Array(%len);
  %view = new DataView(%bytes.buffer);
  %offset = 0;

  %view.setUint32(%offset, %strings.length);
  %offset += Uint32Array.BYTES_PER_ELEMENT;
  for (var %i = 0; %i < %encoded.length; %i += 1) {
    %len = %encoded[%i].byteLength;
    %view.setUint32(%offset, %len);
    %offset += Uint32Array.BYTES_PER_ELEMENT;
    %bytes.set(%encoded[%i], %offset);
    %offset += %len;
  }
  return %bytes.buffer;
}

次の例は、［［前の例, または `UTF-8$n 以外の符号化法に等価な~algo ］により生産される形式に符号化された~data ］を含んでいる `ArrayBuffer$I を復号して、 ~~元の，文字列たちが成す配列に戻す。 ◎ The following example decodes an ArrayBuffer containing data encoded in the format produced by the previous example, or an equivalent algorithm for encodings other than UTF-8, back into an array of strings.

function decodeArrayOfStrings(%buffer, %encoding) {
  var %decoder, %view, %offset, %num_strings, %strings, %len;

  %decoder = new TextDecoder(%encoding);
  %view = new DataView(%buffer);
  %offset = 0;
  %strings = [];

  %num_strings = %view.getUint32(%offset);
  %offset += Uint32Array.BYTES_PER_ELEMENT;
  for (var %i = 0; %i < %num_strings; %i++) {
    %len = %view.getUint32(%offset);
    %offset += Uint32Array.BYTES_PER_ELEMENT;
    %strings[%i] = %decoder.decode(
      new DataView(%view.buffer, %offset, %len));
    %offset += %len;
  }
  return %strings;
}

7.1. ~interface~mixin `TextDecoderCommon^I

interface mixin `TextDecoderCommon@I {
  readonly attribute `DOMString$ `encoding$m;
  readonly attribute `boolean$ `fatal$m;
  readonly attribute `boolean$ `ignoreBOM$m;
};

`TextDecoderCommon$I ~interface~mixinは、［ `TextDecoder$I, `TextDecoderStream$I ］~objで共有される共通な取得子を定義する。これらの各~objには、次に挙げるものが結付けられる： ◎ The TextDecoderCommon interface mixin defines common getters that are shared between TextDecoder and TextDecoderStream objects. These objects have an associated:

`符号化法@dec ⇒ ある`符号化法$ ◎ encoding • An encoding.
`復号器@dec ⇒ ある［ `復号器$の~instance ］ ◎ decoder • A decoder instance.
`入出力~queue@dec ⇒ ある`入出力~queue$`~byte^tA ◎ I/O queue • An I/O queue of bytes.
`~BOMは無視するか@dec ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ ignore BOM • A boolean, initially false.
`~BOMを見つけたか@dec ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ BOM seen • A boolean, initially false.
`~error~mode@dec ⇒ ある`~error~mode$ — 初期~時は `replacement^l とする。 ◎ error mode • An error mode, initially "replacement".

`入出力~queueを直列化する@ ~algoは、所与の ( `TextDecoderCommon$I %復号器, `入出力~queue$`~scalar値^tA %入出力~queue ) に対し： ◎ The serialize I/O queue algorithm, given a TextDecoderCommon decoder and an I/O queue of scalar values ioQueue, runs these steps:

%出力 ~LET 空~文字列 ◎ Let output be the empty string.
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %入出力~queue ) ◎ Let item be the result of reading from ioQueue.
2. ~IF［ %~item ~NEQ `~EoQ$ ］ ⇒ ~RET %出力 ◎ If item is end-of-queue, then return output.
3. ~IF［ %復号器の`符号化法$dec ~IN { `UTF-8$n, `UTF-16BE$n, `UTF-16LE$n【！`UTF-16BE/LE$n】 } ］~AND［ %復号器の`~BOMは無視するか$dec ~EQ ~F ］~AND［ %復号器の`~BOMを見つけたか$dec ~EQ ~F ］： ◎ If decoder’s encoding is UTF-8 or UTF-16BE/LE, and decoder’s ignore BOM and BOM seen are false:
  1. %復号器の`~BOMを見つけたか$dec ~SET ~T ◎ Set decoder’s BOM seen to true.
  2. ~IF［ %~item ~EQ `FEFF^U `BOM^cn ］ ⇒ ~CONTINUE ◎ If item is U+FEFF BOM, then continue.
4. %出力に %~item を付加する ◎ Append item to output.

注記：この~algoは、 ~APIの利用者にもっと制御を与えるため，［ ~platformの他所で利用される，`~Unicodeに復号する$ ~algo ］とは、 ~BOMの取扱いに関して意図的に異なるものにされている。 ◎ This algorithm is intentionally different with respect to BOM handling from the decode algorithm used by the rest of the platform to give API users more control.

`encoding@m 取得子~手続きは ⇒ ~RET `~ASCII小文字~化する$( コレの`符号化法$decの`名前$ ) ◎ The encoding getter steps are to return this’s encoding’s name, ASCII lowercased.

`fatal@m 取得子~手続きは ⇒ ~RET ~IS［コレの`~error~mode$dec ~EQ `fatal^l ］ ◎ The fatal getter steps are to return true if this’s error mode is "fatal"; otherwise false.

`ignoreBOM@m 取得子~手続きは ⇒ ~RET コレの`~BOMは無視するか$dec ◎ The ignoreBOM getter steps are to return this’s ignore BOM.

7.2. ~interface `TextDecoder^I

dictionary `TextDecoderOptions@I {
  `boolean$ `fatal@mb = false;
  `boolean$ `ignoreBOM@mb = false;
};

dictionary `TextDecodeOptions@I {
  `boolean$ `stream@mb = false;
};

[`Exposed$=*]
interface `TextDecoder@I {
  `TextDecoder$mc(optional `DOMString$ %label = "utf-8", optional `TextDecoderOptions$I %options = {});

  `USVString$ `decode$m(optional `AllowSharedBufferSource$I %input, optional `TextDecodeOptions$I %options = {});
};

`TextDecoder$I includes `TextDecoderCommon$I;

【 `利用-中な~browserでこの特能を試す＠Encoding-test.html$ 】

各 `TextDecoder$I ~objには、真偽値をとる `書出さないか@dec が結付けられ，初期~時は ~F をとるとする。 ◎ A TextDecoder object has an associated do not flush, which is a boolean, initially false.

%decoder = `new TextDecoder$m([%label = "utf-8" [, %options]])

新たな `TextDecoder$I ~obj を返す。 ◎ Returns a new TextDecoder object.

%label が次を満たす場合、 `RangeError$E が`投出-$される ⇒ ［ ~labelでない］~OR［ `replacement$n 用の~labelである］ ◎ If label is either not a label or is a label for replacement, throws a RangeError.

%decoder . `encoding$m

`符号化法$decの`名前$を小文字~化して返す。 ◎ Returns encoding’s name, lowercased.

%decoder . `fatal$m

`~error~mode$dec ~EQ `fatal^l ならば ~T を返す。他の場合は ~F を返す。 ◎ Returns true if error mode is "fatal"; otherwise false.

%decoder . `ignoreBOM$m

`~BOMは無視するか$decの値を返す。 ◎ Returns the value of ignore BOM.

%decoder . `decode([input [, options]])$m

%input を `符号化法$decの`復号器$にかけた結果を返す。 %input を断片化して処理するときは、 %options の `stream$mb ~memberを ~T にした下で，この~method 0 回~以上~呼出してから， %options を省略して（またはその `stream$mb ~memberを ~F にして） 1 回だけ呼出すことで行える。後者の呼出nに %input もないならば、両~引数とも省略するのが最も簡明になる。 ◎ Returns the result of running encoding’s decoder. The method can be invoked zero or more times with options’s stream set to true, and then once without options’s stream (or set to false), to process a fragmented input. If the invocation without options’s stream (or set to false) has no input, it’s clearest to omit both arguments.

var %string = "", %decoder = new TextDecoder(%encoding), %buffer;
while(%buffer = next_chunk()) {
  %string += %decoder.decode(%buffer, {stream:true});
}
%string += %decoder.decode(); // ~EoQ

`~error~mode$dec ~EQ `fatal^l の下で， `符号化法$decの`復号器$が `~error$i を返した場合、 `TypeError$E が`投出-$される。 ◎ If the error mode is "fatal" and encoding’s decoder returns error, throws a TypeError.

`new TextDecoder(label, options)@m 構築子~手続きは： ◎ The new TextDecoder(label, options) constructor steps are:

%符号化法 ~LET `~labelから符号化法を取得する$( %label ) ◎ Let encoding be the result of getting an encoding from label.
~IF［ %符号化法 ~IN { `失敗^i, `replacement$n } ］ ⇒ ~THROW `RangeError$E ◎ If encoding is failure or replacement, then throw a RangeError.
コレの `符号化法$dec ~SET %符号化法 ◎ Set this’s encoding to encoding.
~IF［ %options[ "`fatal$mb" ] ~EQ ~T ］ ⇒ コレの`~error~mode$dec ~SET `fatal^l ◎ If options["fatal"] is true, then set this’s error mode to "fatal".
コレの `~BOMは無視するか$dec ~SET %options[ "`ignoreBOM$mb" ] ◎ Set this’s ignore BOM to options["ignoreBOM"].

`decode(input, options)@m ~method~手続きは： ◎ The decode(input, options) method steps are:

~IF［コレの`書出さないか$dec ~EQ ~F ］ ⇒＃コレの`復号器$dec ~SET コレの`符号化法$decの`復号器$の新たな~instance；コレの`入出力~queue$dec ~SET `入出力~queue$`~byte^tA « `~EoQ$ »；コレの`~BOMを見つけたか$dec ~SET ~F ◎ If this’s do not flush is false, then set this’s decoder to a new instance of this’s encoding’s decoder, this’s I/O queue to the I/O queue of bytes « end-of-queue », and this’s BOM seen to false.
コレの`書出さないか$dec ~SET %options[ "`stream$mb" ] ◎ Set this’s do not flush to options["stream"].
~IF［ %input ~NEQ ε ］： ◎ If input is given,＼
1. %複製 ~LET %input に`保持された~byte列の複製を取得する$ ◎ ↓
2. `入出力~queueに~pushする$( コレの`入出力~queue$dec, %複製 ) ◎ then push a copy of input to this’s I/O queue.
注記：実装には、この複製を避けるよう実装することが強く奨励される。そうするときは、 %input が変更されても，後の `decode()$m の~callに影響しないようにする必要がある。 ◎ Implementations are strongly encouraged to use an implementation strategy that avoids this copy. When doing so they will have to make sure that changes to input do not affect future calls to decode().

`SharedArrayBuffer^I ~objにより公開される~memoryは、［実装~用に概して利用される~programming言語］の~memory~modelに要求される `data race freedom^en な特質を固守しない。実装するときは、 `SharedArrayBuffer^I ~objが公開する~memoryに~accessするときに適切な便宜性†を利用するよう~careすること。【†そのような~accessにその種の特質が備わるよう指示する，言語~特有な構文など】 ◎ The memory exposed by SharedArrayBuffer objects does not adhere to data race freedom properties required by the memory model of programming languages typically used for implementations. When implementing, take care to use the appropriate facilities when accessing memory exposed by SharedArrayBuffer objects.
%出力 ~LET `入出力~queue$`~scalar値^tA « `~EoQ$ » ◎ Let output be the I/O queue of scalar values « end-of-queue ».
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( コレの`入出力~queue$dec ) ◎ Let item be the result of reading from this’s I/O queue.
2. ~IF［ %~item ~EQ `~EoQ$ ］~AND［コレの`書出さないか$dec ~EQ ~T ］ ⇒ ~RET `入出力~queueを直列化する$( コレ, %出力 ) ◎ If item is end-of-queue and this’s do not flush is true, then return the result of running serialize I/O queue with this and output.
  
  注記： ~streamingでは、［コレの`書出さないか$dec ~EQ ~T ］のとき，ここで`~EoQ$を取扱うことなく，それを ~F にしない仕方で働く。この仕方により，コレの`復号器$decは、後続な呼出nにおいて，この~algoの最初の段で一新されることなく，その状態は保全される。 ◎ The way streaming works is to not handle end-of-queue here when this’s do not flush is true and to not set it to false. That way in a subsequent invocation this’s decoder is not set anew in the first step of the algorithm and its state is preserved.
3. %結果 ~LET `~itemを処理する$( %~item, コレの`復号器$dec, コレの`入出力~queue$dec, %出力, コレの`~error~mode$dec ) ◎ Otherwise: ◎ Let result be the result of processing an item with item, this’s decoder, this’s I/O queue, output, and this’s error mode.
4. ~IF［ %結果 ~EQ `完遂d$i ］ ⇒ ~RET `入出力~queueを直列化する$( コレ, %出力 ) ◎ If result is finished, then return the result of running serialize I/O queue with this and output.
5. ~IF［ %結果 ~EQ `~error$i ］ ⇒ ~THROW `TypeError$E ◎ Otherwise, if result is error, throw a TypeError.

7.3. ~interface~mixin `TextEncoderCommon^I

interface mixin `TextEncoderCommon@I {
  readonly attribute `DOMString$ `~encoding0$m;
};

`TextEncoderCommon$I ~interface~mixinは、［ `TextEncoder$I, `TextEncoderStream$I ］~objで共有される共通な取得子を定義する。 ◎ The TextEncoderCommon interface mixin defines common getters that are shared between TextEncoder and TextEncoderStream objects.

`~encoding0@m 取得子~手続きは ⇒ ~RET `utf-8^l ◎ The encoding getter steps are to return "utf-8".

7.4. ~interface `TextEncoder^I

dictionary `TextEncoderEncodeIntoResult@I {
  `unsigned long long$ `read@m;
  `unsigned long long$ `written@m;
};

[`Exposed$=*]
interface `TextEncoder@I {
  `TextEncoder$mc();

  [NewObject] `Uint8Array$ `encode$m(optional `USVString$ %input = "");
  `TextEncoderEncodeIntoResult$I `encodeInto$m(`USVString$ %source, [`AllowShared$] `Uint8Array$ %destination);
};
`TextEncoder$I includes `TextEncoderCommon$I;

注記： `TextEncoder$I ~objは、 `UTF-8$n しか~supportしないので，構築子に %label 引数は無い。また、 ~scalar値~bufferを要求する`符号化器$は無いので， `stream^mb ~optionもない。 ◎ A TextEncoder object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder requires buffering of scalar values.

%encoder = `new TextEncoder()$m: 新たな `TextEncoder$I ~obj を返す。 ◎ Returns a new TextEncoder object.
%encoder . `~encoding0$m: `utf-8^l を返す。 ◎ Returns "utf-8".
%encoder . `encode([input = ""])$m: %input を `UTF-8$n の`符号化器$にかけた結果を返す。 ◎ Returns the result of running UTF-8’s encoder.
%encoder . `encodeInto(source, destination)$m: %source を渡して`~UTF-8符号化器$を走らせた結果を %destination の中に格納して，その進捗を~objとして返す — 結果の ⇒＃ `read$m は %source から変換された`~cu$数になる／ `written$m は %destination 内で改変された~byte数になる ◎ Runs the UTF-8 encoder on source, stores the result of that operation into destination, and returns the progress made as an object wherein read is the number of converted code units of source and written is the number of bytes modified in destination.

`new TextEncoder()@m 構築子~手続きは、何もしない。 ◎ The new TextEncoder() constructor steps are to do nothing.

`encode(input)@m ~method~手続きは： ◎ The encode(input) method steps are:

%入力 ~LET `入出力~queueに変換する$( %input ) ◎ Convert input to an I/O queue of scalar values.
%出力 ~LET `入出力~queue$`~byte^tA « `~EoQ$ » ◎ Let output be the I/O queue of bytes « end-of-queue ».
%符号化器 ~LET `UTF-8$n の`符号化器$の新たな~instance ◎ ↓
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %入力 ) ◎ Let item be the result of reading from input.
2. %結果 ~LET `~itemを処理する$( %~item, %符号化器, %入力, %出力, `fatal^l ) ◎ Let result be the result of processing an item with item, an instance of the UTF-8 encoder, input, output, and "fatal".
3. ~Assert： %結果は `~error$i でない ◎ Assert: result is not an error.
  
  注記： `~UTF-8符号化器$が `~error$i を返すことはない。 ◎ The UTF-8 encoder cannot return error.
4. ~IF［ %結果 ~EQ `完遂d$i ］：
  1. %出力 ~SET %出力を~byte列に`変換する$
  2. ~RET `~buffer~sourceを作成する$( `Uint8Array$I, %出力, コレに`関連な~realm$ )
  ◎ If result is finished, then return the result of creating a Uint8Array object given output and this’s relevant realm.

`encodeInto(source, destination)@m ~method~手続きは： ◎ The encodeInto(source, destination) method steps are:

%read ~LET 0 ◎ Let read be 0.
%written ~LET 0 ◎ Let written be 0.
%符号化器 ~LET `~UTF-8符号化器$の新たな~instance ◎ Let encoder be an instance of the UTF-8 encoder.
%利用されない~queue ~LET `入出力~queue$`~scalar値^tA « `~EoQ$ » ◎ Let unused be the I/O queue of scalar values « end-of-queue ».

注記：以下で呼出される`~handler$~algoには，この引数が要求されるが、 `~UTF-8符号化器$は，それを利用しない。 ◎ The handler algorithm invoked below requires this argument, but it is not used by the UTF-8 encoder.
%source ~SET `入出力~queueに変換する$( %source ) ◎ Convert source to an I/O queue of scalar values.
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %source ) ◎ Let item be the result of reading from source.
2. %結果 ~LET %符号化器の`~handler$( %利用されない~queue, %~item ) ◎ Let result be the result of running encoder’s handler on unused and item.
3. ~IF［ %結果 ~EQ `完遂d$i ］ ⇒ ~BREAK ◎ If result is finished, then break. ◎ Otherwise:
4. ~IF［ %written ~PLUS %結果を成す~byte数 ~GT %destination の`~byte長さ$BS ］ ⇒ ~BREAK ◎ If destination’s byte length − written is greater than or equal to the number of bytes in result:
5. %read ~INCBY ［ %~item ~LTE `FFFF^U ならば 1 ／ ~ELSE_ 2 ］ ◎ • If item is greater than U+FFFF, then increment read by 2. • Otherwise, increment read by 1.
6. `配列~buffer~viewの中へ~byte列を書込む$( %destination, %結果, %written ) ◎ • Write the bytes in result into destination, with startingOffset set to written.
  
  上述した `SharedArrayBuffer^I ~obj用の`警告＠#sharedarraybuffer-warning$を見よ。 ◎ See the warning for SharedArrayBuffer objects above.
7. %written ~INCBY %結果を成す~byte数 ◎ • Increment written by the number of bytes in result. ◎ Otherwise, break.
~RET « `read$m → %read, `written$m → %written » ◎ Return «[ "read" → read, "written" → written ]».

`encodeInto()$m ~methodを利用すれば、文字列を既存の `ArrayBuffer$I ~objの中へ符号化できる。下における様々な詳細は，読者への宿題として残しておくが、この例は，この~methodの用-法の一つをデモる： ◎ The encodeInto() method can be used to encode a string into an existing ArrayBuffer object. Various details below are left as an exercise for the reader, but this demonstrates an approach one could take to use this method:

function convertString(%buffer, %input, %callback) {
  let %bufferSize = 256,
      %bufferStart = malloc(%buffer, %bufferSize),
      %writeOffset = 0,
      %readOffset = 0;
  while (true) {
    const view = new Uint8Array(%buffer, %bufferStart + %writeOffset, %bufferSize - %writeOffset),
          {%read, %written} = cachedEncoder.encodeInto(%input.substring(%readOffset), view);
    %readOffset += %read;
    %writeOffset += %written;
    if (%readOffset === %input.length) {
      %callback(%bufferStart, %writeOffset);
      free(%buffer, %bufferStart);
      return;
    }
    %bufferSize *= 2;
    %bufferStart = realloc(%buffer, %bufferStart, %bufferSize);
  }
}

7.5. ~interface `TextDecoderStream$I

[`Exposed$=*]
interface `TextDecoderStream@I {
  `TextDecoderStream$mc(optional `DOMString$ %label = "utf-8", optional `TextDecoderOptions$I %options = {});
};
`TextDecoderStream$I includes `TextDecoderCommon$I;
`TextDecoderStream$I includes `GenericTransformStream$I;

%decoder = new TextDecoderStream([%label = "utf-8" [, %options]])

新たな `TextDecoderStream$I ~objを返す。 ◎ Returns a new TextDecoderStream object.

%decoder . `encoding$m

`符号化法$decの`名前$を小文字~化して返す。 ◎ Returns encoding’s name, lowercased.

%decoder . `fatal$m

`~error~mode$dec ~EQ `fatal^l ならば ~T を返す。他の場合は ~F を返す。 ◎ Returns true if error mode is "fatal", and false otherwise.

%decoder . `ignoreBOM$m

`~BOMは無視するか$decの値を返す。 ◎ Returns the value of ignore BOM.

%decoder . `readable$m

`可読~stream$を返す。その`~chunk$たちは、 `writable$m に書込まれた~chunkたちに対し，`符号化法$decの`復号器$を走らせた結果の文字列たちになる。 ◎ Returns a readable stream whose chunks are strings resulting from running encoding’s decoder on the chunks written to writable.

%decoder . `writable$m

`可書~stream$を返す。それは、 `AllowSharedBufferSource$I 型の~chunkたちを受容して — `readable$m に可用にされる前に — `符号化法$decの`復号器$にかける。 ◎ Returns a writable stream which accepts AllowSharedBufferSource chunks and runs them through encoding’s decoder before making them available to readable.

これは概して、 `ReadableStream$I ~sourceの `pipeThrough$m ~methodを介して利用されることになる。 ◎ Typically this will be used via the pipeThrough() method on a ReadableStream source.

var %decoder = new TextDecoderStream(%encoding);
byteReadable
  .pipeThrough(%decoder)
  .pipeTo(%textWritable);

`~error~mode$dec ~EQ `fatal^l かつ`符号化法$decの`復号器$は `~error$i を返す場合、 `readable$m, `writable$m とも `TypeError$E で~errorにされることになる。 ◎ If the error mode is "fatal" and encoding’s decoder returns error, both readable and writable will be errored with a TypeError.

`new TextDecoderStream(label, options)@m 構築子~手続きは： ◎ The new TextDecoderStream(label, options) constructor steps are:

%符号化法 ~LET `~labelから符号化法を取得する$( %label ) ◎ Let encoding be the result of getting an encoding from label.
~IF［ %符号化法 ~IN { `失敗^i, `replacement$n } ］ ⇒ ~THROW `RangeError$E ◎ If encoding is failure or replacement, then throw a RangeError.
コレの`符号化法$dec ~SET %符号化法 ◎ Set this’s encoding to encoding.
~IF［ %options[ "`fatal$mb" ] ~EQ ~T ］ ⇒ コレの`~error~mode$dec ~SET `fatal^l ◎ If options["fatal"] is true, then set this’s error mode to "fatal".
コレの`~BOMは無視するか$dec ~SET %options[ "`ignoreBOM$mb" ] ◎ Set this’s ignore BOM to options["ignoreBOM"].
コレの`復号器$dec ~SET コレの`符号化法$decの`復号器$の新たな~instance ◎ Set this’s decoder to a new instance of this’s encoding’s decoder,＼
コレの`入出力~queue$dec ~SET 新たな`入出力~queue$`~byte^tA ◎ and set this’s I/O queue to a new I/O queue.
%形式変換~stream ~SET `新たな~obj$( `TransformStream$I, コレに`関連な~realm$ ) ◎ ↓
%形式変換~stream を`設定しておく$TS — 次を与える下で：
- `形式変換~algo^i ~SET 所与の ( %~chunk ) に対し，次を走らす~algo ⇒ `~chunkを復号して~enqueueする$( コレ, %~chunk )
- `書出n~algo^i ~SET 次を走らす~algo ⇒ `書出して~enqueueする$( コレ )
◎ Let transformAlgorithm be an algorithm which takes a chunk argument and runs the decode and enqueue a chunk algorithm with this and chunk. ◎ Let flushAlgorithm be an algorithm which takes no arguments and runs the flush and enqueue algorithm with this. ◎ Let transformStream be a new TransformStream. ◎ Set up transformStream with transformAlgorithm set to transformAlgorithm and flushAlgorithm set to flushAlgorithm.
コレの`形式変換$ ~SET %形式変換~stream ◎ ◎ Set this’s transform to transformStream.

`~chunkを復号して~enqueueする@ ~algoは、所与の ( `TextDecoderStream$I ~obj %復号器, %~chunk ) に対し： ◎ The decode and enqueue a chunk algorithm, given a TextDecoderStream object decoder and a chunk, runs these steps:

%~buffer~source ~LET `~IDL値に変換する$( %~chunk, `AllowSharedBufferSource$I ) ◎ Let bufferSource be the result of converting chunk to an AllowSharedBufferSource.
`入出力~queueに~pushする$( %復号器の`入出力~queue$dec, 次の結果 ) ⇒ %~buffer~source に`保持された~byte列の複製を取得する$ ◎ Push a copy of bufferSource to decoder’s I/O queue.

上述した `SharedArrayBuffer^I ~obj用の`警告＠#sharedarraybuffer-warning$を見よ。 ◎ See the warning for SharedArrayBuffer objects above.
%出力 ~LET `入出力~queue$`~scalar値^tA « `~EoQ$ » ◎ Let output be the I/O queue of scalar values « end-of-queue ».
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %復号器の`入出力~queue$dec ) ◎ Let item be the result of reading from decoder’s I/O queue.
2. ~IF［ %~item ~EQ `~EoQ$ ］： ◎ If item is end-of-queue:
  1. %出力~chunk ~LET `入出力~queueを直列化する$( コレ, %出力 ) ◎ Let outputChunk be the result of running serialize I/O queue with decoder and output.
  2. ~IF［ %出力~chunk ~NEQ 空~文字列］ ⇒ %復号器の`形式変換$に`~chunkを~enqueueする$TS( %出力~chunk ) ◎ If outputChunk is not the empty string, then enqueue outputChunk in decoder’s transform.
  3. ~RET ◎ Return.
3. %結果 ~LET `~itemを処理する$( %~item, %復号器の`復号器$dec, %復号器の`入出力~queue$dec, %出力, %復号器の`~error~mode$dec ) ◎ Let result be the result of processing an item with item, decoder’s decoder, decoder’s I/O queue, output, and decoder’s error mode.
4. ~IF［ %結果 ~EQ `~error$i ］ ⇒ ~THROW `TypeError$E ◎ If result is error, then throw a TypeError.

`書出して~enqueueする@ ~algoは、入力 `ReadableStream$I ~objからの~dataの終端を取扱う — それは、所与の ( `TextDecoderStream$I ~obj %復号器 ) に対し： ◎ The flush and enqueue algorithm, which handles the end of data from the input ReadableStream object, given a TextDecoderStream object decoder, runs these steps:

%出力 ~LET `入出力~queue$`~scalar値^tA « `~EoQ$ » ◎ Let output be the I/O queue of scalar values « end-of-queue ».
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %復号器の`入出力~queue$dec ) ◎ Let item be the result of reading from decoder’s I/O queue.
2. %結果 ~LET `~itemを処理する$( %~item, %復号器の`復号器$dec, %復号器の`入出力~queue$dec, %出力, %復号器の`~error~mode$dec ) ◎ Let result be the result of processing an item with item, decoder’s decoder, decoder’s I/O queue, output, and decoder’s error mode.
3. ~IF［ %結果 ~EQ `完遂d$i ］： ◎ If result is finished:
  1. %出力~chunk ~LET `入出力~queueを直列化する$( コレ, %出力 ) ◎ Let outputChunk be the result of running serialize I/O queue with decoder and output.
  2. ~IF［ %出力~chunk ~NEQ 空~文字列］ ⇒ %復号器の`形式変換$に`~chunkを~enqueueする$TS( %出力~chunk ) ◎ If outputChunk is not the empty string, then enqueue outputChunk in decoder’s transform.
  3. ~RET ◎ Return.
4. ~ELIF［ %結果 ~EQ `~error$i ］ ⇒ ~THROW `TypeError$E ◎ Otherwise, if result is error, throw a TypeError.

7.6. ~interface `TextEncoderStream^I

[`Exposed$=*]
interface `TextEncoderStream@I {
  `TextEncoderStream$mc();
};

`TextEncoderStream$I includes `TextEncoderCommon$I;
`TextEncoderStream$I includes `GenericTransformStream$I;

各 `TextEncoderStream$I ~objには、次に挙げるものが結付けられる： ◎ A TextEncoderStream object has an associated:

`符号化器@enc ⇒ ある［ `符号化器$の~instance ］

【下に注記されるとおり， `UTF-8$n のそれしかとらない。また、 `復号器$decのときと違って，状態を保持する~fieldは無い。】
◎ encoder • An encoder instance.
`頭部~surrogate@enc ⇒ ~NULL ／`頭部~surrogate$ — 初期~時は ~NULL とする。 ◎ leading surrogate • Null or a leading surrogate, initially null.

注記： `TextEncoderStream$I ~objは `UTF-8$n しか~supportしないので、 %~label 引数を提供しない。 ◎ A TextEncoderStream object offers no label argument as it only supports UTF-8.

%encoder = `new TextEncoderStream()$m

新たな `TextEncoderStream$I ~objを返す。 ◎ Returns a new TextEncoderStream object.

%encoder . `~encoding0$m

`utf-8^l を返す。 ◎ Returns "utf-8".

%encoder . `readable$m

`可読~stream$を返す。その各`~chunk$は、 `writable$m に書込まれた~chunkたちに対し `UTF-8$n の`符号化器$を走らせた結果の `Uint8Array$I になる。 ◎ Returns a readable stream whose chunks are Uint8Arrays resulting from running UTF-8’s encoder on the chunks written to writable.

%encoder . `writable$m

`可書~stream$を返す。それは、文字列~chunkたちを受容して — `readable$m に可用にされる前に — `UTF-8$n の`符号化器$にかける。 ◎ Returns a writable stream which accepts string chunks and runs them through UTF-8’s encoder before making them available to readable.

%textReadable
  .pipeThrough(new TextEncoderStream())
  .pipeTo(%byteWritable);

`new TextEncoderStream()@m 構築子~手続きは： ◎ The new TextEncoderStream() constructor steps are:

コレの`符号化器$enc ~SET `UTF-8$n の`符号化器$の新たな~instance ◎ Set this’s encoder to an instance of the UTF-8 encoder.
%形式変換~stream ~SET `新たな~obj$( `TransformStream$I, コレに`関連な~realm$ ) ◎ ↓
%形式変換~stream を`設定しておく$TS — 次を与える下で：
- `形式変換~algo^i ~SET 所与の ( %~chunk ) に対し，次を走らす~algo ⇒ `~chunkを符号化して~enqueueする$( コレ, %~chunk )
- `書出n~algo^i ~SET 次を走らす~algo ⇒ `符号化して書出す$( コレ )
◎ Let transformAlgorithm be an algorithm which takes a chunk argument and runs the encode and enqueue a chunk algorithm with this and chunk. ◎ Let flushAlgorithm be an algorithm which runs the encode and flush algorithm with this. ◎ Let transformStream be a new TransformStream. ◎ Set up transformStream with transformAlgorithm set to transformAlgorithm and flushAlgorithm set to flushAlgorithm.
コレの`形式変換$ ~SET %形式変換~stream ◎ Set this’s transform to transformStream.

`~chunkを符号化して~enqueueする@ ~algoは、所与の ( `TextEncoderStream$I ~obj %符号化器, %~chunk ) に対し： ◎ The encode and enqueue a chunk algorithm, given a TextEncoderStream object encoder and chunk, runs these steps:

%入力 ~LET `~IDL値に変換する$( %~chunk, `DOMString^I ) ◎ Let input be the result of converting chunk to a DOMString.
%入力 ~SET `入出力~queueに変換する$( %入力 ) ◎ Convert input to an I/O queue of code units.

注記： `DOMString^I 型から変換しているので、結果の`入出力~queue$の~item型は，~scalar値ではなく`~cu$になる。そのようにしているのは、［ 2 つの~chunkに分割された~surrogate~pairを，適切な~scalar値に組立直せるようにする］ためであり，他の挙動は `USVString^I と一致する。特に，~~孤立した~surrogateは `FFFD^U1 に置換されることになる。 ◎ DOMString, as well as an I/O queue of code units rather than scalar values, are used here so that a surrogate pair that is split between chunks can be reassembled into the appropriate scalar value. The behavior is otherwise identical to USVString. In particular, lone surrogates will be replaced with U+FFFD (�).
%出力 ~LET `入出力~queue$`~byte^tA « `~EoQ$ » ◎ Let output be the I/O queue of bytes « end-of-queue ».
~WHILE 無条件： ◎ While true:
1. %~item ~LET `入出力~queueから~itemを読取る$( %入力 ) ◎ Let item be the result of reading from input.
2. ~IF［ %~item ~EQ `~EoQ$ ］： ◎ If item is end-of-queue:
  1. %出力 ~SET %出力を~byte列に`変換する$ ◎ Convert output into a byte sequence.
  2. ~IF［ %出力は`空$でない］： ◎ If output is not empty:
    1. %~chunk ~LET `~buffer~sourceを作成する$( `Uint8Array$I, %出力, %符号化器に`関連な~realm$ ) ◎ Let chunk be the result of creating a Uint8Array object given output and encoder’s relevant realm.
    2. %符号化器の`形式変換$に`~chunkを~enqueueする$TS( %~chunk ) ◎ Enqueue chunk into encoder’s transform.
  3. ~RET ◎ Return.
3. %結果 ~LET `~cuを~scalar値に変換する$( %符号化器, %~item, %入力 ) ◎ Let result be the result of executing the convert code unit to scalar value algorithm with encoder, item and input.
4. ~IF［ %結果 ~NEQ `継続-$i ］ ⇒ `~itemを処理する$( %結果, %符号化器の`符号化器$enc, %入力, %出力, `fatal^l ) ◎ If result is not continue, then process an item with result, encoder’s encoder, input, output, and "fatal".

`~cuを~scalar値に変換する@ ~algoは、所与の ( `TextEncoderStream$I ~obj %符号化器, `~cu$ %~item, `入出力~queue$`~cu^tA %入力 ) に対し： ◎ The convert code unit to scalar value algorithm, given a TextEncoderStream object encoder, a code unit item, and an I/O queue of code units input, runs these steps:

~IF［ %符号化器の`頭部~surrogate$enc ~NEQ ~NULL ］： ◎ If encoder’s leading surrogate is non-null:
1. %頭部~surrogate ~LET %符号化器の`頭部~surrogate$enc ◎ Let leadingSurrogate be encoder’s leading surrogate.
2. %符号化器の`頭部~surrogate$enc ~SET ~NULL ◎ Set encoder’s leading surrogate to null.
3. ~IF［ %~item は`尾部~surrogate$である］ ⇒ ~RET `~surrogate対から~scalar値を得する$( %頭部~surrogate, %~item ) ◎ If item is a trailing surrogate, then return a scalar value from surrogates given leadingSurrogate and item.
4. `入出力~queueに格納し直す$( %入力, %~item ) ◎ Restore item to input.
5. ~RET `FFFD^U1 ◎ Return U+FFFD (�).
~IF［ %~item は`頭部~surrogate$である］ ⇒＃ %符号化器の`頭部~surrogate$enc ~SET %~item； ~RET `継続-$i ◎ If item is a leading surrogate, then set encoder’s leading surrogate to item and return continue.
~IF［ %~item は`尾部~surrogate$である］ ⇒ ~RET `FFFD^U1 ◎ If item is a trailing surrogate, then return U+FFFD (�).
~RET %~item ◎ Return item.

注記：これは， `INFRA$r による［ `文字列$を`~scalar値~文字列$に`変換する~algo＠~INFRA#javascript-string-convert$ ］と等価になるが、 2 つの文字列に分割された~surrogate~pairも許容する。 `INFRA$r ◎ This is equivalent to the "convert a string into a scalar value string" algorithm from the Infra Standard, but allows for surrogate pairs that are split between strings. [INFRA]

`符号化して書出す@ ~algoは、所与の ( `TextEncoderStream$I ~obj %符号化器 ) に対し： ◎ The encode and flush algorithm, given a TextEncoderStream object encoder, runs these steps:

~IF［ %符号化器の`頭部~surrogate$enc ~NEQ ~NULL ］： ◎ If encoder’s leading surrogate is non-null:
1. %~byte列 ~LET ~byte列 `EF^X `BF^X `BD^X
  
  注記：これは、 `FFFD^U1 を成す~UTF-8~byte列である。
  ◎ ↓
2. %~chunk ~LET `~buffer~sourceを作成する$( `Uint8Array$I, %~byte列, %符号化器に`関連な~realm$ ) ◎ Let chunk be the result of creating a Uint8Array object given « 0xEF, 0xBF, 0xBD » and encoder’s relevant realm. ◎ This is U+FFFD (�) in UTF-8 bytes.
3. %符号化器の`形式変換$に`~chunkを~enqueueする$TS( %~chunk ) ◎ Enqueue chunk into encoder’s transform.

8. ~~標準の符号化法

【この “~~標準の” は “The” の対訳であり、およそ， “規範とされるべき唯一無二の” を意味する。】

8.1. ~UTF-8

8.1.1. ~UTF-8復号器

注記： ~BOMは、 ~labelより~~優先される — その方が配備-済みな内容において正確aになるものと見出されたので。したがって，それは、 `~UTF-8復号器$~algoの一部を成さない — 代わりに［ `~Unicodeに復号する$／`~UTF-8復号する$ ］~algoの一部を成す。 ◎ A byte order mark has priority over a label as it has been found to be more accurate in deployed content. Therefore it is not part of the UTF-8 decoder algorithm, but rather the decode and UTF-8 decode algorithms.

各［ `UTF-8$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ UTF-8’s decoder has an associated:

`~UTF-8~cp@ ⇒ ある無符号整数 — 初期~時は 0 とする。 ◎ UTF-8 code point • ↓
`~UTF-8出現~byte数@ ⇒ ある無符号整数 — 初期~時は 0 とする。 ◎ UTF-8 bytes seen • ↓
`~UTF-8要~byte数@ ⇒ ある無符号整数 — 初期~時は 0 とする。 ◎ UTF-8 bytes needed • Each a number, initially 0.
`~UTF-8下限@ ⇒ ある~byte — 初期~時は `80^X とする。 ◎ UTF-8 lower boundary • A byte, initially 0x80.
`~UTF-8上限@ ⇒ ある~byte — 初期~時は `BF^X とする。 ◎ UTF-8 upper boundary • A byte, initially 0xBF.

`UTF-8$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ UTF-8’s decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］~AND［ `~UTF-8要~byte数$ ~NEQ 0 ］ ⇒＃ `~UTF-8要~byte数$ ~SET 0； ~RET `~error$i ◎ If byte is end-of-queue and UTF-8 bytes needed is not 0, then set UTF-8 bytes needed to 0 and return error.
~IF［ %~byte ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue, then return finished.
~IF［ `~UTF-8要~byte数$ ~EQ 0 ］： ◎ If UTF-8 bytes needed is 0,＼
1. %~byte に応じて： ◎ based on byte:
  - `00^X 〜 `7F^X ⇒ ~RET ~cp « %~byte » ◎ 0x00 to 0x7F • Return a code point whose value is byte.
  - `C2^X 〜 `DF^X： ◎ 0xC2 to 0xDF
    1. `~UTF-8要~byte数$ ~SET 1 ◎ Set UTF-8 bytes needed to 1.
    2. `~UTF-8~cp$ ~SET %~byte ~bAND `1F^X （ %~byte の下位 5 ~bit ） ◎ Set UTF-8 code point to byte & 0x1F. ◎ The five least significant bits of byte.
  - `E0^X 〜 `EF^X： ◎ 0xE0 to 0xEF
    1. ~IF［ %~byte ~EQ `E0^X ］ ⇒ `~UTF-8下限$ ~SET `A0^X ◎ If byte is 0xE0, then set UTF-8 lower boundary to 0xA0.
    2. ~IF［ %~byte ~EQ `ED^X ］ ⇒ `~UTF-8上限$ ~SET `9F^X ◎ If byte is 0xED, then set UTF-8 upper boundary to 0x9F.
    3. `~UTF-8要~byte数$ ~SET 2 ◎ Set UTF-8 bytes needed to 2.
    4. `~UTF-8~cp$ ~SET %~byte ~bAND `F^X （ %~byte の下位 4 ~bit ） ◎ Set UTF-8 code point to byte & 0xF. ◎ The four least significant bits of byte.
  - `F0^X 〜 `F4^X： ◎ 0xF0 to 0xF4
    1. ~IF［ %~byte ~EQ `F0^X ］ ⇒ `~UTF-8下限$ ~SET `90^X ◎ If byte is 0xF0, then set UTF-8 lower boundary to 0x90.
    2. ~IF［ %~byte ~EQ `F4^X ］ ⇒ `~UTF-8上限$ ~SET `8F^X ◎ If byte is 0xF4, then set UTF-8 upper boundary to 0x8F.
    3. `~UTF-8要~byte数$ ~SET 3 ◎ Set UTF-8 bytes needed to 3.
    4. `~UTF-8~cp$ ~SET %~byte ~bAND `7^X （ %~byte の下位 3 ~bit ） ◎ Set UTF-8 code point to byte & 0x7. ◎ The three least significant bits of byte.
  - その他 ⇒ ~RET `~error$i ◎ Otherwise • Return error.
2. ~RET `継続-$i ◎ Return continue.
~IF［ %~byte ~NIN { `~UTF-8下限$ 〜 `~UTF-8上限$ } ］： ◎ If byte is not in the range UTF-8 lower boundary to UTF-8 upper boundary, inclusive:
1. ( `~UTF-8~cp$, `~UTF-8要~byte数$, `~UTF-8出現~byte数$ ) ~SET ( 0, 0, 0 ) ◎ Set UTF-8 code point, UTF-8 bytes needed, and UTF-8 bytes seen to 0,＼
2. ( `~UTF-8下限$, `~UTF-8上限$ ) ~SET ( `80^X, `BF^X ) ◎ set UTF-8 lower boundary to 0x80, and set UTF-8 upper boundary to 0xBF.
3. `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ Restore byte to ioQueue.
4. ~RET `~error$i ◎ Return error.
( `~UTF-8下限$, `~UTF-8上限$ ) ~SET ( `80^X, `BF^X ) ◎ Set UTF-8 lower boundary to 0x80 and UTF-8 upper boundary to 0xBF.
`~UTF-8~cp$ ~SET (`~UTF-8~cp$ ~Lshift 6) ~bOR (%~byte ~bAND `3F^X) ◎ Set UTF-8 code point to (UTF-8 code point << 6) | (byte & 0x3F)

注記： `~UTF-8~cp$内の既存の~bitを左へ 6 ~bit ~shiftして，~~空いた下位 6 ~bitに %~byte の下位 6 ~bitをあてがう。 ◎ Shift the existing bits of UTF-8 code point left by six places and set the newly-vacated six least significant bits to the six least significant bits of byte.
`~UTF-8出現~byte数$ ~INCBY 1 ◎ Increase UTF-8 bytes seen by one.
~IF［ `~UTF-8出現~byte数$ ~NEQ `~UTF-8要~byte数$ ］ ⇒ ~RET `継続-$i ◎ If UTF-8 bytes seen is not equal to UTF-8 bytes needed, then return continue.
%~cp ~LET `~UTF-8~cp$ ◎ Let codePoint be UTF-8 code point.
( `~UTF-8~cp$, `~UTF-8要~byte数$, `~UTF-8出現~byte数$ ) ~SET ( 0, 0, 0 ) ◎ Set UTF-8 code point, UTF-8 bytes needed, and UTF-8 bytes seen to 0.
~RET ~cp « %~cp » ◎ Return a code point whose value is codePoint.

注記： `~UTF-8復号器$における上の拘束は、 ~Unicode標準の “`Best Practices for Using U+FFFD^en” に準じる。他の挙動は、 Encoding 標準の下では許可されない（同じ結果を達成するなら、他の~algoでも~~十分であり，むしろ奨励される）。 `UNICODE$r ◎ The constraints in the UTF-8 decoder above match “Best Practices for Using U+FFFD” from the Unicode standard. No other behavior is permitted per the Encoding Standard (other algorithms that achieve the same result are fine, even encouraged). [UNICODE]

8.1.2. ~UTF-8符号化器

`UTF-8$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ UTF-8’s encoder’s handler, given unused and codePoint, runs these steps:

~IF［ %~cp ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If codePoint is end-of-queue, then return finished.
~IF［ %~cp ~IN `~ASCII~cp$ ］ ⇒ ~RET ~byte « %~cp » ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint.
( %count, %~offset ) ~SET %~cp が属する範囲に応じて ⇒＃ { `0080^U 〜 `07FF^U } ならば ( 1, `C0^X ) ／ { `0800^U 〜 `FFFF^U } ならば ( 2, `E0^X ) ／ { `10000^U 〜 `10FFFF^U } ならば ( 3, `F0^X ) ◎ Set count and offset based on the range codePoint is in: ◎ U+0080 to U+07FF, inclusive • 1 and 0xC0 U+0800 to U+FFFF, inclusive • 2 and 0xE0 U+10000 to U+10FFFF, inclusive • 3 and 0xF0
%~byte列 ~LET ~byte « ( %~cp ~Rshift ( 6 ~MUL %count ) ) ~PLUS %~offset » ◎ Let bytes be a byte sequence whose first byte is (codePoint >> (6 × count)) + offset.
~WHILE［ %count ~GT 0 ］： ◎ While count is greater than 0:
1. %temp ~SET %~cp ~Rshift ( 6 ~MUL ( %count ~MINUS 1 ) ) ◎ Set temp to codePoint >> (6 × (count − 1)).
2. %~byte列に ( `80^X ~bOR ( %temp ~bAND `3F^X ) ) を付加する ◎ Append to bytes 0x80 | (temp & 0x3F).
3. %count ~DECBY 1 ◎ Decrease count by one.
~RET %~byte列 ◎ Return bytes bytes, in order.

注記：この~algoは、 ~Unicode標準に述べられるものと一致する結果を得るが，完全さのためここに含められている。 `UNICODE$r ◎ This algorithm has identical results to the one described in the Unicode standard. It is included here for completeness. [UNICODE]

9. 旧来の単-~byte符号化法

`符号化法$のうち［各~byteが， 1 個の~cpに対応するか, どの~cpにも対応しないもの］は、 `単-~byte符号化法@ と総称される。すべての`単-~byte符号化法$が、同じ［ `復号器$, `符号化器$ ］を共有する。 `単-~byte復号器$／`単-~byte符号化器$から参照される `単-~byte索引@ は、利用-中にある`単-~byte符号化法$に依存し，次の表tで定義される。［ `ISO-8859-8^n, `ISO-8859-8-I^n ］を除くすべての`単-~byte符号化法$は、それぞれに一意な`索引$を持つ。 ◎ An encoding where each byte is either a single code point or nothing, is a single-byte encoding. Single-byte encodings share the decoder and encoder. Index single-byte, as referenced by the single-byte decoder and single-byte encoder, is defined by the following table, and depends on the single-byte encoding in use. All but two single-byte encodings have a unique index.

【被覆域の~tableは巨大なことに注意。】【視覚-化~tableの各~cellの色については、 `凡例＠#visualization$を見よ。】

`名前$	`索引$	視覚-化	基本多言語面（ BMP ）の被覆域
`IBM866@n	`IBM866$idx
`ISO-8859-2@n	`ISO-8859-2$idx
`ISO-8859-3@n	`ISO-8859-3$idx
`ISO-8859-4@n	`ISO-8859-4$idx
`ISO-8859-5@n	`ISO-8859-5$idx
`ISO-8859-6@n	`ISO-8859-6$idx
`ISO-8859-7@n	`ISO-8859-7$idx
`ISO-8859-8@n	`ISO-8859-8$idx
`ISO-8859-8-I@n	`ISO-8859-8$n と同じ
`ISO-8859-10@n	`ISO-8859-10$idx
`ISO-8859-13@n	`ISO-8859-13$idx
`ISO-8859-14@n	`ISO-8859-14$idx
`ISO-8859-15@n	`ISO-8859-15$idx
`ISO-8859-16@n	`ISO-8859-16$idx
`KOI8-R@n	`KOI8-R$idx
`KOI8-U@n	`KOI8-U$idx
`macintosh@n	`macintosh$idx
`windows-874@n	`windows-874$idx
`windows-1250@n	`windows-1250$idx
`windows-1251@n	`windows-1251$idx
`windows-1252@n	`windows-1252$idx
`windows-1253@n	`windows-1253$idx
`windows-1254@n	`windows-1254$idx
`windows-1255@n	`windows-1255$idx
`windows-1256@n	`windows-1256$idx
`windows-1257@n	`windows-1257$idx
`windows-1258@n	`windows-1258$idx
`x-mac-cyrillic@n	`x-mac-cyrillic$idx

注記： ~layout方向に波及することから、 `ISO-8859-8$n と `ISO-8859-8-I$n の`符号化法$の`名前$は異なるものにされている。歴史的に、このことは `ISO-8859-6$n と "ISO-8859-6-I" についても該当していたが、それはもはや成立しない。【！ https://www.w3.org/Bugs/Public/show_bug.cgi?id=19505 】 ◎ ISO-8859-8 and ISO-8859-8-I are distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and "ISO-8859-6-I" as well, that is no longer true.

9.1. 単-~byte復号器

`単-~byte符号化法$の`復号器$の`~handler$は、所与の ( %利用されない~queue, %~byte ) に対し： ◎ Single-byte encodings’s decoder’s handler, given unused and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue, then return finished.
~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ ~RET ~cp « %~byte » ◎ If byte is an ASCII byte, then return a code point whose value is byte.
%~cp ~LET `単-~byte索引$ の中で ( %~byte ~MINUS `80^X ) が指す`索引~cp$ ◎ Let codePoint be the index code point for byte − 0x80 in index single-byte.
~IF［ %~cp ~EQ ~NULL ］ ⇒ ~RET `~error$i ◎ If codePoint is null, then return error.
~RET ~cp « %~cp » ◎ Return a code point whose value is codePoint.

9.2. 単-~byte符号化器

`単-~byte符号化法$ の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ Single-byte encodings’s encoder’s handler, given unused and codePoint, runs these steps:

~IF［ %~cp ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If codePoint is end-of-queue, then return finished.
~IF［ %~cp ~IN `~ASCII~cp$ ］ ⇒ ~RET ~byte « %~cp » ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint.
%~pointer ~LET `単-~byte索引$ の中で %~cp を指す`索引~pointer$ ◎ Let pointer be the index pointer for codePoint in index single-byte.
~IF［ %~pointer ~EQ ~NULL ］ ⇒ ~RET `~error$i( %~cp ) ◎ If pointer is null, then return error with codePoint.
~RET ~byte « %~pointer ~PLUS `80^X » ◎ Return a byte whose value is pointer + 0x80.

10. 旧来の複-~byte~Chinese（簡体字）符号化法

10.1. ~GBK

10.1.1. ~GBK復号器

`GBK$n の`復号器$は，`gb18030$n の`復号器$である。 ◎ GBK’s decoder is gb18030’s decoder.

10.1.2. ~GBK符号化器

`GBK$n の符号化器は，［ `~GBK用か$ ~SET ~T ］にされた`gb18030$n の`符号化器$である。 ◎ GBK’s encoder is gb18030’s encoder with its is GBK set to true.

注記： `GBK$n を`gb18030$nに対する全くの別名にしないのは、 `GBK$n の`符号化器$により生成された内容を［旧来の~serverや他の消費器を非互換化する機会cを減らすよう，保守的に移行する］ためである。 ◎ Not fully aliasing GBK with gb18030 is a conservative move to decrease the chances of breaking legacy servers and other consumers of content generated with GBK’s encoder.

10.2. ~gb18030

10.2.1. ~gb18030復号器

各［ `gb18030$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ gb18030’s decoder has an associated:

`~gb1@ ⇒ ある~byte — 初期~時は 0 とする。 ◎ gb18030 first • ↓
`~gb2@ ⇒ ある~byte — 初期~時は 0 とする。 ◎ gb18030 second • ↓
`~gb3@ ⇒ ある~byte — 初期~時は 0 とする。 ◎ gb18030 third • Each a byte, initially 0x00.

`gb18030$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ gb18030’s decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］： ◎ ↓
1. ~IF［ ( `~gb1$, `~gb2$, `~gb3$ ) ~EQ ( `00^X, `00^X, `00^X ) ］ ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue and gb18030 first, gb18030 second, and gb18030 third are 0x00, then return finished.
2. ( `~gb1$, `~gb2$, `~gb3$ ) ~SET ( `00^X, `00^X, `00^X ) ◎ If byte is end-of-queue, and gb18030 first, gb18030 second, or gb18030 third is not 0x00, then set gb18030 first, gb18030 second, and gb18030 third to 0x00, and＼
3. ~RET `~error$i ◎ return error.
~IF［ `~gb3$ ~NEQ `00^X ］： ◎ If gb18030 third is not 0x00:
1. ~IF［ %~byte ~NIN { `30^X 〜 `39^X } ］： ◎ If byte is not in the range 0x30 to 0x39, inclusive:
  1. `入出力~queueに格納し直す$( %入出力~queue, ~byte列 « `~gb2$, `~gb3$, %~byte » ) ◎ Restore « gb18030 second, gb18030 third, byte » to ioQueue.
  2. ( `~gb1$, `~gb2$, `~gb3$ ) ~SET ( `00^X, `00^X, `00^X ) ◎ Set gb18030 first, gb18030 second, and gb18030 third to 0x00.
  3. ~RET `~error$i ◎ Return error.
2. %~cp ~LET 次に与える~pointerが指す`索引~gb18030範囲~群~cp$ ⇒ (( `~gb1$ ~MINUS `81^X ) ~MUL ( 10 ~MUL 126 ~MUL 10 )) ~PLUS (( `~gb2$ ~MINUS `30^X ) ~MUL ( 10 ~MUL 126 )) ~PLUS (( `~gb3$ ~MINUS `81^X ) ~MUL 10 ) ~PLUS ( %~byte ~MINUS `30^X ) ◎ Let codePoint be the index gb18030 ranges code point for ((gb18030 first − 0x81) × (10 × 126 × 10)) + ((gb18030 second − 0x30) × (10 × 126)) + ((gb18030 third − 0x81) × 10) + byte − 0x30.
3. ( `~gb1$, `~gb2$, `~gb3$ ) ~SET ( `00^X, `00^X, `00^X ) ◎ Set gb18030 first, gb18030 second, and gb18030 third to 0x00.
4. ~IF［ %~cp ~EQ ~NULL ］ ⇒ ~RET `~error$i ◎ If codePoint is null, then return error.
5. ~RET ~cp « %~cp » ◎ Return a code point whose value is codePoint.
~IF［ `~gb2$ ~NEQ `00^X ］： ◎ If gb18030 second is not 0x00:
1. ~IF［ %~byte ~IN { `81^X 〜 `FE^X } ］ ⇒＃ `~gb3$ ~SET %~byte ； ~RET `継続-$i ◎ If byte is in the range 0x81 to 0xFE, inclusive, then set gb18030 third to byte and return continue.
2. `入出力~queueに格納し直す$( %入出力~queue, ~byte列 « `~gb2$, %~byte » ) ◎ Restore « gb18030 second, byte » to ioQueue,＼
3. ( `~gb1$, `~gb2$ ) ~SET ( `00^X, `00^X ) ◎ set gb18030 first and gb18030 second to 0x00, and＼
4. ~RET `~error$i ◎ return error.
~IF［ `~gb1$ ~NEQ `00^X ］： ◎ If gb18030 first is not 0x00:
1. ~IF［ %~byte ~IN { `30^X 〜 `39^X } ］ ⇒＃ `~gb2$ ~SET %~byte ； ~RET `継続-$i ◎ If byte is in the range 0x30 to 0x39, inclusive, then set gb18030 second to byte and return continue.
2. %頭部 ~LET `~gb1$ ◎ Let leading be gb18030 first.
3. `~gb1$ ~SET `00^X ◎ Set gb18030 first to 0x00.
4. %~pointer ~LET ~NULL ◎ Let pointer be null.
5. %~offset ~LET ［ %~byte ~IN { `00^X 〜 `7E^X } ならば `40^X ／ ~ELSE_ `41^X ］ ◎ Let offset be 0x40 if byte is less than 0x7F; otherwise 0x41.
6. ~IF［ %~byte ~IN { `40^X 〜 `7E^X, `80^X 〜 `FE^X } ］ ⇒ %~pointer ~SET ( %頭部 ~MINUS `81^X ) ~MUL 190 ~PLUS ( %~byte ~MINUS %~offset ) ◎ If byte is in the range 0x40 to 0x7E, inclusive, or 0x80 to 0xFE, inclusive, then set pointer to (leading − 0x81) × 190 + (byte − offset).
7. %~cp ~LET %~pointer に応じて ⇒ ~NULL ならば ~NULL ／ ~ELSE_ `索引~gb18030$ の中で %~pointer が指す`索引~cp$ ◎ Let codePoint be null if pointer is null; otherwise the index code point for pointer in index gb18030.
8. ~IF［ %~cp ~NEQ ~NULL ］ ⇒ ~RET ~cp « %~cp » ◎ If codePoint is non-null, then return a code point whose value is codePoint.
9. ~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ If byte is an ASCII byte, then restore byte to ioQueue.
10. ~RET `~error$i ◎ Return error.
~IF［ %~byte ~IN { `81^X 〜 `FE^X } ］ ⇒＃ `~gb1$ ~SET %~byte ； ~RET `継続-$i ◎ ↓
~RET %~byte に応じて ⇒＃ `~ASCII~byte$ならば ~cp « %~byte » ／ `80^X ならば ~cp « `20AC^U1 » ／ `FF^X ならば `~error$i ◎ If byte is an ASCII byte, then return a code point whose value is byte. ◎ If byte is 0x80, then return code point U+20AC (€). ◎ If byte is in the range 0x81 to 0xFE, inclusive, then set gb18030 first to byte and return continue. ◎ Return error.

10.2.2. ~gb18030符号化器

各［ `gb18030$n の`符号化器$ ］には、次に挙げるものが結付けられる： ◎ gb18030’s encoder has an associated＼

`~GBK用か@ ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ is GBK, which is a boolean, initially false.

`gb18030$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ gb18030’s encoder’s handler, given unused and codePoint, runs these steps:

~IF［ %~cp ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If codePoint is end-of-queue, then return finished.
~IF［ %~cp ~IN `~ASCII~cp$ ］ ⇒ ~RET ~byte « %~cp » ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint.
~IF［ %~cp ~EQ `E5E5^U ］ ⇒ ~RET `~error$i( %~cp ) ◎ If codePoint is U+E5E5, then return error with codePoint.

注記：配備-済みな内容との互換性を得るため、 `索引~gb18030$ は［ `A3^X `A0^X ］を `E5E5^U ではなく `3000^U `IDEOGRAPHIC SPACE^cn へ対応付けている。したがって、それは往復し得ない。 ◎ Index gb18030 maps 0xA3 0xA0 to U+3000 IDEOGRAPHIC SPACE rather than U+E5E5 for compatibility with deployed content. Therefore it cannot roundtrip.
~IF［ `~GBK用か$ ~EQ ~T ］~AND［ %~cp ~EQ `20AC^U1 ］ ⇒ ~RET ~byte « `80^X » ◎ If is GBK is true and codePoint is U+20AC (€), then return byte 0x80.

下の表tを成す ~EACH( %行 ) に対し ⇒ ~IF［ %~cp ~EQ %行の 1 列目に挙げられる~cp ］ ⇒ ~RET %行の 2 列目に挙げられる~byte列 ◎ If there is a row in the table below whose first column is codePoint, then return the two bytes on the same row listed in the second column:

~cp	~byte列
`E78D^U	`A6^X `D9^X
`E78E^U	`A6^X `DA^X
`E78F^U	`A6^X `DB^X
`E790^U	`A6^X `DC^X
`E791^U	`A6^X `DD^X
`E792^U	`A6^X `DE^X
`E793^U	`A6^X `DF^X
`E794^U	`A6^X `EC^X
`E795^U	`A6^X `ED^X
`E796^U	`A6^X `F3^X
`E81E^U	`FE^X `59^X
`E826^U	`FE^X `61^X
`E82B^U	`FE^X `66^X
`E82C^U	`FE^X `67^X
`E832^U	`FE^X `6D^X
`E843^U	`FE^X `7E^X
`E854^U	`FE^X `90^X
`E864^U	`FE^X `A0^X

注記：この非対称な符号化器~表tは、 GB18030-2005 標準との互換性を保全する。 `索引~gb18030範囲~群$における説明も見よ。 ◎ This asymmetric encoder table preserves compatibility with the GB18030-2005 standard. See also the explanation at index gb18030 ranges.

%~pointer ~LET `索引~gb18030$ の中で %~cp を指す`索引~pointer$ ◎ Let pointer be the index pointer for codePoint in index gb18030.
~IF［ %~pointer ~NEQ ~NULL ］： ◎ If pointer is non-null:
1. %頭部 ~LET ( %~pointer ~DIV 190 ) ~PLUS `81^X ◎ Let leading be pointer / 190 + 0x81.
2. %尾部 ~LET %~pointer ~MOD 190 ◎ Let trailing be pointer % 190.
3. %~offset ~LET ［ %尾部 ~IN { `00^X 〜 `3E^X } ならば `40^X【！0x7F-0x40 】／ ~ELSE_ `41^X ］ ◎ Let offset be 0x40 if trailing is less than 0x3F, otherwise 0x41.
4. ~RET ~byte列 « %頭部, ( %尾部 ~PLUS %~offset ) » ◎ Return two bytes whose values are leading and trailing + offset.
~IF［ `~GBK用か$ ~EQ ~T ］ ⇒ ~RET `~error$i( %~cp ) ◎ If is GBK is true, then return error with codePoint.
%~pointer ~SET %~cp を指す`索引~gb18030範囲~群~pointer$ ◎ Set pointer to the index gb18030 ranges pointer for codePoint.
%byte1 ~LET %~pointer ~DIV ( 10 ~MUL 126 ~MUL 10 ) ◎ Let byte1 be pointer / (10 × 126 × 10).
%~pointer ~SET %~pointer ~MOD ( 10 ~MUL 126 ~MUL 10 ) ◎ Set pointer to pointer % (10 × 126 × 10).
%byte2 ~LET %~pointer ~DIV ( 10 ~MUL 126 ) ◎ Let byte2 be pointer / (10 × 126).
%~pointer ~SET %~pointer ~MOD ( 10 ~MUL 126 ) ◎ Set pointer to pointer % (10 × 126).
%byte3 ~LET %~pointer ~DIV 10 ◎ Let byte3 be pointer / 10.
%byte4 ~LET %~pointer ~MOD 10 ◎ Let byte4 be pointer % 10.
~RET ~byte列 « ( %byte1 ~PLUS `81^X ), ( %byte2 ~PLUS `30^X ), ( %byte3 ~PLUS `81^X ), ( %byte4 ~PLUS `30^X ) » ◎ Return four bytes whose values are byte1 + 0x81, byte2 + 0x30, byte3 + 0x81, byte4 + 0x30.

11. 旧来の複-~byte~Chinese（繁体字）符号化法

【！ Leading: 0x81 to 0xFE ／ Trailing: 0x40 to 0x7E or 0xA1 to 0xFE 】

11.1. ~Big5

11.1.1. ~Big5復号器

各［ `Big5$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ Big5’s decoder has an associated＼

`~Big5頭部@ ⇒ ある~byte — 初期~時は `00^X とする。 ◎ Big5 leading, which is a byte, initially 0x00.

`Big5$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ Big5’s decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］：
1. ~IF［ `~Big5頭部$ ~NEQ `00^X ］ ⇒＃ `~Big5頭部$ ~SET `00^X ； ~RET `~error$i
2. ~RET `完遂d$i
◎ If byte is end-of-queue and Big5 leading is not 0x00, then set Big5 leading to 0x00 and return error. ◎ If byte is end-of-queue and Big5 leading is 0x00, then return finished.

~IF［ `~Big5頭部$ ~NEQ `00^X ］： ◎ If Big5 leading is not 0x00:

%頭部 ~LET `~Big5頭部$ ◎ Let leading be Big5 leading.
`~Big5頭部$ ~SET `00^X ◎ Set Big5 leading to 0x00.
%~pointer ~LET ~NULL ◎ Let pointer be null.
%~offset ~LET ［ %~byte ~IN { `00^X 〜 `7E^X } ならば `40^X ／ ~ELSE_ `62^X 【！ 0x62 = 0xA1-0x7E+1+0x40 】］ ◎ Let offset be 0x40 if byte is less than 0x7F; otherwise 0x62.
~IF［ %~byte ~IN { `40^X 〜 `7E^X, `A1^X 〜 `FE^X } ］ ⇒ %~pointer ~SET ( %頭部 ~MINUS `81^X ) ~MUL 157 ~PLUS ( %~byte ~MINUS %~offset ) ◎ If byte is in the range 0x40 to 0x7E, inclusive, or 0xA1 to 0xFE, inclusive, then set pointer to (leading − 0x81) × 157 + (byte − offset).

~IF［下の表tの中で， 1 列目が %~pointer に等しい行がある］ ⇒ ~RET 同じ行の 2 列目に与える~cp列（ `2 個の^em ~cpからなる） ◎ If there is a row in the table below whose first column is pointer, then return the two code points listed in its second column (the third column is irrelevant):

【！ https://www.unicode.org/Public/UNIDATA/NamedSequences.txt 】

~pointer	~cp	注記（この段には関連しない）
1133【！ 0x88 0x62 】	`00CA^U `0304^U	Ê̄ ( `LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND MACRON^cn )
1135【！ 0x88 0x64 】	`00CA^U `030C^U	Ê̌ ( `LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND CARON^cn )
1164【！ 0x88 0xA3 】	`00EA^U `0304^U	ê̄ ( `LATIN SMALL LETTER E WITH CIRCUMFLEX AND MACRON^cn )
1166【！ 0x88 0xA5 】	`00EA^U `030C^U	ê̌ ( `LATIN SMALL LETTER E WITH CIRCUMFLEX AND CARON^cn )

【！~UA／環境／言語~codeによっては、~glyphが結合されず，正しく表示されないかもしれない。】【！lang=ja の下では正しく表示されない~UAがある】【！lang=en の下でも正しく表示しない~UAがある】【！文字参照（ê̌）を利用すると異なる表示になる~UAもある】 ◎ Pointer｜Code points｜Notes 1133｜U+00CA U+0304｜Ê̄ (LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND MACRON) 1135｜U+00CA U+030C｜Ê̌ (LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND CARON) 1164｜U+00EA U+0304｜ê̄ (LATIN SMALL LETTER E WITH CIRCUMFLEX AND MACRON) 1166｜U+00EA U+030C｜ê̌ (LATIN SMALL LETTER E WITH CIRCUMFLEX AND CARON)

注記： `索引$は単独の~cpに制限されるので、これらの~pointerにはこの表tが利用される。 ◎ Since indexes are limited to single code points this table is used for these pointers.

%~cp ~LET ［ %~pointer ~EQ ~NULL ならば ~NULL ／ ~ELSE_ `索引~Big5$ の中で %~pointer が指す`索引~cp$ ］ ◎ Let codePoint be null if pointer is null; otherwise the index code point for pointer in index Big5.
~IF［ %~cp ~NEQ ~NULL ］ ⇒ ~RET ~cp « %~cp » ◎ If codePoint is non-null, then return a code point whose value is codePoint.
~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ If byte is an ASCII byte, restore byte to ioQueue.
~RET `~error$i ◎ Return error.

~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ ~RET ~cp « %~byte » ◎ If byte is an ASCII byte, then return a code point whose value is byte.
~IF［ %~byte ~IN { `81^X 〜 `FE^X } ］ ⇒＃ `~Big5頭部$ ~SET %~byte ； ~RET `継続-$i ◎ If byte is in the range 0x81 to 0xFE, inclusive, then set Big5 leading to byte and return continue.
~RET `~error$i ◎ Return error.

11.1.2. ~Big5符号化器

`Big5$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ Big5’s encoder’s handler, given unused and codePoint, runs these steps:

~IF［ %~cp ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If codePoint is end-of-queue, then return finished.
~IF［ %~cp ~IN `~ASCII~cp$ ］ ⇒ ~RET ~byte « %~cp » ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint.
%~pointer ~LET %~cp を指す`索引~Big5~pointer$ ◎ Let pointer be the index Big5 pointer for codePoint.
~IF［ %~pointer ~EQ ~NULL ］ ⇒ ~RET `~error$i( %~cp ) ◎ If pointer is null, then return error with codePoint.
%頭部 ~LET ( %~pointer ~DIV 157 ) ~PLUS `81^X ◎ Let leading be pointer / 157 + 0x81.
%尾部 ~LET %~pointer ~MOD 157 ◎ Let trailing be pointer % 157.
%~offset ~LET ［ %尾部 ~IN { `00^X 〜 `3E^X } ならば `40^X【！0x7F-0x40 】／ ~ELSE_ `62^X【！0xA1-0x3F 】］ ◎ Let offset be 0x40 if trailing is less than 0x3F, otherwise 0x62.
~RET ~byte列 « %頭部, ( %尾部 ~PLUS %~offset) »» ◎ Return two bytes whose values are leading and trailing + offset.

12. 旧来の複-~byte~Japanese符号化法

12.1. ~EUC-JP

【！ ~IANA-a/charset-reg/CP51932 】

12.1.1. ~EUC-JP復号器

各［ `EUC-JP$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ EUC-JP’s decoder has an associated:

`~EUC-JP~jis0212か@ ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ EUC-JP jis0212 • A boolean, initially false.
`~EUC-JP頭部@ ⇒ ある~byte — 初期~時は `00^X とする。 ◎ EUC-JP leading • A byte, initially 0x00.

`EUC-JP$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ EUC-JP’s decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］： ◎ ↓
1. ~IF［ `~EUC-JP頭部$ ~NEQ `00^X ］ ⇒＃ `~EUC-JP頭部$ ~SET `00^X ； ~RET `~error$i ◎ If byte is end-of-queue and EUC-JP leading is not 0x00, then set EUC-JP leading to 0x00 and return error.
2. ~ELSE ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue and EUC-JP leading is 0x00, then return finished.
~IF［ `~EUC-JP頭部$ ~EQ `8E^X ］~AND［ %~byte ~IN { `A1^X 〜 `DF^X } ］【！ katakana; subtraction is done first to avoid upsetting compilers 】 ⇒＃ `~EUC-JP頭部$ ~SET `00^X ； ~RET ~cp « `FF61^X ~MINUS `A1^X ~PLUS %~byte » ◎ If EUC-JP leading is 0x8E and byte is in the range 0xA1 to 0xDF, inclusive, then set EUC-JP leading to 0x00 and return a code point whose value is 0xFF61 − 0xA1 + byte.
~IF［ `~EUC-JP頭部$ ~EQ `8F^X ］~AND［ %~byte ~IN { `A1^X 〜 `FE^X } ］ ⇒＃ `~EUC-JP~jis0212か$ ~SET ~T ； `~EUC-JP頭部$ ~SET %~byte ； ~RET `継続-$i ◎ If EUC-JP leading is 0x8F and byte is in the range 0xA1 to 0xFE, inclusive, then set EUC-JP jis0212 to true, set EUC-JP leading to byte, and return continue.
~IF［ `~EUC-JP頭部$ ~NEQ `00^X ］： ◎ If EUC-JP leading is not 0x00:
1. %頭部 ~LET `~EUC-JP頭部$ ◎ Let leading be EUC-JP leading.
2. `~EUC-JP頭部$ ~SET `00^X ◎ Set EUC-JP leading to 0x00.
3. %~cp ~LET ~NULL ◎ Let codePoint be null.
4. ~IF［ %頭部 ~IN { `A1^X 〜 `FE^X } ］~AND［ %~byte ~IN { `A1^X 〜 `FE^X } ］：
  1. %索引 ~LET `~EUC-JP~jis0212か$に応じて ⇒＃ ~F ならば`索引~jis0208$ ／ ~T ならば `索引~jis0212$
  2. %~cp ~SET %索引の中で ( ( %頭部 ~MINUS `A1^X ) ~MUL 94 ~PLUS %~byte ~MINUS `A1^X ) が指す`索引~cp$
  ◎ If leading and byte are both in the range 0xA1 to 0xFE, inclusive, then set codePoint to the index code point for (leading − 0xA1) × 94 + byte − 0xA1 in index jis0208 if EUC-JP jis0212 is false and in index jis0212 otherwise.
5. `~EUC-JP~jis0212か$ ~SET ~F ◎ Set EUC-JP jis0212 to false.
6. ~IF［ %~cp ~NEQ ~NULL ］ ⇒ ~RET ~cp « %~cp » ◎ If codePoint is non-null, then return a code point whose value is codePoint.
7. ~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ If byte is an ASCII byte, then restore byte to ioQueue.
8. ~RET `~error$i ◎ Return error.
~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ ~RET ~cp « %~byte » ◎ If byte is an ASCII byte, then return a code point whose value is byte.
~IF［ %~byte ~IN { `8E^X, `8F^X, `A1^X 〜 `FE^X } ］ ⇒＃ `~EUC-JP頭部$ ~SET %~byte ； ~RET `継続-$i ◎ If byte is 0x8E, 0x8F, or in the range 0xA1 to 0xFE, inclusive, then set EUC-JP leading to byte and return continue.
~RET `~error$i ◎ Return error.

12.1.2. ~EUC-JP符号化器

`EUC-JP$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ EUC-JP’s encoder’s handler, given unused and codePoint, runs these steps:

%結果 ~LET %~cp に応じて ⇒＃ `~EoQ$ ならば `完遂d$i ／ `~ASCII~cp$ならば ~byte « %~cp » ／ `00A5^U1 ならば ~byte « `5C^X » ／ `203E^U1 ならば ~byte « `7E^X » ／ `FF61^U1 〜 `FF9F^U1 ならば ~byte列 « `8E^X, ( %~cp ~MINUS `FF61^X ~PLUS `A1^X ) » ／ ~ELSE_ ~NULL ◎ If codePoint is end-of-queue, then return finished. ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint. ◎ If codePoint is U+00A5 (¥), then return byte 0x5C. ◎ If codePoint is U+203E (‾), then return byte 0x7E. ◎ If codePoint is in the range U+FF61 (｡) to U+FF9F (ﾟ), inclusive, then return two bytes whose values are 0x8E and codePoint − 0xFF61 + 0xA1.
~IF［ %結果 ~NEQ ~NULL ］ ⇒ ~RET %結果 ◎ ↑
~IF［ %~cp ~EQ `2212^U1 ］ ⇒ %~cp ~SET `FF0D^U1 ◎ If codePoint is U+2212 (−), then set it to U+FF0D (－).
%~pointer ~LET `索引~jis0208$ の中で %~cp を指す`索引~pointer$ ◎ Let pointer be the index pointer for codePoint in index jis0208.

注記： %~pointer は、 ~NULL でなければ，`索引~jis0208$と~pointer演算の資質に因り 8836 未満になる。 ◎ If pointer is non-null, it is less than 8836 due to the nature of index jis0208 and the index pointer operation.
~IF［ %~pointer ~EQ ~NULL ］ ⇒ ~RET `~error$i( %~cp ) ◎ If pointer is null, then return error with codePoint.
%頭部 ~LET ( %~pointer ~DIV 94 ) ~PLUS `A1^X ◎ Let leading be pointer / 94 + 0xA1.
%尾部 ~LET ( %~pointer ~MOD 94 ) ~PLUS `A1^X ◎ Let trailing be pointer % 94 + 0xA1.
~RET ~byte列 « %頭部, %尾部 » ◎ Return two bytes whose values are leading and trailing.

12.2. ~ISO-2022-JP

【！ https://tools.ietf.org/html/rfc1468】【！ https://tools.ietf.org/html/rfc2237 (iso-2022-jp-1; not used)】【！ "ESC ) I" is from iso-2022-jp-3 reportedly】

12.2.1. ~ISO-2022-JP復号器

各［ `ISO-2022-JP$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ ISO-2022-JP’s decoder has an associated:

`~ISO-2022-JP復号器~状態@ ⇒ ある状態 — 初期~時は `~ASCII$i とする。 ◎ ISO-2022-JP decoder state • A state, initially ASCII.
`~ISO-2022-JP復号器~出力~状態@ ⇒ ある状態 — 初期~時は `~ASCII$i とする。 ◎ ISO-2022-JP decoder output state • A state, initially ASCII.
`~ISO-2022-JP頭部@ ⇒ ある~byte — 初期~時は `00^X とする。 ◎ ISO-2022-JP leading • A byte, initially 0x00.
`~ISO-2022-JP出力するか@ ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ ISO-2022-JP output • A boolean, initially false.

`ISO-2022-JP$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し，`~ISO-2022-JP復号器~状態$に応じて： ◎ ISO-2022-JP’s decoder’s handler, given ioQueue and byte, runs these steps, switching on ISO-2022-JP decoder state:

`~ASCII@i ◎ ASCII

%~byte に応じて： ◎ Based on byte:

`1B^X ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `~escape開始$i ； ~RET `継続-$i ◎ 0x1B • Set ISO-2022-JP decoder state to escape start and return continue.
`0E^X, `0F^X, `1B^X 以外の`~ASCII~byte$ ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET ~cp « %~byte » ◎ 0x00 to 0x7F, excluding 0x0E, 0x0F, and 0x1B • Set ISO-2022-JP output to false and return a code point whose value is byte.
`~EoQ$ ⇒＃ ~RET `完遂d$i ◎ end-of-queue • Return finished.
その他 ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET `~error$i ◎ Otherwise • Set ISO-2022-JP output to false and return error.

`~Roman@i ◎ Roman

%~byte に応じて： ◎ Based on byte:

`1B^X ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `~escape開始$i ； ~RET `継続-$i ◎ 0x1B • Set ISO-2022-JP decoder state to escape start and return continue.
`5C^X ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET ~cp « `00A5^U1 » ◎ 0x5C • Set ISO-2022-JP output to false and return code point U+00A5 (¥).
`7E^X ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET ~cp « `203E^U1 » ◎ 0x7E • Set ISO-2022-JP output to false and return code point U+203E (‾).
`0E^X, `0F^X, `1B^X, `5C^X, `7E^X 以外の`~ASCII~byte$ ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET ~cp « %~byte » ◎ 0x00 to 0x7F, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E • Set ISO-2022-JP output to false and return a code point whose value is byte.
`~EoQ$ ⇒＃ ~RET `完遂d$i ◎ end-of-queue • Return finished.
その他 ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET `~error$i ◎ Otherwise • Set ISO-2022-JP output to false and return error.

`~katakana@i ◎ katakana

%~byte に応じて： ◎ Based on byte:

`1B^X ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `~escape開始$i ； ~RET `継続-$i ◎ 0x1B • Set ISO-2022-JP decoder state to escape start and return continue.
`21^X 〜 `5F^X 【！ katakana; subtraction is done first to avoid upsetting compilers 】 ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET ~cp « `FF61^X ~MINUS `21^X ~PLUS %~byte » ◎ 0x21 to 0x5F • Set ISO-2022-JP output to false and return a code point whose value is 0xFF61 − 0x21 + byte.
`~EoQ$ ⇒＃ ~RET `完遂d$i ◎ end-of-queue • Return finished.
その他 ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET `~error$i ◎ Otherwise • Set ISO-2022-JP output to false and return error.

`頭部~byte@i ◎ Leading byte

%~byte に応じて： ◎ Based on byte:

`1B^X ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `~escape開始$i ； ~RET `継続-$i ◎ 0x1B • Set ISO-2022-JP decoder state to escape start and return continue.
`21^X 〜 `7E^X ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； `~ISO-2022-JP頭部$ ~SET %~byte ； `~ISO-2022-JP復号器~状態$ ~SET `尾部~byte$i ； ~RET `継続-$i ◎ 0x21 to 0x7E • Set ISO-2022-JP output to false, ISO-2022-JP leading to byte, ISO-2022-JP decoder state to trailing byte, and return continue.
`~EoQ$ ⇒＃ ~RET `完遂d$i ◎ end-of-queue • Return finished.
その他 ⇒＃ `~ISO-2022-JP出力するか$ ~SET ~F ； ~RET `~error$i ◎ Otherwise • Set ISO-2022-JP output to false and return error.

`尾部~byte@i ◎ Trailing byte

%~byte に応じて： ◎ Based on byte:

`1B^X 【！ iso-2022-jp decoder output state is still leading byte 】 ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `~escape開始$i ； ~RET `~error$i ◎ 0x1B • Set ISO-2022-JP decoder state to escape start and return error.
`21^X 〜 `7E^X ： ◎ 0x21 to 0x7E
1. `~ISO-2022-JP復号器~状態$ ~SET `頭部~byte$i ◎ Set the ISO-2022-JP decoder state to leading byte.
2. %~pointer ~LET ( `~ISO-2022-JP頭部$ ~MINUS `21^X ) ~MUL 94 ~PLUS %~byte ~MINUS `21^X ◎ Let pointer be (ISO-2022-JP leading − 0x21) × 94 + byte − 0x21.
3. %~cp ~LET `索引~jis0208$ の中で %~pointer が指す`索引~cp$ ◎ Let codePoint be the index code point for pointer in index jis0208.
4. ~IF［ %~cp ~EQ ~NULL ］ ⇒ ~RET `~error$i ◎ If codePoint is null, then return error.
5. ~RET ~cp « %~cp » ◎ Return a code point whose value is codePoint.
`~EoQ$ ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `頭部~byte$i ； ~RET `~error$i ◎ end-of-queue • Set the ISO-2022-JP decoder state to leading byte and return error.
その他【！ iso-2022-jp decoder output state is still leading byte 】 ⇒＃ `~ISO-2022-JP復号器~状態$ ~SET `頭部~byte$i ； ~RET `~error$i ◎ Otherwise • Set ISO-2022-JP decoder state to leading byte and return error.

`~escape開始@i ◎ Escape start

~IF［ %~byte ~IN { `24^X【！ $ 】, `28^X【！ ( 】 } ］ ⇒＃ `~ISO-2022-JP頭部$ ~SET %~byte ； `~ISO-2022-JP復号器~状態$ ~SET `~escape$i ； ~RET `継続-$i ◎ If byte is either 0x24 or 0x28, then set ISO-2022-JP leading to byte, ISO-2022-JP decoder state to escape, and return continue.
~IF［ %~byte ~NEQ `~EoQ$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ If byte is not end-of-queue, then restore byte to ioQueue.
`~ISO-2022-JP出力するか$ ~SET ~F ◎ Set ISO-2022-JP output to false,＼
`~ISO-2022-JP復号器~状態$ ~SET `~ISO-2022-JP復号器~出力~状態$ ◎ ISO-2022-JP decoder state to ISO-2022-JP decoder output state, and＼
~RET `~error$i ◎ return error.

`~escape@i ◎ Escape

%頭部 ~LET `~ISO-2022-JP頭部$ ◎ Let leading be ISO-2022-JP leading and＼
`~ISO-2022-JP頭部$ ~SET `00^X ◎ set ISO-2022-JP leading to 0x00.
%状態 ~LET ( %頭部, %~byte ) に応じて ⇒＃ ( `28^X, `42^X【！B 】 ) ならば `~ASCII$i ／ ( `28^X, `4A^X【！J 】 ) ならば `~Roman$i1 ／ ( `28^X, `49^X【！I 】 ) ならば `~katakana$i ／ ( `24^X, `40^X【！@ 】 ) ならば `頭部~byte$i ／ ( `24^X, `42^X【！B 】 ) ならば `頭部~byte$i ／ ~ELSE_ ~NULL ◎ Let state be null. ◎ If leading is 0x28 and byte is 0x42, then set state to ASCII. ◎ If leading is 0x28 and byte is 0x4A, then set state to Roman. ◎ If leading is 0x28 and byte is 0x49, then set state to katakana. ◎ If leading is 0x24 and byte is either 0x40 or 0x42, then set state to leading byte.
~IF［ %状態 ~NEQ ~NULL ］： ◎ If state is non-null:
1. `~ISO-2022-JP復号器~状態$ ~SET %状態 ◎ ↓
2. `~ISO-2022-JP復号器~出力~状態$ ~SET %状態 ◎ Set ISO-2022-JP decoder state and ISO-2022-JP decoder output state to state.
3. %出力 ~LET `~ISO-2022-JP出力するか$ ◎ Let output be the value of ISO-2022-JP output.
4. `~ISO-2022-JP出力するか$ ~SET ~T ◎ Set ISO-2022-JP output to true.
5. ~RET %出力に応じて ⇒＃ ~F ならば `継続-$i ／ ~T ならば `~error$i ◎ Return continue, if output is false, and error otherwise.
~IF［ %~byte ~EQ `~EoQ$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %頭部 ) ◎ If byte is end-of-queue, then restore leading to ioQueue;＼
~ELSE ⇒ `入出力~queueに格納し直す$( %入出力~queue, ~byte列 « %頭部, %~byte » ) ◎ otherwise, restore « leading, byte » to ioQueue.
`~ISO-2022-JP出力するか$ ~SET ~F ◎ Set ISO-2022-JP output to false,＼
`~ISO-2022-JP復号器~状態$ ~SET `~ISO-2022-JP復号器~出力~状態$ ◎ ISO-2022-JP decoder state to ISO-2022-JP decoder output state and＼
~RET `~error$i ◎ return error.

12.2.2. ~ISO-2022-JP符号化器

注記： `~ISO-2022-JP符号化器$は、［複数の出力を連結した結果を対応する`復号器$にかけたとき， `~error$i になり得る］ような，唯一の`符号化器$である。 ◎ The ISO-2022-JP encoder is the only encoder for which the concatenation of multiple outputs can result in an error when run through the corresponding decoder.

`00A5^U1 を符号化した結果は [ `1B^X `28^X `4A^X `5C^X `1B^X `28^X `42^X ] になる。その結果に同じ結果を連結してから復号した結果は、 [ `00A5^U `FFFD^U `00A5^U ] になる。 ◎ Encoding U+00A5 (¥) gives 0x1B 0x28 0x4A 0x5C 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 U+FFFD U+00A5.

各［ `ISO-2022-JP$n の`符号化器$ ］には、次に挙げるものが結付けられる： ◎ ISO-2022-JP’s encoder has an associated＼

`~ISO-2022-JP符号化器~状態@ ⇒ `~ASCII@i1 ／ `~Roman@i1 ／ `jis0208@i1 — 初期~時は `~ASCII$i1 とする ◎ ISO-2022-JP encoder state which is ASCII, Roman, or jis0208, initially ASCII.

`ISO-2022-JP$n の`符号化器$の`~handler$は、所与の ( %入出力~queue, %~cp ) に対し： ◎ ISO-2022-JP’s encoder’s handler, given ioQueue and codePoint, runs these steps:

~IF［ %~cp ~EQ `~EoQ$ ］： ◎ ↓
1. ~IF［ `~ISO-2022-JP符号化器~状態$ ~NEQ `~ASCII$i1 ］ ⇒＃ `~ISO-2022-JP符号化器~状態$ ~SET `~ASCII$i1 ； ~RET ~byte列 « `1B^X, `28^X, `42^X » ◎ If codePoint is end-of-queue and ISO-2022-JP encoder state is not ASCII, then set ISO-2022-JP encoder state to ASCII and return three bytes 0x1B 0x28 0x42.
2. ~RET `完遂d$i ◎ If codePoint is end-of-queue and ISO-2022-JP encoder state is ASCII, then return finished.
~IF［ `~ISO-2022-JP符号化器~状態$ ~IN { `~ASCII$i1, `~Roman$i1 } ］~AND［ %~cp ~IN { `000E^U, `000F^U, `001B^U } ］ ⇒ ~RET `~error$i( `FFFD^U1 ) ◎ If ISO-2022-JP encoder state is ASCII or Roman, and codePoint is U+000E, U+000F, or U+001B, then return error with U+FFFD (�).

注記：攻撃を防ぐため、ここでは，［ %~cp ではなく， `FFFD^U1 ］を返す。 ◎ This returns U+FFFD (�) rather than codePoint to prevent attacks.
【！ https://github.com/whatwg/encoding/issues/15 】
~IF［ `~ISO-2022-JP符号化器~状態$ ~EQ `~ASCII$i1 ］~AND［ %~cp ~IN `~ASCII~cp$ ］ ⇒ ~RET ~byte « %~cp » ◎ If ISO-2022-JP encoder state is ASCII and codePoint is an ASCII code point, then return a byte whose value is codePoint.
~IF［ `~ISO-2022-JP符号化器~状態$ ~EQ `~Roman$i1 ］：
1. %結果 ~LET %~cp に応じて ⇒＃ `005C^U1, `007E^U1 以外の`~ASCII~cp$ならば ~byte « %~cp » ／ `00A5^U1 ならば ~byte « `5C^X » ／ `203E^U1 ならば ~byte « `7E^X » ／ ~ELSE_ ~NULL
2. ~IF［ %結果 ~NEQ ~NULL ］ ⇒ ~RET %結果 ◎ ↑
◎ If ISO-2022-JP encoder state is Roman and codePoint is an ASCII code point, excluding U+005C (\) and U+007E (~), or is U+00A5 (¥) or U+203E (‾): • If codePoint is an ASCII code point, then return a byte whose value is codePoint. • If codePoint is U+00A5 (¥), then return byte 0x5C. • If codePoint is U+203E (‾), then return byte 0x7E.
~IF［ %~cp ~IN `~ASCII~cp$ ］~AND［ `~ISO-2022-JP符号化器~状態$ ~NEQ `~ASCII$i1 ］ ⇒＃ `入出力~queueに格納し直す$( %入出力~queue, %~cp )； `~ISO-2022-JP符号化器~状態$ ~SET `~ASCII$i1 ； ~RET ~byte列 « `1B^X, `28^X, `42^X » ◎ If codePoint is an ASCII code point, and ISO-2022-JP encoder state is not ASCII, then restore codePoint to ioQueue, set ISO-2022-JP encoder state to ASCII, and return three bytes 0x1B 0x28 0x42.
~IF［ %~cp ~NIN { `00A5^U1, `203E^U1 } ］~AND［ `~ISO-2022-JP符号化器~状態$ ~NEQ `~Roman$i1 ］ ⇒＃ `入出力~queueに格納し直す$( %入出力~queue, %~cp )； `~ISO-2022-JP符号化器~状態$ ~SET `~Roman$i1 ； ~RET ~byte列 « `1B^X, `28^X, `4A^X » ◎ If codePoint is either U+00A5 (¥) or U+203E (‾), and ISO-2022-JP encoder state is not Roman, then restore codePoint to ioQueue, set ISO-2022-JP encoder state to Roman, and return three bytes 0x1B 0x28 0x4A.
~IF［ %~cp ~EQ `2212^U1 ］ ⇒ %~cp ~SET `FF0D^U1 ◎ If codePoint is U+2212 (−), then set it to U+FF0D (－).
~IF［ %~cp ~IN { `FF61^U1 〜 `FF9F^U1 } ］ ⇒ %~cp ~SET `索引~ISO-2022-JP~katakana$の中で ( %~cp ~MINUS `FF61^X ) が指す`索引~cp$ ◎ If codePoint is in the range U+FF61 (｡) to U+FF9F (ﾟ), inclusive, then set it to the index code point for codePoint − 0xFF61 in index ISO-2022-JP katakana.
%~pointer ~LET `索引~jis0208$ の中で %~cp を指す`索引~pointer$ ◎ Let pointer be the index pointer for codePoint in index jis0208.

注記： %~pointer は、 ~NULL でなければ，`索引~jis0208$と~pointer演算の資質に因り 8836 未満になる。 ◎ If pointer is non-null, it is less than 8836 due to the nature of index jis0208 and the index pointer operation.
~IF［ %~pointer ~EQ ~NULL ］： ◎ If pointer is null:
1. ~IF［ `~ISO-2022-JP符号化器~状態$ ~EQ `jis0208$i1 ］ ⇒＃ `入出力~queueに格納し直す$( %入出力~queue, %~cp )； `~ISO-2022-JP符号化器~状態$ ~SET `~ASCII$i1 ； ~RET ~byte列 « `1B^X, `28^X, `42^X » ◎ If ISO-2022-JP encoder state is jis0208, then restore codePoint to ioQueue, set ISO-2022-JP encoder state to ASCII, and return three bytes 0x1B 0x28 0x42.
2. ~RET `~error$i( %~cp ) ◎ Return error with codePoint.
~IF［ `~ISO-2022-JP符号化器~状態$ ~NEQ `jis0208$i1 ］ ⇒＃ `入出力~queueに格納し直す$( %入出力~queue, %~cp )； `~ISO-2022-JP符号化器~状態$ ~SET `jis0208$i1 ； ~RET ~byte列 « `1B^X, `24^X, `42^X » ◎ If ISO-2022-JP encoder state is not jis0208, then restore codePoint to ioQueue, set ISO-2022-JP encoder state to jis0208, and return three bytes 0x1B 0x24 0x42.
%頭部 ~LET ( %~pointer ~DIV 94 ) ~PLUS `21^X ◎ Let leading be pointer / 94 + 0x21.
%尾部 ~LET ( %~pointer ~MOD 94 ) ~PLUS `21^X ◎ Let trailing be pointer % 94 + 0x21.
~RET ~byte列 « %頭部, %尾部 » ◎ Return two bytes whose values are leading and trailing.

12.3. ~Shift_JIS

12.3.1. ~Shift_JIS復号器

各［ `Shift_JIS$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ Shift_JIS’s decoder has an associated＼

`~Shift_JIS頭部@ ⇒ ある~byte — 初期~時は `00^X とする。 ◎ Shift_JIS leading, which is a byte, initially 0x00.

`Shift_JIS$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ Shift_JIS’s decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］： ◎ ↓
1. ~IF［ `~Shift_JIS頭部$ ~NEQ `00^X ］ ⇒＃ `~Shift_JIS頭部$ ~SET `00^X ； ~RET `~error$i ◎ If byte is end-of-queue and Shift_JIS leading is not 0x00, then set Shift_JIS leading to 0x00 and return error.
2. ~ELSE ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue and Shift_JIS leading is 0x00, then return finished.
~IF［ `~Shift_JIS頭部$ ~NEQ `00^X ］： ◎ If Shift_JIS leading is not 0x00:
1. %頭部 ~LET `~Shift_JIS頭部$ ◎ Let leading be Shift_JIS leading.
2. `~Shift_JIS頭部$ ~SET `00^X ◎ Set Shift_JIS leading to 0x00.
3. %~pointer ~LET ~NULL ◎ Let pointer be null.
4. %~offset ~LET ［ %~byte ~IN { `00^X 〜 `7E^X } ならば `40^X ／ ~ELSE_ `41^X ］ ◎ Let offset be 0x40 if byte is less than 0x7F; otherwise 0x41.
5. %頭部~offset ~LET ［ %頭部 ~IN { `00^X 〜 `9F^X } ならば `81^X ／ ~ELSE_ `C1^X ］ ◎ Let leadingOffset be 0x81 if leading is less than 0xA0; otherwise 0xC1.
6. ~IF［ %~byte ~IN { `40^X 〜 `7E^X, `80^X 〜 `FC^X } ］ ⇒ %~pointer ~SET ( %頭部 ~MINUS %頭部~offset ) ~MUL 188 ~PLUS %~byte ~MINUS %~offset ◎ If byte is in the range 0x40 to 0x7E, inclusive, or 0x80 to 0xFC, inclusive, then set pointer to (leading − leadingOffset) × 188 + byte − offset.
7. ~IF［ %~pointer ~IN { 8836 〜 10715 } ］【！ subtraction is done first to avoid upsetting compilers 】 ⇒ ~RET ~cp « `E000^X ~MINUS 8836 ~PLUS %~pointer » ◎ If pointer is in the range 8836 to 10715, inclusive, then return a code point whose value is 0xE000 − 8836 + pointer.
  
  注記：これは EUDC として周知な，旧来の Windows によるものと相互運用可能にする。【！ PUA 】 ◎ This is interoperable legacy from Windows known as EUDC.
  
  【 EUDC — いわゆる外字~用の機能。】【 8836 = 94 ~MUL 94 は~Shift_JIS（ JIS X 0208 ）の`区点番号＠https://ja.wikipedia.org/wiki/%E5%8C%BA%E7%82%B9%E7%95%AA%E5%8F%B7$の総数。結果の~cpは~Unicode私用領域に入る。】
8. %~cp ~LET ［ %~pointer ~EQ ~NULL ならば ~NULL ／ ~ELSE_ `索引~jis0208$ の中で %~pointer が指す`索引~cp$ ］ ◎ Let codePoint be null if pointer is null; otherwise the index code point for pointer in index jis0208.
9. ~IF［ %~cp ~NEQ ~NULL ］ ⇒ ~RET ~cp « %~cp » ◎ If codePoint is non-null, then return a code point whose value is codePoint.
10. ~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ If byte is an ASCII byte, then restore byte to ioQueue.
11. ~RET `~error$i ◎ Return error.
~IF［ %~byte ~IN { `~ASCII~byte$, `80^X} ］ ⇒ ~RET ~cp « %~byte » 【！ Opera has 0x7E 】 ◎ If byte is an ASCII byte or 0x80, then return a code point whose value is byte.
~IF［ %~byte ~IN { `A1^X 〜 `DF^X } ］【！ katakana; subtraction is done first to avoid upsetting compilers 】 ⇒ ~RET ~cp « `FF61^X ~PLUS ( %~byte ~MINUS `A1^X ) » ◎ If byte is in the range 0xA1 to 0xDF, inclusive, then return a code point whose value is 0xFF61 − 0xA1 + byte.
~IF［ %~byte ~IN { `81^X 〜 `9F^X, `E0^X 〜 `FC^X } ］ ⇒＃ `~Shift_JIS頭部$ ~SET %~byte ； ~RET `継続-$i ◎ If byte is in the range 0x81 to 0x9F, inclusive, or 0xE0 to 0xFC, inclusive, then set Shift_JIS leading to byte and return continue.
~RET `~error$i ◎ Return error.

12.3.2. ~Shift_JIS符号化器

`Shift_JIS$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ Shift_JIS’s encoder’s handler, given unused and codePoint, runs these steps:

%結果 ~LET %~cp に応じて ⇒＃ `~EoQ$ ならば `完遂d$i ／ `~ASCII~cp$ならば ~byte « %~cp » ／ `0080^U ならば ~byte « %~cp » ／ `00A5^U1 ならば ~byte « `5C^X » ／ `203E^U1 ならば ~byte « `7E^X » ／ `FF61^U1 〜 `FF9F^U1 ならば ~byte « ( %~cp ~MINUS `FF61^X ) ~PLUS `A1^X » ／ ~ELSE_ ~NULL ◎ If codePoint is end-of-queue, then return finished. ◎ If codePoint is an ASCII code point or U+0080, then return a byte whose value is codePoint. ◎ If codePoint is U+00A5 (¥), then return byte 0x5C. ◎ If codePoint is U+203E (‾), then return byte 0x7E. ◎ If codePoint is in the range U+FF61 (｡) to U+FF9F (ﾟ), inclusive, then return a byte whose value is codePoint − 0xFF61 + 0xA1.
~IF［ %結果 ~NEQ ~NULL ］ ⇒ ~RET %結果 ◎ ↑
~IF［ %~cp ~EQ `2212^U1 ］ ⇒ %~cp ~SET `FF0D^U1 ◎ If codePoint is U+2212 (−), then set it to U+FF0D (－).
%~pointer ~LET %~cp を指す`索引~Shift_JIS~pointer$ ◎ Let pointer be the index Shift_JIS pointer for codePoint.
~IF［ %~pointer ~EQ ~NULL ］ ⇒ ~RET `~error$i( %~cp ) ◎ If pointer is null, then return error with codePoint.
%頭部 ~LET ( %~pointer ~DIV 188 ) ◎ Let leading be pointer / 188.
%頭部~offset ~LET ［ %頭部 ~IN { `00^X 〜 `1E^X } ならば `81^X ／ ~ELSE_ `C1^X【！ 0xA0-0x81 】］ ◎ Let leadingOffset be 0x81 if leading is less than 0x1F; otherwise 0xC1.
%尾部 ~LET %~pointer ~MOD 188 ◎ Let trailing be pointer % 188.
%~offset ~LET ［ %尾部 ~IN { `00^X 〜 `3E^X } ならば `40^X ／ ~ELSE_ `41^X ］ ◎ Let offset be 0x40 if trailing is less than 0x3F; otherwise 0x41.
~RET ~byte列 « ( %頭部 ~PLUS %頭部~offset ), ( %尾部 ~PLUS %~offset ) » ◎ Return two bytes whose values are leading + leadingOffset and trailing + offset.

13. 旧来の複-~byte~Korean符号化法

13.1. ~EUC-KR

13.1.1. ~EUC-KR復号器

各［ `EUC-KR$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ EUC-KR’s decoder has an associated＼

`~EUC-KR頭部@ ⇒ ある~byte — 初期~時は `00^X とする。 ◎ EUC-KR leading, which is a byte, initially 0x00.

`EUC-KR$n の`復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ EUC-KR’s decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］： ◎ ↓
1. ~IF［ `~EUC-KR頭部$ ~NEQ `00^X ］ ⇒＃ `~EUC-KR頭部$ ~SET `00^X ； ~RET `~error$i ◎ If byte is end-of-queue and EUC-KR leading is not 0x00, then set EUC-KR leading to 0x00 and return error.
2. ~ELSE ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue and EUC-KR leading is 0x00, then return finished.
~IF［ `~EUC-KR頭部$ ~NEQ `00^X ］： ◎ If EUC-KR leading is not 0x00:
1. %頭部 ~LET `~EUC-KR頭部$ ◎ Let leading be EUC-KR leading.
2. `~EUC-KR頭部$ ~SET `00^X ◎ Set EUC-KR leading to 0x00.
3. %~pointer ~LET ~NULL ◎ Let pointer be null.
4. ~IF［ %~byte ~IN { `41^X 〜 `FE^X } ］ ⇒ %~pointer ~SET ( %頭部 ~MINUS `81^X ) ~MUL 190 ~PLUS ( %~byte ~MINUS `41^X ) ◎ If byte is in the range 0x41 to 0xFE, inclusive, then set pointer to (leading − 0x81) × 190 + (byte − 0x41).
5. %~cp ~LET ［ %~pointer ~EQ ~NULL ならば ~NULL ／ ~ELSE_ `索引~EUC-KR$ の中で %~pointer が指す`索引~cp$ ］ ◎ Let codePoint be null if pointer is null; otherwise the index code point for pointer in index EUC-KR.
6. ~IF［ %~cp ~NEQ ~NULL ］ ⇒ ~RET ~cp « %~cp » ◎ If codePoint is non-null, then return a code point whose value is codePoint.
7. ~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ `入出力~queueに格納し直す$( %入出力~queue, %~byte ) ◎ If byte is an ASCII byte, then restore byte to ioQueue.
8. ~RET `~error$i ◎ Return error.
~IF［ %~byte ~IN `~ASCII~byte$ ］ ⇒ ~RET ~cp « %~byte » ◎ If byte is an ASCII byte, then return a code point whose value is byte.
~IF［ %~byte ~IN { `81^X 〜 `FE^X } ］ ⇒＃ `~EUC-KR頭部$ ~SET %~byte ； ~RET `継続-$i ◎ If byte is in the range 0x81 to 0xFE, inclusive, then set EUC-KR leading to byte and return continue.
~RET `~error$i ◎ Return error.

13.1.2. ~EUC-KR符号化器

`EUC-KR$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ EUC-KR’s encoder’s handler, given unused and codePoint, runs these steps:

~IF［ %~cp ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If codePoint is end-of-queue, then return finished.
~IF［ %~cp ~IN `~ASCII~cp$ ］ ⇒ ~RET ~byte « %~cp » ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint.
%~pointer ~LET `索引~EUC-KR$ の中で %~cp を指す`索引~pointer$ ◎ Let pointer be the index pointer for codePoint in index EUC-KR.
~IF［ %~pointer ~EQ ~NULL ］ ⇒ ~RET `~error$i( %~cp ) ◎ If pointer is null, then return error with codePoint.
%頭部 ~LET ( %~pointer ~DIV 190 ) ~PLUS `81^X ◎ Let leading be pointer / 190 + 0x81.
%尾部 ~LET ( %~pointer ~MOD 190 ) ~PLUS `41^X ◎ Let trailing be pointer % 190 + 0x41.
~RET ~byte列 « %頭部, %尾部 » ◎ Return two bytes whose values are leading and trailing.

【！ removed from the spec 2013-08-23: 13.2 iso-2022-kr id="iso-2022-kr" 】

14. 旧来の諸々の符号化法

14.1. ~replacement

注記： `replacement$n `符号化法$は、 ~serverと~clientにおける `符号化法$の~supportの不一致を突く，ある種の攻撃を防ぐためのものである。 ◎ The replacement encoding exists to prevent certain attacks that abuse a mismatch between encodings supported on the server and the client.

14.1.1. ~replacement復号器

各［ `replacement$n の`復号器$ ］には、次に挙げるものが結付けられる： ◎ replacement’s decoder has an associated＼

`~replacement~errorは返したか@ ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ replacement error returned, which is a boolean, initially false.

`replacement$n の`復号器$の`~handler$は、所与の ( %利用されない~queue, %~byte ) に対し： ◎ replacement’s decoder’s handler, given unused and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］ ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue, then return finished.
~IF［ `~replacement~errorは返したか$ ~EQ ~F ］ ⇒＃ `~replacement~errorは返したか$ ~SET ~T ； ~RET `~error$i ◎ If replacement error returned is false, then set replacement error returned to true and return error.
~RET `完遂d$i ◎ Return finished.

【 `replacement$n には、 `符号化器$は無い。】

14.2. `UTF-16BE/LE$n に共通な基盤

`UTF-16BE/LE@n は、［ `UTF-16BE$n ／ `UTF-16LE$n ］の総称である。 ◎ UTF-16BE/LE is UTF-16BE or UTF-16LE.

14.2.1. 共用~UTF-16復号器

注記： ~BOMは~labelより優先される。それは，配備-済みな内容において、どの~labelよりも正確aであることが見出されているので。したがって，それは、 `共用~UTF-16復号器$の一部ではなく，`~Unicodeに復号する$ ~algoの一部を成す。 ◎ A byte order mark has priority over a label as it has been found to be more accurate in deployed content. Therefore it is not part of the shared UTF-16 decoder algorithm, but rather the decode algorithm.

各［ `共用~UTF-16復号器$ ］には、次に挙げるものが結付けられる： ◎ shared UTF-16 decoder has an associated:

`~UTF-16頭部~byte@ ⇒ ~NULL ／ある~byte — 初期~時は ~NULL とする。 ◎ UTF-16 leading byte • Null or a byte, initially null.
`~UTF-16頭部~surrogate@ ⇒ ~NULL ／ある`頭部~surrogate$ — 初期~時は ~NULL とする。 ◎ UTF-16 leading surrogate • Null or a leading surrogate, initially null.
`~UTF-16BE復号器~用か@ ⇒ ある真偽値 — 初期~時は ~F とする。 ◎ is UTF-16BE decoder • A boolean, initially false.

`共用~UTF-16復号器$の`~handler$は、所与の ( %入出力~queue, %~byte ) に対し： ◎ shared UTF-16 decoder’s handler, given ioQueue and byte, runs these steps:

~IF［ %~byte ~EQ `~EoQ$ ］： ◎ ↓
1. ~IF［ `~UTF-16頭部~byte$ ~NEQ ~NULL ］~OR［ `~UTF-16頭部~surrogate$ ~NEQ ~NULL ］ ⇒＃ `~UTF-16頭部~byte$ ~SET ~NULL； `~UTF-16頭部~surrogate$ ~SET ~NULL； ~RET `~error$i ◎ If byte is end-of-queue and either UTF-16 leading byte or UTF-16 leading surrogate is non-null, then set UTF-16 leading byte and UTF-16 leading surrogate to null, and return error.
2. ~ELSE ⇒ ~RET `完遂d$i ◎ If byte is end-of-queue and UTF-16 leading byte and UTF-16 leading surrogate are null, then return finished.
~IF［ `~UTF-16頭部~byte$ ~EQ ~NULL ］ ⇒＃ `~UTF-16頭部~byte$ ~SET %~byte ； ~RET `継続-$i ◎ If UTF-16 leading byte is null, then set UTF-16 leading byte to byte and return continue.
%~cu ~LET `~UTF-16BE復号器~用か$に応じて ⇒＃ ~T ならば ( ( `~UTF-16頭部~byte$ ~Lshift 8 ) ~PLUS %~byte ) ／ ~F ならば ( ( %~byte ~Lshift 8 ) ~PLUS `~UTF-16頭部~byte$ ) ◎ Let codeUnit be the result of: • is UTF-16BE decoder is true •• (UTF-16 leading byte << 8) + byte. • is UTF-16BE decoder is false •• (byte << 8) + UTF-16 leading byte.
`~UTF-16頭部~byte$ ~SET ~NULL ◎ Set UTF-16 leading byte to null.
~IF［ `~UTF-16頭部~surrogate$ ~NEQ ~NULL ］： ◎ If UTF-16 leading surrogate is non-null:
1. %頭部~surrogate ~LET `~UTF-16頭部~surrogate$ ◎ Let leadingSurrogate be UTF-16 leading surrogate.
2. `~UTF-16頭部~surrogate$ ~SET ~NULL ◎ Set UTF-16 leading surrogate to null.
3. ~IF［ %~cu は`尾部~surrogate$である］ ⇒ ~RET `~surrogate対から~scalar値を得する$( %頭部~surrogate, %~cu ) ◎ If codeUnit is a trailing surrogate, then return a scalar value from surrogates given leadingSurrogate and codeUnit.
4. %byte1 ~LET %~cu ~Rshift 8 ◎ Let byte1 be codeUnit >> 8.
5. %byte2 ~LET %~cu ~bAND `00FF^X ◎ Let byte2 be codeUnit & 0x00FF.
6. %~byte列 ~LET `~UTF-16BE復号器~用か$に応じて ⇒＃ ~T ならば ~byte列 « %byte1, %byte2 » ／ ~F ならば ~byte列 « %byte2, %byte1 » ◎ Let bytes be a list of two bytes whose values are byte1 and byte2, if is UTF-16BE decoder is true; otherwise byte2 and byte1.
7. `入出力~queueに格納し直す$( %入出力~queue, %~byte列 ) ◎ Restore bytes to ioQueue and return error.
8. ~RET `~error$i ◎ ↑
~IF［ %~cu は`頭部~surrogate$である］ ⇒＃ `~UTF-16頭部~surrogate$ ~SET %~cu ； ~RET `継続-$i ◎ If codeUnit is a leading surrogate, then set UTF-16 leading surrogate to codeUnit and return continue.
~IF［ %~cu は`尾部~surrogate$である］ ⇒ ~RET `~error$i ◎ If codeUnit is a trailing surrogate, then return error.
~RET ~cp « %~cu » ◎ Return code point codeUnit.

14.3. ~UTF-16BE

14.3.1. ~UTF-16BE復号器

`UTF-16BE$n の`復号器$は、［ `~UTF-16BE復号器~用か$ ~SET ~T ］にされた`共用~UTF-16復号器$である。 ◎ UTF-16BE’s decoder is shared UTF-16 decoder with its is UTF-16BE decoder set to true.

14.4. ~UTF-16LE

注記：配備-済みな内容に~~対処するため、 `utf-16^lb は `UTF-16LE$n 用の`~label$にされている。 ◎ "utf-16" is a label for UTF-16LE to deal with deployed content.

14.4.1. ~UTF-16LE復号器

`UTF-16BE$n の`復号器$は、 `共用~UTF-16復号器$である。 ◎ UTF-16LE’s decoder is shared UTF-16 decoder.

14.5. ~x-user-defined

注記：これは形上では`単-~byte符号化法$であるが、 ~algo的に実装できるので，別々に定義される。 ◎ While technically this is a single-byte encoding, it is defined separately as it can be implemented algorithmically.

14.5.1. ~x-user-defined復号器

`x-user-defined$n の`復号器$の`~handler$は、所与の ( %利用されない~queue, %~byte ) に対し： ◎ x-user-defined’s decoder’s handler, given unused and byte, runs these steps:

~RET %~byte に応じて ⇒＃ `~EoQ$ならば `完遂d$i ／ `~ASCII~byte$であるならば ~cp « %~byte » ／ ~ELSE_ ~cp « `F780^X ~PLUS %~byte ~MINUS `80^X » ◎ If byte is end-of-queue, then return finished. ◎ If byte is an ASCII byte, then return a code point whose value is byte. ◎ Return a code point whose value is 0xF780 + byte − 0x80.

14.5.2. ~x-user-defined符号化器

`x-user-defined$n の`符号化器$の`~handler$は、所与の ( %利用されない~queue, %~cp ) に対し： ◎ x-user-defined’s encoder’s handler, given unused and codePoint, runs these steps:

~RET %~cp に応じて ⇒＃ `~EoQ$ならば `完遂d$i ／ `~ASCII~cp$ならば ~byte « %~cp » ／ `F780^U 〜 `F7FF^U ならば ~byte « %~cp ~MINUS `F780^X ~PLUS `80^X » ／ ~ELSE_ `~error$i( %~cp ) ◎ If codePoint is end-of-queue, then return finished. ◎ If codePoint is an ASCII code point, then return a byte whose value is codePoint. ◎ If codePoint is in the range U+F780 to U+F7FF, inclusive, then return a byte whose value is codePoint − 0xF780 + 0x80. ◎ Return error with codePoint.

15. ~browser~UI

~browserには、資源の符号化法に対する上書きを可能化させないことが奨励される。にもかかわらず，その種の特能が在る場合、前述した`~securityの課題＠#security-background$から， ~browserは `UTF-16BE/LE$n を~optionとして提供するベキでない。 ~browserは、資源が `UTF-16BE/LE$n を利用して復号された場合でも，この特能を不能化するベキである。 ◎ Browsers are encouraged to not enable overriding the encoding of a resource. If such a feature is nonetheless present, browsers should not offer UTF-16BE/LE as an option, due to the aforementioned security issues. Browsers should also disable this feature if the resource was decoded using UTF-16BE/LE.

実装の考慮点

この標準における`符号化法$用の`復号器$は、 `格納し直す演算$を備える`入出力~queue$を~supportする代わりに，次により実装することもできる： ◎ Instead of supporting I/O queues with arbitrary restore, the decoders for encodings in this standard could be implemented with:

読取った現在の~byteを元に戻す能 ◎ The ability to unread the current byte.
`ISO-2022-JP$n 用の単-~byte（ `24^X ／ `28^X ）~buffer ◎ ↓
`gb18030$n 用の単-~byte（ `~ASCII~byte$ ）~buffer ◎ A single-byte buffer for gb18030 (an ASCII byte) and ISO-2022-JP (0x24 or 0x28).

`gb18030$n に対しては、［ `~gb3$ ~NEQ `00^X ］の間に~~不正な~byteに出くわしたときは、 `~gb2$は，次回に返すことになる単-~byte~bufferの中に移動でき、 `~gb3$が — 単-~byte~bufferを返して空にした後， `00^X でないことを検査したなら — 新たな`~gb1$になる。これは、 `gb18030$n における 1 個目と 3 個目の~byte範囲が一致するので可能になる。 ◎ For gb18030 when hitting a bogus byte while gb18030 third is not 0x00, gb18030 second could be moved into the single-byte buffer to be returned next, and gb18030 third would be the new gb18030 first, checked for not being 0x00 after the single-byte buffer was returned and emptied. This is possible as the range for the first and third byte in gb18030 is identical.

`~ISO-2022-JP符号化器$には，追加的な状態として`~ISO-2022-JP符号化器~状態$が必要になるが、それ以外では、この標準におけるどの`符号化法$用にも，その`符号化器$に追加的な［状態／~buffer ］は要求されない。 ◎ The ISO-2022-JP encoder needs ISO-2022-JP encoder state as additional state, but other than that, none of the encoders for encodings in this standard require additional state or buffers.

謝辞

年月に渡り、符号化法を相互運用可能なものにするために，たくさんの方々が助力され、この標準の目標へ近付けてきた。同様に多くの方々の助力が，この標準を~~現在の姿に仕立て上げてきた。特に，次の方々に感謝する： ◎ There have been a lot of people that have helped make encodings more interoperable over the years and thereby furthered the goals of this standard. Likewise many people have helped making this standard what it is today.

`_acks1@

知的財産権

`_ipr1@

1. 序

2. ~securityに関する背景0

3. 各種用語

【この訳に特有な表記規約】

4. 符号化法

4.1. 符号化器と復号器

4.2. 名前と~label

4.3. 出力~符号化法

5. 索引

6. 他の標準~用の~hook

6.1. 各~標準~用の旧来の~hook

7. ~API

7.1. ~interface~mixin `TextDecoderCommon^I

7.2. ~interface `TextDecoder^I

7.3. ~interface~mixin `TextEncoderCommon^I

7.4. ~interface `TextEncoder^I

7.5. ~interface `TextDecoderStream$I

7.6. ~interface `TextEncoderStream^I

8. ~~標準の符号化法

8.1. ~UTF-8

8.1.1. ~UTF-8復号器

8.1.2. ~UTF-8符号化器

9. 旧来の単-~byte符号化法

9.1. 単-~byte復号器

9.2. 単-~byte符号化器

10. 旧来の複-~byte~Chinese（簡体字） 符号化法

10.1. ~GBK

10.1.1. ~GBK復号器

10.1.2. ~GBK符号化器

10.2. ~gb18030

10.2.1. ~gb18030復号器

10.2.2. ~gb18030符号化器

11. 旧来の複-~byte~Chinese（繁体字）符号化法

11.1. ~Big5

11.1.1. ~Big5復号器

11.1.2. ~Big5符号化器

12. 旧来の複-~byte~Japanese符号化法

12.1. ~EUC-JP

12.1.1. ~EUC-JP復号器

12.1.2. ~EUC-JP符号化器

12.2. ~ISO-2022-JP

12.2.1. ~ISO-2022-JP復号器

12.2.2. ~ISO-2022-JP符号化器

12.3. ~Shift_JIS

12.3.1. ~Shift_JIS復号器

12.3.2. ~Shift_JIS符号化器

13. 旧来の複-~byte~Korean符号化法

13.1. ~EUC-KR

13.1.1. ~EUC-KR復号器

13.1.2. ~EUC-KR符号化器

14. 旧来の諸々の符号化法

14.1. ~replacement

14.1.1. ~replacement復号器

14.2. `UTF-16BE/LE$n に共通な基盤

14.2.1. 共用~UTF-16復号器

14.3. ~UTF-16BE

14.3.1. ~UTF-16BE復号器

14.4. ~UTF-16LE

14.4.1. ~UTF-16LE復号器

14.5. ~x-user-defined

14.5.1. ~x-user-defined復号器

14.5.2. ~x-user-defined符号化器

15. ~browser~UI

実装の考慮点

謝辞

知的財産権

10. 旧来の複-~byte~Chinese（簡体字）符号化法