CopyPastor

Detecting plagiarism made easy.

Score: 0.8146778212653266; Reported for: String similarity Open both answers

Possible Plagiarism

Reposted on 2024-10-22
by Linh Vu

Original Post

Original - Posted on 2024-10-22
by Linh Vu



            
Present in both answers; Present only in the new answer; Present only in the old answer;

**TLDR;**
`Encoding`: or **Character Encoding** or **Character-Encoding Scheme** is a part of `Charset`
`Charset`: is defined as the combination of one or more **coded character sets** (Unicode, US-ASCII, ISO 8859-1,...) and a **character-encoding scheme** (UTF-8, UTF-16, ISO 2022, EUC,...).
---
> A **coded character set** is a mapping between a set of abstract characters and a set of integers. US-ASCII, ISO 8859-1, JIS X 0201, and Unicode are examples of coded character sets.
Let's take `Unicode` as an example:
```text Code Point <-> Character Description U+0041 | A Latin Capital letter A (https://symbl.cc/en/0041/) U+0042 | B Latin Capital letter B (https://symbl.cc/en/0042/) ... | U+005A | Z Latin Capital letter Z (https://symbl.cc/en/005A/) ... | U+5301 | 匁 Ideograph Japanese unit of weight (1/1000 of a kan) CJK 匁 (https://symbl.cc/en/5301/) ... | U+1F525 | 🔥 Fire Emoji (https://symbl.cc/en/1F525-fire-emoji/) ```
> A **character-encoding scheme** is a mapping between one or more **coded character sets** and a set of octet (eight-bit byte) sequences. UTF-8, UTF-16, ISO 2022, and EUC are examples of character-encoding schemes.
Let's take `UTF-8` as an example:
```text Character Code Point <-> Bytes(Hex) A U+0041 | 41 (https://symbl.cc/en/0041/) B U+0042 | 42 (https://symbl.cc/en/0042/) ... | Z U+005A | 5A (https://symbl.cc/en/005A/) ... | 匁 U+5301 | E5 8C 81 (https://symbl.cc/en/5301/) ... | 🔥 U+1F525 | F0 9F 94 A5 (https://symbl.cc/en/1F525-fire-emoji/) ```
Refer to [Java Charset Class](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/Charset.html)
---
If you're interested in `Unicode` and `UTF-8` in Java, my [video](https://youtu.be/Jdk39lBJ_hY) may help.

**TLDR;**
`UTF-8`: is a **Character Encoding**.
`Charset.UTF_8` in [Java Charset](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/Charset.html) is a combination of `Unicode` as a **Coded Character Set** and `UTF-8` as a **Character-Encoding Scheme**.
--- A **Charset** is defined as the combination of one or more **coded character sets** and a **character-encoding scheme**.
`Unicode`: is a **Coded Character Set** - a mapping between a set of abstract characters and a set of integers (code points).
```text Code Point <-> Character Description U+0041 | A Latin Capital letter A (https://symbl.cc/en/0041/) U+0042 | B Latin Capital letter B (https://symbl.cc/en/0042/) ... | U+005A | Z Latin Capital letter Z (https://symbl.cc/en/005A/) ... | U+5301 | 匁 Ideograph Japanese unit of weight (1/1000 of a kan) CJK 匁 (https://symbl.cc/en/5301/) ... | U+1F525 | 🔥 Fire Emoji (https://symbl.cc/en/1F525-fire-emoji/) ```
`UTF-8`: is a **Character-Encoding Scheme** (or you could simply say **Character Encoding**) - a mapping between one or more **coded character sets** (Unicode code points) and a set of octet (eight-bit byte) sequences. ```text Character Code Point <-> Bytes(Hex) A U+0041 | 41 (https://symbl.cc/en/0041/) B U+0042 | 42 (https://symbl.cc/en/0042/) ... | Z U+005A | 5A (https://symbl.cc/en/005A/) ... | 匁 U+5301 | E5 8C 81 (https://symbl.cc/en/5301/) ... | 🔥 U+1F525 | F0 9F 94 A5 (https://symbl.cc/en/1F525-fire-emoji/) ```
`UTF-8`(**encoding scheme**) is used only to encode `Unicode` (**coded character set**)
Refer to [Java Charset Class](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/charset/Charset.html)
---
If you're interested in `Unicode` and `UTF-8` in Java, my [video](https://youtu.be/Jdk39lBJ_hY) may help.


        
Present in both answers; Present only in the new answer; Present only in the old answer;