Update copilot prompts

2026-05-11 15:07:18 +08:00 · 2025-10-30 22:39:15 -07:00
parent 0aa6299dbe
commit 47be4f596b
243 changed files with 16743 additions and 0 deletions
@@ -0,0 +1,133 @@
+# Encoding and Decoding
+
+Text encoding conversion between different UTF formats with BOM support and binary data encoding.
+
+## UTF Encoding with BOM Support
+
+Use `BomEncoder` and `BomDecoder` for UTF encoding with BOM support.
+
+`BomEncoder` and `BomDecoder` convert data between `wchar_t` and a specified UTF encoding with BOM added to the very beginning.
+
+## UTF Encoding without BOM
+
+Use `UtfGeneralEncoder<Native, Expect>` and `UtfGeneralDecoder<Native, Expect>` for UTF conversion without BOM.
+
+`UtfGeneralEncoder<Native, Expect>` encode from `Expect` to `Native`, `UtfGeneralDecoder<Native, Expect>` decode from `Native` to `Expect`. They should be one of `wchar_t`, `char8_t`, `char16_t`, `char32_t` and `char16be_t`.
+
+Unlike `BomEncoder` and `BomDecoder`, `UtfGeneralEncoder` and `UtfGeneralDecodes` is without BOM.
+
+`char16be_t` means UTF-16 Big Endian, which is not a C++ native type, it can't be used with any string literal.
+
+## Specific UTF Conversion Aliases
+
+There are aliases for them to convert between `wchar_t` and any other UTF encoding:
+
+- Use `Utf8Encoder` and `Utf8Decoder` for UTF-8 conversion
+- Use `Utf16Encoder` and `Utf16Decoder` for UTF-16 conversion  
+- Use `Utf16BEEncoder` and `Utf16BEDecoder` for UTF-16 Big Endian conversion
+- Use `Utf32Encoder` and `Utf32Decoder` for UTF-32 conversion
+
+## ASCII/MBCS Encoding
+
+Use `MbcsEncoder` and `MbcsDecoder` for ASCII/MBCS conversion.
+
+`MbcsEncoder` and `MbcsDecoder` convert data between `wchar_t` and `char`, which is ASCII.
+
+`BomEncoder::Mbcs` also handles ASCII meanwhile there is no BOM for ASCII. A `BomEncoder(BomEncoder::Mbcs)` works like a `MbcsEncoder`.
+
+The actual encoding of `char` depends on the user setting in the running OS.
+
+## Automatic Encoding Detection
+
+Use `TestEncoding` for automatic encoding detection.
+
+There is a function `TestEncoding` to scan a binary data and guess the most possible UTF encoding.
+
+## Base64 Encoding
+
+Use `Utf8Base64Encoder` and `Utf8Base64Decoder` for Base64 encoding in UTF-8.
+
+`Utf8Base64Encoder` and `Utf6Base64Decoder` convert between binary data to Base64 in UTF8 encoding.
+They can work with `UtfGeneralEncoder` and `UtfGeneralDecoder` to convert binary data to Base64 in a `WString`.
+
+### Example: Converting Binary Data to Base64 WString
+
+```cpp
+MemoryStream memoryStream;
+{
+  UtfGeneralEncoder<wchar_t, char8_t> u8towEncoder;
+  EncoderStream u8towStream(memoryStream, u8towEncoder);
+  Utf8Base64Encoder base64Encoder;
+  EncoderStream base64Stream(u8t0wStream, base64Encoder);
+  base64Stream.Write(binary ...);
+}
+memoryStream.SeekFromBegin(0);
+{
+  StreamReader reader(memoryStream);
+  auto base64 = reader.ReadToEnd(reader);
+}
+```
+
+### Example: Converting Base64 WString to Binary Data
+
+```cpp
+MemoryStream memoryStreamn;
+{
+  StreamWriter writer(memoryStream);
+  writer.WriteString(base64);
+}
+memoryStream.SeekFromBegin(0);
+{
+  UtfGeneralEncoder<wchar_t, char8_t> wtou8Decoder;
+  DecoderStream wtou8Stream(memoryStream, wtou8Decoder);
+  Utf8Base64Decoder base64Decoder;
+  DecoderStream base64Stream(wtou8Stream, base64Decoder);
+  base64Stream.Read(binary ...);
+}
+```
+
+## Data Compression
+
+Use `LzwEncoder` and `LzwDecoder` for data compression.
+
+- `LzwEncoder` compress binary data.
+- `LzwDecoder` decompress binary data.
+
+## Helper Functions
+
+Use `CopyStream`, `CompressStream`, `DecompressStream` helper functions.
+
+There are help functions `CopyStream`, `CompressStream` and `DecompressStream` to make the code simpler.
+
+## Extra Content
+
+### Encoding Selection Guidelines
+
+When choosing between different encoding methods:
+- Use BOM encoders when you need to ensure proper encoding detection by other applications
+- Use general UTF encoders for maximum compatibility and control over BOM presence
+- Use MBCS encoders only when working with legacy systems that require ASCII compatibility
+
+### Performance Considerations
+
+Different encoding operations have varying performance characteristics:
+- ASCII/MBCS encoding is fastest but limited to basic character sets
+- UTF-8 encoding provides good balance between space efficiency and Unicode support
+- UTF-16 and UTF-32 provide different trade-offs between processing speed and memory usage
+
+### Error Handling
+
+Encoding operations may fail when:
+- Invalid byte sequences are encountered during decoding
+- Characters cannot be represented in the target encoding
+- BOM detection fails for corrupted files
+
+Always handle potential encoding exceptions, especially when processing user-provided files.
+
+### Pipeline Design
+
+The encoder/decoder system is designed for pipeline composition. You can chain multiple encoding operations together to create complex data transformation workflows, such as:
+1. Base64 decode → UTF-8 decode → String processing
+2. String processing → UTF-8 encode → Compression → File output
+
+This design provides flexibility for handling various data transformation scenarios efficiently.