Sync reserved characters proposal #9539

rasa · 2024-05-14T19:57:52Z

Sync reserved characters proposal v2

1. Preamble

This proposal is authored by @rasa and @JanKanis, and was inspired by JanKanis' comments here. It was last updated on 29-May-24. Feedback appreciated. An editable copy is here.

2. Abstract

Syncthing will report "Out of Sync" errors on peers where the underlying filesystem does not allow certain filenames that are allowed on other peers. This proposal addresses this issue. On https://roadmap.syncthing.net/, the issue is tied for 32nd, but was locked over five years ago, so it can't be voted on any more. If you value this proposal, click the thumbs up icon on this issue instead.

3. Motivation

As a user, I want to sync filenames containing reserved/unsupported/special characters on any filesystem. Specifically, I want to sync filenames containing "*:<>?| characters on NTFS/exFAT/FAT32 filesystems, which disallows these characters in filenames. For more information, see https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations

Use cases

@JanKanis' comment below documents several use cases.

4. Rationale

Each folder will be configured to use an encoder. Initially, there will be two encoders: the "None encoder" and the "FAT encoder".

All existing folders will start out "using" the None encoder. The None "encoder" isn't really an encoder. It's the way Syncthing works right now. In fact, no new code will be executed when a folder is configured to "use" the None encoder.

Newly created folders will default to using the None encoder as well.

The user can change a folder's encoder setting via the GUI, but only via Actions > Advanced > Folders > Folder. If possible, when the user clicks "Save", a dialog box will pop up that explains the potential pitfalls, and asks for a further confirmation.

5. Specification

The None encoder

The None encoder, as described above, is not really an encoder, as it reads and writes filenames on disk "as is," without any encoding. It does not reject or ignore any filenames it receives. It is designed to be used on filesystems that allow all characters except / and NUL, but it can be used on any filesystem. If it's used on a FAT-based filesystem, filenames a peer receives containing reserved characters won't be able to be written to disk, leading to out-of-sync errors.

The FAT encoder

The FAT encoder is designed to be used on filesystems that disallow the characters \"*:<>?| in filenames. When filenames with these characters are written to disk, the FAT encoder encodes the filename in a format that the filesystem will accept. When read from disk, the filename is decoded to its original filename, before being sent to the other peers. The FAT encoder can be used on any filesystem, but there is no reason to run it on a non-FAT filesystem.

To clarify, encoding is something that happens purely locally. File names sent over the wire in the Syncthing protocol always use the original pre-encoded names, and peers don’t know if another peer is using any sort of encoder when storing their files.

Unicode Private Use Area (PUA) characters

The FAT encoder will replace reserved characters with Unicode Private Use characters (\xf000 - \xf0ff). It is the same method that GitBash, Windows Subsystem for Linux (WSL), Cygwin, MSYS, Linux's CIFS driver, and other platforms, use to save these filenames, and was first implemented in 1996. It requires the underlying filesystem allow UTF-8 characters, such as NTFS, exFAT, and VFAT. For the technical details, see https://cygwin.com/cygwin-ug-net/using-specialnames.html.

6. Backwards Compatibility

Since all folders, both existing and newly created ones, will default to using the None encoder, there are no backward compatibility issues. From the user's perspective nothing changes, and encoding-aware peers can communicate with non-encoding-aware peers without any issues.

A user can even downgrade a peer from a encoding-aware build, to a non-encoding-aware build without issue.

The only issue that can manifest, is if all of the following occurs:

A folder is set to use the FAT encoder
The peer using the FAT encoder received filenames that required encoding, and it saves these encoded filenames to disk.
The user switched the folder back to using the None encoder (or Syncthing was downgraded to a non-encoding-aware build, effectively "using" the None encoder).

First, we'll describe the problem in detail, and then the proposed solution.

The problem

We have two peers: N and F. Both use the None encoder. Peer F's filesystem is FAT, and so it had an out-of-sync error when it received a file named acolon:. Peer F switched its folder's encoder to FAT, which now can save acolon: as acolon\xf03a, and the out-of-sync error goes away.

Now, peer F switched the folder's encoder from FAT, back to None. The None encoder on peer F will find the file acolon\xf03a on disk and sync this file to peer N, which will see it as a new file, and save it. Peer N now has two files named acolon: and acolon\xf03a, which are effectively the same file.

Peer N will then sync these files with peer F. Peer F will still accept acolon\xf03a, but will reject acolon: as it has a reserved character, leading to an out-of-sync issue.

Proposed solution

A separate CLI program is run on any peer where the folder is not on a FAT filesystem. Using the example above, the program is run on peer N. It searches for files where encoded files (acolon\xf03a) coexist with their pre-encoded equivalents (acolon:). If a pair is found, it will see if the files are the same. If they are, it will delete the encoded version (acolon\xf03a).

If the two files differ, it will display the two filenames, timestamps, sizes, and attributes to the user, and ask them to choose:

Keep acolon: only (by deleting acolon\xf03a)
Keep acolon\xf03a only (by renaming acolon\xf03a to acolon:)
Keep both files (so they can research, and rerun, or possibly correct manually)

Option 1 - Keep `acolon:`

Peer N syncs the delete of acolon\xf03a with the other peers. None peers will process the delete. FAT peers will silently ignore the delete, as they ignore all encoded filenames they receive on the wire.

Option 2 - Keep `acolon\xf03a`

Syncthing sees this rename of acolon\xf03a to acolon: as deleting acolon\xf03a and updating acolon:. None peers will process both the delete and update. FAT peers will ignore the delete, and update acolon:, by encoding the filename as acolon\xf03a.

Automating the process

The following startup options would automate the above selection process:

--decoded - always select the pre-encoded filename (choice 1. above)
--newer - always select the newer of the two files
--encoded - always select the encoded filename (choice 2. above)
--older - always select the older of the two files

The program will not back up files before deleting them. If a user wants backups, they should turn on versioning on a None peer, before running the program.

Which option is most likely to be the right one?

Option 1, "Keep acolon:", will almost always be the best choice. Why? Because pre-encoded filenames almost always originated on non-FAT peers, as users cannot generally create these filenames on FAT peers. The most likely way a user on a FAT peer created an encoded filename themselves, is if they created the file via a CLI environment, such as GitBash, Cygwin, MSys2, WSL, etc. So, since they most likely didn't author the file, it's less likely that a FAT peer will be the last one updating it.

7. Security Implications

Other that an peer intentionally changing from FAT to None to cause duplications, I can't think of any security implications. If they do this on a Linux peer, they could edit one of the duplicates with the hope of another peer accepting their edited version instead of the original version of the file. If you have malicious peers, you have bigger problems than this proposal.

8. How to Teach This

The documentation will explain the benifits and drawbacks of changing a folder's encoder.

9. Reference Implementation

@rasa has volunteered to draft a PR with full unit tests if this proposal is accepted. Integration tests will also be provided using the new framework provided in #9266. @rasa will also draft a PR for the documentation needed.

10. Rejected Ideas

We are not aware of any alternate proposals.

11. Open Issues

None that we are aware of, but here's a good place to list a potential future enhancement:

Warning the user the encoder was changed

Due to the duplicate file issue noted above, we may want to alert the user whenever a folder's encoder is changed from FAT back to None. To do this, we can update .stfolder/syncthing-folder-xxxxxx.txt (See #9525), with either Encoder: None or Encoder: FAT, if the entry is missing.

Then whenever Syncthing starts up, if the encoder in the .stfolder file listed FAT, but config.xml lists None, a warning is shown in the GUI. The user can select "Revert", "Accept" or "Ignore". If they select "Revert", the encoder setting is changed back to FAT in config.xml. If they select "Accept", the .stfolder file is updated to contain Encoder: None. If they select "Ignore", the message goes away, until Syncthing restarts.

We could also provide CLI users with these options:
--report-on-encoder-changes: if the encoder was switched from FAT to None, scan the filesystem, and if there are duplicate files, log the duplicates, and continue
--abort-on-encoder-changes: do the above, but quit instead
--accept-encoder-changes: Update the .stfolder file with Encoder: None
--revert-encoder-changes: Switch the encoder back to FAT in the config.xml file

If no option is provided, a warning about the encoder change is logged.

12. Footnotes

For reference, see:
https://cygwin.com/cygwin-ug-net/using-specialnames.html
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations .

For implementations, see
https://github.com/mirror/newlib-cygwin/blob/fb01286fab9b370c86323f84a46285cfbebfe4ff/winsup/cygwin/path.cc#L435
https://github.com/billziss-gh/winfsp/blob/6e3a8f70b2bd958960012447544d492fc6a2f1af/src/shared/ku/posix.c#L1250
https://github.com/torvalds/linux/blob/master/fs/cifs/cifs_unicode.h#L27

Other encoding methods that could be implemented

URL-encoded

This encoding replaces reserved characters with their URL-encoded equivalent. See https://en.m.wikipedia.org/wiki/Percent-encoding. This would be a good choice on filesystems that don't support UTF-8 characters. Proposed by @AudriusButkevicius.

Samba's Catia mapping

This encoding replaces reserved characters using the mapping "→¨ *→¤ /→ø :→÷ <→« >→» ?→¿ \→ÿ |→¦. This would be a good choice if the user wants to encode to more visually related characters. See https://www.samba.org/samba/docs/current/man-html/vfs_catia.8.html. Proposed by @JanKanis.

The text was updated successfully, but these errors were encountered:

JanKanis · 2024-05-14T21:54:22Z

a nitpick, before I forget it. I'll see if I have more substantial comments when I have more time.

re url encoding and Samba's Catia mapping encoders: FAT12/16/32 do support unicode, as utf-16 I suppose, so they should be able to handle the PUA encoder just fine. The Catia mapping also requires unicode support. But the filesystem specifically using UTF-8 is not required for any encoder.

rasa · 2024-05-14T22:01:01Z

FAT12/16/32 do support unicode, as utf-16 I suppose, so they should be able to handle the PUA encoder just fine.

You may be right. I was going off of here, but it may be wrong. I will remove that claim.

The Catia mapping also requires unicode support.

Will update doc.

But the filesystem specifically using UTF-8 is not required for any encoder.

Sorry, I don't follow. Can you clarify?

AudriusButkevicius · 2024-05-14T22:01:56Z

I think I got lost in the levels of inception of re-encoding, but I think this should be handled exactly the same way as the "case insensitive fs" wrapper.
i.e., there is a file system wrapper that hides the gnarlyness of having to encode/decode files, so to syncthing it should look like every filesystem supports everything, and I'm not that concerned as to how the sausages are made.

I agree that you can end up with cases where you switch between the different wrappers leading to unexpected effects, i.e., files that were claimed to be with: now suddenly get deleted, and replaced with some encoded version, but I think that is ok, as there is no actual data loss, there is just change in names, and setting the encoding back on would unwind this.

In majority of the cases the codec should be a no-op, and that's fine, switching it back and fourth should have no effect, and will only matter for cases where you do have a genuine ":" in the paths, which should be very few cases.

I guess the more interesting case that I don't see handled is where our encoding scheme clashes with files that already exist.
Namely I replace : with unicorn, want to sync a:, but already have a file aunicorn.
I guess perhaps this was covered by the inception of re-encodings, but I guess it lacks clarity and examples for me to digest what is being said there.

Agreed, we can have helper cli utility that help "decode" or "encode" things in place to allow you to convert.

acolomb · 2024-05-14T22:09:44Z

I think taking a step back and defining the assumed invariants would be good before diving into details and an action plan.

Is "encoding" always a well-defined, reversible process? Do encode and decode schemes have perfect symmetry and lossless round-trips?
Do we acknowledge that there is no way to deduce an encoding solely based on looking at encoded names? As long as we don't have any reserved escape character(s) that are otherwise forbidden except in names encoded by Syncthing, we must assume that detecting "already encoded" names vs. "happens to use one of our replacement characters" is a best-effort heuristic.
Can the encoding / decoding process be made foolproof if we assume the encoding scheme is known? Imagine a more radical encoder which, e.g. translates all names to their base64 equivalent. Then detecting a single file that uses any other character would point to a misconfiguration or externally placed, non-encoded files.
Is the encoding a strictly local matter, or is it announced to other peers?

Regarding point 3, I really like the idea of storing the encoding scheme with the data, under .stfolder. It survives database resets and messing with the configuration. I would even argue that once put there, Syncthing should not support changing the encoding, but rather require the folder to be set up again from scratch. Putting an encoding marker there after the fact should be safe, as any file name found locally that cannot be a product of that encoding, will be renamed (encoded) at that time. Assuming the choice of encoder is sensible (e.g. FAT on a FAT filesystem), there cannot even be unencoded existing names, as the filesystem would not allow them.

As to point 1, we do have some kind of encoders already in Syncthing: encrypted names on untrusted devices (not easily reversible) and the Unicode normalization code (also not reversible if the previous name was not normalized). Looking at those might give some hints regarding the invariance questions. Integrating that functionality with the proposed encoding stuff is probably too far fetched though.

Thinking one step further, I could imagine even more radical encoders emerging, such as the mentioned base64 encoding. That might prove useful to implement further filesystem types in Syncthing, e.g. to add object stores. But then it needs to be clear whether this encoding machinery works with only a (non-reversible) hash function. Again, laying down these invariants / requirements for encoding schemes will help set the boundaries for designing the basic encoders we actually need in the first step.

JanKanis · 2024-05-15T06:43:55Z

But the filesystem specifically using UTF-8 is not required for any encoder.

Sorry, I don't follow. Can you clarify?

s/UTF-8/unicode/. It doesn't matter if a filesystem uses UTF8, they need to support unicode, any unicode encoding will do.

JanKanis · 2024-05-15T06:53:48Z

FAT12/16/32 do support unicode, as utf-16 I suppose, so they should be able to handle the PUA encoder just fine.

You may be right. I was going off of here, but it may be wrong. I will remove that claim.

The base filesystems only support 8.3 length non-unicode filenames, but Windows uses an extension to also store longer unicode filenames as an add-on.

calmh · 2024-05-15T10:55:11Z

Thank you for writing this up, it's an excellent summary of the problem, your proposed solutions, and the potential issues. ❤️

For me, however, it also illustrates quite clearly why I'm disinclined to accept the proposal (and the corresponding PR). In a nutshell, the problem ("I want to sync filenames containing reserved/unsupported/special characters on any filesystem") is fairly easily avoided and/or corrected when it surfaces. The proposed solutions, however, are complicated and error prone, and the result of mistakes and misconfigurations much harder to reason about and fix than the original problem. In my mind this makes the cost higher than the benefit.

rdebath · 2024-05-15T14:44:42Z

Some small points to start with ...

There seems to be a requirement for a machine with a "Default" encoder within the swarm; it should not be assumed that there will be a Unix host available, all running on different versions of Windows seems rather likely. This would mean that if an particular host switches to a particular non-"Default" encoder you need to be able to fix the "mess" from that host.
Please use vFAT rather than FAT for your encoder name. This is because, pedantically, FAT is a filesystem that only supports uppercase Ascii 8+3 filenames (with some extras depending on localisation) that can (IMO) only be upgraded by a full overlay filesystem like umsdos.
Don't forget that the valid characters on a Windows filesystem depend on the version of Windows (sometimes even build).

rdebath · 2024-05-15T15:13:45Z

Oh and as a counterpoint.

The requirement seems to be that a particular host has all the files created on every other peer irrespective of the name it might be given here to overcome any local limitation. This is presumably useful for things like backup servers.

In that case a translation like the previously mentioned base64 would be acceptable, BUT might still hit a file length limitation. Taking a secure hash (MD5, SHA1 etc) of the pathname would give a name with four or five 8 character sections for any original filename which is (basically) guaranteed to be unique.

A small database containing a list of all the paths would be required to know what filenames are stored on the local FS. Working with the filesystem would be mostly trivial but there would be no method of migrating to or from this scheme except for adding another peer to the swarm. Though individual pathnames can be translated using simple tools like sha1sum so restoring particular files would be quite feasible.

Personally I'm more likely to make the backup server a Linux box.

JanKanis · 2024-05-29T14:56:49Z

Hi @rasa, I finally got around to writing a full reply to your proposal.

Use cases

Judging by the other comments, the use cases/purpose needs fleshing out. My personal use case is the following:

I use Linux as desktop, and so I have some of my personal documents using windows reserved characters in their filenames. I also have some documentation downloaded for local use from a website using wget. Those pages used '?' in their url, so the files now also have that in their names. I would like to sync my documents to my Android phone, which does not accept those characters (depending on Android version etc). Of course I could go through all these files to rename them. But in the case of the wget-ed webpages that would break their internal links. I also don't really want to change my names as I mainly use these documents on my Linux laptop, and only occasionally on my phone.

For my use case I want to be able to view/edit the files with existing Android apps, so proposals such as base64-encoding or storing a filename hash don't cut it for me. In that case I would not be able to identify the file when browsing through the files in e.g. an Android file manager or any other app that is not Syncthing. Using the Unicode PUA works, as I only occasionally have a reserved character in my file names and I can still identify them from the rest of the file name. That is why I proposed including the Samba Catia mapping, in which all characters are still identifiable without using any special software.

Other use cases could be backing up your personal files to a Windows server, or using multiple computers with mixed operating systems where Mac or Linux is your main OS, or when handling files from a WSL/cygwin/etc environment on Windows. Other use cases could be when you are not syncing your own personal files, but a file set over which you have no direct control and for which you thus can't just change the file names. However some real use cases are probably more compelling than what I can think up.

Restricting file names

I see that, compared to my previous proposal, you've not adopted the part of configuring certain file name characters as disallowed for a folder. I'm totally fine with that, Syncthing cannot actually control what users put in to their synced folders anyway. It was primarily a way to surface something similar to the existing proposal in a way that would be easier to understand.

The Default encoder

I would suggest renaming it to "None". A setting encoder: None is imo the clearest way to signal to users that this encoder doesn't do anything. If it's named 'default', I would then need to go look up what the default encoding for syncthing is.

re-encoding 'inception'

As others have also mentioned, the whole part on re-encoding and re-re-encoding is overly complicated. First, the encoder doesn’t know the meaning of the file names it receives, it only knows that it sees some characters incoming that are also in its encoding codomain. The question then is how to handle that. I think it would be clearer to rephrase the section in terms of incoming characters instead of as “re-(re-)*encoding”.

IMO there are two sane ways to handle it: don’t, or escape. Adding a separate encoder for different re-encoding is imo way too much complexity for questionable gain, so the FAT encoder should just implement one of these options.

If the encoder does not handle incoming encoding target characters, the encoder should reject the file which should lead to a synchronization failure, just like currently already happens with file names with reserved characters. However encoder target characters, whether PUA or the Samba Catia mapping, should be a lot less common than e.g. : and ? and probably won’t be an issue for most users.

If the encoder escapes target characters, it would prefix such codomain characters with an escape character such as \xf05c (in case of the PUA encoder). If the input name already contains an \xf05c, that would be escaped just like any other encoding codomain character, so it would end up as \xf05c\xf05c. No need to handle re-re-encoding specially. The only problem is that this increases the file name length, so it would not work for files with a length of 255. That isn’t really an issue as the proposed PUA encoding already increases the file name length by requiring surrogate pairs to encode the PUA characters. Also, file names that are almost 255 bytes long are quite rare in my observation, and file length limits already differ between file systems.

For this issue it is also worth finding out what WSL/cygwin/Mys2/CIFS do when they encounter the PUA target characters.

There are other ways to store files that sidestep the whole filename character issue, such as base64-encoding them or storing a base64-d hash of the filename and a separate file with the real filenames. But with such solutions it is no longer practical to edit files on the encoded side, and the feature set that Syncthing would offer in such a case would be (practically speaking) one-way backup instead of two way synchronization, which is Syncthing’s unique selling point.

I have a slight preference for doing escaping, but I’m also fine with not handling re-encoding. Especially if that—being simpler—contributes to the proposal being accepted.

Switching encoder settings

A lot of the potential problems come from changing encoder settings. There’s one simple solution for (most of) this problem: don’t allow changing the encoding setting. Syncthing already does this by not allowing the folder path and ID to be changed. Even though changing the path shouldn’t be such a big problem (as far as I know). Users can still change the encoding by editing the config.xml file directly, but in that case they should know what they are doing, and the user should manually rename any affected files or make sure there are no affected files in the folder.

There is still an issue with downgrading Syncthing after creating a folder with the FAT encoding, but I don’t think that is worth bothering about a lot. It seems like a quite rare situation, there won’t be any data loss, all that happens is files can be duplicated or their names messed up, and it can be fixed by a script or cli tool.

Of course it is also possible to handle this in the GUI. That would certainly be more user friendly, but I’m not sure if it is worth the complexity. In that case Syncthing should handle it as proposed in Potential Issues 5. If all files can be renamed automatically (or don’t need renaming), Syncthing can just do the rename. Otherwise, it should ask the user and affected files would need to be deleted and un-synced.

Technical scope of the encoder framework

Judging by some of the discussion, the proposal should probably clarify that the encoding is something that happens purely locally. File names sent over the wire in the Syncthing protocol are always the unencoded names, and nodes don’t know about each other if they use any kind of encoder when storing their files.

Proposal document structure

The proposal document is a bit too complicated, in my opinion. And that probably contributes to the issue appearing more complex to readers than it actually is.

Specifically, the description of what the encoders actually do is spread around the document, under the headings “The encoders”, “Other encoders”, and “Possible encoding methods”. I think the main proposal (including the PUA encoding) should be at the beginning of the document, so readers not already familiar with the existing discussion have a clear view of what the (current) proposal constitutes.

Also, you divide the work into several “phases”, but you don’t specify that you’re doing that or what these phases are before referring to them. However I think the notion of defining separate phases should be dropped altogether. We only need two “phases”: what will be included in the current proposal and (assuming it gets accepted) pull request, and Future Extensions, i.e. everything that can be implemented as later enhancements. The goal being to limit the scope and complexity of the current proposal, both in the amount of code that needs to be written, but more importantly in the number of issues to be discussed and that can be disagreed about. IMO the proposal should correspond mostly to what is now phase 1 using the PUA encoder, and probably not allowing changing the encoder from the GUI. For the choices of changing encoder and handling filenames that are already encoded, these are the most simple options and other options can be implemented in the future. These other enhancements should still be mentioned under future enhancements, of course.

As a reference it might be helpful to see the list of sections a Python Enhancement Proposal (PEP) should include. Not all of them apply to Syncthing, but it is still a helpful and thought out structure.

rasa · 2024-05-30T02:38:01Z

@JanKanis I completely rewrote the proposal, using the PEP layout you referenced. Let me know if I captured all your feedback, or if further simplification is needed. Clearly, my initial draft was way too complicated to be accepted. Lesson learned! Thanks again for the thoughtful and detailed feedback!

JanKanis · 2024-05-30T10:45:57Z

If this issue is important to you, visit #1734, and click the thumbs up icon.

I'd like to do that, but I can not add any emoticon response/vote to it, I guess because the issue is locked. Also after authenticating to roadmap.syncthing.net, voting on the issue there doesn't do anything, presumably also because the issue is locked.

JanKanis · 2024-05-30T15:46:05Z

I added a number of comments to the Google doc version on the document structure.

One other question: you haven't commented on or adopted what I proposed w.r.t. changing the encoder setting from the GUI. (disallowing it, or if allowed, make sure all files are renamed) What do you think about that?

rasa · 2024-05-30T18:13:47Z

One other question: you haven't commented on or adopted what I proposed w.r.t. changing the encoder setting from the GUI. (disallowing it, or if allowed, make sure all files are renamed) What do you think about that?

@JanKanis It's a good idea, but I don't think we can make a field read-only on the Actions > Advanced > Folder page. And that page already has a big red message

Be careful! Incorrect configuration may damage your folder contents and render Syncthing inoperable.

so the user's been warned enough. And it doesn't appear that page restricts the user from editing any field, including the folderID, so adding logic to change some fields to read-only may defeat the purpose of this page (which is to change any field, no matter how disastrous the change would be, like changing 'Filesystem Type' to fake).

But if we ever add the setting to the folder's setting page, we should make the field read-only if it's not None. I purposely left this idea out of the proposal, as, IMO, changing the encoder is an "Advanced" feature, such as changing the "Case Sensitive FS", "Junctions As Dirs", or "modTimeWindowS" settings.

JanKanis · 2024-05-30T18:54:59Z

Ah, I wasn't aware of that advanced configuration page. That's basically equivalent to directly editing the configuration file, so I agree with you then. Does that mean this option will be an advanced configuration only feature, or do you still want to show the option in the regular UI when creating a new folder?

rasa · 2024-05-31T03:15:49Z

Does that mean this option will be an advanced configuration only feature?

@JanKanis IMO, yes, as long as we have the potential for duplicate files, I think we need to hide the option from the user.

calmh · 2024-05-31T07:33:50Z

I don't think all the details around config handling need to be defined right out of the gate, but my gut feeling is that this would be the new default on Windows and Android, editable at folder creation time, and otherwise handled pretty much like the folder path -- not easily editable, with some FAQ or doc article explaining the situation and what to do to change it safely.

rasa added enhancement New features or improvements of some kind, as opposed to a problem (bug) needs-triage New issues needed to be validated labels May 14, 2024

This was referenced May 14, 2024

lib/fs: Encode filenames containing reserved characters (fixes #1734) #7876

Closed

Rename special/unsupported characters in filenames #1734

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync reserved characters proposal #9539

Sync reserved characters proposal #9539

rasa commented May 14, 2024 •

edited

JanKanis commented May 14, 2024

rasa commented May 14, 2024

AudriusButkevicius commented May 14, 2024

acolomb commented May 14, 2024

JanKanis commented May 15, 2024

JanKanis commented May 15, 2024

calmh commented May 15, 2024

rdebath commented May 15, 2024

rdebath commented May 15, 2024

JanKanis commented May 29, 2024

rasa commented May 30, 2024

JanKanis commented May 30, 2024 •

edited

JanKanis commented May 30, 2024

rasa commented May 30, 2024

JanKanis commented May 30, 2024

rasa commented May 31, 2024

calmh commented May 31, 2024

Sync reserved characters proposal #9539

Sync reserved characters proposal #9539

Comments

rasa commented May 14, 2024 • edited

Sync reserved characters proposal v2

1. Preamble

2. Abstract

3. Motivation

Use cases

4. Rationale

5. Specification

The None encoder

The FAT encoder

Unicode Private Use Area (PUA) characters

6. Backwards Compatibility

The problem

Proposed solution

Option 1 - Keep acolon:

Option 2 - Keep acolon\xf03a

Automating the process

Which option is most likely to be the right one?

7. Security Implications

8. How to Teach This

9. Reference Implementation

10. Rejected Ideas

11. Open Issues

Warning the user the encoder was changed

12. Footnotes

Other encoding methods that could be implemented

URL-encoded

Samba's Catia mapping

JanKanis commented May 14, 2024

rasa commented May 14, 2024

AudriusButkevicius commented May 14, 2024

acolomb commented May 14, 2024

JanKanis commented May 15, 2024

JanKanis commented May 15, 2024

calmh commented May 15, 2024

rdebath commented May 15, 2024

rdebath commented May 15, 2024

JanKanis commented May 29, 2024

Use cases

Restricting file names

The Default encoder

re-encoding 'inception'

Switching encoder settings

Technical scope of the encoder framework

Proposal document structure

rasa commented May 30, 2024

JanKanis commented May 30, 2024 • edited

JanKanis commented May 30, 2024

rasa commented May 30, 2024

JanKanis commented May 30, 2024

rasa commented May 31, 2024

calmh commented May 31, 2024

rasa commented May 14, 2024 •

edited

Option 1 - Keep `acolon:`

Option 2 - Keep `acolon\xf03a`

JanKanis commented May 30, 2024 •

edited