Ticket #158 (accepted defect)

Opened 6 years ago

Last modified 5 years ago

Not being able to work with files with accents

Reported by: geroni339-noip@yahoo.es Owned by: alamaison
Priority: critical (affects core workflow) Milestone: 0.90 Beta 1
Component: i18n Version: 0.6.0
Keywords: remote files accent Cc:

Description

Hi,
I'm new to SSH and I'm exploring the possibilities of my new dd-wrt router. My main goal is to have a USB drive atached to the router, being able to be accessed remotely over the internet (backup, file access, ...).

I came across with SWISH, which gives me the possibility to mount, somehow, a folder from the USB drive and access it remotely, as well as opening files (word, autocad, pdf, etc.). It is excelent, since winSCP does not allow me to open other files than text files. Everything works well for me, except for those files containing accents in its name. I cannot download, upload or open files with accents. The name is also strangely displayed, not showing accents but strange symbols.

I haven't found any option in the program in order to correct it, should it exist.

Since I'm writing in Spanish, where accents are used, it is a big problem not being able to work with files containing accents in its name.

I'm writing you to let you know about this problem, and asking help on that topic, if possible.

I'm enclosing two screenshots in order to show you the view from the same folder (with its files) from SWISH and from winSCP, if it could help with my explanation. Characters with accents are clearly distorted in SWISH.

Thank you very much in advance.


Attachments

swish.JPG Download (62.8 KB) - added by anonymous 6 years ago.
View of swish with files containing accents
Winscp.JPG Download (104.2 KB) - added by anonymous 6 years ago.
View of winscp with files containing accents
Pantalla help.JPG Download (55.6 KB) - added by geroni339-noip@yahoo.es 6 years ago.
Screenshot of "locale"

Change History

Changed 6 years ago by anonymous

View of swish with files containing accents

Changed 6 years ago by anonymous

View of winscp with files containing accents

comment:1 Changed 6 years ago by alamaison

Interesting. Swish supports non-ASCII characters but assumes that the server is sending UTF-8 filenames. Unfortunately the SFTP standard doesn't require this so, I suspect, the problem is that the filenames on your USB drive are stored as non-unicode and sent as-is.

It would help if you could do a little experiment. Could you copy your files to a PC running Linux or Mac OS and access them there using Swish? Assuming that PC stores the filenames as UTF-8, I'm expecting that they would appear correctly in Swish.

If so, we'll have to do some thinking about how to get Swish to behave. Perhaps it can try to auto-detect the filename encoding on a file-by-file basis.

comment:2 Changed 6 years ago by alamaison

  • Status changed from new to accepted

comment:3 Changed 6 years ago by alamaison

  • Priority changed from blocker (cannot release, e.g. data loss) to major (affects peripheral workflow)
  • Component changed from remote folder to backend

comment:4 follow-up: ↓ 5 Changed 6 years ago by geroni339-noip@yahoo.es

Hi, thanks for your fast reply.

I'm afraid I cannot help you with the experiment in Linux or Mac OS, since I have to admit that I'm only a windows user, and I unfortunately have no knowledge of Linux (though I'd like to), and I also have no friends in my comunity runing Linux or MacOS in order to ask them to help me with that test.

But, if you want, I could send you some of the files I showed to you on the photographs if it's easy for you to perform that test.

Let me know about that and, if you would be so kind to provide help with that, let me know how would you like the files to be sent (to an e-mail address, attached in a comment of the ticket...).

Kind regards.

comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 6 years ago by alamaison

Replying to geroni339-noip@…:

I'm afraid I cannot help you with the experiment in Linux or Mac OS, since I have to admit that I'm only a windows user, and I unfortunately have no knowledge of Linux (though I'd like to), and I also have no friends in my comunity runing Linux or MacOS in order to ask them to help me with that test.

No worries.

I've looked into it a bit more (and played with my own DD-WRT router). Do you know what SSH server is running on your router? On mine its DropBox? but mine doesn't have SFTP so I can't test the filenames. If you log in to your router using Putty and look at the event log it should tell you.

But, if you want, I could send you some of the files I showed to you on the photographs if it's easy for you to perform that test.

I'm not sure that would help because sending them will most likely reencode the filenames.

Do you know the filesystem of the USB drive you are accessing? Most Windows drives store filenames as unicode so it would be odd for the SFTP server to reencode these as something non-unicode. And where did the files come from? A windows PC?

comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 6 years ago by geroni339-noip@yahoo.es

Replying to alamaison:

Hi alamaison, I've been "out of bussiness" these days. Sorry for the delay.

I've looked into it a bit more (and played with my own DD-WRT router). Do you know what SSH server is running on your router? On mine its DropBox? but mine doesn't have SFTP so I can't test the filenames. If you log in to your router using Putty and look at the event log it should tell you.

I've checked what version of SSH server is running my router. I logged with a software called "Bitvise Tunnelier" (pretty similar to Putty, with a friendly GUI which works well for me) and the event log states "Server version string: SSH-2.0-dropbear_0.52".

Do you know the filesystem of the USB drive you are accessing? Most Windows drives store filenames as unicode so it would be odd for the SFTP server to reencode these as something non-unicode. And where did the files come from? A windows PC?

The filesystem of the USB I'm using is FAT32. The files come from a windows PC. In fact, these files are related to my work. One of the files that you saw in the fotograph having strange characters is a CAD file, and the other displaying a strange character is a PDF file generated from a website guide on how to configure dinamic DNS. So, they are files generated in my windows PC.

I hope that helps. Should you need any other answer or help (If I have the knowledge to help with it) I'll be delighted to contribute.

Kind regards.

comment:7 in reply to: ↑ 6 ; follow-up: ↓ 9 Changed 6 years ago by alamaison

Replying to geroni339-noip@…:

Replying to alamaison:

Hi alamaison, I've been "out of bussiness" these days. Sorry for the delay.

I've checked what version of SSH server is running my router. I logged with a software called "Bitvise Tunnelier" (pretty similar to Putty, with a friendly GUI which works well for me) and the event log states "Server version string: SSH-2.0-dropbear_0.52".

How did you put SFTP support on your router? Was it via the Optware packages?

Do you know the filesystem of the USB drive you are accessing? Most Windows drives store filenames as unicode so it would be odd for the SFTP server to reencode these as something non-unicode. And where did the files come from? A windows PC?

The filesystem of the USB I'm using is FAT32. The files come from a windows PC. In fact, these files are related to my work. One of the files that you saw in the fotograph having strange characters is a CAD file, and the other displaying a strange character is a PDF file generated from a website guide on how to configure dinamic DNS. So, they are files generated in my windows PC.

Truly odd. FAT32 encodes filenames as Unicode (so were aren't even dealing with the usual Unix encoding nightmare) which means that your server is taking those and explicitly converting them into a non-unicode format! I thought it was just being lazy and not bothering to convert them from a non-unicode format.

When I get time (not imminently) I'll set up SFTP on my own router and try to replicate the issue. In the mean time, I suggest you look into how your SFTP is configured. You may find there is a setting to get your SFTP server to render UTF-8 filenames.

comment:8 follow-up: ↓ 11 Changed 6 years ago by alamaison

It may be a problem with how your disk is mounted. It may  need an additional language pack.

If you connect to your router via SMB (aka Windows Shares) how do the filenames appear?

comment:9 in reply to: ↑ 7 ; follow-up: ↓ 10 Changed 6 years ago by geroni339-noip@yahoo.es

Replying to alamaison:

How did you put SFTP support on your router? Was it via the Optware packages?

The only thing I did was activating Secure Shell. In my DD-WRT router I went to tab "services" --> option "services" and in that page I went to the box "Secure Shell", and there I selected the radio buttons displayed there (SSHd, password, port, etc.).

So, I haven't used any optware packages. I'm only using the plain installation of DD-WRT v24SP2.

When I get time (not imminently) I'll set up SFTP on my own router and try to replicate the issue. In the mean time, I suggest you look into how your SFTP is configured. You may find there is a setting to get your SFTP server to render UTF-8 filenames.

I've been browsing right now through the options of the router and I haven't seen the possibility to modify anything else from the SFTP server. Since the access was being done through SSH I deactivated FTP (in the DD-WRT webserver it's in tab "services" --> option "NAS"), but there is nothing there as well. I checked having FTP activated or deactivated, and there is no difference when I'm executing SWISH. So, I didn't find a setting to render UTF-8 filenames.

As I told you, I'm a newbie in that world of SSH and linux OS. Shall I see how my SFTP is configured from the command line of the terminal (i.e. from PuTTY's terminal)? In that case, what should I do?

Thanks in advance.

comment:10 in reply to: ↑ 9 Changed 6 years ago by alamaison

Replying to geroni339-noip@…:

Replying to alamaison:

How did you put SFTP support on your router? Was it via the Optware packages?

The only thing I did was activating Secure Shell. In my DD-WRT router I went to tab "services" --> option "services" and in that page I went to the box "Secure Shell", and there I selected the radio buttons displayed there (SSHd, password, port, etc.).

I guess our DD-WRT images are a bit different - mine comes with SSH but not SFTP.

When I get time (not imminently) I'll set up SFTP on my own router and try to replicate the issue. In the mean time, I suggest you look into how your SFTP is configured. You may find there is a setting to get your SFTP server to render UTF-8 filenames.

I've been browsing right now through the options of the router and I haven't seen the possibility to modify anything else from the SFTP server. Since the access was being done through SSH I deactivated FTP (in the DD-WRT webserver it's in tab "services" --> option "NAS"), but there is nothing there as well. I checked having FTP activated or deactivated, and there is no difference when I'm executing SWISH. So, I didn't find a setting to render UTF-8 filenames.

That sort of option won't be via the web interface.

As I told you, I'm a newbie in that world of SSH and linux OS. Shall I see how my SFTP is configured from the command line of the terminal (i.e. from PuTTY's terminal)? In that case, what should I do?

Yes, that's the way to do it. I don't know what option you have to set. Google is your friend.

It probably isn't you SFTP server at fault, now that I think about it. It's more likely your OS. Desktop linux nowadays always uses UTF-8 by default. However, I can well imagine that the embedded linux used in DD-WRT doesn't. Not sure how you change that though.

Or it could be the language pack thing I linked above.

comment:11 in reply to: ↑ 8 ; follow-up: ↓ 12 Changed 6 years ago by geroni339-noip@yahoo.es

Replying to alamaison:

Hi again

It may be a problem with how your disk is mounted. It may

 need an additional language pack.

I've checked the link that you have pointed me right now, and performed a dmesg command as it suggests.

Amongst all the text displayed, I haven't found anything like:

"FAT: codepage cpXYZ not found " or " FAT: IO charset isoVWXY-1 not found"

If you connect to your router via SMB (aka Windows Shares) how do the filenames appear?

I still haven't installed SAMBA server in the router (it is my pending task for next weekend). There are some Wiki pages on the DD-WRT explaining how to do that, and I'll follow them.

So, at the present moment I can't answer you to that.

Do you know if there is any other way to do what you are asking me?

Thanks.

comment:12 in reply to: ↑ 11 ; follow-up: ↓ 14 Changed 6 years ago by alamaison

Replying to geroni339-noip@…:

Do you know if there is any other way to do what you are asking me?

Can you post the output of running the locale command?

More information is  here and  here.

comment:13 Changed 6 years ago by alamaison

  • Milestone set to 0.6.1 Bug sprint

Changed 6 years ago by geroni339-noip@yahoo.es

Screenshot of "locale"

comment:14 in reply to: ↑ 12 ; follow-up: ↓ 15 Changed 6 years ago by anonymous

Replying to alamaison:

Hi, I discovered that busybox is an embeded version of Linux, and that some of the Linux commands have been removed. I also discovered through Google that busybox is compiled with UTF-8.

Can you post the output of running the locale command?

If I type "locale" I get the following message "sh: locale: not found".

If I write "help" I get a list of the built-in commands. (I attach a screenshot of that, file "pantalla help.jpg"). I've found that there is the command "local", but nothing seems to happend when I type it.

What I find odd is that using the same busybox embeded Linux, I can see file names having accents with no problems using software packages like "WinSCP" or "Bitvise Tunnelier", but unfortunatelly not "swish" (which is the software that really meets what I was looking for). Isn't that strange?

This weekend I will install Samba and I'll let you know what happens with the file names.

Thanks for your interest.

comment:15 in reply to: ↑ 14 Changed 6 years ago by alamaison

Replying to anonymous:

Replying to alamaison:

Hi, I discovered that busybox is an embeded version of Linux, and that some of the Linux commands have been removed. I also discovered through Google that busybox is compiled with UTF-8.

Can you post the output of running the locale command?

If I type "locale" I get the following message "sh: locale: not found".

It's possible DD-WRT only has the one fixed locale. I'll have to look into this some more.

What I find odd is that using the same busybox embeded Linux, I can see file names having accents with no problems using software packages like "WinSCP" or "Bitvise Tunnelier", but unfortunatelly not "swish" (which is the software that really meets what I was looking for). Isn't that strange?

Oh, this one is easily explained. WinSCP uses the non-Unicode 'Western' codepage by default when the connection uses SFTP v3. This means that accents appear correctly with DD-WRT but won't with any modern desktop Linux! In other words, it works for you by chance.

Now that's not entirely fair. It does give you a setting where you can change this to use UTF-8. Swish doesn't and neither do I want it to. Eventually, I'd like Swish to auto-detect the encoding of the filenames.

This weekend I will install Samba and I'll let you know what happens with the file names.

That would be an interesting experiment. You could also try the inbuilt ProFTPD server which might be easier to set up.

comment:16 Changed 5 years ago by alamaison

  • Milestone changed from 0.6.x Bug sprint to 0.90 Beta 1

comment:17 Changed 5 years ago by alamaison

* Ticket #188 marked duplicate of this one *

comment:18 Changed 5 years ago by alamaison

  • Priority changed from major (affects peripheral workflow) to critical (affects core workflow)
  • Component changed from backend to i18n

comment:19 Changed 5 years ago by alamaison

* Ticket #197 marked duplicate of this one *

comment:20 follow-up: ↓ 21 Changed 5 years ago by klonos@gmail.com

I was the one to file #197, sorry but the title of this ticket here doesn't make it clear that it is about Unicode in general.

The setup where I see this is Win7 x64 Greek PC --- [remote Swish connection to] ---> Ubuntu 12.04 Server x64 (openssh) with locale set to en_US.UTF-8 --- [LAN connection (mount) to] ---> Win7 x64 Greek PC.

The connection from the linux server to the Windows PC shares is actually a mount using:

mount -t cifs //server_name_or_ip/share -o username=someuser,password=somepass,iocharset=iso8859-7,codepage=737 /mnt/somedir

The filenames in the mounts display properly on the linux server locally (and when connecting remotely through Putty of course). They also display properly with WinSCP. The Greek characters show up as gibberish in ExpanDrive? and Swish.

comment:21 in reply to: ↑ 20 Changed 5 years ago by alamaison

Replying to klonos@…:

I was the one to file #197, sorry but the title of this ticket here doesn't make it clear that it is about Unicode in general.

Feel free to change it.

The setup where I see this is Win7 x64 Greek PC --- [remote Swish connection to] ---> Ubuntu 12.04 Server x64 (openssh) with locale set to en_US.UTF-8 --- [LAN connection (mount) to] ---> Win7 x64 Greek PC.

The connection from the linux server to the Windows PC shares is actually a mount using:

mount -t cifs //server_name_or_ip/share -o username=someuser,password=somepass,iocharset=iso8859-7,codepage=737 /mnt/somedir

Ok, so what happens here is you've told the Samba mount to convert all the Unicode Windows filenames into the Greek non-unicode codepage. Which it obediently does and then your Linux SFTP server sends those non-unicode filenames as unmolested binary blobs to Swish. Swish (slightly arrogantly) assumes the binary blob is UTF-8 and the result is gibberish.

In the other thread you say iocharset=utf8 doesn't work. Are you sure? With Swish, not WinSCP? Maybe removing the codepage option will fix that.

The filenames in the mounts display properly on the linux server locally (and when connecting remotely through Putty of course). They also display properly with WinSCP.

If WinSCP is reading them properly then you set it to use non-Unicode greek or it guessed from your Windows locale. WinSCP doesn't use unicode by default. I suspect if you set you mount to use iocharset=utf8 and ask WinSCP to use unicode then everyone will be able to read your filenames.

(Although I read somewhere that the iocharset flag didn't exist in recent Samba versions - not sure what happens then)

comment:22 Changed 5 years ago by klonos@gmail.com

Well what do you know?!? This is really embarrassing.

...I've been working with putty/winscp for so many years and each time I had issues with how Greek would show up, the first thing I touched was the server side. I must have seen the "UTF-8 encoding for filenames" setting in WinSCP and the "Remote character" drop-down in putty like 1 million times. I just thought that leaving it to "auto" would simply work. I've spent countless hours/days testing various settings and trying to figure out how to mount Windows shares with the proper options so that Greek would display properly too and it was there in front of my eyes all along.

Anyways, I've removed the iocharset and codepage options from the mount commands in my fstab and this time simply "enforced" UTF-8 in both WinSCP and Putty and they worked like a charm. Of course so did Swish and Greek file/directory names come up as they should. So, please scratch my bug reports here and over at #197 too because I just didn't know what the hell I was doing/talking about.

Thanx for pointing me to the right direction ;)

PS: ExpanDrive? mounted drives still have the issue, but that's a completely different story.

Note: See TracTickets for help on using tickets.