diff options
author | phajdan.jr@chromium.org <phajdan.jr@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98> | 2011-04-06 08:03:49 +0000 |
---|---|---|
committer | phajdan.jr@chromium.org <phajdan.jr@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98> | 2011-04-06 08:03:49 +0000 |
commit | 27528b197f63c0cb8d71416ffc5065570358be8b (patch) | |
tree | 5c69270196473d64a9a8aed7e9691ca3a33973ce /base/i18n | |
parent | 597787dd161b4959eb35cc9d0b365933fd2b315e (diff) | |
download | chromium_src-27528b197f63c0cb8d71416ffc5065570358be8b.zip chromium_src-27528b197f63c0cb8d71416ffc5065570358be8b.tar.gz chromium_src-27528b197f63c0cb8d71416ffc5065570358be8b.tar.bz2 |
Revert 80587 - FTP: Multiple fixes for localized directory listings:- fix detection of KOI8-R and possibly other encodings- fix parsing Russian month namesWhen detecting the listing encoding, we need to not onlycheck whether the data can be converted using given encoding,but also whether the result can be parsed as a valid directory listing.Also, we only need to compare the first three characters of theabbreviated month name, because that's how they're abbreviatedin FTP directory listings.Finally, the Russian directory listings have swapped the "month" and "day of month" columns.BUG=65917Review URL: http://codereview.chromium.org/6718043
TBR=phajdan.jr@chromium.org
Failures (Windows only, both Vista and XP):
FtpDirectoryListingBufferTest.Parse:
.\ftp\ftp_directory_listing_parser_unittest.cc(133): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(name)
Which is: L".."
Google Test trace:
.\ftp\ftp_directory_listing_parser_unittest.cc(112): Filename: ..
.\ftp\ftp_directory_listing_parser_unittest.cc(83): Test[25]: dir-listing-ls-25
.\ftp\ftp_directory_listing_parser_unittest.cc(133): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(name)
Which is: L".message"
Google Test trace:
.\ftp\ftp_directory_listing_parser_unittest.cc(112): Filename: .message
.\ftp\ftp_directory_listing_parser_unittest.cc(83): Test[25]: dir-listing-ls-25
.\ftp\ftp_directory_listing_parser_unittest.cc(133): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(name)
Which is: L".."
Google Test trace:
.\ftp\ftp_directory_listing_parser_unittest.cc(112): Filename: ..
.\ftp\ftp_directory_listing_parser_unittest.cc(83): Test[26]: dir-listing-ls-26
.\ftp\ftp_directory_listing_parser_unittest.cc(133): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(name)
Which is: L".message"
Google Test trace:
.\ftp\ftp_directory_listing_parser_unittest.cc(112): Filename: .message
.\ftp\ftp_directory_listing_parser_unittest.cc(83): Test[26]: dir-listing-ls-26
.\ftp\ftp_directory_listing_parser_unittest.cc(133): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(name)
Which is: L".."
Google Test trace:
.\ftp\ftp_directory_listing_parser_unittest.cc(112): Filename: ..
.\ftp\ftp_directory_listing_parser_unittest.cc(83): Test[27]: dir-listing-ls-27
.\ftp\ftp_directory_listing_parser_unittest.cc(133): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(name)
Which is: L".message"
Google Test trace:
.\ftp\ftp_directory_listing_parser_unittest.cc(112): Filename: .message
.\ftp\ftp_directory_listing_parser_unittest.cc(83): Test[27]: dir-listing-ls-27
FtpDirectoryListingParserLsTest.Good:
c:\b\build\slave\cr-win-rel\build\src\net/ftp/ftp_directory_listing_parser_unittest.h(47): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(test_case.filename)
Which is: L"test"
Google Test trace:
.\ftp\ftp_directory_listing_parser_ls_unittest.cc(109): Test[21]: -rwxrwxr-x 1 ftp ftp 123 23 май 2011 test
c:\b\build\slave\cr-win-rel\build\src\net/ftp/ftp_directory_listing_parser_unittest.h(47): error: Value of: entry.name
Actual: L""
Expected: UTF8ToUTF16(test_case.filename)
Which is: L"dir"
Google Test trace:
.\ftp\ftp_directory_listing_parser_ls_unittest.cc(109): Test[22]: drwxrwxr-x 1 ftp ftp 4096 19 окт 2011 dir
Review URL: http://codereview.chromium.org/6802006
git-svn-id: svn://svn.chromium.org/chrome/trunk/src@80590 0039d316-1c4b-4281-b951-d872f2087c98
Diffstat (limited to 'base/i18n')
-rw-r--r-- | base/i18n/icu_encoding_detection.cc | 40 | ||||
-rw-r--r-- | base/i18n/icu_encoding_detection.h | 2 |
2 files changed, 1 insertions, 41 deletions
diff --git a/base/i18n/icu_encoding_detection.cc b/base/i18n/icu_encoding_detection.cc index 3583fa9..d579af2 100644 --- a/base/i18n/icu_encoding_detection.cc +++ b/base/i18n/icu_encoding_detection.cc @@ -1,11 +1,9 @@ -// Copyright (c) 2011 The Chromium Authors. All rights reserved. +// Copyright (c) 2010 The Chromium Authors. All rights reserved. // Use of this source code is governed by a BSD-style license that can be // found in the LICENSE file. #include "base/i18n/icu_encoding_detection.h" -#include <set> - #include "base/string_util.h" #include "unicode/ucsdet.h" @@ -47,13 +45,6 @@ bool DetectAllEncodings(const std::string& text, return false; } - // ICU has some heuristics for encoding detection, such that the more likely - // encodings should be returned first. However, it doesn't always return - // all encodings that properly decode |text|, so we'll append more encodings - // later. To make that efficient, keep track of encodings sniffed in this - // first phase. - std::set<std::string> sniffed_encodings; - encodings->clear(); for (int i = 0; i < matches_count; i++) { UErrorCode get_name_status = U_ZERO_ERROR; @@ -63,37 +54,8 @@ bool DetectAllEncodings(const std::string& text, if (U_FAILURE(get_name_status)) continue; - int32_t confidence = ucsdet_getConfidence(matches[i], &get_name_status); - - // We also treat this error as non-fatal. - if (U_FAILURE(get_name_status)) - continue; - - // A confidence level >= 10 means that the encoding is expected to properly - // decode the text. Drop all encodings with lower confidence level. - if (confidence < 10) - continue; - encodings->push_back(encoding_name); - sniffed_encodings.insert(encoding_name); - } - - // Append all encodings not included earlier, in arbitrary order. - // TODO(jshin): This shouldn't be necessary, possible ICU bug. - // See also http://crbug.com/65917. - UEnumeration* detectable_encodings = ucsdet_getAllDetectableCharsets(detector, - &status); - int detectable_count = uenum_count(detectable_encodings, &status); - for (int i = 0; i < detectable_count; i++) { - int name_length; - const char* name_raw = uenum_next(detectable_encodings, - &name_length, - &status); - std::string name(name_raw, name_length); - if (sniffed_encodings.find(name) == sniffed_encodings.end()) - encodings->push_back(name); } - uenum_close(detectable_encodings); ucsdet_close(detector); return !encodings->empty(); diff --git a/base/i18n/icu_encoding_detection.h b/base/i18n/icu_encoding_detection.h index 552eb3d..cdc4cb7 100644 --- a/base/i18n/icu_encoding_detection.h +++ b/base/i18n/icu_encoding_detection.h @@ -18,8 +18,6 @@ bool DetectEncoding(const std::string& text, std::string* encoding); // Detect all possible encodings of |text| and put their names // (as returned by ICU) in |encodings|. Returns true on success. -// Note: this function may return encodings that may fail to decode |text|, -// the caller is responsible for handling that. bool DetectAllEncodings(const std::string& text, std::vector<std::string>* encodings); |