Android http url encode

Содержание

How to encode or decode a URL string in Java
URL Encoding in Java

How URL Encoding Works
URL Decoding in Java
Руководство по кодированию/декодированию URL-адресов Java
1. введение
2. Проанализируйте URL-адрес
3. Закодируйте URL-адрес
4. Расшифруйте URL-адрес
5. Кодирование сегмента пути
6. Заключение
Android http url encode
Username and Password
Query
Fragment
Encoding
Percent encoding
IDNA Mapping and Punycode encoding
Why another URL model?
Different URLs should be different
Equal URLs should be equal
If it works on the web, it should work in your application
Paths and Queries should decompose
Plus a modern API

How to encode or decode a URL string in Java

сентября 26, 2019 • Atta

It is a common practice to URL encode the query strings or form parameters while calling a remote web service to avoid cross-site attacks. URL encoding converts a string into a valid URL format that makes the transmitted data more reliable and secure.

In this article, you will learn how to URL encode or decode query strings and form parameters using Java.

URL Encoding in Java

You can easily encode a URL string or a form parameter into a valid URL format by using the URLEncoder class in Java. This utility class contains static methods for converting a string into the application/x-www-form-urlencoded MIME format.

The following example shows how to use the URLEncoder.encode() method to perform URL encoding in Java:

Here is how the encoded URL looks like:

The encode() method takes two parameters:

str — The string to be encoded.

encodingScheme — The name of the character encoding. In the above example, we used the UTF-8 encoding scheme. The World Wide Web Consortium recommends that the UTF-8 encoding scheme should be used whenever possible to avoid incompatibilities. If the given encoding is not supported, an UnsupportedEncodingException is thrown.

Common Pitfall: When performing URL encoding, don’t encode the entire URL. Only encode the individual query string parameter value or portion of the URI (path segment).

Let us have another example with multiple query string parameters encoding:

Here is how the output looks like:

How URL Encoding Works

When URL encoding a string, the following rules apply:

The alphanumeric characters ( a-z , A-Z , and 0-9 ) remain the same.

The special characters . , — , * , and _ remain the same.

The white space character » » is converted into a + sign. This is opposite to other programming languages like JavaScript which encodes the space character into %20 . But it is completely valid as the spaces in query string parameters are represented by + , and not %20 . The %20 is generally used to represent spaces in URI itself (the URL part before ? ).

All other characters are considered unsafe and are first converted into one or more bytes using the given encoding scheme. Then each byte is represented by the 3-character string %XY , where XY is the two-digit hexadecimal representation of the byte.

URL Decoding in Java

URL decoding is the process of converting URL encoding query strings and form parameters into their original form. By default, HTML form parameters are encoded using application/x-www-form-urlencoded MIME type. Before using them in your application, you must decode them. The same is the case with query string parameters included in the URL.

Mostly, these parameters are already decoded by the framework you’re using in your application like Spring or Express. But in a standalone Java application, you must manually decode query string and form parameters by using the URLDecoder utility class.

The following example uses the URLDecoder.decode() method to perform URL decoding in Java:

Here is the original URL printed on the console:

The decode() method accepts two parameters:

str — The string to be decoded.

encodingScheme — The name of the character encoding scheme. It is recommended to use the UTF-8 encoding to avoid incompatibilities with other systems.

The decoding process is the opposite of that used by the URLEncoder class. It is assumed that all characters in the encoded string are one of the following: a through z , A through Z , 0 through 9 , and — , _ , . , and * . The character % is permitted but is interpreted as the start of a special escaped sequence.

✌️ Like this article? Follow me on Twitter and LinkedIn. You can also subscribe to RSS Feed.

Источник

Руководство по кодированию/декодированию URL-адресов Java

В статье обсуждается кодирование URL-адресов в Java, некоторые подводные камни и способы их избежать.

Автор: baeldung
Дата записи

1. введение

Проще говоря, кодировка URL переводит специальные символы из URL-адреса в представление, которое соответствует спецификации и может быть правильно понято и интерпретировано.

В этой статье мы сосредоточимся на том, как кодировать/декодировать URL-адрес или данные формы , чтобы они соответствовали спецификации и правильно передавались по сети.

2. Проанализируйте URL-адрес

Базовый синтаксис URI можно обобщить следующим образом:

Первым шагом в кодировании URI является изучение его частей, а затем кодирование только соответствующих частей.

Давайте рассмотрим пример URI:

Одним из способов анализа URI является загрузка строкового представления в java.net.URI класс:

Класс URI анализирует URL – адрес строкового представления и предоставляет его части с помощью простого API-например, getXXX.

3. Закодируйте URL-адрес

При кодировании URI одной из распространенных ошибок является кодирование полного URI. Как правило, нам нужно кодировать только часть запроса URI.

Давайте закодируем данные с помощью метода encode(data, encoding Scheme) класса URLEncoder :

Метод encode принимает два параметра:

data – строка для перевода

encodingScheme – имя кодировки символов

Схема кодирования преобразует специальные символы в двухзначное шестнадцатеричное представление из 8 бит, которое будет представлено в виде ” %xy “. Когда мы имеем дело с параметрами пути или добавляем параметры, которые являются динамическими, мы кодируем данные, а затем отправляем их на сервер.

Примечание: В рекомендации Консорциума World Wide Web говорится, что следует использовать UTF-8 . Невыполнение этого требования может привести к несовместимости. (Ссылка: https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html )

4. Расшифруйте URL-адрес

Давайте теперь декодируем предыдущий URL-адрес, используя метод декодирования URLDecoder :

Два важных момента здесь:

анализ URL-адреса перед декодированием

используйте одну и ту же схему кодирования для кодирования и декодирования

Если бы мы декодировали, а не анализировали, части URL-адресов могли бы быть проанализированы неправильно. Если бы мы использовали другую схему кодирования для декодирования данных, это привело бы к мусорным данным.

5. Кодирование сегмента пути

URLEncoder не может использоваться для кодирования сегмента пути URL . Компонент Path ссылается на иерархическую структуру, которая представляет путь к каталогу, или служит для поиска ресурсов, разделенных “/” .

Зарезервированные символы в сегменте пути отличаются от значений параметров запроса. Например, знак “+” является допустимым символом в сегменте пути и поэтому не должен кодироваться.

Для кодирования сегмента пути мы используем класс UriUtils от Spring Framework. UriUtils класс предоставляет encodePath и encodePathSegment методы для кодирования пути и сегмента пути соответственно.

Давайте рассмотрим пример:

В приведенном выше фрагменте кода мы видим, что когда мы использовали метод encodePathSegment , он возвращал закодированное значение, а + не кодируется, потому что это символ значения в компоненте path.

Давайте добавим переменную пути к нашему тестовому URL-адресу:

и чтобы собрать и утвердить правильно закодированный URL-адрес, давайте изменим тест из раздела 2:

6. Заключение

В этом уроке мы рассмотрели, как кодировать и декодировать данные, чтобы их можно было правильно передавать и интерпретировать. Хотя в статье основное внимание уделялось кодированию/декодированию значений параметров запроса URI, этот подход применим и к параметрам HTML-формы.

Источник

Android http url encode

Sometimes referred to as protocol, A URL’s scheme describes what mechanism should be used to retrieve the resource. Although URLs have many schemes ( mailto , file , ftp ), this class only supports http and https . Use java.net.URI for URLs with arbitrary schemes.

Username and Password

Username and password are either present, or the empty string «» if absent. This class offers no mechanism to differentiate empty from absent. Neither of these components are popular in practice. Typically HTTP applications use other mechanisms for user identification and authentication.

The host identifies the webserver that serves the URL’s resource. It is either a hostname like square.com or localhost , an IPv4 address like 192.168.0.1 , or an IPv6 address like ::1 .

Usually a webserver is reachable with multiple identifiers: its IP addresses, registered domain names, and even localhost when connecting from the server itself. Each of a webserver’s names is a distinct URL and they are not interchangeable. For example, even if http://square.github.io/dagger and http://google.github.io/dagger are served by the same IP address, the two URLs identify different resources.

The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme’s default is used.

The path identifies a specific resource on the host. Paths have a hierarchical structure like «/square/okhttp/issues/1486» and decompose into a list of segments like [«square», «okhttp», «issues», «1486»].

This class offers methods to compose and decompose paths by segment. It composes each path from a list of segments by alternating between «/» and the encoded segment. For example the segments [«a», «b»] build «/a/b» and the segments [«a», «b», «»] build «/a/b/».

If a path’s last segment is the empty string then the path ends with «/». This class always builds non-empty paths: if the path is omitted it defaults to «/». The default path’s segment list is a single empty string: [«»].

Query

The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.

Fragment

The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it’s private to the client.

Encoding

Each component must be encoded before it is embedded in the complete URL. As we saw above, the string cute #puppies is encoded as cute%20%23puppies when used as a query parameter value.

Percent encoding

Percent encoding replaces a character (like 🍩 ) with its UTF-8 hex bytes (like %F0%9F%8D%A9 ). This approach works for whitespace characters, control characters, non-ASCII characters, and characters that already have another meaning in a particular context.

Percent encoding is used in every URL component except for the hostname. But the set of characters that need to be encoded is different for each component. For example, the path component must escape all of its ? characters, otherwise it could be interpreted as the start of the URL’s query. But within the query and fragment components, the ? character doesn’t delimit anything and doesn’t need to be escaped. This prints: When parsing URLs that lack percent encoding where it is required, this class will percent encode the offending characters.

IDNA Mapping and Punycode encoding

Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.

In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid confusing characters. This includes basic case folding: transforming shouting SQUARE.COM into cool and casual square.com . It also handles more exotic characters. For example, the Unicode trademark sign (™) could be confused for the letters «TM» in http://ho™mail.com . To mitigate this, the single character (™) maps to the string ™. There is similar policy for all of the 1.1 million Unicode code points. Note that some code points such as «🍩» are not mapped and cannot be used in a hostname.

Punycode converts a Unicode string to an ASCII string to make international domain names work everywhere. For example, «σ» encodes as «xn--4xa». The encoded string is not human readable, but can be used with classes like InetAddress to establish connections.

Why another URL model?

Java includes both java.net.URL and java.net.URI . We offer a new URL model to address problems that the others don’t.

Different URLs should be different

Although they have different content, java.net.URL considers the following two URLs equal, and the equals() method between them returns true:

http://square.github.io/

http://google.github.io/

This is because those two hosts share the same IP address. This is an old, bad design decision that makes java.net.URL unusable for many things. It shouldn’t be used as a Map key or in a Set . Doing so is both inefficient because equality may require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are hosted.

Equal URLs should be equal

These two URLs are semantically identical, but java.net.URI disagrees:

http://host:80/

http://host

Both the unnecessary port specification ( :80 ) and the absent trailing slash ( / ) cause URI to bucket the two URLs separately. This harms URI’s usefulness in collections. Any application that stores information-per-URL will need to either canonicalize manually, or suffer unnecessary redundancy for such URLs.

Because they don’t attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you’re building a webservice that checks that incoming paths are prefixed «/static/images/» before serving the corresponding assets from the filesystem. By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that checks only the path prefix may suffer!

If it works on the web, it should work in your application

The java.net.URI class is strict around what URLs it accepts. It rejects URLs like «http://example.com/abc|def» because the ‘|’ character is unsupported. This class is more forgiving: it will automatically percent-encode the ‘|’, yielding «http://example.com/abc%7Cdef». This kind behavior is consistent with web browsers. HttpUrl prefers consistency with major web browsers over consistency with obsolete specifications.

Paths and Queries should decompose

Neither of the built-in URL models offer direct access to path segments or query parameters. Manually using StringBuilder to assemble these components is cumbersome: do ‘+’ characters get silently replaced with spaces? If a query parameter contains a ‘&’, does that get escaped? By offering methods to read and write individual query parameters directly, application developers are saved from the hassles of encoding and decoding.

Plus a modern API

The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there’s no API to compose a URI with a custom port without also providing a query and fragment.

Instances of HttpUrl are well-formed and always have a scheme, host, and path. With java.net.URL it’s possible to create an awkward URL like http:/ with scheme and path but no hostname. Building APIs that consume such malformed values is difficult!

This class has a modern API. It avoids punitive checked exceptions: get() throws IllegalArgumentException on invalid input or parse() returns null if the input is an invalid URL. You can even be explicit about whether each component has been encoded already.

Источник