
- The modern ecosystem
- What are we looking for?
- Re.S.T. : REpresentational State Transfer
- Richardson’s maturity model or Web Service Maturity Heuristic
- H.A.T.E.O.A.S. & “Resource Linking”
- ReSTful thus Stateless
- Pragmatism, ideology and ReSTafarians
- Tips, tricks and best practices
- “standards” (or almost)
- Tools
- Rest APIs security
Voir les Slides
1. The modern ecosystem
- Less and less monolithic applications.
- We need interactions between services.
- Single Page Applications / Progressive Web Apps etc…
- Micro-Services.
- Public APIs.
2. What are we looking for?
APIs must be :
- Flexible, extensible and reusable.
- Easy to use.
- Separation of Concerns.
- Compatible with as many technologies as possible.
- We should be able to develop light clients and servers.
- What about reusing our proxy caches? (Varnish, Cloudfront, Browser etc…).
- Scalable.
- Performant.
- Secure.
3. Re.S.T. : REpresentational State Transfer
3.1. What a ReST API is not
- ReST is not a standard. It’s an architectural style.
- ReST doesn’t only concern remote or HTTP-based APIs. A library can be ReSTful.
3.2. Roy Thomas FIELDING : ReST’s daddy
- ReST APIs are almost 20 years old.
3.3. ReST description
- A ReST API is an abstract interface of the data model we call resources.
- We can distinguish two types of resources:
- Instances (a user, a product etc…).
- Collections (a list of instances).
- ReST APIs allow us to add / update / remove resources.
- As opposed to SOAP APIs, we must avoid an imperative logic where we send actions to the API.
3.4. The 5 (and a half) rules of ReST APIs
- Uniform.
- “Stateless”.
- “Cacheable”.
- Client / Server: “Separation of Concerns”.
- Layered or based on “connectors”.
- Code on demand (the optional or inadequate rule).
- It’s the “extra” parameter that we can find in some RFCs to keep some flexibility.
Uniform
- Each resource is identified in a unique and canonicalized way using it’s URL.
- The interface is uniform at all levels. All the components communicate using the same interface.
Stateless
- A ReST API should not keep any session.
- This avoids load balancing issues (among lots of other issues).
Cacheable
- Resources can be cached at all levels (frontend, intermediate connectors, backend, etc…).
- We should be able to reuse existing standard and generic HTTP cache implementations.
Client / Server : “Separation of Concerns”
- ReST APIs are not concerned by the display, the user interactions and the session.
- All these items should be handled by the client (Example : frontend web application).
Layered
- Intermediate connectors presence should be implicit for the client and the server (cache / security connector etc…).
3.5. Data exchange formats
- In theory, you are free to use the format you want.
- In practice, the format must be standardized and non-linear (Hypermedia).
- Pragmatically, the most common format today is JSON.
- The JavaScript universe is in a permanent expansion.
- In opposition to classic backend technologies, in JavaScript we are trying to reduce the number of libraries and tools in order to keep our apps light and fast.
- JSON tools are available in all languages.
- JSON is quick and easy to manipulate.
3.6. HTTP methods
- GET : Retrieve a resource or a collection.
- POST : Create a resource.
- PUT : Replace a resource or a collection.
- PATCH : Update a resource or a collection.
- DELETE : Remove a resource or a collection.
- More exactly:
- The POST method can be used to update a resource but it’s not recommended.
- The PUT method can also be used to create a resource if we can choose the id in advance for example. The only constraint with the PUT method is that it must be idempotent. I.e. the number of times we send the same PUT request should not have any impact on the result.
4. Richardson’s Maturity Model
or Web Service Maturity Heuristic
https://www.crummy.com/writing/speaking/2008-QCon/act3.html (2008)
*P.O.X. : Plain Old XML
4.1. Level 0 : The Swamp of POX
XML-RPC over HTTP
4.2 – Level 1 : Resources
The API follows the data model and every resource is identified with a unique URL.
POST /blogs/11111/posts POST /blogs/11111/posts/22222/comments
4.3 – Level 2 : HTTP Verbs
Usage of HTTP methods other than GET and POST to signify the wanted action : PATCH / PUT / DELETE.
… and specially HTTP “status codes” to summarize the operation’s result.
200 : OK
201 : Created
204 : No Content (delete)
400 : Bad Request
401 : Unauthorized
403 : Forbidden
404 : Not Found
409 : Conflict
…
Of course, 4xx errors can contain a body with additional information.
Use the right vocabulary and avoid smurf APIs.
SMURF /q?data=select:*:from:carts
4.4 – Level 3 : Hypermedia Controls
Hypermedia is one of the principal rules of Fielding’s thesis.
The idea behind this is to find in ReST APIs the same Hypermedia logic we can find in HTML for example. Today, this sums up to the presence of links in the resources allowing us to establish the relationship between that resource and other resources.
The ReST API becomes “discoverable“.
5 – H.A.T.E.O.A.S. & “Resource Linking”
Most of the time, the Level 3 of Richardon’s maturity model is presented as the H.A.T.E.A.O.S. acronym : Hypermedia As The Engine Of Application State.
{ "id": "22222", "href": "https://www.wishtack.com/blogs/11111/posts/22222", "blog": { "href": "https://www.wishtack.com/blogs/11111" }, "comments": { "href": "https://www.wishtack.com/blogs/11111/posts" }, ... }
6. ReSTful thus Stateless
6.1. Stateless ?
Supposing the following HTTP requests scenario. Does this sound “stateless” to you?
1 - GET /init 2 - GET /select-cart?cartId=123ab 3 - POST /add-product 4 - POST /add-product 5 - POST /update-product-count?productId=12345 6 - GET /cart-summary 7 - POST /pay
6.2. Limits and issues with stateful APIs
- The never-click-back effect.
- Load balancing issues.
- How could we add two products in two different carts simultaneously?
- How could we cache the “/cart-summary” resource?
- The API is not very intuitive and extensible.
6.3. Stateless example
1.a - POST /carts/123ab/products 1.b - POST /carts/456cb/products 2 - PATCH /carts/456cb/products/33333 3 - POST /carts/123ab/payments 4 - GET /carts/123ab
6.4. Les avantages
- No session management thus no load balancing issues.
- Less requests.
- Requests can be executed simultaneously.
- Cacheable.
- Intuitive and extensible API.
- The API is human readable (we don’t have to print the documentation and stick it to the wall).
- The API is easy to extend (we can add fields and filters).
- The API can easily cover unexpected needs (Ex.: clear the cart, update the product’s count in the cart, …).
Affordances
In general, we can compare the stateful and the stateless approaches using the geographic destination metaphor.
Which one of these two instructions sounds more precise?
- GPS coordinates of an address in Strasbourg.
- To go from Nice to Strasbourg:
- Turn right.
- At 100m turn left.
- At the roundabout (if it didn’t change since the last time), take the 3rd exit.
- Enjoy the view on your left… or was it on the right?
- …
7. Pragmatism, Ideology and ReSTafarians
- As described before, the main goals of ReST APIs are the following:
- Genericity.
- Ease of implementation, ease of use and extensibility.
- Performance and scalability.
- ReST is not a dogma or an ideology; therefore, we must be pragmatic while anticipating the future. We should be visionaries but not diviners. K.I.S.S. and Agile.
- We should avoid the dark forces who are trying to apply complex paradigms dating from the XML age! (I’m suspecting them to be same guys behind 2-spaces indentation in JavaScript and semi-colon removal in JavaScript).
- Example : https://github.com/kevinswiber/siren.
- Don’t be ReSTafarians! The initial goal is to respond to the User eXperience and the Developer Experience. We are not trying to be more ReST than the others. (There’s still no World ReST Competion… yet…)
8. Tips, tricks and best practices
8.1. Naming
- kebab-case for URLs.
- camelCase for parameters in the query string or resource fields.
- plural kebab-case for resource names in URLs.
- I also recommend you to convert plurals to variables and properties with a
List
suffix. In URLs, plural is nice but in the source code, it’s just too subtile and error-prone.
- I also recommend you to convert plurals to variables and properties with a
- Use explicit names that follow your service’s metaphor.
- URLs must be constructed this way:
-
/resources/:resourceId/subresources/:subResourceId /blogs/:blogId/posts/:postId
- Avoid URLs like this:
-
/blogs/:blogId/posts/:postId/summary
- This might sound like ReSTafaring but lots of libraries and frameworks are constructed this way. Breaking this rule will make it harder to guess if the URL is pointing on a collection or an instance and will need lots of hacking and will make our libraries and frameworks suffer.
8.2. Base URL
- The base URL is the root of the API.
- Avoid complex base URL:
-
https://www.ibm.com/index.aspx/lastCompanyWeBought/service/rest/
-
- Make it look fun:
-
https://api.wishtack.com
-
8.3. Media Type
- The media type (
Content-Type
header) which is commonly used isapplication/json
. - It is also common to define a custom media type for an API or depending on the “standard” we’ll be using.
- Example:
application/vnd.github+json
.
- Example:
- Media types like
application/vnd*
are not following any standard and can cause trouble with some libraries or connectors (ex.: Web Application Firewall).
- It is also possible to return an HTML response (presentation, documentation, demo, etc…) when the client doesn’t provide the right media type in the
Accept
header.- It’s nice…
- … but not convenient at all. Who never tested a ReST API’s URL directly on the browser?
- Later, we’ll see that for security reasons, requests that don’t present the right
Content-Type
header should be rejected as soon as possible.
- Given that ReST APIs are meant to be used from multiple sources (mobile / web / desktop / partners / public clients…), they evolve at a different pace than the client’s code; therefore, we’ll be needing some versioning for our APIs.
- Versioning can be achieved with the two following ways:
- Media type.
- URL.
8.4.1. Media type versioning
Media type versioning uses standard HTTP headers Accept
and Content-Type
.
The client indicates the API version he is willing to use using the Accept
header :
Accept: application/vnd.wishtack.v3+json
The API returns the data using the right media type and sets the Content-Type
to that value.
- Nice but can get slightly tricky.
- What should we do if the new version is using a different language or a different platform?
- We would be forced to use a custom proxy for balancing.
8.4.2. URL versioning
- Due to the eventual balancing barrier introduced by the media type versioning, why not use a standard balancing… but which one?
- We could use the URL path and proceed to the balancing with something like virtual hosts but this would still need a proxy (but a little bit more classic this time).
https://www.wishtack.com/api/v1
- Any better idea?
- Yes! The DNS!
https://v1.api.wishtack.com
NodeJS hosted on Amazon.https://v2.api.wishtack.com
Python hosted on Heroku.https://v3.api.wishtack.com
Python hosted on Heroku using a common morepo with the v2.
8.5. “id” property
The “id” property must be uniform.
Resource have a unique id field which is named “id” by convention.
8.6. Polymorphism
Sometimes, a collection resource can contain multiple resources which are slightly different : books and movies.
- First, we should harmonize the resource as much as we can. For example, books and movies have a price ; it should be in the same format and in the same field. Even though it’s not the case in your data model (ex. : scraping), use some kind of computed fields. We could imagine something (probably stupid) like calculating the
price
depending on the movie duration. - We will then add to the resource a
type
field that should be fully documented. - This will allow the client to remap the resources to the associated classes.
- Polymorphism abuse can harm your API’s health!
{ "id": "1", "author": {"id": "3"}, "price": {"amount": 10, "currency": "EUR"}, "type": "book", }, { "duration": 5400, "id": "2", "price": {"amount": 6, "currency": "USD"}, "type": "movie" } ]
8.7. Datetime
- How should we format the date and time with ReST APIs?
- We will avoid using the word “timestamp” as most of the time it refers to the unix timestamp.
- There’s no debate about this point. ISO 8601 is available since 1997 (at Wishtack, we were 12 years old spending our time setting up our IP addresses to play Counter-Strike).
- https://www.w3.org/TR/NOTE-datetime
- https://www.iso.org/obp/ui#iso:std:iso:8601:-2:dis:ed-1:v1:en
1997-07-16
1997-07-16T19:20:01.003Z
- Make your clients’ lives easy by converting datetimes to UTC.
8.8. “Association Resource”
Given the following resource:
/users/123ab/friends [ { "id": ..., "firstName": ..., "type": "user" } ]
- How should we represent the datetime of the creation of the link between users?
We could create create a collection resource that represents these links.
/friendships?userId=123ab [ { "id": "FRIENDSHIP_ID_1", "creationDateTime": "2017-01-01T18:16:00.000Z", "friend": { "id": ..., "type": "user" } } ]
…and the instance resource
/friendships/FRIENDSHIP_ID_1
8.9. Subjective thoughts about H.A.T.E.O.A.S. and Semantic Web
- Well, it’s nice…
- …but how can we integrate this in our real life applications?
- Suppose, we have the following objects and methods:
-
// GET https://v1.api.wishtack.com/users/SOME_USER_ID userStore.getUser({userId: 'SOME_USER_ID'}); // GET https://v1.api.wishtack.com/users/SOME_USER_ID/wishes wishStore.getWishList({userId: 'SOME_USER_ID'});
- Now, suppose that the API responds with the following data:
{ "href": "https://api-v1.wishtack.com/users/SOME_USER_ID", "wishes": { "href": "https://api-v1.wishtack.com/users/SOME_USER_ID/wishes" } }
- How can we reuse the
WishStore.getWishList
method?
- Should we add an additional
WishStore.getWishListByUrl
method?
- What about consistency?
- Data is not properly canonicalized.
- URLs are duplicated and take lots of space in the exchanged data
- How to retrieve the resource id if we are not willing to use the href as an id.
- The URL makes a lose precious canonicalized information. A more canonical way of describing the resource would look something like this:
-
{ "baseUrlList": [ "https://api.wishtack.com", "https://api-backup.wishtack.com" ], "resourcePath": [ {"id": "USER_ID", "type": "user"}, {"id": "WISH_ID", "type": "wish"} ] }
-
- Otherwise, how could be automatically balance from the principal API to a backup API without parsing and reconstructing the URL?
- Can we trust the API enough to naively trust the URLs it’s giving us?
- We would at least need the following information:
- Resource id.
- Resource type or a schema repository. I.A.N.A. ?
- Mapping information to map the type to the information we need to construct the URL (Base URL and Path).
- Affordances: What can I do with the resource?
It’s simply based on the utopy (or the future?) of the Semantic Web. This needs canonicalization standards and an important adoption.
For the moment, there are only few attempts for this and they look more like an HTMLization of ReST APIs. They are associated with familiar names like : Richardson, Amundsen and Foster.
Application-Level Profile Semantics : http://alps.io/spec/drafts/draft-01.html
Interesting but quite far away from ReST conventions and the pragmatism we are looking for.
8.10. Why you should apply these best practices
In addition to genericity, readability and ease of use, these best practices allows us to write generic libraries and connectors without even knowing what the API is about.
- A cache connector could easily:
- Retrieve the next page of a paginated resource by anticipation.
- Retrieve a resource instance from the collection in cache.
- MISS:
/products
- HIT:
/products/1234
- MISS:
- Handle synchronization between local and API data (Progressive Web Apps).
- https://github.com/wishtack/wishtack-steroids/tree/develop/packages/rest-cache
- A generic connector could handle resource access authorization and even handle the filtering of readonly or hidden properties.
9. “standards” (or almost)
From the beginning of the conception of our simplest ReST APIs, we have to make lots of important choices and we feel like kids at “Tous R Us”.
- Why are these choices so important?
- After all, we could think that a good documentation would be enough.
- The problem is that we will soon meet some barriers with libraries, frameworks and connectors that are highly based on ReST conventions.
- Even better than conventions, we would need a standard that covers most of the following points:
- Data exchange format.
- Resource typing.
- Linking.
- Pagination.
9.1. JSON API
- http://jsonapi.org/
- Created by the co-founder of http://www.tilde.io, a consulting company (about jsonapi?)
- It’s a specification but a standard yet.
- Cool :
- Strict format but extensible.
- Standardizing parameters for sorting, filtering and pagination. (the pagination implementation is free).
- The idea of resource linking using “relationships” is interesting.
- Not cool :
- Collision risks between fields in
attributes
andrelationships
. - No difference between a link to an instance or a collection resource.
- one-to-one relationships are ambiguous and don’t respect the URL structure :
/resources/:resourceId/sub-resources/:subResourceId
- Warning, examples in the specification use untraditional naming conventions but they are not defined in the specification.
- kebab-case fields.
type
field doesn’t have to be plural.
- The idea of using bulk operations to update relationships is really interesting but it’s not applied to other kind of resources.
- There are many implementations but most of them are not really maintained.
- Collision risks between fields in
9.2. H.A.L.
- Hypertext Application Language.
- Created by the founder of http://stateless.co, a consulting company.
- This is not a standard yet.
{ "_links": { "self": { "href": "/orders" }, "next": { "href": "/orders?page=2" }, "find": { "href": "/orders{?id}", "templated": true } }, "_embedded": { "orders": [{ "_links": { "self": { "href": "/orders/123" }, "basket": { "href": "/baskets/98712" }, "customer": { "href": "/customers/7809" } }, "total": 30.00, "currency": "USD", "status": "shipped", },{ ... }] }, "currentlyProcessing": 14, "shippedToday": 20 }
- Cool :
- Simple, clear and easy to implement.
- Templated links are promising and allow the decoupling of the client code and the API’s interface.
- Curies allow us to easily link resources to their documentation (and why not the schemas but this is not defined by H.A.L.).
- Not cool :
- As its name indicates, H.A.L. focuses only on linking. The perimeter is limited.
- The
_embedded
property is quite useless and can cause conflicts with resource properties. - There are many implementations but most of them are not really maintained.
9.3. JSON LD
- A JSON-based Serialization for Linked Data.
- https://www.w3.org/TR/json-ld/
- Created by authors who are highly associated to the Semantic Web word, RDF etc…
- This is not a standard but a W3C recommendation.
- Example : http://json-ld.org/playground/
- Cool :
- Uses contexts from shema.org.
- It is possible to create custom contexts.
- Not cool :
- Inherits from XML / RDF culture.
- There are some implementations in the wild but most of them are not maintained.
9.4. Other initiatives
- Collection+json https://github.com/collection-json/spec
- Hydra http://www.markus-lanthaler.com/hydra/
- ReST vocabulary for JSON LD.
- Allows the addition of affordances.
9.5. So what?
Memento :
- JSON API, H.A.L. and JSON-LD cover different parts that collide sometimes.
- JSON-LD is the one receiving most support from the Hypermedia community.
- Still there’s no leading standard, we should pick the best of each depending on your needs.
- Don’t forget that you’ll have to parse, serialize and link your APIs’ resources in the languages you are using and also those use by your partners. The more your exchange format is complex, the more you will have negative impact on the adoption of your API.
10. Tools
10.1. Swagger
Swagger is a framework that helps you define and generate documentation for your ReST APIs.
10.2. Postman
Postman is a ReST API client which comes handy to analyze, experiment and debug your ReST APIs.
You should definitely try the Chrome extension that allows you to analyze the traffic and replay requests or scenarios.
10.3. Sandbox
Best playground ever to create mock ReST APIs.
10.4. JSON Generator
JSON Generator allows you to easily generate JSON data for your unit tests for example.
http://www.json-generator.com/
11. ReST APIs security
Modern and distributed architectures including ReST APIs are exposing us to new security risks.
It’s just the same for some modern authentication and authorization mechanisms where specifications are not strict enough.
11.1. Authentication and session management
11.1.1. What do we need?
- Session management?
- Not really. ReST APIs are stateless!
- Session information should be handled by the client.
- Data which is persisted by the ReST API is associated to resources.
- Resources can expire:
GET /carts/1234 => 404 Not Found
- Don’t smurf your API.
SMURF /smurf-api/sessions/current
- Authentication
- Ideally, we should need an authentication mechanism even if the data presented by the API is public.
- Identification
- If necessary. Authentication and identification are completely distinct matters.
- We can authenticate a user without identifying him.
- We can also identify a user without authentication but we would have no identity warranty.
- Logout and revocation
- Logout can be triggered by another source than the final user.
- Authentication tokens must not be transmitted in the URL.
GET /users/123?token=asdf....
- Basic-Auth authentication must not be used.
- Tokens must be transmitted using the
Authorization
headerAuthorization: Bearer xxxxxx, Extra yyyyy
11.1.2. Authentication mechanisms
- We will browse the available authentications mechanisms later.
- In general (with some slight changes), most authentication mechanisms work as such:
- Authentication service provides a unique token to the client.
- The client presents the token to ReST APIs of a service provider.
- The service provider will then guess resource access authorizations based on the token.
11.1.3. Client-side session management
- The most complex case is the one where the client application runs on the browser.
- If you want to persist data on the browser so we can recover the same state when the user opens another window or refreshes the page:
- Do not use cookies at least for the following reasons:
- It’s not their purpose.
- You don’t want to send all this data to the ReST API with each request.
- Cookies are EVIL!
- IndexedDB is made for this but it’s not globally supported yet. The localStorage is an interesting backup solution.
- Problem 😦
- IndexedDB and localStorage cannot be expired (except on Firefox: https://developer.mozilla.org/en-US/docs/Web/API/IDBFactory/open).
- Have a look at the storages on your browser after logout or after closing the browser. You’ll be impressed to see what we can find there.
- Secure Storage
- Meanwhile we get an in-the-browser solution, it is recommended to encrypt all the locally stored data using a temporary key which should be unique for each client and retrieved from the ReST API.
- https://github.com/jas-/crypt.io
- Do not use cookies at least for the following reasons:
11.2. Authorization and permission management
- Authorization and access permissions on a resource depend on the clients’ credentials.
- For the same resource, the ReST API can respond with different data depending on the client’s authorization and permissions. This doesn’t break ReST rules.
- For example, for the same “user” resource, an “owner” role and a “friend” role will not have access to the same operations and properties:
With “owner” rule
-
GET /users/123 (with owner role) { "id": "123", "firstName": "John", "lastName": "DOE", "address": { "street": "...", ... } }
With “friend” role
-
GET /users/123 (with friend role) { "id": "123", "firstName": "John", "lastName": "DOE" }
- There are three authorization levels:
- Resource level: Binary access authorization to the resource.
- Verb level: Methods that are authorized on a resource (create / read / update / delete).
- Property level: Authorization management by resource property (read / write / mask / restricted values per role). Sadly, this authorization level is most of the time omitted by ReST API libraries and frameworks.
- For example, a “post” resource on a blog can have a “state” property that can take the following values: “draft”, “private” or “published”.
- Users with an administrator role can modify this property.
- Users with an editor role can modify any other property except this one.
- Whatever the implementation you choose, permissions must be constructed using a “whitelist” logic.
- In order to respect separation of concerns, improve scalability and simplify the implementation and the readability of the ReST API, permission management should be implemented on a dedicated connector.
This way, we can split the business and permission logic implementations.
The permissions implementation can be written as a framework’s middleware that could be migrated later to a dedicated connector or micro-service.
- This connector is similar to an A.C.L. (Access Control List) that we can find on file systems or firewalls.
- At Wishtack, we are using a dedicated Python middleware
- Some links for the lucky ones who are implementing their APIs in Python.
http://www.django-rest-framework.org/tutorial/4-authentication-and-permissions/#object-level-permissions
http://django-tastypie.readthedocs.io/en/latest/authorization.html - There’s also a NodeJS alternative https://github.com/nyambati/express-acl but not as easy to customize as django-rest-framework specially for property level permissions.
- Some links for the lucky ones who are implementing their APIs in Python.
11.3. Validation: Canonicalization, Escaping & Sanitization
- All the data exchanged with the ReST API must be validated by the API.
- Validation should also be done on the client side to avoid useless requests.
11.3.1. Canonicalization
- ReST APIs should convert the received data to their canonical form or reject them.
-
{ "firstName": "joHn", "lastName": " DoE", "url": "myWebsite.com" }
- This example should be transformed or better, simply rejected.
{ "firstName": "john", "lastName": "doe", "url": "https://mywebsite.com" }
11.3.2. Escaping
- ReST APIs should not escape content.
- For example, on a blog, the following comment can make sense:
<img src="not-found.png" onerror=alert(1)>
- It’s up to the client to handle escaping and avoid attacks like XSS.
11.3.3. Sanitization
- Sanitization is dangerous game where potentially malicious payload is removed.
- For the previous example, sanitization would remove the
onerror
attribute.<img src="not-found.png">
- Again, it’s up to the client to handle this.
- The problem with sanitization is that we can always try to find ways to bypass it.
- There are some sanitization escaping experts out there 🙂http://n0p.net/penguicon/php_app_sec/mirror/xss.html
11.4. Cookies are EVIL
- Cookies don’t play well with C.O.R.S. (Cross Origin Resource Sharing) that will see later.
- Cookies expose us to vulnerabilities like C.S.R.F. (Cross Site Request Forgery). We’ll dive deeper into this later.
- Clients are not always browsers (Mobile, Desktop, Micro-Services, Partners…)
- Cookies are sent with each request even for public resources like static assets.
- Cookies are strongly coupled with the session. With the
Authorization
header we can send multiple requests simultaneously but with different tokens.
Example: a session with multiple user accounts without any link between the users. It’s the case with google services / twitter / facebook etc… or the “see my facebook page as X” feature on facebook.
11.5. C.O.R.S.
- Cross-Origin Resource Sharing. https://www.w3.org/TR/cors/
- Origin : https://tools.ietf.org/html/rfc6454
- The origin is a composition of the following items:
- Scheme : http / https / …
- FQDN : http://www.attacker.com / api.target.com / …
- Port : 80 / 443 / 8000
- The origin is a composition of the following items:
- Suppose we have two origins:
https://attacker.com
andhttps://target.com
.
- By default, if the
https://attacker.com
application sends aGET
request from the browser (JavaScript) to thehttps://target.com
origin:- This request wouldn’t send any cookie from any of these two origins.
- The browser will then analyze some C.O.R.S. headers (that we’ll see later) but without them, the application will not be capable of reading the response and it will produce the following error.
Error: No 'Access-Control-Allow-Origin' header is present on the requested resource
- If it’s not a
GET
request (or similarHEAD
…), the browser will send a preflight request with theOPTIONS
verb to check if the request is authorized depending on the origin.- More precisely, browsers will send a preflight request if the method is different than
GET
,HEAD
orPOST
or if the request has extra headers or if theContent-Type
has value different thanapplication/x-www-form-urlencoded
ormultipart/form-data
ortext/plain
.
- More precisely, browsers will send a preflight request if the method is different than
- These mechanisms have been created in order to avoid C.S.R.F. (Cross Site Request Forgery) attacks.
- But sadly, the first result we stumble upon after googling the error message is the following stackoverflow issue:
http://stackoverflow.com/questions/20035101/no-access-control-allow-origin-header-is-present-on-the-requested-resource
- It presents the following solutions:
- “The easy way is to just add the extension in google chrome to allow access using CORS.”https://chrome.google.com/webstore/detail/allow-control-allow-origi/nlfbmbojpeacfghkpbjhddihlkkiljbi?hl=en-US
chrome.exe --user-data-dir="C:/Chrome dev session" --disable-web-security
- “It’s very simple to solve if you are using PHP. Just add the following script in the beginning of your PHP page which handles the request:”
<?php header('Access-Control-Allow-Origin: *'); ?>
- …nothing forbids the vulnerabilization of the web.
- Luckily, after setting up the
Access-Control-Allow-Origin: *
header, the request sent from thehttps://attacker.com
origin tohttps://target.com
doesn’t contain any cookies.
- We have to activate the
withCrendentials
option of the XHR object orfetch
function.
- The request should contain the
https://attacker.com
cookies but once again C.O.R.S. rules are saving our life because we can’t retrieve the response if theAccess-Control-Allow-Origin
is set to*
.
- In order to be allowed to send cookies and retrieve the response from another origin, the target API should set the
Access-Control-Allow-Credentials
header but once again this is not enough. This feature is intentionally disabled ifAccess-Control-Allow-Origin
header is set to*
.
- We would have to set a whitelist of allowed origins… but still with all these intentional barriers, some people are really determined:
http://stackoverflow.com/questions/26411480/angularjs-a-wildcard-cannot-be-used-in-the-access-control-allow-origin-he
Please don’t!!!
- After this ultimate step, any application from any origin can freely communicate with the API using the credentials (cookies) of the current user.
- By precaution, even if the API is not using cookies, we should avoid using the
*
value for theAccess-Control-Allow-Origin
header.
- A whitelist based logic on the API is safer. It should check the
Origin
header from the request and respond with the same origin in theAccess-Control-Allow-Origin
header if the origin is whitelisted.
- The “whitelist” verification should be strict! Checking the FQDN only is not enough.
- Think about implementing a rule on your W.A.F. (Web Application Firewall), middlewares or security monitoring to detect and block HTTP responses containing the
Access-Control-Allow-Credentials
header.
- WARNING ! Client certificates and basic auth authentication are also considered as credentials and we would meet the same issues we meet here with cookies.
11.6. C.S.R.F.
- Cross Site Request Forgery is an in-the-browser attack with the following scenario:
- A user (the victim) must be authenticated on the vulnerable application.
- The attacker must trick the victim to visit a web application he controls (totally or partially).
- When the victim visits the application (controlled by the attacker), the attacker triggers operations on the vulnerable application while implicitly using the victim’s credentials (as he is authenticated on the vulnerable application).
- We suppose that a GET request must not trigger a sensitive operation otherwise, we just need to redirect the victim to the URL that triggers the operation.
- If C.O.R.S. rules are deactivated with one of the means described in the C.O.R.S. chapter above, the attacker can simply trigger a POST request of his choice on the vulnerable application using the victim’s credentials.
- If the origins whitelist is not strictly checked, the attacker could eventually control the http domain (using ARP spoofing, DNS poisoning or any other attack of this kind) and target the https application.
11.7. C.S.R.F. & Content-Type
- A common mistake is to accept media types different from application/*json (
Content-Type
header).
- What does that imply?
- Without any C.O.R.S. configuration mistake, accepting the
application/x-www-form-urlencoded
mime type allows the attacker to create a form and trigger a simple POST request.<form method="POST" action="https://app.vulnerable.com/api/products/0/admins"> <input name="email" value="pwned.by@attacker.io"> <input name="grants" value="all"> </form> <script>document.querySelector('form').submit()</script>
- In that case, most frameworks (Ex. : expressjs) will parse the request’s body and handle an object like this to the developer:
{ email: 'pwned.by@attacker.io', grants: 'all' }
- ReST APIs frameworks and libraries should only activate the JSON parser.
- …but suppose it’s activated on all “media types” and more precisely on the
text/plain
media type to simplify clients’ developers lives. - In that case, the attacker should only adapt the form as below:
<form method="POST" action="https://app.vulnerable.com/api/products/0/admins" <span style="color: #ff0000;">enctype="text/plain"</span>> <input name='{"email": "pwned.by@attacker.io", "grants": "all", "extra": "' value='"}'> </form>
- This will send the following body in the request:
{"email": "pwned.by@attacker.io", "grants": "all", "extra": "="}
- The API will use the JSON parser even though the media type is set to
text/plain
and the developer will be handled the following object:{ email: 'pwned.by@attacker.io', grants: 'all', extrat: '=' }
- Media types should be strictly checked.
C.S.R.F. Mitigation
- If your application still can’t give up using credentials like cookies, basic auth and client certificates, a common mitigation solution is to set a non-http-only cookie containing a random and unpredictable token.
- The client application (JavaScript) must read the cookie and set it in the
Authorization
header (Ex. :Authorization: Bearer ..., Csrf: ...
) for every request sent to the API. - The API doesn’t have to store the token. It should only compare the csrf cookie and the csrf token in the header authorization and reject the request if the values don’t match.
11.8. C.S.R.F. & Resource Linking
- We’ve shared before our concerns about resource linking trust in general.
- In general, a client (browser or other) communicates with multiple APIs.
- A malicious or clumsy API response could make the client mistakenly send a request to an unpredictable API while containing the client’s token or any other confidential information.
{ "firstName": "Foo", "address": { "href": "https://api.attacker.com/" } }
- A partial protection could use C.S.P. (Content Security Policy)
connect-src
rules.
https://w3c.github.io/webappsec-csp/
- Otherwise, it is recommended to implement or use an HTTP library using strict whitelist checking or absolute URLs rejection even though relative URLs can also be malicious.
11.9. OAuth 2
- As its name doesn’t tell, OAuth 2 is an authorization protocol and not authentication.
- OAuth 2 is one of the most common authorization standards on the web (IETF stage: Proposed Standard).
11.9.1. OAuth 2 Roles
- OAuth 2 describe 4 roles:
- Resource Owner: An entity that has the legitimacy and decisional power allowing it to access one or multiple protected resources.
- Ex.: A google services user allowing a calendar aggregation application to access his personal and professional calendars.
- Resource Server: This service holds the protected resources. It is capable of responding to resource access requests depending on the presented access tokens.
- Ex. : Google Calendar.
- Client: An application sending requests to the Resource Server on behalf of the Resource Owner using its authorizations. The Client can be an entire frontend (Web / Progressive Web App / Mobile Web App / Desktop etc…) or backend application.
- Ex. : The calendar aggregation application.
- We will distinguish two types of Clients
- Confidential (Ex. : Backend)
- Public (Ex. : Frontend)
- Authorization Server : A server that provides access tokens after the Resource Owner authenticates and grants authorizations.
- Ex. : Google accounts.
- Resource Owner: An entity that has the legitimacy and decisional power allowing it to access one or multiple protected resources.
11.9.2. OAuth 2 Flows
11.9.2.1. OAuth 2 Grant Type: Authorization Code
OAuth 2 suggests 4 different flows and here is the most common one:
- 1 – The Client redirects the Resource Owner to the Authorization Server:
https://auth.wishtack.com/v1/oauth/authorize? response_type=code &client_id=CLIENT_ID &redirect_uri=https://www.wishtack.com/oauth // optional &scope=read &state=... // state is recommended thus optional 😢
- 2 – The Resource Owner then confirms or rejects the requested access authorizations on the interface presented by the Authorization Server.
- 3 – The Client receives an Authorization Code through a redirect:
https://www.wishtack.com/oauth?code=AUTHORIZATION_CODE&state=...
- 4 – The Client can then send a request to the Authorization Server in order to trade the Authorization Code against an Access Token.
POST https://auth.wishtack.com/v1/oauth/token client_id=CLIENT_ID &client_secret=CLIENT_SECRET &grant_type=authorization_code &code=AUTHORIZATION_CODE &redirect_uri=https://www.wishtack.com/oauth
- 5 – In case of success, the Client receives an Access Token and an optional Refresh Token.
{
"access_token":"ACCESS_TOKEN",
"token_type":"bearer",
"expires_in":2592000,
"refresh_token":"REFRESH_TOKEN",
"scope":"read",
"wishtack_user_data":{
"first_name":"John",
"last_name": "DOE",
"email":"j.doe@ibm.com",
"is_cool": "definitely not!"
}
}
- When the Access Token expires and if the Client has received a Refresh Token, the Client can send a request and trade the Refresh Token against a new Access Token and a new Refresh Token.
- The Authorization Code is one-time usage code with a short lifetime (should be less than 10 minutes).
11.9.2.2. OAuth 2 Grant Type: Implicit
- The Implicit Flow is a fallback mode of the Authorization Code Flow.
- It is used when the Client is public (non-confidential) as the Authorization Code Flow is not available in that case.
- 1 – Client redirects the Resource Owner to the Authorization Server:
https://auth.wishtack.com/v1/oauth/authorize? response_type=token &client_id=CLIENT_ID &redirect_uri=CALLBACK_URL &scope=read &state=...
- 2 – The Resource Owner confirms or rejects the requested access authorization on the interface presented by the Authorization Server.
- 3 – Authorization Server redirects the Resource Owner to the Client that receives the Access Token in the URL’s fragment.
https://www.wishtack.com/callback# access_token=ACCESS_TOKEN &token_type=bearer &scope=... &state=...
- 4 – The User-Agent follows the redirection but the fragment doesn’t leave the device.
- 5 – The User-Agent executes the code that extracts the Access Token from the fragment.
- 6 – The Access Token is transmitted to the Client.
- Some playful User-Agents tend to lose the fragments in some cases:
https://bugs.webkit.org/show_bug.cgi?id=24175
- The Resource Owner can grab the Access Token and bypass the Client to communicate directly with the Resource Server.
- In other words, in case of a man-in-the-middle, an attacker can grab the Access Token and communicate with the Resource Server.
- Those are the reasons why it is recommended to sue the Authorization Code flow instead.
11.9.2.3. OAuth 2 Grant Type: Resource Owner Password Credentials
- In really rare cases where the Resource Owner can fully trust the Client, the Resource Owner can give it’s credentials directly to the Client.
- The Client sends the credentials to the Authorization Server to obtain the Access Token directly.
POST https://auth.wishtack.com/v1/oauth/token grant_type=password &username=... &password=... &client_id=CLIENT_ID // If client is confidential &client_secret=CLIENT_SECRET // If client is confidential
- This flow is rarely implemented for the following reasons:
- Resource Owner can’t validate the requested authorizations.
- It only allows password authentication.
- Clients are not designed to handle credentials.
- This flow is the default of a switch/case that we can find in many specifications in order to increase the adoption of the protocol.
- OpenID Connect doesn’t forbid the use of this flow in order to be compatible with OAuth 2 but it’s not mentioned in the specification.
11.9.2.4. OAuth 2 Grant Type: Client Credentials
- The Client can request an Access Token to the Authorization Server in order to access its own resources.
POST https://auth.wishtack.com/v1/oauth/token grant_type=client_credentials &client_id=CLIENT_ID &client_secret=CLIENT_SECRET
11.9.3 OAuth 2 Registration
- Before a Client can communicate with an Authorization Server, it has to register first.
- This subscription can be done through different ways (out of OAuth 2 specification’s scope)
- API.
- Applicative interface.
- Offline (paperwork).
- A mix of all the previous ways.
- During this process, the Authentication Server should at least obtain the following information:
- Client type : confidential or public.
- Redirect URIs : The URI or URIs that the Authorization Server should redirect the Resource Owner to after authorization validation.
- If the Client type is public, the Authorization Server will not provide a Client Secret but it can constrain the access to resources.
11.9.4. OAuth 2 Risks and Recommendations
- TLS EVERYWHERE! Unfortunately, OAuth 2 doesn’t require the usage of TLS for Redirect URIs but it highly recommends it.
- The Authorization Server must verify that the Client really owns the domain name associated to Redirect URIs. (Ex.: Check that the client can add a random and unpredictable DNS TXT entry generated by the Authorization Server)
- The Authorization Server should allow the Client to renew the Client Secret quickly and automatically.
- The Authorization Server should force the Clients to renew their Client Secrets regularly.
- The Client Secret is often stored in a environment variable on the application server.
- A simple configuration error can compromise the whole system by revealing the Client Secret.
- The Authorization Server can allow the Client to programmatically update the Redirect URIs by presenting the Client ID and the Client Secret; therefore, revealing the Client Secret can have a huge security impact.
- Redirect URIs are absolute URLs (scheme, fqdn, port, path) and must not contain a fragment (…/path#fragment).
- The Authorization Server must check the Redirect URIs with a strict equality.
https://www.wishtack.com/oauth?source=test
is strictly different fromhttps://www.wishtack.com/oauth
.
11.9.5. OAuth 2 Substitution Attack
11.9.5.1. Attack description
- This attack supposes that the attacker and the victims are Resource Owners subscribed on the same Authorization Server.
- The attacker initiates an authorization code or implicit flow.
- The attacker stops the flow’s scenario at step 3 when he obtains the Authorization Code (or the Access Token when using the implicit flow).
- The attacker tricks the victim to follow a link pointing on a Redirect URI containing the authorization code or the access token retrieved during the previous step (social engineering or malicious application).
- The victim follows the link to the Client and ends up to be authorized to access the attacker resources.
11.9.5.2. Some scenario examples
- Bank
- The attacker and the victim are customers at the same bank.
- The victim ends up on the bank’s application but with the attacker’s data.
- While trying to download his IBAN, the victims retrieves the attacker’s IBAN.
- Messaging
-
- The attacker steals the victim’s identity by creating a fake Facebook account.
- The attacker can add the same friends as the victim to this fake account.
- The attacker subscribes to a messaging application using Facebook’s OAuth 2 service.
- After the attack, the victim ends up on the messaging application with the attacker’s fake account and starts sending messages to his real friends.
- The attacker signs in to the messaging application and retrieves the private messages the victim has exchanged with his friends.
11.9.5.3. The vulnerability’s origin and the solution
- This vulnerability exists because OAuth 2 doesn’t enforce any link between the flow’s step 1 (requesting the Authorization Code or Access Token) and step 3 (retrieving the Authorization Code or Access Token).
- Luckily, there is an optional
state
parameter, initially intended to help the Client recover the initial state of the application after the authorization process.
- This parameter is now diverted from it’s initial goal. It is now used to mitigate this attack by verifying that the authorized Resource Owner is the same as the one that initiated the flow.
- The most common implementation works as follows:
- The Client generates an unpredictable “nonce” which is also unique for each authorization request.
- The Client can then set the “nonce” in a cookie and in the
state
parameter before redirecting the Resource Owner to the Authorization Server. - After the authorization process, the Authorization Server redirects the Resource Owner to the Client by adding the “nonce” to the state parameter in the URL.
- The Client verifies that the state matches the “nonce” in the cookie.
- Unfortunately, this is a conceptual vulnerability in OAuth 2.
- In the specification, the
state
parameter is “RECOMMENDED” instead of being “REQUIRED” which is not encouraging the Client to secure his application and protect it from this attack.
- If the Authentication Server makes this parameter mandatory, it would break the standard.
- OpenID Connect adds an explicit
nonce
parameter but in order to stay compatible with OAuth 2, this parameter is also optional 😭.
11.9.5.4. Solution et workaround
- The best solution is to make the
state
parameter mandatory but of course, without any verification on the Client’s side, the parameter is useless.
- The Authorization Server can reduce the authorization scope when the
state
parameter is missing.
- This is one of the reasons why the Authorization Code should be short-lived. The problem with the implicit flow is that we can’t reduce the Access Token’s lifetime to few minutes.
11.10. J.O.S.E. (Javascript Object Signing and Encryption)
- JOSE is a framework intended to provide a mean of securely transmitting claims (authorization information for example) between parties.
https://datatracker.ietf.org/wg/jose/charter/
- JOSE contains four main items:
- J.W.K. (JSON Web Key)
Defines the JSON representation of a symmetric or asymmetric cryptographic key. - J.W.S. (JSON Web Signature)
Defines the JSON representation of a signed content. - J.W.E. (JSON Web Encryption)
Defines the JSON representation of an encrypted content. - J.W.T. (JSON Web Token)
Defines the compact and URL-safe representation of a token (optionally signed or encrypted or signed then encrypted) in addition to a list of standard claims registered on IANA.
- J.W.K. (JSON Web Key)
- JOSE doesn’t define any authentication or authorization mechanism.
11.10.1. JWK
- Symmetric key intended for AES256 encryption with a HMAC SHA5122 hash.
{ "kty":"oct", // Key type : Octet Sequence. "alg":"A256CBC-HS512", // Algorithm intended for this key. "k":"GawgguFyGrWKav7AX4VKUg" // Key. "kid": "0" // Key Id. }
- Public asymmetric key intended for signature with it’s X509 certificate chain.
{ "kty":"RSA", // Key type: RSA. "alg": "RS512", // RSA SHA512. "use":"sig", // signature. "kid":"1b94c", // Key Id. "n":"vrjOfz9Ccdgx5nQudyhdoR17V...", "e":"AQAB", "x5c": ["MIIDQjCCAiqgAwIBAgIGATz/FuLiMA0GCS...A6SdS4xSvdXK3IVfOWA=="] }
- Private asymmetric key intended for encryption.
{ "kty":"RSA", "kid":"3j4h", "use":"enc", "n":"t6Q8PWSi1dkJj9hTP8hNYF...PFGGcG1qs2Wz-Q", "e":"AQAB", "d":"GRtbIQmhOZtyszfgKdg4...SdSgqcN96X52esAQ", "p":"2rnSOV4hKSN8sS4Cgc...Ngqh56HDnETTQhH3rCT5T3yJws", "q":"1u_RiFDP7LBYh3N4GXL...TB7LbAHRK9GqocDE5B0f808I4s", "dp":"KkMTWqBUefVwZ2_Dbj1...2pYhEAeYrhttWtxVqLCRViD6c", "dq":"AvfS0-gRxvn0bwJoMSnF...Y63TmmEAu_lRFCOJ3xDea-ots", "qi":"lSQi-w9CpyUReMErP1RsBL...2lNx_76aBZoOUu9HCJ-UsfSOI8" }
11.10.2. JWS : Asymmetric Signature vs Symmetric “Signature”
- Representation of a signed content.
{ "payload": "eyJpc3MiOiJqb2...kjp0cnVlfQ", "signatures": [ { "protected":"eyJhbGciOiJSUzI1NiJ9", "header": {"kid":"123"}, "signature": "cC4hiUPoj9E...cN_IoypGlUPQGe77Rw" }, { "protected":"eyJhbGciOiJFUzI1NiJ9", "header": {"kid":"456"}, "signature": "DtEhU3ljbEg8L38VWA...Kg6NU1Q" } ] }
- It is possible to use symmetric keys to authenticate a message using HMAC.
This would a be a message authentication code and not a signature.
11.10.3. JWE : Asymmetric and Symmetric Encryption
- Representation of an encrypted content.
{ // Integrity protected header but not encrypted! "protected": "eyJlbmMiOiJBMTI4Q0JDLUhTMjU2In0", "unprotected": {"jku":"https://server.example.com/keys.jwks"}, "recipients":[ { // Key and Alg hints. "header": {"alg":"RSA1_5","kid":"123"}, // Encryption key encrypted using 123's public key. "encrypted_key": "UGhIOguC7IuEvf_N...XMR4gp_A" }, { "header": {"alg":"A128KW","kid":"456"}, "encrypted_key": "6KB707dM9YTIgHt...2IlrT1oOQ" } ], "iv": "AxY8DCtDaGlsbGljb3RoZQ", // Encrypted message. "ciphertext": "KDlTtXchhZTGufMYmO...4HffxPSUrfmqCHXaI9wOGY", // AEAD authentication tag. "tag": "Mz-VPPyU4RlcuYv1IwIvzw" }
- The asymmetric encryption has size limit for messages (modulo – padding). For that reason, we generate an ad-hoc symmetric key that encrypts the message then we encrypt the symmetric key using the asymmetric public key.
11.10.4. JWT
11.10.4.1. How JWT works
- JWT describes the structure of a token (encrypted, signed or insecure) containing standard, public or private claims.
https://www.iana.org/assignments/jwt/jwt.xhtml
- In order to make the JWT tokens easier to transfer, the token is serialized in a compact format (which is also applicable on JWE and JWS).
- Each block (header / payload / signature etc…) is encoded in URL-safe base64 and separated using a dot “.”.
- Example:
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOmZhbHNlfQ.FMy5mxG5mDjL4rW8defHN2fZ_U_ypDW6hUT-Oan2F6P36NzCEHq85IXWUChQc5vzCXa_SHWK9j1ZZG3vRwuEkEH-lA_FNPL2EAQjdqq_qxMhaS5SscW8RVb30rd7lw1-OvEESrKcAtqipDmkufpsv3R3YWBItF1Uev0wF1U9QGU
- Header
-
{"alg":"RS256","typ":"JWT"}
- Payload
{"sub":"1234567890","name":"John Doe","admin":false}
- Signature
14ccb99b11b99838cbe2b5bc75e7c73767d9532a435ba8544ce6a7d85e8fdfa3730841eaf392175940a141ce6fcc25da48758af63d59646def470b849041e500534f2f610042376aaaac4c85a4b94ac716f1155bdf4addee5c353af1044ab29c02daa2a439a4b9fa6cbf7477616048b45d547afd3017553d4065
11.10.4.2. Usage and advantages
- JWT tokens are mainly used for authentication and authorization (Ex.: SSO, session, web sockets, email links, etc…).
- They can also be used to transfer and store encrypted or signed data.
- In opposition to binary formats or XML formats like SAML, JWT tokens are:
- relatively lightweight.
- easy to manipulate (many libraries available in different environments).
- easy to archive (Ex. : can be easily stored in some NoSQL databases like MongoDB).
11.10.4.3. JWT usage for authentication
- As the JWT token contains all the required information concerning the identity of the Resource Owner and as it is signed, it can be trusted and we don’t have to check the token with a database or a remote service for every request. This can improve performance and scalability if no other solution is used (caching for example).
- JWT tokens might contain personal information related to the Resource Owner and are most of the time stored on the user’s device (Ex.: Local Storage).
- It is recommended to use signed then encrypted JWT tokens but this increases the token’s size which is sent with every request.
11.10.5. JWT, authentication, sessions and security risks
- Before diving deeper in security, JWT tokens used for authentication and session management have the following issues:
- Important size (mainly when using encryption).
- Claims immutability. We should generate new JWT tokens to transmit updated claims.
- No key policy: JWT does not define any security policy concerning key management (symmetric keys generation rules, key storage, key rotation etc…).
- HMAC is not a signature algorithm: Many documentations and implementations use HMAC to authenticate JWT tokens and call this signature.
- No invalidation: Even though JWT tokens can contain an expiration date, JWT can’t define any way of revoking or invalidating JWT tokens.
- Mmmm… How should we handle the logout?
- The only possible solution is to store some information somewhere (invalidated tokens list, logout time etc…)
- We would then need to check this information anytime we verify a token. We would then lose one of the main interests in using JWT tokens.
TLS private key security policy analogy
- Let’s analyze the common security policy used to handle TLS private keys.
- It is generally recommended (and it is increasingly implemented) to use a dedicated machine for encryption/decryption of TLS data exchanges. That way, even if an application is compromised, the private key held in memory can’t be revealed.
- Keys are renewed regularly.
- Keys are protected with passphrases that sometimes need 2 or 3 persons to be present in order to load the key.
- What happens if a TLS private key is stolen?
- The attacker needs to combine this with Man-In-The-Middle attacks (ARP Spoofing, DNS Cache Poisoning…).
- In general, the attacker would only be able to impact a limited set of users depending on their location and if they are connected during the attack.
- Once the attack is detected, it is possible to revoke the certificate quite quickly and thanks to some protocols like OCSP, the clients would refuse the certificate immediately.
https://tools.ietf.org/html/rfc6960
JWT private keys theft risk
- Let’s now analyze the classical way JWT is used for authentication.
- Unfortunately, the authentication server’s private key is commonly stored in environment variables, databases or files (let’s just hope they won’t end up in the Version Control System).
https://github.com/mitreid-connect/OpenID-Connect-Java-Spring-Server/wiki/Key-generation
http://django-oidc-provider.readthedocs.io/en/v0.4.x/sections/serverkeys.html
- Private keys can be leaked due to different reasons:
- Version Control System unauthorized access.
- SQL Injection.
- Insecure remote file access.
- Environment variables dump due to common configuration errors.
- If the attacker retrieves the private key, he can simply generate JWT tokens with arbitrary claims. He will then be able to retrieve and modify personal data of all the users that are allowed to use JWT for authentication.
“none” alg
- Regrettably, JWT also allows an algorithm called “none” where data is neither signed or encrypted.
- The attacker can generate JWT tokens with a “none” value as the “alg” property.
- If the token verification implementation is using the “alg” property from the token, it is vulnerable and can eventually accept tokens with “alg” “none”.
- Some implementations can also be vulnerable to an attack that targets systems using asymmetric signature. The attacker simply generates a JWT token using “HS256” as a value for the “alg” property and the asymmetric public key as a symmetric key. The vulnerable implementation “successfully” verifies the token thinking that it was authenticated using an HMAC where the secret key is the string representing the public asymmetric key.
11.10.6. JWT : Recommendations
- Use RSA keys for signature (at least 2048bits length).
- The implementation should apply a regular and automatic key rotation. Public keys should be published automatically too.
- Given that the rotation frequency should be longer than the tokens lifetime, tokens should be as short-lived as possible. (Ex.: Google’s implementation of OpenID Connect seems to apply a rotation frequency of 3 to 4 days but personally I would suggest a shorter duration).
- To mitigate the risks, use multiple keys.
- Ideally, secret keys should only be manipulated by dedicated highly-secured services (micro-services ?) with advanced monitoring mechanisms.
- JWT tokens can be used as an additional mechanism to classic tokens. We could wrap our tokens in JWT tokens in order to verify their validity and expiration quickly without checking in database or remote service if they are invalid or expired.
11.11. OpenID Connect
- OpenID Connect (OIDC) is a layer that comes on top of OAuth 2. It adds new features concerning authentication and identification.
http://openid.net/connect/
- It’s an OpenID Foundation standard.
- OpenID Connect is compatible with OAuth 2 implementations.
- It contains many interesting concepts similar to those from Liberty Alliance Project (R.I.P.): http://www.projectliberty.org/
11.11.1. Terminology
- OpenID Provider : OAuth 2 Authorization Server capable of authenticating the End-User (Resource Owner) and transmitting claims to the Relying Party (Client).
- Relying Party : OAuth 2 Client can request claims from the OpenID Provider.
- End-User : OAuth 2 Resource Owner.
11.11.2. What’s new?
OpenID Connect provides the following additional features:
- The Relying Party can ask the OpenID Provider to authenticate or reauthenticate the End-User.
- The Relying Party can also transmit additional information (hint) like the End-User‘s identifier to ease the authentication process.
- Using JWT tokens but we can avoid transmitting them to the End-User.
- Distributed and aggregated claims.
- End-User data are commonly distributed between multiple OpenID Providers.
- With OpenID Connect, an OpenID Provider can aggregate claims or provide the necessary information to recover the claims from other OpenID Providers.
- Logout: When an End-User logs out from the OpenID Provider, the OpenID Provider can notify the Relying Parties through different mechanisms.
- Dynamic Client Registration: Some OpenID Providers can activate this feature to allow Relying Parties to subscribe dynamically.
- Discovery: OpenID Provider can publish information allowing other entities (Relying Party, OpenID Provider, …) to dynamically discover the features and the configuration of the OpenID Provider.
https://accounts.google.com/.well-known/openid-configuration
- OpenID Connect describes some additional claims.
- OpenID Connect describes some common scopes that encompass multiple claims.
11.11.3. OpenID Connect Flows
OpenID Connect describes 3 flows :
- Authorization Code Flow which is identical to OAuth 2 but it adds some parameters and the OpenID Provider can return an id_token which is a JWT token.
- In this flow, the JWT token is directly transmitted from the OpenID Provider to the Relying Party without going through the User-Agent. The token’s signature is an additional security on top of the security provided by the TLS channel.
- Implicit Flow which is identical to OAuth 2 but it also provides an id_token.
- It has the same security issues as OAuth 2 implicit flow in addition to the security risks implied by the fact the id_token is a JWT token.
- Hybrid Flow is a fusion of the two previous flows.
11.11.4. What should we do?
- Unfortunately, OpenID Connect does not define any rule concerning the JWT tokens signature, storage or key rotation.
- OpenID Connect is one of the most advanced standards concerning authentication, authorization and identity management in general.
- One should absolutely avoid the implicit flow. It shouldn’t be a barrier to implement a dedicated backend for the Relying Party.
- Use asymmetric keys.
- Keys must be rotated regularly and automatically.
“ReSTful Web APIs” by Richardson and Amundsen