Fixing `Invalid argument (string): Contains invalid characters` when using http's Response
Today while I was writing a unit test for a web crawler, I ran into an encoding error. What I did was download a webpage with curl, then tried to make my mock Client return that. It had worked with another page, but not that one. Let's fix it.
My code looked like this:
import 'dart:io';
import 'package:gpu_benchmarks/game_crawler.dart';
import 'package:http/http.dart';
import 'package:mockito/annotations.dart';
import 'package:mockito/mockito.dart';
import 'package:test/test.dart';
import 'videocardbenchmark_crawler_test.mocks.dart';
@GenerateMocks([Client])
main() {
test('Crawls a games page', () async {
const url = '[some url]';
final client = MockClient();
final uri = Uri.parse(url);
final mockResponse =
Response(File('./test/games.html').readAsStringSync(), 200);
when(client.get(any)).thenAnswer((_) => Future.value(mockResponse));
final r = await GameCrawler.crawl(uri, client);
expect(r.length, equals(0));
});
}
The error came from the line when I am initializing Response
:
Invalid argument (string): Contains invalid characters.: "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n\n...
...
...
dart:convert Latin1Codec.encode
new Response
package:http/src/response.dart:37
Here's what the constructor for Response
looks like:
Here's the code for _encodingForHeaders:
/// Returns the encoding to use for a response with the given headers.
///
/// Defaults to [latin1] if the headers don't specify a charset or if that
/// charset is unknown.
Encoding _encodingForHeaders(Map<String, String> headers) =>
encodingForCharset(_contentTypeForHeaders(headers).parameters['charset']);
So it tries to find the charset in the headers.
/// Returns the [Encoding] that corresponds to [charset].
///
/// Returns [fallback] if [charset] is null or if no [Encoding] was found that
/// corresponds to [charset].
Encoding encodingForCharset(String? charset, [Encoding fallback = latin1]) {
if (charset == null) return fallback;
return Encoding.getByName(charset) ?? fallback;
}
And if it can't find it in the header, then it'll fall back to latin1
. I find it odd that it wouldn't fall back to utf8, but ok. Let's try and pass a charset in the header, using one of those constants. I want to use "utf-8":
So here's how I pass it:
final mockResponse = Response(
File('./test/games.html').readAsStringSync(), 200,
headers: {'charset': 'utf-8'});
Unfortunately, it still threw the exception. Looking at the code again, it turns out Response doesn't look at the charset property in the headers that I passed. Instead it parses the content-type from the headers, then extracts the charset from that:
/// Returns the encoding to use for a response with the given headers.
///
/// Defaults to [latin1] if the headers don't specify a charset or if that
/// charset is unknown.
Encoding _encodingForHeaders(Map<String, String> headers) =>
encodingForCharset(_contentTypeForHeaders(headers).parameters['charset']);
/// Returns the [MediaType] object for the given headers's content-type.
///
/// Defaults to `application/octet-stream`.
MediaType _contentTypeForHeaders(Map<String, String> headers) {
var contentType = headers['content-type'];
if (contentType != null) return MediaType.parse(contentType);
return MediaType('application', 'octet-stream');
}
The parsing is done here:
/// Parses a media type.
///
/// This will throw a FormatError if the media type is invalid.
factory MediaType.parse(String mediaType) =>
// This parsing is based on sections 3.6 and 3.7 of the HTTP spec:
// http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html.
I was unfamiliar with content-types, but Google gives a few common examples, and it looks like "text/html; charset=utf-8" should do the trick. And indeed, this code doesn't throw an exception anymore:
final mockResponse = Response(
File('./test/games.html').readAsStringSync(), 200,
headers: {'content-type': 'text/html; charset=utf-8'});