serde的反序列化架构主要涉及三个特性:Deserialize、Deserializer和Visitor,各自承担的责任不同。
Deserialize Trait
pub trait Deserialize<'de>: Sized { /// Deserialize this value from the given Serde deserializer. fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de>; }
该 Trait 定义了反序列化一个值的接口,使用某个的反序列化器(Deserializer)创建自身对象,如果出错返回错误。
带有一个泛型生存期 'de
指使用的反序列化器的生存期,返回的错误是反序列化器定义的错误类型。
Deserializer Trait
pub trait Deserializer<'de>: Sized { /// The error type that can be returned if some error occurs during /// deserialization. type Error: Error; /// Require the `Deserializer` to figure out how to drive the visitor based /// on what data type is in the input. /// /// When implementing `Deserialize`, you should avoid relying on /// `Deserializer::deserialize_any` unless you need to be told by the /// Deserializer what type is in the input. Know that relying on /// `Deserializer::deserialize_any` means your data type will be able to /// deserialize from self-describing formats only, ruling out Bincode and /// many others. fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `bool` value. fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting an `i8` value. fn deserialize_i8<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting an `i16` value. fn deserialize_i16<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting an `i32` value. fn deserialize_i32<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting an `i64` value. fn deserialize_i64<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `u8` value. fn deserialize_u8<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `u16` value. fn deserialize_u16<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `u32` value. fn deserialize_u32<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `u64` value. fn deserialize_u64<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `f32` value. fn deserialize_f32<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `f64` value. fn deserialize_f64<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a `char` value. fn deserialize_char<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a string value and does /// not benefit from taking ownership of buffered data owned by the /// `Deserializer`. /// /// If the `Visitor` would benefit from taking ownership of `String` data, /// indiciate this to the `Deserializer` by using `deserialize_string` /// instead. fn deserialize_str<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a string value and would /// benefit from taking ownership of buffered data owned by the /// `Deserializer`. /// /// If the `Visitor` would not benefit from taking ownership of `String` /// data, indicate that to the `Deserializer` by using `deserialize_str` /// instead. fn deserialize_string<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a byte array and does not /// benefit from taking ownership of buffered data owned by the /// `Deserializer`. /// /// If the `Visitor` would benefit from taking ownership of `Vec<u8>` data, /// indicate this to the `Deserializer` by using `deserialize_byte_buf` /// instead. fn deserialize_bytes<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a byte array and would /// benefit from taking ownership of buffered data owned by the /// `Deserializer`. /// /// If the `Visitor` would not benefit from taking ownership of `Vec<u8>` /// data, indicate that to the `Deserializer` by using `deserialize_bytes` /// instead. fn deserialize_byte_buf<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting an optional value. /// /// This allows deserializers that encode an optional value as a nullable /// value to convert the null value into `None` and a regular value into /// `Some(value)`. fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a unit value. fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a unit struct with a /// particular name. fn deserialize_unit_struct<V>( self, name: &'static str, visitor: V, ) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a newtype struct with a /// particular name. fn deserialize_newtype_struct<V>( self, name: &'static str, visitor: V, ) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a sequence of values. fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a sequence of values and /// knows how many values there are without looking at the serialized data. fn deserialize_tuple<V>(self, len: usize, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a tuple struct with a /// particular name and number of fields. fn deserialize_tuple_struct<V>( self, name: &'static str, len: usize, visitor: V, ) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a map of key-value pairs. fn deserialize_map<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting a struct with a particular /// name and fields. fn deserialize_struct<V>( self, name: &'static str, fields: &'static [&'static str], visitor: V, ) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting an enum value with a /// particular name and possible variants. fn deserialize_enum<V>( self, name: &'static str, variants: &'static [&'static str], visitor: V, ) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type is expecting the name of a struct /// field or the discriminant of an enum variant. fn deserialize_identifier<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Hint that the `Deserialize` type needs to deserialize a value whose type /// doesn't matter because it is ignored. /// /// Deserializers for non-self-describing formats may not support this mode. fn deserialize_ignored_any<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de>; /// Determine whether `Deserialize` implementations should expect to /// deserialize their human-readable form. /// /// Some types have a human-readable form that may be somewhat expensive to /// construct, as well as a binary form that is compact and efficient. /// Generally text-based formats like JSON and YAML will prefer to use the /// human-readable one and binary formats like Bincode will prefer the /// compact one. /// /// ``` /// # use std::ops::Add; /// # use std::str::FromStr; /// # /// # struct Timestamp; /// # /// # impl Timestamp { /// # const EPOCH: Timestamp = Timestamp; /// # } /// # /// # impl FromStr for Timestamp { /// # type Err = String; /// # fn from_str(_: &str) -> Result<Self, Self::Err> { /// # unimplemented!() /// # } /// # } /// # /// # struct Duration; /// # /// # impl Duration { /// # fn seconds(_: u64) -> Self { unimplemented!() } /// # } /// # /// # impl Add<Duration> for Timestamp { /// # type Output = Timestamp; /// # fn add(self, _: Duration) -> Self::Output { /// # unimplemented!() /// # } /// # } /// # /// use serde::de::{self, Deserialize, Deserializer}; /// /// impl<'de> Deserialize<'de> for Timestamp { /// fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> /// where D: Deserializer<'de> /// { /// if deserializer.is_human_readable() { /// // Deserialize from a human-readable string like "2015-05-15T17:01:00Z". /// let s = String::deserialize(deserializer)?; /// Timestamp::from_str(&s).map_err(de::Error::custom) /// } else { /// // Deserialize from a compact binary representation, seconds since /// // the Unix epoch. /// let n = u64::deserialize(deserializer)?; /// Ok(Timestamp::EPOCH + Duration::seconds(n)) /// } /// } /// } /// ``` /// /// The default implementation of this method returns `true`. Data formats /// may override this to `false` to request a compact form for types that /// support one. Note that modifying this method to change a format from /// human-readable to compact or vice versa should be regarded as a breaking /// change, as a value serialized in human-readable mode is not required to /// deserialize from the same data in compact mode. #[inline] fn is_human_readable(&self) -> bool { true } }
反序列化器提供了一系列的方法,可以供 Deserialize 使用,每个方法都需要一个 Visitor,方法返回的值由Visitor决定,在实现某个数据类型的 Deserialize 特性时,需要调用反序列化器的方法表达我需要一个什么类型的数据,并给方法一个合适的 Visitor,值得注意的是:调用 deserialize_u8 方法不一定返回u8,返回的实际类型由 Visitor::Value 决定。
Visitor Trait
pub trait Visitor<'de>: Sized { /// The value produced by this visitor. type Value; /// Format a message stating what data this Visitor expects to receive. /// /// This is used in error messages. The message should complete the sentence /// "This Visitor expects to receive ...", for example the message could be /// "an integer between 0 and 64". The message should not be capitalized and /// should not end with a period. /// /// ```rust /// # use std::fmt; /// # /// # struct S { /// # max: usize, /// # } /// # /// # impl<'de> serde::de::Visitor<'de> for S { /// # type Value = (); /// # /// fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result { /// write!(formatter, "an integer between 0 and {}", self.max) /// } /// # } /// ``` fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result; /// The input contains a boolean. /// /// The default implementation fails with a type error. fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Bool(v), &self)) } /// The input contains an `i8`. /// /// The default implementation forwards to [`visit_i64`]. /// /// [`visit_i64`]: #method.visit_i64 fn visit_i8<E>(self, v: i8) -> Result<Self::Value, E> where E: Error, { self.visit_i64(v as i64) } /// The input contains an `i16`. /// /// The default implementation forwards to [`visit_i64`]. /// /// [`visit_i64`]: #method.visit_i64 fn visit_i16<E>(self, v: i16) -> Result<Self::Value, E> where E: Error, { self.visit_i64(v as i64) } /// The input contains an `i32`. /// /// The default implementation forwards to [`visit_i64`]. /// /// [`visit_i64`]: #method.visit_i64 fn visit_i32<E>(self, v: i32) -> Result<Self::Value, E> where E: Error, { self.visit_i64(v as i64) } /// The input contains an `i64`. /// /// The default implementation fails with a type error. fn visit_i64<E>(self, v: i64) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Signed(v), &self)) } /// The input contains a `u8`. /// /// The default implementation forwards to [`visit_u64`]. /// /// [`visit_u64`]: #method.visit_u64 fn visit_u8<E>(self, v: u8) -> Result<Self::Value, E> where E: Error, { self.visit_u64(v as u64) } /// The input contains a `u16`. /// /// The default implementation forwards to [`visit_u64`]. /// /// [`visit_u64`]: #method.visit_u64 fn visit_u16<E>(self, v: u16) -> Result<Self::Value, E> where E: Error, { self.visit_u64(v as u64) } /// The input contains a `u32`. /// /// The default implementation forwards to [`visit_u64`]. /// /// [`visit_u64`]: #method.visit_u64 fn visit_u32<E>(self, v: u32) -> Result<Self::Value, E> where E: Error, { self.visit_u64(v as u64) } /// The input contains a `u64`. /// /// The default implementation fails with a type error. fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Unsigned(v), &self)) } /// The input contains an `f32`. /// /// The default implementation forwards to [`visit_f64`]. /// /// [`visit_f64`]: #method.visit_f64 fn visit_f32<E>(self, v: f32) -> Result<Self::Value, E> where E: Error, { self.visit_f64(v as f64) } /// The input contains an `f64`. /// /// The default implementation fails with a type error. fn visit_f64<E>(self, v: f64) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Float(v), &self)) } /// The input contains a `char`. /// /// The default implementation forwards to [`visit_str`] as a one-character /// string. /// /// [`visit_str`]: #method.visit_str #[inline] fn visit_char<E>(self, v: char) -> Result<Self::Value, E> where E: Error, { self.visit_str(utf8::encode(v).as_str()) } /// The input contains a string. The lifetime of the string is ephemeral and /// it may be destroyed after this method returns. /// /// This method allows the `Deserializer` to avoid a copy by retaining /// ownership of any buffered data. `Deserialize` implementations that do /// not benefit from taking ownership of `String` data should indicate that /// to the deserializer by using `Deserializer::deserialize_str` rather than /// `Deserializer::deserialize_string`. /// /// It is never correct to implement `visit_string` without implementing /// `visit_str`. Implement neither, both, or just `visit_str`. fn visit_str<E>(self, v: &str) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Str(v), &self)) } /// The input contains a string that lives at least as long as the /// `Deserializer`. /// /// This enables zero-copy deserialization of strings in some formats. For /// example JSON input containing the JSON string `"borrowed"` can be /// deserialized with zero copying into a `&'a str` as long as the input /// data outlives `'a`. /// /// The default implementation forwards to `visit_str`. #[inline] fn visit_borrowed_str<E>(self, v: &'de str) -> Result<Self::Value, E> where E: Error, { self.visit_str(v) } /// The input contains a string and ownership of the string is being given /// to the `Visitor`. /// /// This method allows the `Visitor` to avoid a copy by taking ownership of /// a string created by the `Deserializer`. `Deserialize` implementations /// that benefit from taking ownership of `String` data should indicate that /// to the deserializer by using `Deserializer::deserialize_string` rather /// than `Deserializer::deserialize_str`, although not every deserializer /// will honor such a request. /// /// It is never correct to implement `visit_string` without implementing /// `visit_str`. Implement neither, both, or just `visit_str`. /// /// The default implementation forwards to `visit_str` and then drops the /// `String`. #[inline] #[cfg(any(feature = "std", feature = "alloc"))] fn visit_string<E>(self, v: String) -> Result<Self::Value, E> where E: Error, { self.visit_str(&v) } /// The input contains a byte array. The lifetime of the byte array is /// ephemeral and it may be destroyed after this method returns. /// /// This method allows the `Deserializer` to avoid a copy by retaining /// ownership of any buffered data. `Deserialize` implementations that do /// not benefit from taking ownership of `Vec<u8>` data should indicate that /// to the deserializer by using `Deserializer::deserialize_bytes` rather /// than `Deserializer::deserialize_byte_buf`. /// /// It is never correct to implement `visit_byte_buf` without implementing /// `visit_bytes`. Implement neither, both, or just `visit_bytes`. fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E> where E: Error, { let _ = v; Err(Error::invalid_type(Unexpected::Bytes(v), &self)) } /// The input contains a byte array that lives at least as long as the /// `Deserializer`. /// /// This enables zero-copy deserialization of bytes in some formats. For /// example Bincode data containing bytes can be deserialized with zero /// copying into a `&'a [u8]` as long as the input data outlives `'a`. /// /// The default implementation forwards to `visit_bytes`. #[inline] fn visit_borrowed_bytes<E>(self, v: &'de [u8]) -> Result<Self::Value, E> where E: Error, { self.visit_bytes(v) } /// The input contains a byte array and ownership of the byte array is being /// given to the `Visitor`. /// /// This method allows the `Visitor` to avoid a copy by taking ownership of /// a byte buffer created by the `Deserializer`. `Deserialize` /// implementations that benefit from taking ownership of `Vec<u8>` data /// should indicate that to the deserializer by using /// `Deserializer::deserialize_byte_buf` rather than /// `Deserializer::deserialize_bytes`, although not every deserializer will /// honor such a request. /// /// It is never correct to implement `visit_byte_buf` without implementing /// `visit_bytes`. Implement neither, both, or just `visit_bytes`. /// /// The default implementation forwards to `visit_bytes` and then drops the /// `Vec<u8>`. #[cfg(any(feature = "std", feature = "alloc"))] fn visit_byte_buf<E>(self, v: Vec<u8>) -> Result<Self::Value, E> where E: Error, { self.visit_bytes(&v) } /// The input contains an optional that is absent. /// /// The default implementation fails with a type error. fn visit_none<E>(self) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Option, &self)) } /// The input contains an optional that is present. /// /// The default implementation fails with a type error. fn visit_some<D>(self, deserializer: D) -> Result<Self::Value, D::Error> where D: Deserializer<'de>, { let _ = deserializer; Err(Error::invalid_type(Unexpected::Option, &self)) } /// The input contains a unit `()`. /// /// The default implementation fails with a type error. fn visit_unit<E>(self) -> Result<Self::Value, E> where E: Error, { Err(Error::invalid_type(Unexpected::Unit, &self)) } /// The input contains a newtype struct. /// /// The content of the newtype struct may be read from the given /// `Deserializer`. /// /// The default implementation fails with a type error. fn visit_newtype_struct<D>(self, deserializer: D) -> Result<Self::Value, D::Error> where D: Deserializer<'de>, { let _ = deserializer; Err(Error::invalid_type(Unexpected::NewtypeStruct, &self)) } /// The input contains a sequence of elements. /// /// The default implementation fails with a type error. fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error> where A: SeqAccess<'de>, { let _ = seq; Err(Error::invalid_type(Unexpected::Seq, &self)) } /// The input contains a key-value map. /// /// The default implementation fails with a type error. fn visit_map<A>(self, map: A) -> Result<Self::Value, A::Error> where A: MapAccess<'de>, { let _ = map; Err(Error::invalid_type(Unexpected::Map, &self)) } /// The input contains an enum. /// /// The default implementation fails with a type error. fn visit_enum<A>(self, data: A) -> Result<Self::Value, A::Error> where A: EnumAccess<'de>, { let _ = data; Err(Error::invalid_type(Unexpected::Enum, &self)) } }
Visitor 的作用就是一个类型转换器,它定义了各种数据类型转换为 Self::Value 的方法。
反序列化过程
Deserialize 负责告诉 Deserializer 我想要一个什么类型的值以及它可以由哪些数据类型转换而来(Visitor),Deserializer 解析特定格式的原数据得到具体的数据以及该数据的类型,然后调用Visitor的相关类型方法将数据转换为Deserialize需要的数据类型。解析出错或者转换出错都会导致反序列化出错。
职责分工与解耦
- serde库可以提供常见数据类型的转换方法,即实现常用数据类型的Visitor。如果没有满足需要的Visitor,那么Deserialize的实现者可以实现自己的Visitor。
- 某种原数据格式的解析方法由第三方来提供,即由第三方来实现通用的Deserializer,如json、xml等。
- 由Deserialize的实现者来提供特定数据结构构造方法,也可以由serde库提供常见数据类型的构造方法。
三者职责不同,却紧密配合,完成反序列化与文档解析的解耦合,让担任不同角色的程序员可以独立工作,不用考虑其它角色的工作,达到最大化的代码共享和复用
附注
/// The input contains a sequence of elements. /// /// The default implementation fails with a type error. fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error> where A: SeqAccess<'de>, { let _ = seq; Err(Error::invalid_type(Unexpected::Seq, &self)) } /// The input contains a key-value map. /// /// The default implementation fails with a type error. fn visit_map<A>(self, map: A) -> Result<Self::Value, A::Error> where A: MapAccess<'de>, { let _ = map; Err(Error::invalid_type(Unexpected::Map, &self)) } /// The input contains an enum. /// /// The default implementation fails with a type error. fn visit_enum<A>(self, data: A) -> Result<Self::Value, A::Error> where A: EnumAccess<'de>, { let _ = data; Err(Error::invalid_type(Unexpected::Enum, &self)) }
Visitor中有三个特殊的方法,使用到了三个 Trait:SeqAccess、MapAccess、EnumAccess,为什么需要这三个 Trait? 有些 Self::Value 可以从一些复杂的数据结构转换而来,比如列表、映射表或枚举类型,而这些数据是由 Deserializer 创建的,Visitor 不知道如何来访问其中的数值,所以需要 Deserializer 的实现者提供访问的方法,这三个 Trait 由 Deserializer 的实现者来实现,并在需要的时候提供给 Visitor 使用。