-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用pgloader将mysql迁移至pgsql并使用mysql_fdw校验表数据 #48
Comments
CREATE TABLE `tbmi_metadata_embedded_exif` (
`imageId` int unsigned NOT NULL,
`orientation` text,
`imageDescription` text,
`userComment` blob,
`artist` text,
`xpAuthor` blob,
`copyright` text,
`imageUniqueId` text,
`bodySerialNumber` text,
`make` text,
`model` text,
`software` text,
`customRendered` smallint DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`dateTimeOffset` text,
`dateTimeDigitized` datetime DEFAULT NULL,
`dateTimeDigitizedOffset` text,
`dateTimeOriginal` datetime DEFAULT NULL,
`dateTimeOriginalOffset` text,
`offsetTime` text,
`offsetTimeDigitized` text,
`offsetTimeOriginal` text,
`gpsDateTime` datetime DEFAULT NULL,
`gpsCoordinate` point DEFAULT NULL,
`gpsImgDirection` float DEFAULT NULL,
`gpsImgDirectionRef` text,
`rawBytes` blob,
`xxHash3` binary(8) NOT NULL,
PRIMARY KEY (`imageId`)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; 由pgloader在pgsql端按照 create table tbm.tbmi_metadata_embedded_exif (
"imageId" bigint primary key not null,
orientation text,
"imageDescription" text,
"userComment" text,
artist text,
"xpAuthor" text,
copyright text,
"imageUniqueId" text,
"bodySerialNumber" text,
make text,
model text,
software text,
"customRendered" smallint,
"dateTime" timestamp with time zone,
"dateTimeOffset" text,
"dateTimeDigitized" timestamp with time zone,
"dateTimeDigitizedOffset" text,
"dateTimeOriginal" timestamp with time zone,
"dateTimeOriginalOffset" text,
"offsetTime" text,
"offsetTimeDigitized" text,
"offsetTimeOriginal" text,
"gpsDateTime" timestamp with time zone,
"gpsCoordinate" point,
"gpsImgDirection" double precision,
"gpsImgDirectionRef" text,
"rawBytes" bytea,
"xxHash3" bytea not null
); 10.1. 使用 SELECT COUNT(*) FROM ONLY mysql."tbmi_metadata_embedded_exif" a FULL OUTER JOIN ONLY tbm."tbmi_metadata_embedded_exif" b ON row(a) = row(b) WHERE a IS NULL OR b IS NULL; 可得 ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions 10.1.1. 执行 SELECT COUNT(*) FROM mysql."tbmi_metadata_embedded_exif" a FULL OUTER JOIN ONLY tbm."tbmi_metadata_embedded_exif" b
USING ("imageId", "orientation", "imageDescription", "userComment", "artist", "xpAuthor", "copyright", "imageUniqueId", "bodySerialNumber", "make", "model", "software", "customRendered", "dateTime", "dateTimeOffset", "dateTimeDigitized", "dateTimeDigitizedOffset", "dateTimeOriginal", "dateTimeOriginalOffset", "offsetTime", "offsetTimeDigitized", "offsetTimeOriginal", "gpsDateTime", "gpsCoordinate", "gpsImgDirection", "gpsImgDirectionRef", "rawBytes", "xxHash3")
WHERE (a."imageId" IS NULL AND a."orientation" IS NULL AND a."imageDescription" IS NULL AND a."userComment" IS NULL AND a."artist" IS NULL AND a."xpAuthor" IS NULL AND a."copyright" IS NULL AND a."imageUniqueId" IS NULL AND a."bodySerialNumber" IS NULL AND a."make" IS NULL AND a."model" IS NULL AND a."software" IS NULL AND a."customRendered" IS NULL AND a."dateTime" IS NULL AND a."dateTimeOffset" IS NULL AND a."dateTimeDigitized" IS NULL AND a."dateTimeDigitizedOffset" IS NULL AND a."dateTimeOriginal" IS NULL AND a."dateTimeOriginalOffset" IS NULL AND a."offsetTime" IS NULL AND a."offsetTimeDigitized" IS NULL AND a."offsetTimeOriginal" IS NULL AND a."gpsDateTime" IS NULL AND a."gpsCoordinate" IS NULL AND a."gpsImgDirection" IS NULL AND a."gpsImgDirectionRef" IS NULL AND a."rawBytes" IS NULL AND a."xxHash3" IS NULL)
OR (b."imageId" IS NULL AND b."orientation" IS NULL AND b."imageDescription" IS NULL AND b."userComment" IS NULL AND b."artist" IS NULL AND b."xpAuthor" IS NULL AND b."copyright" IS NULL AND b."imageUniqueId" IS NULL AND b."bodySerialNumber" IS NULL AND b."make" IS NULL AND b."model" IS NULL AND b."software" IS NULL AND b."customRendered" IS NULL AND b."dateTime" IS NULL AND b."dateTimeOffset" IS NULL AND b."dateTimeDigitized" IS NULL AND b."dateTimeDigitizedOffset" IS NULL AND b."dateTimeOriginal" IS NULL AND b."dateTimeOriginalOffset" IS NULL AND b."offsetTime" IS NULL AND b."offsetTimeDigitized" IS NULL AND b."offsetTimeOriginal" IS NULL AND b."gpsDateTime" IS NULL AND b."gpsCoordinate" IS NULL AND b."gpsImgDirection" IS NULL AND b."gpsImgDirectionRef" IS NULL AND b."rawBytes" IS NULL AND b."xxHash3" IS NULL); 会有 ERROR: operator does not exist: point = point
Hint: No operator matches the given name and argument types. You might need to add explicit type casts. 10.1.2. 按照 https://dba.stackexchange.com/questions/252066/how-to-formulate-equality-predicate-on-point-column-in-postgresql/252112#252112 使用 SELECT COUNT(*) FROM mysql."tbmi_metadata_embedded_exif" a FULL OUTER JOIN ONLY tbm."tbmi_metadata_embedded_exif" b ON
(a."imageId" = b."imageId") AND
(a."orientation" = b."orientation" OR (a."orientation" IS NULL AND b."orientation" IS NULL)) AND
(a."imageDescription" = b."imageDescription" OR (a."imageDescription" IS NULL AND b."imageDescription" IS NULL)) AND
(a."userComment" = b."userComment" OR (a."userComment" IS NULL AND b."userComment" IS NULL)) AND
(a."artist" = b."artist" OR (a."artist" IS NULL AND b."artist" IS NULL)) AND
(a."xpAuthor" = b."xpAuthor" OR (a."xpAuthor" IS NULL AND b."xpAuthor" IS NULL)) AND
(a."copyright" = b."copyright" OR (a."copyright" IS NULL AND b."copyright" IS NULL)) AND
(a."imageUniqueId" = b."imageUniqueId" OR (a."imageUniqueId" IS NULL AND b."imageUniqueId" IS NULL)) AND
(a."bodySerialNumber" = b."bodySerialNumber" OR (a."bodySerialNumber" IS NULL AND b."bodySerialNumber" IS NULL)) AND
(a."make" = b."make" OR (a."make" IS NULL AND b."make" IS NULL)) AND
(a."model" = b."model" OR (a."model" IS NULL AND b."model" IS NULL)) AND
(a."software" = b."software" OR (a."software" IS NULL AND b."software" IS NULL)) AND
(a."customRendered" = b."customRendered" OR (a."customRendered" IS NULL AND b."customRendered" IS NULL)) AND
(a."dateTime" = b."dateTime" OR (a."dateTime" IS NULL AND b."dateTime" IS NULL)) AND
(a."dateTimeOffset" = b."dateTimeOffset" OR (a."dateTimeOffset" IS NULL AND b."dateTimeOffset" IS NULL)) AND
(a."dateTimeDigitized" = b."dateTimeDigitized" OR (a."dateTimeDigitized" IS NULL AND b."dateTimeDigitized" IS NULL)) AND
(a."dateTimeDigitizedOffset" = b."dateTimeDigitizedOffset" OR (a."dateTimeDigitizedOffset" IS NULL AND b."dateTimeDigitizedOffset" IS NULL)) AND
(a."dateTimeOriginal" = b."dateTimeOriginal" OR (a."dateTimeOriginal" IS NULL AND b."dateTimeOriginal" IS NULL)) AND
(a."dateTimeOriginalOffset" = b."dateTimeOriginalOffset" OR (a."dateTimeOriginalOffset" IS NULL AND b."dateTimeOriginalOffset" IS NULL)) AND
(a."offsetTime" = b."offsetTime" OR (a."offsetTime" IS NULL AND b."offsetTime" IS NULL)) AND
(a."offsetTimeDigitized" = b."offsetTimeDigitized" OR (a."offsetTimeDigitized" IS NULL AND b."offsetTimeDigitized" IS NULL)) AND
(a."offsetTimeOriginal" = b."offsetTimeOriginal" OR (a."offsetTimeOriginal" IS NULL AND b."offsetTimeOriginal" IS NULL)) AND
(a."gpsDateTime" = b."gpsDateTime" OR (a."gpsDateTime" IS NULL AND b."gpsDateTime" IS NULL)) AND
(a."gpsCoordinate" ~= b."gpsCoordinate" OR (a."gpsCoordinate" IS NULL AND b."gpsCoordinate" IS NULL)) AND
(a."gpsImgDirection" = b."gpsImgDirection" OR (a."gpsImgDirection" IS NULL AND b."gpsImgDirection" IS NULL)) AND
(a."gpsImgDirectionRef" = b."gpsImgDirectionRef" OR (a."gpsImgDirectionRef" IS NULL AND b."gpsImgDirectionRef" IS NULL)) AND
(a."rawBytes" = b."rawBytes" OR (a."rawBytes" IS NULL AND b."rawBytes" IS NULL)) AND
(a."xxHash3" = b."xxHash3")
WHERE (a."imageId" IS NULL AND a."orientation" IS NULL AND a."imageDescription" IS NULL AND a."userComment" IS NULL AND a."artist" IS NULL AND a."xpAuthor" IS NULL AND a."copyright" IS NULL AND a."imageUniqueId" IS NULL AND a."bodySerialNumber" IS NULL AND a."make" IS NULL AND a."model" IS NULL AND a."software" IS NULL AND a."customRendered" IS NULL AND a."dateTime" IS NULL AND a."dateTimeOffset" IS NULL AND a."dateTimeDigitized" IS NULL AND a."dateTimeDigitizedOffset" IS NULL AND a."dateTimeOriginal" IS NULL AND a."dateTimeOriginalOffset" IS NULL AND a."offsetTime" IS NULL AND a."offsetTimeDigitized" IS NULL AND a."offsetTimeOriginal" IS NULL AND a."gpsDateTime" IS NULL AND a."gpsCoordinate" IS NULL AND a."gpsImgDirection" IS NULL AND a."gpsImgDirectionRef" IS NULL AND a."rawBytes" IS NULL AND a."xxHash3" IS NULL)
OR (b."imageId" IS NULL AND b."orientation" IS NULL AND b."imageDescription" IS NULL AND b."userComment" IS NULL AND b."artist" IS NULL AND b."xpAuthor" IS NULL AND b."copyright" IS NULL AND b."imageUniqueId" IS NULL AND b."bodySerialNumber" IS NULL AND b."make" IS NULL AND b."model" IS NULL AND b."software" IS NULL AND b."customRendered" IS NULL AND b."dateTime" IS NULL AND b."dateTimeOffset" IS NULL AND b."dateTimeDigitized" IS NULL AND b."dateTimeDigitizedOffset" IS NULL AND b."dateTimeOriginal" IS NULL AND b."dateTimeOriginalOffset" IS NULL AND b."offsetTime" IS NULL AND b."offsetTimeDigitized" IS NULL AND b."offsetTimeOriginal" IS NULL AND b."gpsDateTime" IS NULL AND b."gpsCoordinate" IS NULL AND b."gpsImgDirection" IS NULL AND b."gpsImgDirectionRef" IS NULL AND b."rawBytes" IS NULL AND b."xxHash3" IS NULL); 可得 ERROR: invalid input syntax for type point: "��" 其分别是 https://codepoints.net/U+FFFE 的utf8和 10.1.3. 即便尝试 - (a."gpsCoordinate" ~= b."gpsCoordinate" OR (a."gpsCoordinate" IS NULL AND b."gpsCoordinate" IS NULL)) AND
+ (a."gpsCoordinate"::text = b."gpsCoordinate"::text OR (a."gpsCoordinate" IS NULL AND b."gpsCoordinate" IS NULL)) AND 事实核查:截止2024年4月,mysql_fdw仍然不支持转换任何gis类型 EnterpriseDB/mysql_fdw#260 10.2.mysql创建 CREATE OR REPLACE VIEW tbmi_metadata_embedded_exif_fdw
AS SELECT *,
ST_Latitude(gpsCoordinate) gpsCoordinateLat,
ST_Longitude(gpsCoordinate) gpsCoordinateLng
FROM tbmi_metadata_embedded_exif;
SELECT imageId, gpsCoordinate, gpsCoordinateLat, gpsCoordinateLng
FROM tbmi_metadata_embedded_exif_fdw
WHERE gpsCoordinate IS NOT NULL; 清理: DROP VIEW tbmi_metadata_embedded_exif_fdw; 10.2.1. 按照 DROP FOREIGN TABLE mysql."tbmi_metadata_embedded_exif";
CREATE FOREIGN TABLE IF NOT EXISTS mysql."tbmi_metadata_embedded_exif"(
"gpsCoordinateLat" float8,
"gpsCoordinateLng" float8)
INHERITS (tbm."tbmi_metadata_embedded_exif")
SERVER mysql
OPTIONS (dbname 'tbm', table_name 'tbmi_metadata_embedded_exif_fdw'); - (a."gpsCoordinate" ~= b."gpsCoordinate" OR (a."gpsCoordinate" IS NULL AND b."gpsCoordinate" IS NULL)) AND
+ ((a."gpsCoordinateLat" = b."gpsCoordinate"[0] AND a."gpsCoordinateLng" = b."gpsCoordinate"[1])
+ OR (a."gpsCoordinate" IS NULL AND b."gpsCoordinate" IS NULL)) AND
-WHERE (a."imageId" IS NULL AND a."orientation" IS NULL AND a."imageDescription" IS NULL AND a."userComment" IS NULL AND a."artist" IS NULL AND a."xpAuthor" IS NULL AND a."copyright" IS NULL AND a."imageUniqueId" IS NULL AND a."bodySerialNumber" IS NULL AND a."make" IS NULL AND a."model" IS NULL AND a."software" IS NULL AND a."customRendered" IS NULL AND a."dateTime" IS NULL AND a."dateTimeOffset" IS NULL AND a."dateTimeDigitized" IS NULL AND a."dateTimeDigitizedOffset" IS NULL AND a."dateTimeOriginal" IS NULL AND a."dateTimeOriginalOffset" IS NULL AND a."offsetTime" IS NULL AND a."offsetTimeDigitized" IS NULL AND a."offsetTimeOriginal" IS NULL AND a."gpsDateTime" IS NULL AND a."gpsCoordinate" IS NULL AND a."gpsImgDirection" IS NULL AND a."gpsImgDirectionRef" IS NULL AND a."rawBytes" IS NULL AND a."xxHash3" IS NULL)
+WHERE (a."imageId" IS NULL AND a."orientation" IS NULL AND a."imageDescription" IS NULL AND a."userComment" IS NULL AND a."artist" IS NULL AND a."xpAuthor" IS NULL AND a."copyright" IS NULL AND a."imageUniqueId" IS NULL AND a."bodySerialNumber" IS NULL AND a."make" IS NULL AND a."model" IS NULL AND a."software" IS NULL AND a."customRendered" IS NULL AND a."dateTime" IS NULL AND a."dateTimeOffset" IS NULL AND a."dateTimeDigitized" IS NULL AND a."dateTimeDigitizedOffset" IS NULL AND a."dateTimeOriginal" IS NULL AND a."dateTimeOriginalOffset" IS NULL AND a."offsetTime" IS NULL AND a."offsetTimeDigitized" IS NULL AND a."offsetTimeOriginal" IS NULL AND a."gpsDateTime" IS NULL AND a."gpsCoordinate" IS NULL AND a."gpsCoordinateLat" IS NULL AND a."gpsCoordinateLng" IS NULL AND a."gpsImgDirection" IS NULL AND a."gpsImgDirectionRef" IS NULL AND a."rawBytes" IS NULL AND a."xxHash3" IS NULL) 10.2.2. 然而 ERROR: failed to execute the MySQL query:
Invalid GIS data provided to function st_latitude. https://dev.mysql.com/doc/refman/8.0/en/gis-point-property-functions.html
https://dev.mysql.com/doc/mysql-errors/8.0/en/server-error-reference.html#error_er_gis_invalid_data
10.2.3. 但即便直接在mysql端绕过视图换成 SELECT COUNT(ST_X(gpsCoordinate)) FROM tbmi_metadata_embedded_exif WHERE gpsCoordinate IS NOT NULL; [3037] Data truncation: Invalid GIS data provided to function st_x. 10.2.4. 但这又正常 SELECT COUNT(*) FROM (
SELECT gpsCoordinate, ST_X(gpsCoordinate) FROM tbmi_metadata_embedded_exif WHERE gpsCoordinate IS NOT NULL
) t;
-- 10026 10.3. 换wkt https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry 表达 CREATE OR REPLACE VIEW tbmi_metadata_embedded_exif_fdw
AS SELECT *,
ST_AsWKT(gpsCoordinate) gpsCoordinateWKT,
ST_AsWKB(gpsCoordinate) gpsCoordinateWKB
FROM tbmi_metadata_embedded_exif;
SELECT COUNT(gpsCoordinate), COUNT(gpsCoordinateWKT) FROM tbmi_metadata_embedded_exif_fdw WHERE gpsCoordinate IS NOT NULL;
可得有10行wkt为 10.3.1. 直接查看mysql内部存储 SELECT COUNT(gpsCoordinate), gpsCoordinate, COUNT(gpsCoordinateWKT), gpsCoordinateWKT
FROM tbmi_metadata_embedded_exif_fdw
WHERE gpsCoordinate IS NOT NULL AND gpsCoordinateWKT IS NULL
GROUP BY gpsCoordinate, gpsCoordinateWKT;
使用 https://rodic.fr/wp-content/uploads/2015/11/geom_converter.html 无法解析wkb之
手动修改第一个字节
10.3.2. 但实际上mysql内部存储 SELECT gpsCoordinate, gpsCoordinateWKB, gpsCoordinateWKT FROM tbmi_metadata_embedded_exif_fdw WHERE gpsCoordinate != gpsCoordinateWKB;
10.3.3. 而如果只看wkb则所有行都能转为wkb SELECT COUNT(gpsCoordinate), COUNT(gpsCoordinateWKB) FROM tbmi_metadata_embedded_exif_fdw WHERE gpsCoordinate IS NOT NULL;
10.3.4. 扩展 SELECT COUNT(gpsCoordinate), gpsCoordinate,
COUNT(gpsCoordinateWKT), gpsCoordinateWKT,
COUNT(gpsCoordinateWKB), gpsCoordinateWKB,
ST_PointFromWKB(gpsCoordinateWKB) a
FROM tbmi_metadata_embedded_exif_fdw
WHERE gpsCoordinate IS NOT NULL AND gpsCoordinateWKT IS NULL
GROUP BY gpsCoordinate, gpsCoordinateWKT, gpsCoordinateWKB;
10.3.5. 而在 SELECT COUNT(gpsCoordinate), gpsCoordinate, ST_IsValid(gpsCoordinate)
FROM tbmi_metadata_embedded_exif
WHERE gpsCoordinate IS NOT NULL AND ST_AsWKT(gpsCoordinate) IS NULL
GROUP BY gpsCoordinate; [3037] Data truncation: Invalid GIS data provided to function st_isvalid. 10.4. 由于pgsql本身不提供gis用途的 sudo apt install -y postgresql-16-postgis-3 CREATE EXTENSION IF NOT EXISTS postgis; 清理: DROP EXTENSION IF EXISTS postgis; sudo apt purge -y postgresql-16-postgis-3 &&
sudo apt autoremove --purge 10.5. 如同 CREATE TABLE tbmi_metadata_embedded_exif_wkb LIKE tbmi_metadata_embedded_exif;
INSERT INTO tbmi_metadata_embedded_exif_wkb SELECT * FROM tbmi_metadata_embedded_exif;
ALTER TABLE tbmi_metadata_embedded_exif_wkb ADD COLUMN gpsCoordinateWKB BLOB NULL AFTER gpsCoordinate;
UPDATE tbmi_metadata_embedded_exif_wkb SET gpsCoordinateWKB = ST_AsWKB(gpsCoordinate) WHERE gpsCoordinate IS NOT NULL;
SELECT COUNT(gpsCoordinate), COUNT(gpsCoordinateWKB) FROM tbmi_metadata_embedded_exif_wkb;
-- expecting the same count
SELECT COUNT(*) FROM tbmi_metadata_embedded_exif_wkb
WHERE (gpsCoordinate IS NULL AND gpsCoordinateWKB IS NOT NULL)
OR (gpsCoordinateWKB IS NULL AND gpsCoordinate IS NOT NULL);
-- expecting 0 rows returned, then:
-- ALTER TABLE tbmi_metadata_embedded_exif_wkb DROP COLUMN gpsCoordinate; 清理: DROP TABLE tbmi_metadata_embedded_exif_wkb; 10.5.1. DROP TABLE tbm."tbmi_metadata_embedded_exif" CASCADE; 类比 CREATE FOREIGN TABLE IF NOT EXISTS mysql."tbmi_metadata_embedded_exif_wkb"()
INHERITS (tbm."tbmi_metadata_embedded_exif_wkb")
SERVER mysql
OPTIONS (dbname 'tbm', table_name 'tbmi_metadata_embedded_exif_wkb'); 10.5.2. 由于根据 "dateTime" timestamp with time zone,
"dateTimeDigitized" timestamp with time zone,
"dateTimeOriginal" timestamp with time zone,
"gpsDateTime" timestamp with time zone, 10.5.3. create table tbm.tbmi_metadata_embedded_exif_wkb (
"imageId" bigint primary key not null,
orientation text,
"imageDescription" text,
"userComment" bytea,
artist text,
"xpAuthor" bytea,
copyright text,
"imageUniqueId" text,
"bodySerialNumber" text,
make text,
model text,
software text,
"customRendered" smallint,
"dateTime" timestamp with time zone,
"dateTimeOffset" text,
"dateTimeDigitized" timestamp with time zone,
"dateTimeDigitizedOffset" text,
"dateTimeOriginal" timestamp with time zone,
"dateTimeOriginalOffset" text,
"offsetTime" text,
"offsetTimeDigitized" text,
"offsetTimeOriginal" text,
"gpsDateTime" timestamp with time zone,
"gpsCoordinateWKB" bytea,
"gpsImgDirection" double precision,
"gpsImgDirectionRef" text,
"rawBytes" bytea,
"xxHash3" bytea not null
); 由于pgsql长期以来无法像mysql那样一键移动 https://stackoverflow.com/questions/6805426/how-to-move-columns-in-a-mysql-table 列顺序 https://stackoverflow.com/questions/285733/how-do-i-alter-the-position-of-a-column-in-a-postgresql-database-table http://wiki.postgresql.org/wiki/Alter_column_position -create table tbm.tbmi_metadata_embedded_exif_wkb (
+create table tbm.tbmi_metadata_embedded_exif (
- "gpsCoordinateWKB" bytea,
+ "gpsCoordinate" point, 10.5.4. 如同 DROP FOREIGN TABLE mysql."tbmi_metadata_embedded_exif_wkb"; -- 避免`4.3.2.`
ALTER TABLE tbm."tbmi_metadata_embedded_exif_wkb" ADD COLUMN "gpsCoordinate" POINT;
UPDATE tbm."tbmi_metadata_embedded_exif_wkb" SET "gpsCoordinate" = st_pointfromwkb("gpsCoordinateWKB")::point WHERE "gpsCoordinateWKB" IS NOT NULL;
SELECT COUNT("gpsCoordinate"), COUNT("gpsCoordinateWKB") FROM tbm."tbmi_metadata_embedded_exif_wkb";
-- expecting the same count
SELECT COUNT(*) FROM tbm."tbmi_metadata_embedded_exif_wkb"
WHERE ("gpsCoordinate" IS NULL AND "gpsCoordinateWKB" IS NOT NULL)
OR ("gpsCoordinateWKB" IS NULL AND "gpsCoordinate" IS NOT NULL);
-- expecting 0 rows returned, then:
ALTER TABLE tbm."tbmi_metadata_embedded_exif_wkb" DROP COLUMN "gpsCoordinateWKB";
INSERT INTO tbm."tbmi_metadata_embedded_exif"
("imageId", orientation, "imageDescription", "userComment", artist, "xpAuthor", copyright, "imageUniqueId", "bodySerialNumber", make, model, software, "customRendered", "dateTime", "dateTimeOffset", "dateTimeDigitized", "dateTimeDigitizedOffset", "dateTimeOriginal", "dateTimeOriginalOffset", "offsetTime", "offsetTimeDigitized", "offsetTimeOriginal", "gpsDateTime", "gpsCoordinate", "gpsImgDirection", "gpsImgDirectionRef", "rawBytes", "xxHash3")
SELECT "imageId", orientation, "imageDescription", "userComment", artist, "xpAuthor", copyright, "imageUniqueId", "bodySerialNumber", make, model, software, "customRendered", "dateTime", "dateTimeOffset", "dateTimeDigitized", "dateTimeDigitizedOffset", "dateTimeOriginal", "dateTimeOriginalOffset", "offsetTime", "offsetTimeDigitized", "offsetTimeOriginal", "gpsDateTime", "gpsCoordinate", "gpsImgDirection", "gpsImgDirectionRef", "rawBytes", "xxHash3"
FROM tbm."tbmi_metadata_embedded_exif_wkb"; 10.5.5. 按 -SELECT COUNT(*) FROM mysql."tbmi_metadata_embedded_exif" a FULL OUTER JOIN ONLY tbm."tbmi_metadata_embedded_exif" b ON
+SELECT COUNT(*) FROM tbm."tbmi_metadata_embedded_exif_wkb" a FULL OUTER JOIN tbm."tbmi_metadata_embedded_exif" b ON 清理: DROP TABLE tbmi_metadata_embedded_exif_wkb;
VACUUM FULL tbmi_metadata_embedded_exif; 10.6. 然而 SELECT "gpsCoordinate",
"gpsCoordinateWKB",
st_isvalid(st_pointfromwkb("gpsCoordinateWKB"))
FROM tbmi_metadata_embedded_exif_wkb
WHERE ("gpsCoordinate" IS NULL AND "gpsCoordinateWKB" IS NOT NULL)
OR ("gpsCoordinateWKB" IS NULL AND "gpsCoordinate" IS NOT NULL); 暂且理解为 10.6.1. 理论上可以将 10.6.2. 最终选择了在mysql表中将所有10行都 UPDATE tbmi_metadata_embedded_exif
SET gpsCoordinate = NULL
WHERE gpsCoordinate IS NOT NULL AND ST_AsWKT(gpsCoordinate) IS NULL; 10 rows affected in 10 s 455 ms SELECT imageId, gpsCoordinate, st_isvalid(gpsCoordinate)
FROM tbmi_metadata_embedded_exif
WHERE gpsCoordinate IS NOT NULL AND st_isvalid(gpsCoordinate) != 1
ORDER BY gpsCoordinate; 0 rows retrieved in 831 ms (execution: 778 ms, fetching: 53 ms) 后重新执行
- `id` int unsigned NOT NULL AUTO_INCREMENT,
+ id bigint primary key not null default nextval('tbmc_f97650_thread_id_seq'::regclass), 11.1. 而其早已被彻底批倒批臭 https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity CREATE OR REPLACE FUNCTION upgrade_serial_to_identity(tbl regclass, col name)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
colnum smallint;
seqid oid;
count int;
BEGIN
-- find column number
SELECT attnum INTO colnum FROM pg_attribute WHERE attrelid = tbl AND attname = col;
IF NOT FOUND THEN
RAISE EXCEPTION 'column does not exist';
END IF;
-- find sequence
SELECT INTO seqid objid
FROM pg_depend
WHERE (refclassid, refobjid, refobjsubid) = ('pg_class'::regclass, tbl, colnum)
AND classid = 'pg_class'::regclass AND objsubid = 0
AND deptype = 'a';
GET DIAGNOSTICS count = ROW_COUNT;
IF count < 1 THEN
RAISE EXCEPTION 'no linked sequence found';
ELSIF count > 1 THEN
RAISE EXCEPTION 'more than one linked sequence found';
END IF;
-- drop the default
EXECUTE 'ALTER TABLE ' || tbl || ' ALTER COLUMN ' || quote_ident(col) || ' DROP DEFAULT';
-- change the dependency between column and sequence to internal
UPDATE pg_depend
SET deptype = 'i'
WHERE (classid, objid, objsubid) = ('pg_class'::regclass, seqid, 0)
AND deptype = 'a';
-- mark the column as identity column
UPDATE pg_attribute
SET attidentity = 'd'
WHERE attrelid = tbl
AND attname = col;
END;
$$; 11.2. mysql端一键生成针对便乘 SELECT CONCAT(
'SELECT upgrade_serial_to_identity(\'tbm."',
TABLE_NAME,
'"\', \'',
COLUMN_NAME,
'\');'
)
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'tbm' AND EXTRA LIKE '%auto_increment%'
AND TABLE_NAME REGEXP '^tbm(_|i_(?!imageInReply)|c_f(78579|2265748|17019292|25459979|27546680))'
ORDER BY TABLE_NAME; REGEXP部分同 |
https://news.ycombinator.com/item?id=19500640 |
9.2.1. 事实上 open-tbm/c#/imagePipeline/src/Consumer/HashConsumer.cs Lines 18 to 19 in 8eca944
已经使得原 byte[] 被反转地解释为了小端序sharplab.io using System;
using System.Linq;
byte[] big = [0, 1, 2, 3, 4, 5, 6, 7];
var small = big.Reverse().ToArray();
Console.WriteLine(Convert.ToHexString(big)); // 0001020304050607
Console.WriteLine(BitConverter.ToUInt64(big, 0)); // 506097522914230528
Console.WriteLine();
Console.WriteLine(Convert.ToHexString(small)); // 0706050403020100
Console.WriteLine(BitConverter.ToUInt64(small, 0)); // 283686952306183 SELECT UNHEX(LPAD(HEX(506097522914230528), 16, '00')),
UNHEX(LPAD(HEX(283686952306183), 16, '00'));
而其他地方更常用的xxhash3 https://z.n0099.net/#narrow/near/88989 反而的确是大端序 SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM (SELECT '\x0001020304050607'::bytea b) t
LEFT JOIN LATERAL (
SELECT string_agg(to_hex(get_byte(b, x)), '') AS text_reverse
FROM generate_series(octet_length(t.b) - 1, 0, -1) x
) x ON true;
CREATE OR REPLACE FUNCTION reverse(bytea) RETURNS bytea AS $reverse$
SELECT string_agg(byte,''::bytea)
FROM (
SELECT substr($1,i,1) byte
FROM generate_series(length($1),1,-1) i) s
$reverse$ LANGUAGE sql;
SELECT reverse('\x0001020304050607'::bytea);
DROP FUNCTION reverse(bytea);
mysql端 UPDATE tbmi_hash SET pHash = REVERSE(pHash), averageHash = REVERSE(averageHash);
-- 14,322,348 rows affected in 20 m 6 s 174 ms pgsql端 UPDATE tbmi_hash SET "pHash" = reverse("pHash"), "averageHash" = reverse("averageHash");
-- 14,322,348 rows affected in 28 m 56 s 671 ms |
…`8.1.2.` @ `CrawlerDbContext..OnModelCreating()` @ crawler * converting string fields that may contain `0x00` to utf8 bytes: #48 `8.1.2.` @ `ImagePipelineDbContext.OnModelCreating()` * now will convert point with invalid coordinate to null: #48 `10.3.1.` @ `MetadataConsumer.ExifGpsTagValuesParser.ParseGpsCoordinateOrNull()` @ imagePipeline @ c#
https://www.postgresql.org/docs/current/functions-admin.html WITH t AS (
SELECT table_schema, table_name,
('"' || table_schema || '"."' || table_name ||'"')::regclass regclass
FROM information_schema.tables
WHERE table_schema = 'tbm'
)
SELECT table_schema, table_name,
pg_relation_filepath(regclass),
pg_indexes_size(regclass) / 1024 / 1024 i,
pg_table_size(regclass) / 1024 / 1024 t,
pg_relation_size(regclass) / 1024 / 1024 r,
pg_total_relation_size(regclass) / 1024 / 1024 tr
FROM t
ORDER BY tr DESC; |
create table tbmc_f97650_reply (
id bigint primary key not null default nextval('tbmc_f97650_reply_id_seq'::regclass),
tid bigint not null,
pid bigint not null,
floor bigint not null,
"authorUid" bigint not null,
"subReplyCount" bigint,
"postedAt" bigint not null,
"isFold" smallint,
"agreeCount" integer,
"disagreeCount" integer,
geolocation bytea,
"signatureId" bigint,
"createdAt" bigint not null,
"updatedAt" bigint,
"lastSeenAt" bigint
);
create unique index idx_23148_pid on tbmc_f97650_reply using btree (pid);
create index "idx_23148_postTime" on tbmc_f97650_reply using btree ("postedAt");
create index "idx_23148_authorUid" on tbmc_f97650_reply using btree ("authorUid");
create index idx_23148_tid on tbmc_f97650_reply using btree (tid); 12.1. 其中 12.2. 事实核查:正义削除该 CREATE TABLE tbmc_reply (fid int NOT NULL, LIKE tbmc_f97650_reply INCLUDING ALL);
ALTER TABLE tbmc_reply DROP COLUMN id, ADD PRIMARY KEY (pid);
DROP INDEX tbmc_reply_pid_idx; -- 重复的UNIQUE(pid)
INSERT INTO tbmc_reply
(fid, tid, pid, floor, "authorUid", "subReplyCount", "postedAt", "isFold", "agreeCount", "disagreeCount", geolocation, "signatureId", "createdAt", "updatedAt", "lastSeenAt")
SELECT 97650 AS fid, tid, pid, floor, "authorUid", "subReplyCount", "postedAt", "isFold", "agreeCount", "disagreeCount", geolocation, "signatureId", "createdAt", "updatedAt", "lastSeenAt"
FROM tbmc_f97650_reply;
VACUUM FULL tbmc_reply; 12.3. 而进一步削除 ALTER TABLE tbmc_reply DROP COLUMN fid;
VACUUM FULL tbmc_reply; 也只会进一步节省 |
11.3. SELECT *
FROM information_schema.columns
WHERE table_schema = 'tbm'
AND column_default IS NOT NULL;
SELECT 'ALTER TABLE tbm."' || table_name || '" ALTER COLUMN "' || column_name || '" TYPE text;'
FROM information_schema.columns
WHERE table_schema = 'tbm' AND data_type = 'character varying'; 而索引中的列长度似乎没有被pgloader迁移过来 |
一键 -- https://stackoverflow.com/questions/65069778/postgresql-rename-table-cascade-renaming-to-objects-like-indices-constraints/78485773#78485773
DO $$DECLARE r record;
BEGIN
FOR r IN
SELECT 'ALTER INDEX "' || schema_name || '"."' || index_name ||
'" RENAME TO "' || new_index_name || '";' sql, *
FROM (
SELECT schema_name, table_name, index_name,
table_name ||
CASE constraint_type
-- only uncomment below line when using postgresql<12 https://stackoverflow.com/questions/4107915/postgresql-default-constraint-names#comment109084417_4108266
-- WHEN 'f' THEN '_' || columns[1] -- https://stackoverflow.com/questions/4107915/postgresql-default-constraint-names#comment127573826_4108266
WHEN 'p' THEN '' -- https://stackoverflow.com/questions/4107915/postgresql-default-constraint-names#comment127553786_4108266
ELSE '_' || string_agg(column_name, '_' ORDER BY colum_index)
END || '_' ||
-- https://stackoverflow.com/questions/15417167/case-when-null-evaluates-to-false
CASE COALESCE(constraint_type, 'NULL')
-- https://stackoverflow.com/questions/4107915/postgresql-default-constraint-names/4108266#4108266
-- https://gist.github.com/popravich/d6816ef1653329fb1745
-- https://stackoverflow.com/questions/8674562/postgresql-is-it-possible-to-provide-custom-name-for-primary-key-or-unique/8674640#8674640
WHEN 'c' THEN 'check'
WHEN 'f' THEN 'fkey' -- TODO: foreign key constraint only exists on the referencing table
WHEN 'p' THEN 'pkey'
WHEN 'u' THEN 'key'
WHEN 'x' THEN 'excl'
WHEN 'NULL' THEN 'idx'
END new_index_name
FROM (
SELECT n.nspname schema_name, t.relname table_name, i.relname index_name, c.contype constraint_type, a.attname column_name, a.attnum colum_index
FROM pg_index x
-- https://stackoverflow.com/questions/37329561/how-to-list-indexes-created-for-table-in-postgres/37330092#37330092
JOIN pg_class t ON t.oid = x.indrelid
JOIN pg_namespace n ON n.oid = t.relnamespace
JOIN pg_class i ON i.oid = x.indexrelid
-- https://stackoverflow.com/questions/55447819/how-to-get-the-list-of-column-names-for-all-indexes-on-a-table-in-postgresql/55448707#55448707
JOIN pg_attribute a ON a.attrelid = x.indrelid AND a.attnum = ANY(x.indkey)
-- https://www.postgresql.org/docs/current/catalog-pg-constraint.html
LEFT JOIN pg_constraint c ON c.conindid = x.indexrelid AND c.conrelid = t.oid
WHERE n.nspname = 'your_schema'
) t
GROUP BY schema_name, table_name, index_name, constraint_type
) t
WHERE index_name != new_index_name
LOOP
-- https://stackoverflow.com/questions/1348126/postgresql-modify-owner-on-all-tables-simultaneously-in-postgresql/37259655#37259655
EXECUTE r.sql;
END LOOP;
END$$;
12.4. 一键 DO $$DECLARE r record;
BEGIN
FOR r IN
SELECT 'ALTER TABLE "' || table_schema || '"."' || table_name ||
'" DROP COLUMN id, ADD PRIMARY KEY USING INDEX "' || table_name || '_tid";' c
FROM information_schema.tables t
WHERE table_schema = 'tbm' AND table_name ~ '^tbmc_f\d+_thread$'
UNION
SELECT 'ALTER TABLE "' || table_schema || '"."' || table_name ||
'" DROP COLUMN id, ADD PRIMARY KEY USING INDEX "' || table_name || '_pid";' c
FROM information_schema.tables t
WHERE table_schema = 'tbm' AND table_name ~ '^tbmc_f\d+_reply$'
UNION
SELECT 'ALTER TABLE "' || table_schema || '"."' || table_name ||
'" DROP COLUMN id, ADD PRIMARY KEY USING INDEX "' || table_name || '_spid";' c
FROM information_schema.tables t
WHERE table_schema = 'tbm' AND table_name ~ '^tbmc_f\d+_subReply$'
LOOP
EXECUTE r.c;
END LOOP;
END$$; |
https://www.postgresql.org/message-id/flat/20060313104028.GB6714%40svana.org
|
|
pgloader
@3.6.10
mysql_fdw
@2.9.1
库名假定为
tbm
1.1 由于 dimitri/pgloader#1211 ,从postgre社区apt source安装的
pgloader 3.6.10-2.pgdg22.04+1
无法连接mysqld1.1.1. 建议从源码自行编译
3.6.10
1.1.2. 清理:
sudo apt purge -y pgloader && sudo apt autoremove --purge
1.2. 由于 dimitri/pgloader#782 ,pgloader只能通过仍在使用
mysql 8.0.34
中deprecated的mysql_native_password
用户登录mysqld1.2.1. 建议ad-hoc地临时创建一个最小权限(只
GRANT
库/表级权限)用户用于pgloader而非修改现有用户的default auth plugin
schema.load
有2.1. 可以使用
INCLUDING ONLY TABLE NAMES MATCHING ~/regexp/, 'exact'
来filter要导入的表名2.2. 令文件
data.load
有2.3. 分开导入
表结构
和表数据
是因为pgloader默认行为是按照指定和mysql://
源下默认的CASTING
创建目的表结构
(除了索引),再COPY
导入表数据
,最后并行{表索引数量}
个CREATE INDEX
并且不等待其完成而是并行地开始下一个表的过程: https://pgloader.readthedocs.io/en/latest/batches.html由于未知原因在
3.6.af8c3c1
中WITH max parallel create index 1
并没有实际作用仍然会执行上述流程所以不得不分开在WITH schema only
的.load
中先完成CREATE INDEX
dimitri/pgloader#1576 以减少io争用2.3.1. 出于类似的原因也建议先禁用
autovacuum
清理:待pgloader完成后再手动
VACUMM FULL
https://dba.stackexchange.com/questions/130496/is-it-worth-it-to-run-vacuum-on-a-table-that-only-receives-inserts/130514#130514ALTER SYSTEM RESET autovacuum; SELECT pg_reload_conf(); VACUMM FULL;
2.4.
WITH quote identifiers
是为了方便后续校验两端表数据
是否一致时可以假定列名相同,您可以之后再重命名为postgre社区习惯的snake_case
以减少满屏幕""
(相当于mysql的``
)2.5. 视奸正在导入什么表 https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql
可根据
表大小
猜测进度 https://stackoverflow.com/questions/21738408/postgresql-list-and-order-tables-by-size/21738505#21738505tbmc_f97650_thread
结构mysql端:
由pgloader在pgsql端按照
2.3.
中mysql://
源下默认的CASTING
所创建:3.1. 可以通过将表中
每列值
concat起来hash后再aggreate地对每行hash
再hash来得出整个表的hash
mermaid.live
mysql
blob类型字段必须套
LOWER(HEX())
因为mysql默认的blob2text
输出类似0xDEADBEAF
形式而pgsql是deadbeaf
pgsql
3.2. 然而在大表上两者最终都会耗尽内存而失败,并且需要
LOWER(HEX())
GROUP_CONCAT()
时按照PK或UK排序行否则在pgsql中默认迫真假随机
序一键生成
3.3. 理论上可以通过赋予纯SQL(无需臭名昭著的命令式
存储过程
或外部程序辅助)图灵完备性 https://wiki.postgresql.org/wiki/Mandelbrot_set 导致其踏入了同一条Turing Tarpit
之河流的rCTE
来LIMIT 10000
地每次hash的行并对每10k行aggreate之hash
再aggreate地hash从而减少每行hash所aggreate的字符串长度mermaid.live
某so纯路人
指出可以使用postgre社区牛逼哄哄
的某知名企业EnterpriseDB
所开发的某pgsql扩展 https://github.com/EnterpriseDB/mysql_fdw 通过foreign data wrapper来在pgsql中将外部数据当做迫真物化视图
查询4.1. 安装:
清理:
4.2. 安装该pgsql扩展
清理:
4.3. 在pgsql中创建fdw表
使用
table inheritance
是为了避免在2.
已经pgloader schema.load
后再写一遍fdw表结构
,但这会导致4.3.1. 被继承表
tbmc_f97650_thread
无法再直接UPDATE
/DELETE
返回[HV005] ERROR: system attribute "tableoid" can't be fetched from remote relation
而必须加ONLY
EnterpriseDB/mysql_fdw#300 但仍然可以INSERT
4.3.1.1. 这个bug可能会在未来版本的
mysql_fdw
中修复但在这里并不重要,因为这种迫真假锁全表
正好避免了在校验期间UPDATE
/DELETE
(但防不了INSERT
)(默认事务隔离级别是READ COMMITTED
而非mysql默认的REPEATABLE READ
#32 (comment) 当然您也可以改)了pgsql表导致假阴性
4.3.2. 由于
table inheritance
的本质:也会出现
中的所有结果:也就是被继承表并集了所有继承表(在这里通过查询mysql),相当于
tbmi_ocr_box_Latn
结构:mysql端:
由pgloader在pgsql端按照
2.3.
中mysql://
源下默认的CASTING
所创建:5.1. 根据
某so纯路人
symmdifftable有4种写法:5.1.1. https://stackoverflow.com/questions/15330403/find-difference-between-two-big-tables-in-postgresql/15333054#15333054
5.1.1.1.
一键生成
5.1.1.2. 也可以使用
NATURAL FULL OUTER JOIN ONLY
https://www.postgresql.org/docs/current/queries-table-expressions.html 省略USING (...)
5.1.2. 但
5.1.1.
和5.1.1.2.
并不能用于有列类型允许NULL
之存在的表如7.
中的tbm_bilibiliVote
表有着因为根据sql特色之 https://en.wikipedia.org/wiki/Three-valued_logic
NULL = NULL
是UNKNOWN
而UNKNOWN
转bool
是false
于是不会被ON
/WHERE
等clause视作相同 https://stackoverflow.com/questions/14366004/sql-server-join-missing-null-values5.1.2.1. 因而需要将
5.1.1.
中的USING (...columns)
展开成对所有列=
operator以及额外的OR (IS NULL AND IS NULL)
5.1.2.1.1. 不对
NOT NULL
的列做额外判断是为了避免7.1.
中的5.1.2.2.
一键生成
5.1.3. https://stackoverflow.com/questions/15330403/find-difference-between-two-big-tables-in-postgresql/49381589#49381589
5.1.3.1. 其中
row(t)
是 https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ROW-CONSTRUCTORS5.1.3.2. 这也避免了
5.1.2.
5.1.4. https://stackoverflow.com/questions/6337871/how-can-i-speed-up-a-diff-between-tables
由于
EXCEPT
不是symmdiff之FULL OUTER JOIN
而是SELECT a.* FROM a LEFT OUTER JOIN b
所以只能执行两遍2x耗时正如同没有FULL OUTER JOIN
可用的mysql人 https://stackoverflow.com/questions/4796872/how-can-i-do-a-full-outer-join-in-mysql5.1.4.1. 由于未知bug
UNION ALL
后始终为0行即便单独查询有行5.1.4.2. 这也避免了
5.1.2.
5.1.5. 典型的误用
NOT EXISTS
因为
SELECT * FROM mysql."tbmi_ocr_box_Latn"
在此显然返回非0行
(因而只需要读一行mysql."tbmi_ocr_box_Latn"
几百ms后就会立即返回)从而使得该predicate在非空表上恒真5.1.5.1. 由于未知bug
UNION ALL
后始终为0行
即便单独查询有非0行
5.1.6. 除
5.1.5.
外的3种执行期间均需pgsql消耗1x表大小
的内存(总比3.2.
好)和3x表大小
的临时表
存储5.1.6.1. 可以修改
临时表空间
在fs上的默认路径/var/lib/postgresql/16/main/base/pgsql_tmp
https://dba.stackexchange.com/questions/170661/in-postgres-how-do-i-adjust-the-pgsql-tmp-setting/170665#1706655.2. 总行数对比
compare.sql
有在mysql中codegen出基于5.1.3.
、4.3.
和5.2.
校验用的一键生成
pgsqlREGEXP部分同
2.1.
6.1. 一键bash
tbm_bilibiliVote
结构 https://github.com/n0099/bilibiliVotemysql端:
由pgloader在pgsql端按照
2.3.
中mysql://
源下默认的CASTING
所创建:7.1. 使用
5.1.3.
中的symmetric difference FULL OUTER JOIN row(table)
可得 https://stackoverflow.com/questions/47405732/why-does-postgresql-throw-full-join-is-only-supported-with-merge-joinable-or-ha
7.1.1. 使用
5.1.5.
中的non symmetric difference EXCEPT
可得 https://stackoverflow.com/questions/48420438/could-not-identify-an-equality-operator-for-type-json-when-using-distinct
7.1.2. 执行
5.1.2.1.
会有 https://stackoverflow.com/questions/32843213/operator-does-not-exist-json-json
7.1.3.
7.
中可见列replyContent
是json
类型这恰好等于
5.2.
之和
7.1.4. 但即便是
json::text
也仍然相同
7.1.5. 可以在mysql端将该
json
列转text
后再重新按照
2.2.
仅导入该表(INCLUDING ONLY TABLE NAMES MATCHING 'tbm_bilibiliVote'
)7.2. 但即便两端的列
replyContent
重新导入后均为text
类型也仍然有所有行
不同实际上是由于列
postTime
mysql无时区类型datetime
由2.3.
中的pgloadermysql://
默认的CASTING
转为pgsql
有时区类型timestamptz
时由pgloader根据系统时区
UTC+8
而非pgsql时区
UTC+0
转换为UTC
7.2.1. 由于
timestamptz
实际上并没有额外存储时区UTC offset
https://stackoverflow.com/questions/5876218/difference-between-timestamps-with-without-time-zone-in-postgresql因而其并不能解决 https://z.n0099.net/#narrow/near/86397 https://old.reddit.com/r/PostgreSQL/comments/xpygbh/when_would_i_ever_use_timestamp_over_timestamptz/ https://news.ycombinator.com/item?id=20212671 中争论的
带时区引用未来指定datetime
问题https://www.w3.org/International/wiki/WorkingWithTimeZones#Past_and_Future_Events
https://www.w3.org/International/wiki/WorkingWithTimeZones#Floating_Time
再叠加mysql_fdw不论当前
pgsql时区
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES 是什么都不会在将从mysql读到的datetime
转换为timestamptz
时根据时区重新计算时间而是只修改offset(也就是使用跟mysql类型datetime
语义相同的pgsql类型timestamp
的语义)因而只需要修改
pgsql时区
便校验一致
7.2.2. 如果执行了
7.1.5.
需再在pgsql端将该列类型改回json
或jsonb
tbmi_metadata_gif
结构:由pgloader在pgsql端按照
2.3.
中mysql://
源下默认的CASTING
所创建:8.1. 按照
7.1.3.
有
8.1.1. 但按照
7.1.4.
就无差异8.1.2. 这是由于
jsonb
类型不同于本质text
的json
,会将json解析后存储为pgsql内置类型之集,再由于text
类型受c人最爱的
https://en.wikipedia.org/wiki/Null-terminated_string 影响不支持存储256个字节
中的唯一一个0x00
导致jsonb
也无法存储\u0000
转义后的0x00
进其内部的text
turbot/steampipe-postgres-fdw#118https://www.postgresql.org/docs/current/datatype-json.html
8.2. 该表该列中共有如下
\u0000
https://codepoints.net/U+0000[" ", "Created by TechSmith\u0000"]
["tvc\u0000"]
["Author:\t\tGod\_job\_dave\u0000\u0000\u0000???\u0001\u0000\u0000\u0000\u0000P??\u0001\u0000"]
["Optimized by Ulead SmartSaver!\u0000"]
["CREATOR: gd-jpeg v1.0 \(using IJG JPEG v62\), quality = 75\n\u0000"]
8.2.1. 使用如下3层如同
6.
中codegen迫真元编程
自我复制的quine病毒
可在mysql端检查所有含有0x00
的text
类型及其各个长度变种 https://stackoverflow.com/questions/13932750/tinytext-text-mediumtext-and-longtext-maximum-storage-sizes 列并生成ALTER
列类型为BLOB
(pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
会转换为可存储所有256个字节
的bytea
)和SELECT
查阅的一键生成
mysql:REGEXP部分同
2.1.
ALTER TABLE tbm.`tbmi_metadata_embedded_exif` MODIFY `userComment` BLOB NOT NULL;
SELECT * FROM tbm.`tbmi_metadata_embedded_exif` WHERE CAST(`userComment` AS BINARY) LIKE CONCAT('%', 0x00, '%');
ALTER TABLE tbm.`tbmi_metadata_embedded_exif` MODIFY `xpAuthor` BLOB NOT NULL;
SELECT * FROM tbm.`tbmi_metadata_embedded_exif` WHERE CAST(`xpAuthor` AS BINARY) LIKE CONCAT('%', 0x00, '%');
8.3. 值得注意的是
2.2.
导入 dimitri/pgloader#1573 不会有截断0x00
后字节的WARNING
且6.
校验 EnterpriseDB/mysql_fdw#299 过程中也无法发现以0x00
结尾text
中的最后一个0x00
字节消失tbmi_hash
结构:由pgloader在pgsql端按照
2.3.
中mysql://
源下默认的CASTING
所创建:9.1. 由于pgsql没有mysql特有(
isosql
不要求)带unsigned
修饰的的int类型 https://stackoverflow.com/questions/20810134/why-unsigned-integer-is-not-available-in-postgresql/59802732#59802732 正如同m$ft精神mvp
最爱的中小实体企业c#工控上位机人上壬
https://z.n0099.net/#narrow/near/94726 除非使用扩展 https://github.com/petere/pguint 因而导致该列上限值从$2^{64} = 18446744073709551615$ 降到 $2^{63} = 9223372036854775807$
9.2. 实际上所谓的
opencv_imghash
lumina37/aiotieba#63 (comment) 之averageHash
open-tbm/c#/imagePipeline/src/Consumer/HashConsumer.cs
Line 19 in 252710b
可运算的正整数
语义之类型存储而应直接视作斑点二进制
https://en.wikipedia.org/wiki/Binary_large_object9.3.$> 2^{63}$ 的
一键生成
找出所有存在值bigint unsigned
列的mysqlREGEXP部分同
2.1.
9.4.
类比
7.1.5.
重新导入该表Comment is too long (maximum is 65536 characters)
The text was updated successfully, but these errors were encountered: