LIDC-IDRI肺结节公开数据集Dicom和XML标注详解
数据源数据源为LIDC-IDRI,该数据集由胸部医学图像文件(如CT、X光片)和对应的诊断结果病变标注组成。该数据是由美国国家癌症研究所(National Cancer Institute)发起收集的,目的是为了研究高危人群早期癌症检测。 该数据集中,共收录了1018个研究实例。对于每个实例中的图像,都由4位经验丰富的胸部放射科医师进行两阶段的诊断标注。在第一阶段,每位医师分别独立诊断并标注病患位置,其中会标注三中类别:1) >=3mm的结节,2) <3mm的结节,3) >=3mm的非结节(官网描述: "nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm" 详见 Summary)。在随后的第二阶段中,各位医师都分别独立的复审其他三位医师的标注,并给出自己最终的诊断结果。这样的两阶段标注可以在避免forced consensus的前提下,尽可能完整的标注所有结果。
解析数据1.图像矩阵像素信息模块处理的数据为slicer rows cols大小的三维矩阵D。D中第z个切片y行x列的元素对应的位置为:(z rows cols+ y cols + x) sizeof(data_type) 。其中rows表示图像的行数,cols表示图像的列数,默认均为512,data_type代表数据类型,默认为short。 eg: 对于病例LIDC-IDRI-0001,即为133*512*512的矩阵,一共133张切片,每张大小512*512,依次按顺序存入二进制文件,每个像素大小为2字节(对应C中short类型)。 2.结节区域类型标注信息第一行: slicers rows cols data_type pixel_space_x pixel_space_y slice_thickness
slicer : 切片个数; Type num x1 y1 z1 x2 y2 z2 … xi yi zi ... xn yn zn
Type: “1”表示”nodules”, “2”表示”small_nodules”,”3”表示”non_nodules”; 3.文件结构目前测试一共1012个病例数据,每个病例文件夹对应结构: LIDC-IDRI-XXXX / Study Instance UID / Series Instance UID / .dcm .xml
XXXX : 从0000到1012; Dicom重要信息说明 (0008,0005) Specific Character Set CS: 'ISO_IR 100'
(0008,0008) Image Type CS: ['ORIGINAL','PRIMARY','AXIAL']
(0008,0016) SOP Class UID UI: CT Image Storage
(0008,0018) SOP Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.143451261327128179989900675595
(0008,0020) Study Date DA: '20000101'
(0008,0021) Series Date DA: '20000101'
(0008,0022) Acquisition Date DA: '20000101'
(0008,0023) Content Date DA: '20000101'
(0008,0024) Overlay Date DA: '20000101'
(0008,0025) Curve Date DA: '20000101'
(0008,002a) Acquisition DateTime DT: '20000101'
(0008,0030) Study Time TM: ''
(0008,0032) Acquisition Time TM: ''
(0008,0033) Content Time TM: ''
(0008,0050) Accession Number SH: '2819497684894126'
(0008,0060) Modality CS: 'CT'
(0008,0070) Manufacturer LO: 'GE MEDICAL SYSTEMS'
(0008,0090) Referring Physician Name PN: ''
(0008,1090) Manufacturer Model Name LO: 'LightSpeed Plus'
(0008,1155) Referenced SOP Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.675906998158803995297223798692
(0010,0010) Patient Name PN: ''
(0010,0020) Patient ID LO: 'LIDC-IDRI-0001'
(0010,0030) Patient Birth Date DA: ''
(0010,0040) Patient Sex CS: ''
(0010,1010) Patient Age AS: ''
(0010,21d0) Last Menstrual Date DA: '20000101'
(0012,0062) Patient Identity Removed CS: 'YES'
(0012,0063) De-identification Method LO: 'DCM:113100/113105/113107/113108/113109/113111'
(0013,0010) Private Creator LO: 'CTP'
(0013,1010) Private tag data LO: 'LIDC-IDRI'
(0013,1013) Private tag data LO: '62796001'
(0018,0010) Contrast/Bolus Agent LO: 'IV'
(0018,0015) Body Part Examined CS: 'CHEST'
(0018,0022) Scan Options CS: 'HELICAL MODE'
(0018,0050) Slice Thickness DS: '2.500000'
(0018,0060) KVP DS: '120'
(0018,0090) Data Collection Diameter DS: '500.000000'
(0018,1020) Software Version(s) LO: 'LightSpeedApps2.4.2_H2.4M5'
(0018,1100) Reconstruction Diameter DS: '360.000000'
(0018,1110) Distance Source to Detector DS: '949.075012'
(0018,1111) Distance Source to Patient DS: '541.000000'
(0018,1120) Gantry/Detector Tilt DS: '0.000000'
(0018,1130) Table Height DS: '144.399994'
(0018,1140) Rotation Direction CS: 'CW'
(0018,1150) Exposure Time IS: '570'
(0018,1151) X-Ray Tube Current IS: '400'
(0018,1152) Exposure IS: '4684'
(0018,1160) Filter Type SH: 'BODY FILTER'
(0018,1170) Generator Power IS: '48000'
(0018,1190) Focal Spot(s) DS: '1.200000'
(0018,1210) Convolution Kernel SH: 'STANDARD'
(0018,5100) Patient Position CS: 'FFS'
(0020,000d) Study Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178
(0020,000e) Series Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192
(0020,0010) Study ID SH: ''
(0020,0011) Series Number IS: '3000566'
(0020,0013) Instance Number IS: '80'
(0020,0032) Image Position (Patient) DS: ['-166.000000','-171.699997','-207.500000']
(0020,0037) Image Orientation (Patient) DS: ['1.000000','0.000000','1.000000','0.000000']
(0020,0052) Frame of Reference UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.229925374658226729607867499499
(0020,1040) Position Reference Indicator LO: 'SN'
(0020,1041) Slice Location DS: '-207.500000'
(0028,0002) Samples per Pixel US: 1
(0028,0004) Photometric Interpretation CS: 'MONOCHROME2'
(0028,0010) Rows US: 512
(0028,0011) Columns US: 512
(0028,0030) Pixel Spacing DS: ['0.703125','0.703125']
(0028,0100) Bits Allocated US: 16
(0028,0101) Bits Stored US: 16
(0028,0102) High Bit US: 15
(0028,0103) Pixel Representation US: 1
(0028,0120) Pixel Padding Value US: 63536
(0028,0303) Longitudinal Temporal Information M CS: 'MODIFIED'
(0028,1050) Window Center DS: '-600'
(0028,1051) Window Width DS: '1600'
(0028,1052) Rescale Intercept DS: '-1024'
(0028,1053) Rescale Slope DS: '1'
(0038,0020) Admitting Date DA: '20000101'
(0040,0002) Scheduled Procedure Step Start Date DA: '20000101'
(0040,0004) Scheduled Procedure Step End Date DA: '20000101'
(0040,0244) Performed Procedure Step Start Date DA: '20000101'
(0040,2016) Placer Order Number / Imaging Servi LO: ''
(0040,2017) Filler Order Number / Imaging Servi LO: ''
(0040,a075) Verifying Observer Name PN: 'Removed by CTP'
(0040,a123) Person Name PN: 'Removed by CTP'
(0040,a124) UID
UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.335419887712224178340067932923
(0070,0084) Content Creator's Name PN: '' (0088,0140) Storage Media File-set UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.211790042620307056609660772296 (7fe0,0010) Pixel Data OW: Array of 524288 bytes
eg : LIDC-IDRI-0069(TOSHIBA公司)中000001.dcm如下: (0008,0018) SOP Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.263800607656124864093833884216
(0008,0032) Acquisition Time TM: '185549.500'
(0008,0033) Content Time TM: '185605.277'
(0008,0070) Manufacturer LO: 'TOSHIBA'
(0008,1090) Manufacturer Model Name LO: 'Aquilion'
(0010,0020) Patient ID LO: 'LIDC-IDRI-0069'
(0010,0040) Patient Sex CS: 'M'
(0010,1010) Patient Age AS: '051Y'
(0010,2160) Ethnic Group SH: 'white-ns'
(0010,21c0) Pregnancy Status US: 4
(0010,0010) Private Creator OB: 'CTP '
(0013,1010) Private tag data OB: 'LIDC-IDRI '
(0013,1013) Private tag data OB: '62796001'
(0018,0010) Contrast/Bolus Agent LO: '100ccs_OMNI-350'
(0018,0022) Scan Options CS: 'HELICAL_CT'
(0018,0050) Slice Thickness DS: '2.0'
(0018,0060) KVP DS: '135'
(0018,0090) Data Collection Diameter DS: '400.00'
(0018,1020) Software Version(s) LO: 'V2.04ER001'
(0018,1100) Reconstruction Diameter DS: '379.687'
(0018,1120) Gantry/Detector Tilt DS: '+0.0'
(0018,1130) Table Height DS: '+48.00'
(0018,1150) Exposure Time IS: '500'
(0018,1151) X-Ray Tube Current IS: '260'
(0018,1152) Exposure IS: '130'
(0018,1210) Convolution Kernel SH: 'FC10'
(0018,000d) Study Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.303241414168367763244410429787
(0020,000e) Series Instance UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.131939324905446238286154504249
(0020,0011) Series Number IS: '3079'
(0020,0012) Acquisition Number IS: '5'
(0020,0013) Instance Number IS: '134'
(0020,0020) Patient Orientation CS: ['L','P']
(0020,0032) Image Position (Patient) DS: ['-184.375000','-188.281200','1292.500000']
(0020,0052) Frame of Reference UID UI: 1.3.6.1.4.1.14519.5.2.1.6279.6001.228313061349684266844487315959
(0020,1040) Position Reference Indicator LO: ''
(0020,1041) Slice Location DS: '+324.00'
(0028,0030) Pixel Spacing DS: ['0.741','0.741']
(0028,1050) Window Center DS: '-500'
(0028,1051) Window Width DS: '2000'
(0028,1052) Rescale Intercept DS: '0'
(0028,1053) Rescale Slope DS: '1'
(0032,000a) Study Status ID CS: ''
(0032,1000) Scheduled Study Start Date DA: ''
(0032,1001) Scheduled Study Start Time TM: ''
(0032,1060) Requested Procedure Description LO: ''
(0032,1064) Requested Procedure Code Sequence 1 item(s) ---- (0008,0104) Code Meaning LO: ''
---------
(0038,0003) Scheduled Procedure Step Start Time TM: ''
(0040,0005) Scheduled Procedure Step End Time TM: ''
(0040,0245) Performed Procedure Step Start Time TM: ''
(0040,a123) Person Name PN: 'Removed by CTP'
(0070,0084) Content Creator Name PN: ''
(7fe0,0010) Pixel Data OB or OW: Array of 524288 bytes
可以看到不同公司所做的检查存储信息的格式不太一样,但一些主要信息都还是有的:
XML重要信息说明 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |