ASF Bugzilla – Attachment 35139 Details for
Bug 61296
Bring over missing constants from Tika
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
a quick comparison of Tika and POI constants
tika-poi-constants.tsv (text/tab-separated-values), 19.85 KB, created by
Javen O'Neal
on 2017-07-14 03:43:44 UTC
(
hide
)
Description:
a quick comparison of Tika and POI constants
Filename:
MIME Type:
Creator:
Javen O'Neal
Created:
2017-07-14 03:43:44 UTC
Size:
19.85 KB
patch
obsolete
>Tika-parsers microsoft office file type line POI Constant Likely location to add constant in POI >./ooxml/XWPFWordExtractorDecorator.java: schema url private final static String[] MAIN_PART_RELATIONS = new String[]{ ..., "http://schemas.openxmlformats.org/officeDocument/2006/relationships/endnotes", "http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments", ... }; ./ooxml/java/org/apache/poi/xwpf/usermodel/XWPFRelation.java >./ooxml/xwpf/ml2006/CorePropertiesHandler.java: schema url final static String DC_NS = "http://purl.org/dc/elements/1.1"; ./ooxml/java/org/apache/poi/openxml4j/opc/PackageProperties.java#NAMESPACE_DC >./ooxml/xwpf/ml2006/CorePropertiesHandler.java: schema url final static String DC_TERMS_NS = "http://purl.org/dc/terms"; ./ooxml/java/org/apache/poi/openxml4j/opc/PackageProperties.java#NAMESPACE_DCTERMS >./ooxml/xwpf/ml2006/CorePropertiesHandler.java: schema url final static String CP_NS = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties"; ./ooxml/java/org/apache/poi/openxml4j/opc/PackageRelationshipTypes.java#CORE_PROPERTIES >./ooxml/xwpf/ml2006/RelationshipsHandler.java: schema url final static String REL_NS = "http://schemas.openxmlformats.org/package/2006/relationships"; ./ooxml/java/org/apache/poi/openxml4j/opc/PackageRelationshipTypes.java#CORE_PROPERTIES_ECMA376_NS >./ooxml/xwpf/ml2006/Word2006MLDocHandler.java: schema url final static String PKG_NS = "http://schemas.microsoft.com/office/2006/xmlPackage"; none >./ooxml/xwpf/ml2006/ExtendedPropertiesHandler.java: schema url final static String EP_NS = "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"; none >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_AUDIO = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/audio"; none >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_IMAGE = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"; many ./ooxml/java/org/apache/poi/openxml4j/opc/PackageRelationshipTypes.java#IMAGE_PART >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_OLE_OBJECT = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject"; ./ooxml/java/org/apache/poi/POIXMLDocument.java#OLE_OBJECT_REL_TYPE >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_PACKAGE = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/package"; ./ooxml/java/org/apache/poi/POIXMLDocument.java#PACK_OBJECT_REL_TYPE >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_MACRO = "http://schemas.microsoft.com/office/2006/relationships/vbaProject"; none ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFRelation.java but should be in POIXMLDocument or other POI Overall Relations file >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_OFFICE_DOCUMENT = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"; ./ooxml/java/org/apache/poi/openxml4j/opc/PackageRelationshipTypes.java#CORE_DOCUMENT >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_DIAGRAM_DATA = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/diagramData"; none >./ooxml/AbstractOOXMLExtractor.java: schema url static final String RELATION_CHART = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/chart"; none some in XSSFRelation and XSLFRelation, but no POI Overall >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url public final static String W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String MC_NS = "http://schemas.openxmlformats.org/markup-compatibility/2006"; ./ooxml/java/org/apache/poi/openxml4j/opc/PackageNamespaces.java#MARKUP_COMPATIBILITY >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String O_NS = "urn:schemas-microsoft-com:office:office"; none ./ooxml/java/org/apache/poi/POIXMLTypeLoader.java and others >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String PIC_NS = "http://schemas.openxmlformats.org/drawingml/2006/picture"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String DRAWING_MAIN_NS = "http://schemas.openxmlformats.org/drawingml/2006/main"; none ooxml/java/org/apache/poi/xssf/usermodel/XSSFRelation.java#NS_DRAWINGML has it, but DrawingML is common to all of OOXML, not just XSSF. POI should move this constant to a different location >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String V_NS = "urn:schemas-microsoft-com:vml"; none ./ooxml/java/org/apache/poi/POIXMLTypeLoader.java and others >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String C_NS = "http://schemas.openxmlformats.org/drawingml/2006/chart"; ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFRelation.java and ./ooxml/java/org/apache/poi/POIXMLTypeLoader.java >./ooxml/OOXMLWordAndPowerPointTextHandler.java: schema url private final static String OFFICE_DOC_RELATIONSHIP_NS = "http://schemas.openxmlformats.org/officeDocument/2006/relationships"; many ooxml/java/org/apache/poi/openxml4j/opc/PackageRelationshipTypes.java and ooxml/java/org/apache/poi/POIXMLTypeLoader.java >./ooxml/SXWPFWordExtractorDecorator.java: schema url private final static String[] MAIN_PART_RELATIONS = new String[]{ ..., "http://schemas.openxmlformats.org/officeDocument/2006/relationships/endnotes", "http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments", ... }; >./ooxml/SXSLFPowerPointExtractorDecorator.java: schema url private final static String HANDOUT_MASTER = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/handoutMaster"; >./ooxml/XSLFPowerPointExtractorDecorator.java: schema url private final static String HANDOUT_MASTER = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/handoutMaster"; >./xml/AbstractXML2003Parser.java: schema url final static String MS_OFFICE_PROPERTIES_URN = "urn:schemas-microsoft-com:office:office"; ./ooxml/java/org/apache/poi/POIXMLTypeLoader.java and ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFVMLDrawing.java >./xml/AbstractXML2003Parser.java: schema url final static String MS_DOC_PROPERTIES_URN = "urn:schemas-microsoft-com:office:office"; none ./ooxml/java/org/apache/poi/POIXMLTypeLoader.java and ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFVMLDrawing.java >./xml/AbstractXML2003Parser.java: schema url final static String MS_SPREADSHEET_URN = "urn:schemas-microsoft-com:office:spreadsheet"; none >./xml/AbstractXML2003Parser.java: schema url final static String MS_VML_URN = "urn:schemas-microsoft-com:vml"; many ./ooxml/java/org/apache/poi/POIXMLTypeLoader.java and ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFVMLDrawing.java >./xml/AbstractXML2003Parser.java: schema url final static String WORD_ML_URL = "http://schemas.microsoft.com/office/word/2003/wordml"; none >./ExcelExtractor.java: POIFS directory name private static final String WORKBOOK_ENTRY = "Workbook"; none >./ExcelExtractor.java: POIFS directory name private static final String BOOK_ENTRY = "Book"; none >./ooxml/OOXMLTikaBodyPartHandler.java: ooxml tag private final static String P = "p"; None - this is carried up straight from the xml schema. POI avoids string literals by using xmlbeans to generate CT classes, and accesses xml tags via CT getters and setters. >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String R = "r"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String FLD = "fld"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String RPR = "rPr"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String P = "p"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String P_STYLE = "pStyle"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String PPR = "pPr"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String T = "t"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String TAB = "tab"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String B = "b"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String ILVL = "ilvl"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String NUM_ID = "numId"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String TC = "tc"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String TR = "tr"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String I = "i"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String NUM_PR = "numPr"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String BR = "br"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String HYPERLINK = "hyperlink"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String HLINK_CLICK = "hlinkClick"; //pptx hlink none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String TBL = "tbl"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String PIC = "pic"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String PICT = "pict"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String IMAGEDATA = "imagedata"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String BLIP = "blip"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String CHOICE = "Choice"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String FALLBACK = "Fallback"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String OLE_OBJECT = "OLEObject"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String CR = "cr"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String V = "v"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String BOOKMARK_START = "bookmarkStart"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String BOOKMARK_END = "bookmarkEnd"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String FOOTNOTE_REFERENCE = "footnoteReference"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String INS = "ins"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String DEL = "del"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String DEL_TEXT = "delText"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String MOVE_FROM = "moveFrom"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String MOVE_TO = "moveTo"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private final static String ENDNOTE_REFERENCE = "endnoteReference"; none >./ooxml/OOXMLWordAndPowerPointTextHandler.java: ooxml tag private static final String TEXTBOX = "textbox"; none >./WordExtractor.java: ooxml tag private static final TagAndStyle defaultParagraphStyle = new TagAndStyle("p", null); none >./xml/SpreadsheetMLParser.java: ooxml tag final static String CELL = "cell"; none >./xml/SpreadsheetMLParser.java: ooxml tag final static String DATA = "data"; none >./xml/SpreadsheetMLParser.java: ooxml tag final static String ROW = "row"; none >./xml/SpreadsheetMLParser.java: ooxml tag final static String WORKSHEET = "worksheet"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String PICT = "pict"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String BIN_DATA = "binData"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String A = "a"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String BODY = "body"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String BR = "br"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String CDATA = "cdata"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String DIV = "div"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String HREF = "href"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String IMG = "img"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String P = "p"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String TD = "td"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String TR = "tr"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String TABLE = "table"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String TBODY = "tbody"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String HLINK = "hlink"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String HLINK_DEST = "dest"; none >./xml/AbstractXML2003Parser.java: ooxml tag final static String NAME_ATTR = "name"; none >./xml/WordMLParser.java: ooxml tag private final static Map<String, String> WORDML_TO_XHTML = { "tbl": TABLE, "tc: TD", ... }; none ./ooxml/java/org/apache/poi/xwpf/usermodel/XWPFDocument.java and others >./xml/AbstractXML2003Parser.java: ooxml tag final static String DOCUMENT_PROPERTIES = "DocumentProperties"; none >./WMFParser.java: mime type private static final MediaType MEDIA_TYPE = MediaType.image("wmf"); none ./java/org/apache/poi/sl/draw/ImageRenderer.java >./TNEFParser.java: mime type private static final Set<MediaType> SUPPORTED_TYPES = { application("vnd.ms-tnef"), application("ms-tnef"), application(x-tnef") } none >./ooxml/xwpf/ml2006/Word2006MLParser.java: mime type protected static final Set<MediaType> SUPPORTED_TYPES = Collections.singleton( MediaType.application("vnd.ms-word2006ml" ) none >./ooxml/AbstractOOXMLExtractor.java: mime type private static final String TYPE_OLE_OBJECT = "application/vnd.openxmlformats-officedocument.oleObject"; none ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFRelation.java; used in ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFWorkbook.java >./ooxml/OOXMLParser.java: mime type protected static final Set<MediaType> SUPPORTED_TYPES = unmodifiableSet( MediaType.application("vnd.openxmlformats-officedocument.presentationml.presentation"), ... ) none ./ooxml/java/org/apache/poi/xslf/usermodel/XSLFRelation.java >./ooxml/OOXMLParser.java: mime type protected static final Set<MediaType> UNSUPPORTED_OOXML_TYPES = singleton( MediaType.application("vnd.ms-xpsdocument") ); none >./EMFParser.java: mime type private static final MediaType MEDIA_TYPE = MediaType.image("emf"); none ./java/org/apache/poi/sl/draw/ImageRenderer.java >./EMFParser.java: mime type private static final MediaType WMF_MEDIA_TYPE = MediaType.image("wmf"); none ./java/org/apache/poi/sl/draw/ImageRenderer.java >./POIFSContainerDetector.java: mime type public static final MediaType OLE = application("x-tika-msoffice"); none >./POIFSContainerDetector.java: mime type public static final MediaType OOXML_PROTECTED = application("x-tika-ooxml-protected"); none >./POIFSContainerDetector.java: mime type public static final MediaType GENERAL_EMBEDDED = application("x-tika-msoffice-embedded"); none >./POIFSContainerDetector.java: mime type public static final MediaType OLE10_NATIVE = new MediaType(GENERAL_EMBEDDED, "format", "ole10_native"); none ./java/org/apache/poi/poifs/filesystem/Ole10Native.java >./POIFSContainerDetector.java: mime type public static final MediaType COMP_OBJ = new MediaType(GENERAL_EMBEDDED, "format", "comp_obj"); none >./POIFSContainerDetector.java: mime type public static final MediaType MS_GRAPH_CHART = application("vnd.ms-graph"); none >./POIFSContainerDetector.java: mime type public static final MediaType MS_EQUATION = application("vnd.ms-equation"); none >./POIFSContainerDetector.java: mime type public static final MediaType XLS = application("vnd.ms-excel"); none ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFRelation.java >./POIFSContainerDetector.java: mime type public static final MediaType DOC = application("msword"); none >./POIFSContainerDetector.java: mime type public static final MediaType PPT = application("vnd.ms-powerpoint"); none ./ooxml/java/org/apache/poi/xslf/usermodel/XSLFRelation.java >./POIFSContainerDetector.java: mime type public static final MediaType PUB = application("x-mspublisher"); none >./POIFSContainerDetector.java: mime type public static final MediaType VSD = application("vnd.visio"); none >./POIFSContainerDetector.java: mime type public static final MediaType WPS = application("vnd.ms-works"); none >./POIFSContainerDetector.java: mime type public static final MediaType XLR = application("x-tika-msworks-spreadsheet"); none >./POIFSContainerDetector.java: mime type public static final MediaType MSG = application("vnd.ms-outlook"); none >./POIFSContainerDetector.java: mime type public static final MediaType MPP = application("vnd.ms-project"); none >./POIFSContainerDetector.java: mime type public static final MediaType SDC = application("vnd.stardivision.calc"); none >./POIFSContainerDetector.java: mime type public static final MediaType SDA = application("vnd.stardivision.draw"); none >./POIFSContainerDetector.java: mime type public static final MediaType SDD = application("vnd.stardivision.impress"); none >./POIFSContainerDetector.java: mime type public static final MediaType SDW = application("vnd.stardivision.writer"); none >./POIFSContainerDetector.java: mime type public static final MediaType SLDWORKS = application("sldworks"); none >./POIFSContainerDetector.java: mime type public static final MediaType HWP = application("x-hwp-v5"); none >./POIFSContainerDetector.java: mime type public static final MediaType QUATTROPRO = application("x-quattro-pro"); none >./POIFSContainerDetector.java: mime type private static final byte[] MS_GRAPH_CHART_BYTES = "MSGraph.Chart".getBytes(); none >./MSOwnerFileParser.java: mime type private static final MediaType MEDIA_TYPE = MediaType.application("x-ms-owner"); none >./xml/SpreadsheetMLParser.java: mime type private static final MediaType MEDIA_TYPE = MediaType.application("vnd.ms-spreadsheetml"); none ./ooxml/java/org/apache/poi/xssf/usermodel/XSSFRelation.java >./xml/WordMLParser.java: mime type private static final MediaType MEDIA_TYPE = MediaType.application("vnd.ms-wordml"); none ./ooxml/java/org/apache/poi/xwpf/usermodel/XWPFRelation.java >./OldExcelParser.java: mime type private static final Set<MediaType> SUPPORTED_TYPES = { MediaType.application("vnd.ms-excel.4"), ... }; >./ooxml/XWPFWordExtractorDecorator.java: private static final String LIST_DELIMITER = " "; >./ooxml/OOXMLTikaBodyPartHandler.java: private final static char[] NEWLINE = new char[]{'\n'}; >./ooxml/OOXMLWordAndPowerPointTextHandler.java: private final static char[] TAB_CHAR = new char[]{'\t'}; >./ooxml/OOXMLWordAndPowerPointTextHandler.java: private final static char NEWLINE = '\n'; >./ooxml/XWPFListManager.java: private final static String SKIP_FORMAT = Character.toString((char) 61623);//if this shows up as the lvlText, don't show a number >./AbstractListManager.java: private final static String BULLET = "\u00b7"; >./MSOwnerFileParser.java: private static final int ASCII_CHUNK_LENGTH = 54; >./WordExtractor.java: private static final char UNICODECHAR_NONBREAKING_HYPHEN = '\u2011'; >./WordExtractor.java: private static final char UNICODECHAR_ZERO_WIDTH_SPACE = '\u200b'; >./WordExtractor.java: private static final String LIST_DELIMITER = " "; >./xml/AbstractXML2003Parser.java: final static char[] NEWLINE = new char[] {'\n'};
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 61296
:
35138
| 35139