Publications | Jialiang Xu

2024

EMNLP 2024
Do LLMs Know to Respect Copyright Notice?

Jialiang Xu, Shenglan Li, Zhaozhuo Xu, and Denghui Zhang

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs DOI Bib PDF

Prior study shows that LLMs sometimes generate content that violates copyright. In this paper, we study another important yet underexplored problem, i.e., will LLMs respect copyright information in user input, and behave accordingly? The research problem is critical, as a negative answer would imply that LLMs will become the primary facilitator and accelerator of copyright infringement behavior. We conducted a series of experiments using a diverse set of language models, user prompts, and copyrighted materials, including books, news articles, API documentation, and movie scripts. Our study offers a conservative evaluation of the extent to which language models may infringe upon copyrights when processing user input containing protected material. This research emphasizes the need for further investigation and the importance of ensuring LLMs respect copyright regulations when handling user input to prevent unauthorized use or reproduction of protected content. We also release a benchmark dataset serving as a test bed for evaluating infringement behaviors by LLMs and stress the need for future alignment.
@inproceedings{xu-etal-2024-llms, title = {Do {LLM}s Know to Respect Copyright Notice?}, author = {Xu, Jialiang and Li, Shenglan and Xu, Zhaozhuo and Zhang, Denghui}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.emnlp-main.1147/}, doi = {10.18653/v1/2024.emnlp-main.1147}, pages = {20604--20619}, selected = true, bibtex_show = true, }
EMNLP 2024
SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions

Shicheng Liu, Sina Semnani, Harold Triedman, Jialiang Xu, Isaac Dan Zhao, and 1 more author

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

Abs DOI Bib PDF

Large Language Models (LLMs) have led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, datasets used in KBQA studies do not capture the true complexity of KBQA tasks. They either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas.We introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from discussions on Wikidata’s “Request a Query” forum with 320 decontextualized question-SPARQL pairs. The complexity of these in-the-wild queries calls for a KBQA system that can dynamically explore large and often incomplete schemas and reason about them, as it is infeasible to create a comprehensive training dataset. We also introduce an in-context learning KBQA agent, also called SPINACH, that mimics how a human expert would write SPARQLs to handle challenging questions. SPINACH achieves a new state of the art on the QALD-7, QALD-9 Plus and QALD-10 datasets by 31.0%, 27.0%, and 10.0% in F_1, respectively, and coming within 1.6% of the fine-tuned LLaMA SOTA model on WikiWebQuestions.On our new SPINACH dataset, the SPINACH agent outperforms all baselines, including the best GPT-4-based KBQA agent, by at least 38.1% in F_1.
@inproceedings{liu-etal-2024-spinach, title = {{SPINACH}: {SPARQL}-Based Information Navigation for Challenging Real-World Questions}, author = {Liu, Shicheng and Semnani, Sina and Triedman, Harold and Xu, Jialiang and Zhao, Isaac Dan and Lam, Monica}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.findings-emnlp.938/}, doi = {10.18653/v1/2024.findings-emnlp.938}, pages = {15977--16001}, selected = false, bibtex_show = true, }
ArXiv
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs

Jialiang Xu^*, Michael Moor^*, and Jure Leskovec

Nov 2024

Abs arXiv Bib PDF

Despite impressive advances in recent multimodal large language models (MLLMs), state-of-the-art models such as from the GPT-4 suite still struggle with knowledge-intensive tasks. To address this, we consider Reverse Image Retrieval (RIR) augmented generation, a simple yet effective strategy to augment MLLMs with web-scale reverse image search results. RIR robustly improves knowledge-intensive visual question answering (VQA) of GPT-4V by 37-43%, GPT-4 Turbo by 25-27%, and GPT-4o by 18-20% in terms of open-ended VQA evaluation metrics. To our surprise, we discover that RIR helps the model to better access its own world knowledge. Concretely, our experiments suggest that RIR augmentation helps by providing further visual and textual cues without necessarily containing the direct answer to a query. In addition, we elucidate cases in which RIR can hurt performance and conduct a human evaluation. Finally, we find that the overall advantage of using RIR makes it difficult for an agent that can choose to use RIR to perform better than an approach where RIR is the default setting.
@misc{xu2024reverse, title = {Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs}, author = {Xu, Jialiang and Moor, Michael and Leskovec, Jure}, year = {2024}, eprint = {2405.18740}, archiveprefix = {arXiv}, primaryclass = {id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}, selected = true, bibtex_show = true, }
ACL 2024
Award
Word Embeddings Are Steers for Language Models

Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, and 3 more authors

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

Awarded Abs DOI Bib PDF

ACL 2024 Outstanding Paper Award

Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs’ size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at https://github.com/Glaciohound/LM-Steer.
@inproceedings{han-etal-2024-word, title = {Word Embeddings Are Steers for Language Models}, author = {Han, Chi and Xu, Jialiang and Li, Manling and Fung, Yi and Sun, Chenkai and Jiang, Nan and Abdelzaher, Tarek and Ji, Heng}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.864/}, doi = {10.18653/v1/2024.acl-long.864}, pages = {16410--16430}, selected = true, bibtex_show = true, }

ACL 2024

Findings

SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

Heidi C. Zhang, Sina J. Semnani, Farhad Ghassemi, Jialiang Xu, Shicheng Liu, and 1 more author

Aug 2024

We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive heterogeneous open-domain QA dataset, with 56.5% exact match (EM) rate. More importantly, manual analysis on a sample of the dataset suggests that SPAGHETTI is more than 90% accurate, indicating that EM is no longer suitable for assessing the capabilities of QA systems today.

@misc{zhang2024spaghetti,
  title = {SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing},
  author = {Zhang, Heidi C. and Semnani, Sina J. and Ghassemi, Farhad and Xu, Jialiang and Liu, Shicheng and Lam, Monica S.},
  year = {2024},
  eprint = {2406.00562},
  archiveprefix = {arXiv},
  primaryclass = {id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'},
  selected = false,
  bibtex_show = true,
}

NAACL 2024
Findings
SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models

Shicheng Liu, Jialiang Xu, Wesley Tjangnaka, Sina Semnani, Chen Jie Yu, and 1 more author

Aug 2024

Abs arXiv Bib PDF

Many knowledge sources consist of both structured information such as relational databases as well as unstructured free text. Building a conversational interface to such data sources is challenging. This paper introduces SUQL, Structured and Unstructured Query Language, the first formal executable representation that naturally covers compositions of structured and unstructured data queries. Specifically, it augments SQL with several free-text primitives to form a precise, succinct, and expressive representation. This paper also presents a conversational search agent based on large language models, including a few-shot contextual semantic parser for SUQL. To validate our approach, we introduce a dataset consisting of crowdsourced questions and conversations about real restaurants. Over 51% of the questions in the dataset require both structured and unstructured data, suggesting that it is a common phenomenon. We show that our few-shot conversational agent based on SUQL finds an entity satisfying all user requirements 89.3% of the time, compared to just 65.0% for a strong and commonly used baseline.
@misc{liu2024suql, title = {{SUQL}: Conversational Search over Structured and Unstructured Data with Large Language Models}, author = {Liu, Shicheng and Xu, Jialiang and Tjangnaka, Wesley and Semnani, Sina and Yu, Chen Jie and Lam, Monica}, year = {2024}, url = {https://openreview.net/forum?id=DoSQeeVlUO}, selected = false, bibtex_show = true, }

2023

ACL 2023
Findings
AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks

Xinyi He, Mengyu Zhou, Mingjie Zhou, Jialiang Xu, Xiao Lv, and 5 more authors

Jul 2023

Abs DOI Bib PDF

Tabular data analysis is performed everyday across various domains. It requires an accurate understanding of field semantics to correctly operate on table fields and find common patterns in daily analysis. In this paper, we introduce the AnaMeta dataset, a collection of 467k tables with derived supervision labels for four types of commonly used field metadata: measure/dimension dichotomy, common field roles, semantic field type, and default aggregation function. We evaluate a wide range of models for inferring metadata as the benchmark. We also propose a multi-encoder framework, called KDF, which improves the metadata understanding capability of tabular models by incorporating distribution and knowledge information. Furthermore, we propose four interfaces for incorporating field metadata into downstream analysis tasks.
@misc{he-etal-2023-anameta, title = {{A}na{M}eta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks}, author = {He, Xinyi and Zhou, Mengyu and Zhou, Mingjie and Xu, Jialiang and Lv, Xiao and Li, Tianle and Shao, Yijia and Han, Shi and Yuan, Zejian and Zhang, Dongmei}, editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki}, month = jul, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-acl.604}, doi = {10.18653/v1/2023.findings-acl.604}, pages = {9471--9492}, bibtex_show = true }
ArXiv
InfoPattern: Unveiling Information Propagation Patterns in Social Media

Chi Han, Jialiang Xu, Manling Li, Hanning Zhang, Tarek Abdelzaher, and 1 more author

Jul 2023

Abs arXiv Bib PDF

Social media play a significant role in shaping public opinion and influencing ideological communities through information propagation. Our demo InfoPattern centers on the interplay between language and human ideology. The demo (Code: this https URL ) is capable of: (1) red teaming to simulate adversary responses from opposite ideology communities; (2) stance detection to identify the underlying political sentiments in each message; (3) information propagation graph discovery to reveal the evolution of claims across various communities over time. (Live Demo: this https URL )
@misc{han2023infopattern, title = {InfoPattern: Unveiling Information Propagation Patterns in Social Media}, author = {Han, Chi and Xu, Jialiang and Li, Manling and Zhang, Hanning and Abdelzaher, Tarek and Ji, Heng}, year = {2023}, eprint = {2311.15642}, archiveprefix = {arXiv}, primaryclass = {cs.SI}, selected = false, bibtex_show = true, }

2022

EMNLP 2022
Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems

Jialiang Xu, Mengyu Zhou, Xinyi He, Shi Han, and Dongmei Zhang

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022

Abs DOI Bib PDF Poster Slides

Numerical Question Answering is the task of answering questions that require numerical capabilities. Previous works introduce general adversarial attacks to Numerical Question Answering, while not systematically exploring numerical capabilities specific to the topic. In this paper, we propose to conduct numerical capability diagnosis on a series of Numerical Question Answering systems and datasets. A series of numerical capabilities are highlighted, and corresponding dataset perturbations are designed. Empirical results indicate that existing systems are severely challenged by these perturbations. E.g., Graph2Tree experienced a 53.83% absolute accuracy drop against the “Extra” perturbation on ASDiv-a, and BART experienced 13.80% accuracy drop against the “Language” perturbation on the numerical subset of DROP. As a counteracting approach, we also investigate the effectiveness of applying perturbations as data augmentation to relieve systems’ lack of robust numerical capabilities. With experiment analysis and empirical studies, it is demonstrated that Numerical Question Answering with robust numerical capabilities is still to a large extent an open question. We discuss future directions of Numerical Question Answering and summarize guidelines on future dataset collection and system design.
@inproceedings{xu-etal-2022-towards-robust, title = {Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of {NLP} Systems}, author = {Xu, Jialiang and Zhou, Mengyu and He, Xinyi and Han, Shi and Zhang, Dongmei}, editor = {Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue}, booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing}, month = dec, year = {2022}, address = {Abu Dhabi, United Arab Emirates}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.emnlp-main.542/}, doi = {10.18653/v1/2022.emnlp-main.542}, pages = {7950--7966}, selected = true, bibtex_show = true, }
ArXiv
LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

Hongwei Han^*, Jialiang Xu^*, Mengyu Zhou, Yijia Shao, Shi Han, and 1 more author

Dec 2022

Abs DOI arXiv Bib PDF

Transformers are widely used in NLP tasks. However, current approaches to leveraging transformers to understand language expose one weak spot: Number understanding. In some scenarios, numbers frequently occur, especially in semi-structured data like tables. But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e.g., breaking numbers into sub-word tokens - which leads to many number-related errors. In this paper, we propose the LUNA framework which improves the numerical reasoning and calculation capabilities of transformer-based language models. With the number plugin of NumTok and NumBed, LUNA represents each number as a whole to model input. With number pre-training, including regression loss and model distillation, LUNA bridges the gap between number and vocabulary embeddings. To the best of our knowledge, this is the first work that explicitly injects numeracy capability into language models using Number Plugins. Besides evaluating toy models on toy tasks, we evaluate LUNA on three large-scale transformer models (RoBERTa, BERT, TabBERT) over three different downstream tasks (TATQA, TabFact, CrediTrans), and observe the performances of language models are constantly improved by LUNA. The augmented models also improve the official baseline of TAT-QA (EM: 50.15 -> 59.58) and achieve SOTA performance on CrediTrans (F1 = 86.17).
@misc{https://doi.org/10.48550/arxiv.2212.02691, doi = {10.48550/ARXIV.2212.02691}, url = {https://arxiv.org/abs/2212.02691}, author = {Han, Hongwei and Xu, Jialiang and Zhou, Mengyu and Shao, Yijia and Han, Shi and Zhang, Dongmei}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license}, selected = false, bibtex_show = true, }